[119152] in cryptography@c2.net mail archive

home help back first fref pref prev next nref lref last post

convergent encryption reconsidered -- salting and key-strengthening

daemon@ATHENA.MIT.EDU (zooko)
Mon Mar 31 11:06:07 2008

In-Reply-To: <6D1C61D6-D9FE-4E9E-BCCA-6B9133AF2058@solarsail.hcs.harvard.edu>
From: zooko <zooko@zooko.com>
Date: Sun, 30 Mar 2008 19:37:15 -0600
To: theory and practice of decentralized computer networks <p2p-hackers@lists.zooko.com>,
 Cryptography <cryptography@metzdowd.com>,
 tahoe-dev@allmydata.org

[This conversation is spanning three mailing lists -- =20
cryptography@metzdowd.com, p2p-hackers@lists.zooko.com, and tahoe-=20
dev@allmydata.org .  Some of the posts have not reached all three of =20
those lists.  I've manually added Jerry Leichter and Ivan Krsti=C4=87 to =
=20
the approved-senders set for p2p-hackers and tahoe-dev, so that =20
further posts by them will appear on those lists.]

> On Mar 30, 2008, at 3:13 PM, Ivan Krsti=C4=87 wrote:
>
> Unless I'm misunderstanding Zooko's writeup, he's worried about an
> attacker going from a partially-known plaintext (e.g. a form bank
> letter) to a completely-known plaintext by repeating the following
> process:
>
> 1. take partially known plaintext
> 2. make a guess, randomly or more intelligently where possible,
>     about the unknown parts
> 3. take the current integrated partial+guessed plaintext, hash
>     to obtain convergence key
> 4. verify whether that key exists in the storage index
> 5. if yes, you've found the full plaintext. if not, repeat from '2'.
>
> That's a brute force search.

That's right.  Your comparison to normal old brute-force/dictionary =20
attack on passwords is a good one, and one that Jim McCoy and Jerry =20
Leichter have alluded to as well.

Think of it like this:

Passwords are susceptible to brute-force and/or dictionary attack.  =20
We can't, in general, prevent attackers from trying guesses at our =20
passwords without also preventing users from using them, so instead =20
we employ various techniques:

  * salts (to break up the space of targets into subspaces, of which =20
at most one can be targeted by a given brute-force attack)
  * key strengthening (to increase by a constant factor the cost of =20
checking a password)
  * rate-limits for on-line tries (i.e., you get only a small fixed =20
number of wrong guesses in a row before you are locked out for a time-=20=

out period)

However, secrets other than passwords are not usually susceptible to =20
such attacks.  You can store your True Name, credit card number, bank =20=

account number, mother's maiden name, and so forth, on the same =20
server as your password, but you don't have to worry about using =20
salts or key strengthening on those latter secrets, because the =20
server doesn't run a service that allows unauthenticated remote =20
people to connect, submit a guess as to their value, and receive =20
confirmation, the way it does for your password.  (In other words, =20
for such data we generally use an extreme form of the third defense, =20
rate-limiting tries -- the number of guesses that an attacker gets is =20=

limited to none!)

Likewise, if you are going to store or transmit those kinds of =20
secrets in encrypted form using a traditional randomly-generated =20
encryption key, then you don't have to worry about brute-force/=20
dictionary attacks because your strong randomly-selected symmetric =20
encryption key defies them.

The Key Point:

  *** Convergent encryption exposes whatever data is put into it to =20
the sorts of attacks that already apply to passwords.


Convergent encryption had been invented, analyzed and used for many =20
years, but to the best of my knowledge the first time that anyone =20
noticed this issue was March 16 of this year (at 3 AM Chicago Time), =20
when Drew Perttula and Brian Warner made that observation.  (Although =20=

to be fair some of the best-known uses of convergent encryption =20
during these years have been sharing publicly-known files with =20
strangers, in which case I suppose it is assumed that the cleartext =20
does not contain secrets.)

Now PBKDF2 is a combination of the first two defenses -- salting and =20
key strengthening.  When you first suggested PBKDF2, I -- and =20
apparently Jerry Leichter -- thought that you were suggesting its =20
salting feature as a solution.  The solution that we've come up with =20
for Tahoe (described in my original note) is much like salting, =20
except that the added value that gets mixed in is secret and =20
unguessable, where I normally think of salt as non-secret.

Now I see that you are also emphasizing the key strengthening feature =20=

of PBKDF2.

"k" denotes symmetric encryption key, "p" denotes plaintext, "c" =20
denotes ciphertext, "s" denotes salt, "E(key, plaintext)" is =20
encryption, "H()" is secure hashing, "H^1000()" is secure hashing a =20
thousand times over, i.e."H(H(H(H(...))))" a thousand times.

Traditional encryption:

k =3D random()
c =3D E(k, p)

Traditional convergent encryption:

k =3D H(p)
c =3D E(k, p)

Tahoe-style convergent encryption with added secret ("s");  "s" can =20
be re-used for any number of files, but it is kept secret from =20
everyone except those with whom you wish to converge storage.

s =3D random()
k =3D H(s, p)
c =3D E(k, p)

PBKDF2 (simplified);  "s" can be re-used but is generally not, and it =20=

is not secret.

s =3D random()
k =3D H^1000(s, password)
c =3D E(k, p)

Now, one could imagine a variant of traditional convergent encryption =20=

which added key strengthening:

k =3D H^1000(p)
c =3D E(k, p)

This would have a performance impact on normal everyday use of Tahoe =20
without, in my current estimation, making a brute-force/dictionary =20
attack infeasible.  Key strengthening allows you to choose an amount =20
of wasted CPU that you are willing to impose on your users during =20
normal use, and multiply the attacker's costs by exactly that =20
amount.  If the attacker has 2^64 computational capacity, and the =20
users are willing to waste 2^10 extra computrons on each file access, =20=

then the attacker's effective capacity is reduced to 2^54.

The trade-off is actually worse than it appears since the attacker is =20=

attacking multiple users at once (in traditional convergent =20
encryption, he is attacking *all* users at once), so he gains an =20
economy of scale, and can profitably invest in specialized tools, =20
even specialized hardware such as a COPACOBANA [1].  At the very =20
least he can profitably devote many CPU cores to churning out new =20
guesses 24/7, while for normal users it is not profitable to allocate =20=

a 24/7 CPU load to strengthening their keys.  The reason for this =20
disparity is that the attacker gets to attack everyone at once for =20
the same cost as attacking only one target, where the defenders have =20
to pay each for his own defense.


Next, one could imagine a variant of Tahoe's convergent encryption =20
with added secret which adds key strengthening:

s =3D random()
k =3D H^1000(s, p)
c =3D E(k, p)

This would likewise be costly to normal users, but moreover it is not =20=

needed because the "s =3D random()" part of the algorithm locks out all =20=

attackers except those with whom s is shared from mounting such an =20
attack at all.

Thank you for your comments on this issue.  If you have further =20
ideas, especially as would be relevant to the Tahoe Least-Authority =20
Filesystem, I would love to hear them.

Regards,

Zooko O'Whielacronx

[1] http://copacobana.org/=

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo@metzdowd.com

home help back first fref pref prev next nref lref last post