[119152] in cryptography@c2.net mail archive
convergent encryption reconsidered -- salting and key-strengthening
daemon@ATHENA.MIT.EDU (zooko)
Mon Mar 31 11:06:07 2008
In-Reply-To: <6D1C61D6-D9FE-4E9E-BCCA-6B9133AF2058@solarsail.hcs.harvard.edu>
From: zooko <zooko@zooko.com>
Date: Sun, 30 Mar 2008 19:37:15 -0600
To: theory and practice of decentralized computer networks <p2p-hackers@lists.zooko.com>,
Cryptography <cryptography@metzdowd.com>,
tahoe-dev@allmydata.org
[This conversation is spanning three mailing lists -- =20
cryptography@metzdowd.com, p2p-hackers@lists.zooko.com, and tahoe-=20
dev@allmydata.org . Some of the posts have not reached all three of =20
those lists. I've manually added Jerry Leichter and Ivan Krsti=C4=87 to =
=20
the approved-senders set for p2p-hackers and tahoe-dev, so that =20
further posts by them will appear on those lists.]
> On Mar 30, 2008, at 3:13 PM, Ivan Krsti=C4=87 wrote:
>
> Unless I'm misunderstanding Zooko's writeup, he's worried about an
> attacker going from a partially-known plaintext (e.g. a form bank
> letter) to a completely-known plaintext by repeating the following
> process:
>
> 1. take partially known plaintext
> 2. make a guess, randomly or more intelligently where possible,
> about the unknown parts
> 3. take the current integrated partial+guessed plaintext, hash
> to obtain convergence key
> 4. verify whether that key exists in the storage index
> 5. if yes, you've found the full plaintext. if not, repeat from '2'.
>
> That's a brute force search.
That's right. Your comparison to normal old brute-force/dictionary =20
attack on passwords is a good one, and one that Jim McCoy and Jerry =20
Leichter have alluded to as well.
Think of it like this:
Passwords are susceptible to brute-force and/or dictionary attack. =20
We can't, in general, prevent attackers from trying guesses at our =20
passwords without also preventing users from using them, so instead =20
we employ various techniques:
* salts (to break up the space of targets into subspaces, of which =20
at most one can be targeted by a given brute-force attack)
* key strengthening (to increase by a constant factor the cost of =20
checking a password)
* rate-limits for on-line tries (i.e., you get only a small fixed =20
number of wrong guesses in a row before you are locked out for a time-=20=
out period)
However, secrets other than passwords are not usually susceptible to =20
such attacks. You can store your True Name, credit card number, bank =20=
account number, mother's maiden name, and so forth, on the same =20
server as your password, but you don't have to worry about using =20
salts or key strengthening on those latter secrets, because the =20
server doesn't run a service that allows unauthenticated remote =20
people to connect, submit a guess as to their value, and receive =20
confirmation, the way it does for your password. (In other words, =20
for such data we generally use an extreme form of the third defense, =20
rate-limiting tries -- the number of guesses that an attacker gets is =20=
limited to none!)
Likewise, if you are going to store or transmit those kinds of =20
secrets in encrypted form using a traditional randomly-generated =20
encryption key, then you don't have to worry about brute-force/=20
dictionary attacks because your strong randomly-selected symmetric =20
encryption key defies them.
The Key Point:
*** Convergent encryption exposes whatever data is put into it to =20
the sorts of attacks that already apply to passwords.
Convergent encryption had been invented, analyzed and used for many =20
years, but to the best of my knowledge the first time that anyone =20
noticed this issue was March 16 of this year (at 3 AM Chicago Time), =20
when Drew Perttula and Brian Warner made that observation. (Although =20=
to be fair some of the best-known uses of convergent encryption =20
during these years have been sharing publicly-known files with =20
strangers, in which case I suppose it is assumed that the cleartext =20
does not contain secrets.)
Now PBKDF2 is a combination of the first two defenses -- salting and =20
key strengthening. When you first suggested PBKDF2, I -- and =20
apparently Jerry Leichter -- thought that you were suggesting its =20
salting feature as a solution. The solution that we've come up with =20
for Tahoe (described in my original note) is much like salting, =20
except that the added value that gets mixed in is secret and =20
unguessable, where I normally think of salt as non-secret.
Now I see that you are also emphasizing the key strengthening feature =20=
of PBKDF2.
"k" denotes symmetric encryption key, "p" denotes plaintext, "c" =20
denotes ciphertext, "s" denotes salt, "E(key, plaintext)" is =20
encryption, "H()" is secure hashing, "H^1000()" is secure hashing a =20
thousand times over, i.e."H(H(H(H(...))))" a thousand times.
Traditional encryption:
k =3D random()
c =3D E(k, p)
Traditional convergent encryption:
k =3D H(p)
c =3D E(k, p)
Tahoe-style convergent encryption with added secret ("s"); "s" can =20
be re-used for any number of files, but it is kept secret from =20
everyone except those with whom you wish to converge storage.
s =3D random()
k =3D H(s, p)
c =3D E(k, p)
PBKDF2 (simplified); "s" can be re-used but is generally not, and it =20=
is not secret.
s =3D random()
k =3D H^1000(s, password)
c =3D E(k, p)
Now, one could imagine a variant of traditional convergent encryption =20=
which added key strengthening:
k =3D H^1000(p)
c =3D E(k, p)
This would have a performance impact on normal everyday use of Tahoe =20
without, in my current estimation, making a brute-force/dictionary =20
attack infeasible. Key strengthening allows you to choose an amount =20
of wasted CPU that you are willing to impose on your users during =20
normal use, and multiply the attacker's costs by exactly that =20
amount. If the attacker has 2^64 computational capacity, and the =20
users are willing to waste 2^10 extra computrons on each file access, =20=
then the attacker's effective capacity is reduced to 2^54.
The trade-off is actually worse than it appears since the attacker is =20=
attacking multiple users at once (in traditional convergent =20
encryption, he is attacking *all* users at once), so he gains an =20
economy of scale, and can profitably invest in specialized tools, =20
even specialized hardware such as a COPACOBANA [1]. At the very =20
least he can profitably devote many CPU cores to churning out new =20
guesses 24/7, while for normal users it is not profitable to allocate =20=
a 24/7 CPU load to strengthening their keys. The reason for this =20
disparity is that the attacker gets to attack everyone at once for =20
the same cost as attacking only one target, where the defenders have =20
to pay each for his own defense.
Next, one could imagine a variant of Tahoe's convergent encryption =20
with added secret which adds key strengthening:
s =3D random()
k =3D H^1000(s, p)
c =3D E(k, p)
This would likewise be costly to normal users, but moreover it is not =20=
needed because the "s =3D random()" part of the algorithm locks out all =20=
attackers except those with whom s is shared from mounting such an =20
attack at all.
Thank you for your comments on this issue. If you have further =20
ideas, especially as would be relevant to the Tahoe Least-Authority =20
Filesystem, I would love to hear them.
Regards,
Zooko O'Whielacronx
[1] http://copacobana.org/=
---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo@metzdowd.com