[8401] in cryptography@c2.net mail archive
Re: Perfect compression and true randomness
daemon@ATHENA.MIT.EDU (Arnold G. Reinhold)
Mon Jan 8 15:01:14 2001
Mime-Version: 1.0
Message-Id: <v04210100b67e9fd5563b@[24.218.56.92]>
In-Reply-To: <200101050520.VAA10489@shell7.ba.best.com>
Date: Mon, 8 Jan 2001 09:41:09 -0500
To: Nick Szabo <szabo@best.com>, cryptography@c2.net
From: "Arnold G. Reinhold" <reinhold@world.std.com>
Cc: burton rosenberg <burt@math.miami.edu>
Content-Type: text/plain; charset="iso-8859-1" ; format="flowed"
Content-Transfer-Encoding: quoted-printable
I don't think Chaitin/Kolomogorv complexity is relevant here. In real=20
world systems both parties have a lot of a priori knowledge. Your=20
probably_perfect_compress program is not likely to compress this=20
sentence at all, but PKZIP can. The probably_perfect_compress=20
argument would work (ignoring run time) if Alice first had to send=20
Bob the entire PKZIP program, but in reality she doesn't. Also=20
discussing "perfect compression" doesn't make sense in the absence of=20
a space of possible messages and a probability distribution on that=20
space.
I don't agree that the assumption of randomness in OTP's is on the=20
same footing as "perfect" compression. The laws of physics let you=20
put a lower bound on the entropy per bit for practical noise=20
generators. You can then distill the collected bits to produce fewer=20
bits which are completely random.
In any case, as I tried to point out before, perfect compression,=20
what ever it may be, does not prevent a know-plaintext attack. If=20
Malfoy knows the plaintext and the compression algorithm, he has=20
every thing he needs to guess or exhaust keys. If he has a large=20
number of plaintexts or can choose plaintexts he might be able to=20
effect more sophisticated attacks attacks.
Arnold Reinhold
At 9:20 PM -0800 1/4/2001, Nick Szabo wrote:
>Anonymous wrote (responding to the idea of "perfect compression"):
>> ... Once you have specified
>> such a probability distribution, you can evaluate how well a particular
>> compression algorithm works. But speaking of absolute compression or
>> absolute entropy is meaningless.
>
>These ideas have on a Turing machine the same meaning as the idea of
>"truly random numbers", and for the same reason. The assumption of
>randomness used in proving that OTPs and other protocols are
>"unconditionally" secure is very similar to the assumption that a string
>is "perfectly compressed". The problem is that determining the absolute
>entropy of a string, as well as the equivalent problem of determining
>whether it is "real random", is both uncomputable and language-dependent.=
=A0
>
>Empirically, it seems likely that generating truly random numbers is much
>more practical than perfect compression. If one has access to certain
>well-observed physical phenomena, one can make highly confident, if
>still mathematically unproven, assumptions of "true randomness", but
>said phenomena don't help with perfect compression.=A0
>
>If we restrict ourselves to Turing machines, we can do something *close*
>to perfect compression and tests of true randomness -- but not quite.
>And *very* slow. From a better physical source there is still the problem
>that if we can't sufficiently test them, how can we be so confident
>they are random anyway? Such assumptions are based on the extensive and
>various, but imperfect, statistical tests physicists have done (has
>anybody tried cryptanalyzing radioactive decay? :-)
>
>We can come close to testing for true randomness and and doing perfect
>compression on a Turing machine. For example, here is an algorithm that,
>for sufficiently long but finite number of steps t, will *probably* give yo=
u
>the perfect compression (I believe the probability converges on
>a number related to Chaitin's "Omega" halting probability as t grows,
>but don't quote me -- this would make an interesting research topic).
>
>probably_perfect_compress(data,t) {
>for all binary programs smaller than data {
> run program until it halts or it has run for time t
> if (output of program =3D=3D data AND
> length(program) < length(shortest_program)) {
> shortest_program =3D program
> }
>}
>print "the data: ", data
>print "the (probably) perfect compression of the data", shortest_program
>return shortest_program
>}
>
>(We have to makes some reasonable assumption about what the binary
>programming language is -- see below).
>
>We can then use our probably-perfect compression algorithm as a statstical
>test of randomness as follows:
>
>probably_random_test(data,t) {
> if length(probably_perfect_compress(data,t)) =3D length(data)
> then print "data is probably random"
> else print "pattern found, data is not random"
>}
>
>We can't *prove* that we've found the perfect compression. However,
>I bet we can get a good idea of the *probability* that we've found the
>perfect compression by examining this algorithm in terms
>of the algorithmic probability of the data and Chaitin's halting
>probability.
>
>Nor is the above algorithm efficient. Similarly, you can't prove
>that you've found truly random numbers, nor is it efficient to
>generate such numbers on a Turing machine. (Pseudorandom
>numbers are another story, and numbers derived from non-Turing
>physical sources are another story).=A0=A0=A0=A0=A0=A0=A0=A0=A0
>
>We could generate (non-cryptographic) probably-random numbers as follows:
>
>probably_random_generate(seed,t) {
> return probably_perfect_compress(seed,t)
>}
>
>For cryptographic applications there are two important ideas,
>one-wayness and expanding rather than contracting the seed, that
>are not captured here. Probably_random_generate is more like the idea
>of hashing an imperfect entropy source to get the "core" entropy
>one believes exists in it. Only probably_random_generate is far=20
>more reliable,
>as one can actually formally analyze the probability of having generated
>truly random numbers. It is, alas, much slower than hashing. :-(
>
>Back to the theoretical point about whether there is such a thing
>as "absolute" entropy or compression. The Kolmogorov complexity
>(the smallest program that, when run, produce the decompressed data)
>is clearly defined and fully general for Turing machines. If we could
>determine the Kolmogorov complexity we wouldn't need to invoke any
>probability distribution to determine the absolute minimum possible
>entropy of any data to be compressed on a Turning machine.
>
>It is, alas, uncomputable. To find the Kolmogorov complexity we could
>simply search through the space of all programs smaller than the data.
>But due to the halting problem we cannot always be certain that there
>does not exist a smaller program that, run for a sufficiently long period
>of time, will produce the decompressed data. When we can't prove that
>there is no smaller program than the data which generates the data,
>we also can't prove that there is not a pattern hidden in the data which
>makes it less than "truly random". The finite version of this search
>process, in the program probably_perfect_compression, circumvents
>the halting problem by arbitrarily halting programs that have already
>run for t steps.
>
>Also, since the length of the program depends on what language it's
>written in, absolute Kolmogorov complexity is good only for
>analyzing growth rates. The choice of language adds a constant
>length to the program. We'd have to look at probably_perfect_compression
>in this context to see if the choice of binary language is a
>reasonable one or if other languages would give better compressions
>on the data we are likely to encounter.
>
>One consequence is that OTPs themselves have a big problem when we
>assume "truly random" numbers. This assumption is, in terms of
>the provability of security, no weaker or stronger than than
>an assumption of "perfect compression". (Which assumption is
>more practical is a different question -- as per above, if one has
>access to certain well-observed physical phenomena, one can make
>highly confident, if still mathematically unproven, assumptions of
>"true randomness", but said phenomena don't help with perfect
>compression).
>
>For more on this topic see http://www.best.com/~szabo/kolmogorov.html and
>the references therein.