[107794] in Cypherpunks

home help back first fref pref prev next nref lref last post

Re: Redundancy and Randomness [Information Theory]

daemon@ATHENA.MIT.EDU (Bennett Haselton)
Sat Jan 23 22:02:58 1999

Date: Sat, 23 Jan 1999 21:02:02 -0600
To: cypherpunks@cyberpass.net
From: Bennett Haselton <bennett@peacefire.org>
In-Reply-To: <Pine.LNX.3.96.990123191136.3911A-100000@albert>
Reply-To: Bennett Haselton <bennett@peacefire.org>

At 07:21 PM 1/23/99 -0500, mgraffam@idsi.net wrote:
>hey everyone.. 
>
>I've been working on some stuff lately, and it occurred to me that
>it hinges on the assumption that if a message has 0 redundancy, then it
>consists of essentially random bits. 

I think it's more than that.  If a message has 0 redundancy then it
consists of random bits, *and* the odds of each bit being 1 or 0 are
exactly 50%.

You could have a string of bits where each bit has a 2/3 chance of being a
0 and a 1/3 chance of being a 1 -- this is a sequence of random bits, but
it doesn't have 0 entropy, and it can be compressed.

>I'm not comfortable with this assumption intuitively, so I was wondering
>if anyone knows of a proof that such a message must be random, or can
>think of an example where this would not be the case. 

If a string of bits has "redundancy", then that means there are bits at
particular places in the message for which you know with confidence of more
than 50% what the bits are, without looking at them.  If you try to guess
what those bits are, you'll be wrong some of the time, but you'll also be
right more than 50% of the time, on average.  So when you look at the bits
to find out what their actual values are, each bit contains slightly less
information than a truly random bit, because most of them will be
confirming something that you already knew was *probably* true.

I think that's what redundancy means.  So, for simplicity, just concentrate
on the redundant bits, i.e. the bits where you have a more than 50% chance
of correctly guessing what their values are, without being told.  This
problem is essentially the same as having a sequence of 1's and 0's where a
1 is more likely to occur than a 0.

I don't know how to prove it, but if you have a large enough stream of 1's
and 0's, and a 0 is generally more than 50% likely to occur, then the
stream can be compressed.  It's probably hard to show in general, but it's
easy if you pick extreme values, such as: a 1 only occurs once in a million
times.  Then instead of storing the sequence of 1's and 0's:
	10000000001000000000000000000000010000001
etc., you could store a sequence of integers (in binary) where each integer
represents the number of 0's between successive 1's:
	(9, 22, 6, etc.)
This would be much more efficient than storing a string of 1's and 0's
where a 1 only occurred once in a million times.  The hard part is proving
that you can also compress a sequence in which 1's occur 49% of the time.

>An ideal compression algorithm that compressed a message down to its
>essential entropy would have 0 redundancy (by definition) .. and while
>it would not be 'philosophically random' (how could a deterministic
>algorithm get randomness from ordered bits?) it seems that such a
>compression function's output would be indistinguishable from true
>randomness. 

I think that's true.

	-Bennett

bennett@peacefire.org    (615) 421 5432    http://www.peacefire.org


home help back first fref pref prev next nref lref last post