?

Log in

The Broken Hut
Working my way up to a full-size building
Recent Entries 
28th-Jul-2007 07:48 pm - Information theory and message codes

Some more on information theory. Last time I covered issues to do with the reliability of the ‘channel’, the medium used to transfer the data between sender and receiver. To summarise what was in the last post:

  • Information is data transferred between sender and receiver, so that the receiver then learns something that they didn’t know before.
  • All transmission channels have some error rate, so it is useful to be able to talk about, and calculate the effects of, these errors on data we transmit.
  • Adding various types of redundancy can mitigate the errors. We know how much redundancy to use because we can make accurate calculations about error rates.

This deals with the problem of transmission, but we still don’t know how to create the message in the first place. What is the most efficient way of turning my thoughts into something which will travel over the channel? This problem, called coding, is an age-old problem. It’s the problem of translating thoughts into voice or words or pictures.

Read more about the coding problem and how information theory helpsCollapse )
11th-Jul-2007 07:18 pm - Information theory and redundancy

In my last post I introduced information theory as one of those obscure ideas which creationists love to misrepresent for their own purposes. Well, I can’t stand to see good science have its name besmirched like that, so let’s rebalance things by looking at the good stuff in information theory and what we use it for.

Information theory was invented in the 1940s to deal with the problems faced by long-distance telecommunications — the original paper was written by Claude Shannon under the title A Mathematical Theory of Communication. Since then it has been expanded upon and applied in nearly every aspect of communication, computing and more besides.

‘Information’, according to the theory, is the individual bits which make up a message which has to be transmitted. They could be computing-style bits (ie, binary digits, ones and zeroes) or they could be individual letters from the alphabet. The problem is that the sender and receiver are some distance apart and so they have to use some indirect means of transferring this information, called a ‘channel’.

Information is passed along a channel. This is an abstract way of saying “dots and dashes are passed down telegraph wires” or “smoke signals are sent into the sky” or “words are spoken into cocoa tins tied together with string”. The actual type of channel doesn’t matter for the theory, only that there is one. When the receiver picks up a signal they have to pull the sender’s information out from all the background noise.

And therein lies the problem. It would be dead easy to pull out the signal if you already knew what it was, right? You’d know exactly what to look for, but it would also be pointless. You wouldn’t have learned anything, so technically the message would have an information content of zero. The other extreme is a message that could be anything, such as a stream of random numbers. This has high information content because each number is completely unpredictable, even if you know all the others. With each new number that arrives, you’ve “learned” something.

If a channel is noisy then many of the bits which are transmitted get corrupted before the receiver sees them. The reliability of the channel is the probability of some bit passing through a channel unscathed. If you a 100% reliable channel, and you’re receiving random numbers then you know they are all correct. But no channel is completely noise-free, so how do you know which random number is intentional and which is a corruption?

The answer is redundancy. If the channel is not very noisy, you hardly need any checks at all. But if the channel is really unreliable you may need to send the same information several times to overcome the corruption. One of the big contributions from Shannon’s paper was the ability to calculate how much redundancy you need. You can tell how much information you’ll lose during transit and so decide how much extra resource you wish to spend on increasing reliability.

You can see this use of redundancy in the real world with police radio codes. Police walkie-talkies are low quality, and they make lots of crackle and fuzz. So the officers don’t use single letters when communicating car registration numbers: they use whole words. Alpha Bravo Charlie has a lot more redundancy than A B C, so it’s clearer when you add a bit of static — ?lph? Bra?? Cha?lie.

If you add redundancy then you increase reliability, but you also take up more room. It means more data to transmit. This slows down messages which isn’t good for anybody. Shannon also covered this in his paper, and I’ll talk about it in my next post: compression.

10th-Jul-2007 02:07 pm - Half-truths and information theory

According to Darrell Huff, it’s easy to lie with statistics. But it’s just as easy to lie with any kind of science, for the same reason — there’s a lot of knowledge out there and most people aren’t familiar with the basics. So people can come up with intuitively plausible ideas like homeopathy using some quantum-theoretic means of healing, and many people will accept it.

Creationist lecturers make a living from this ambiguity: giving just enough knowledge to tempt the audience into making a leap to a fallacious conclusion. One of the examples the Intelligent Design creationists like is something called Information Theory, which they misrepresent and abuse readily.

Information Theory sounds like it might be about meaning and purpose. Of course, information in that sense is entirely in the eye of the beholder. (Imagine having a crossed line on the telephone while you’re talking to a friend. As far as you’re concerned, the other conversation going on is ‘noise’, while yours is ‘information’; the other people think the exact opposite.) There are no formalisms for deciding how useful something is. But the creationists don’t clarify what is meant by information; they just let you make assumptions because it suits their purpose.

In information theory, a message has information if you can’t predict what it says before you read it. So a dice with a single spot on each face provides no information — you knew it would show a 1 before you rolled. An ordinary dice is unpredictable, so when it stops you’ve learned some piece of information. This is how information theory regards information: as something which informs.

Whether it is interesting or useful is irrelevant. In fact, the most informative message possible is one which consists entirely of random numbers, since it has the most unpredictability.

As well as not defining what is meant by information, creationists then claim that information cannot be created. (This is why they bring it up: to show that complex DNA cannot evolve.) But from the correct definition given above, we can see it is very easy to create complex information. Any noisy signals — leaves rustling, background radiation — are fantastic sources of information.

They mean to say that structured, interesting information cannot be spontaneously created, so genomes require a designer. But in this regard, information theory is not helpful, so they lie about what it says and hope to create a convincing argument from the half-truths and fancy mathematical terms.

In further posts I’ll talk about the good stuff in information theory, and places where you’ll come across it. Tune in next time!

This page was loaded Jul 30th 2015, 12:08 pm GMT.