In my last post I introduced information theory as one of those obscure ideas which creationists love to misrepresent for their own purposes. Well, I can’t stand to see good science have its name besmirched like that, so let’s rebalance things by looking at the good stuff in information theory and what we use it for.
Information theory was invented in the 1940s to deal with the problems faced by long-distance telecommunications — the original paper was written by Claude Shannon under the title A Mathematical Theory of Communication. Since then it has been expanded upon and applied in nearly every aspect of communication, computing and more besides.
‘Information’, according to the theory, is the individual bits which make up a message which has to be transmitted. They could be computing-style bits (ie, binary digits, ones and zeroes) or they could be individual letters from the alphabet. The problem is that the sender and receiver are some distance apart and so they have to use some indirect means of transferring this information, called a ‘channel’.
Information is passed along a channel. This is an abstract way of saying “dots and dashes are passed down telegraph wires” or “smoke signals are sent into the sky” or “words are spoken into cocoa tins tied together with string”. The actual type of channel doesn’t matter for the theory, only that there is one. When the receiver picks up a signal they have to pull the sender’s information out from all the background noise.
And therein lies the problem. It would be dead easy to pull out the signal if you already knew what it was, right? You’d know exactly what to look for, but it would also be pointless. You wouldn’t have learned anything, so technically the message would have an information content of zero. The other extreme is a message that could be anything, such as a stream of random numbers. This has high information content because each number is completely unpredictable, even if you know all the others. With each new number that arrives, you’ve “learned” something.
If a channel is noisy then many of the bits which are transmitted get corrupted before the receiver sees them. The reliability of the channel is the probability of some bit passing through a channel unscathed. If you a 100% reliable channel, and you’re receiving random numbers then you know they are all correct. But no channel is completely noise-free, so how do you know which random number is intentional and which is a corruption?
The answer is redundancy. If the channel is not very noisy, you hardly need any checks at all. But if the channel is really unreliable you may need to send the same information several times to overcome the corruption. One of the big contributions from Shannon’s paper was the ability to calculate how much redundancy you need. You can tell how much information you’ll lose during transit and so decide how much extra resource you wish to spend on increasing reliability.
You can see this use of redundancy in the real world with police radio codes. Police walkie-talkies are low quality, and they make lots of crackle and fuzz. So the officers don’t use single letters when communicating car registration numbers: they use whole words. Alpha Bravo Charlie has a lot more redundancy than A B C, so it’s clearer when you add a bit of static — ?lph? Bra?? Cha?lie.
If you add redundancy then you increase reliability, but you also take up more room. It means more data to transmit. This slows down messages which isn’t good for anybody. Shannon also covered this in his paper, and I’ll talk about it in my next post: compression.