Claude Shannon, born on April 30th, 1916 in Petoskey, Michigan and died February 24th, 2001 in Medford, Massachusetts, was an American mathematician and he change the world.

I’d like to explain and to pay homage.

In 1948 he published “A Mathematical Theory of Communication.” This work is the underpinnings to Information Theory is more influential than another invention in 1948, the transistor.

His work is the underpinnings of the internet – email, facebook, netflix, and videos of cats playing piano.

**What is Information Theory?**

While I’m not a mathematician, what the formula above is expressing is a means to probabilistically codify messages from a sender to a receiver. Let me explain: “RICE, CHICKEN, and NOOO VEGETABLES!”

The other night I was getting takeout from an Asian restaurant. As I was entering a man and a woman were talking ahead of me. The man turned and walked away. The woman, annoyed, asks “What do you want?” The man equally annoyed replies while turning around “Rice, Chicken, and no plenaballs.” Which prompted a “Huh? Just tell me what you want.” At this point we get the loud, slow, deliberate answer “RICE… CHICKEN… AND NO VEGETABLES!”

In this case the information was exchanged a few times. Initially not all was received so it was sent again. But this time with redundancy to ensure transmission.

Think of Information as the resolution of uncertainty. If the data in the message reduces the uncertainty, then its information. If there’s no change in uncertainty, then its simply data.

The ultimate example of this is Wheel of Fortune.

**From it to bit – how all this is applied?**

For a coin flip there are only two probabilities: heads or tails or said another way, zero (0) substituting for heads and one (1) substituting for tails. Each of these is 0.50 likely. In this case information, heads or tails, is one bit, a 0 or a 1. Now suppose I have a deck of cards. There are four suits with each having 13 cards in it. In a full deck the likeliness I get a club is 0.25 or 25%. I represent 0.25 as two bits or 00, 10, 01, or 11. Just like if I flipped a coin twice I could have heads/heads (00), tails/heads (10), heads/tails (01) or tails/tails (11).

Now think about the alphabet again. For the sake of simplicity we’ll say there are 26 letters in the alphabet (not counting capitals, punctuation, or blank spaces between words). To represent 26 symbols (each letter is a symbol) I need 5 bits.

1 bit = two symbols (0 or 1) or heads and tails

2 bits = 4 symbols (00, 10, 01, 11) or the suits in a deck of cards: clubs, spades, diamonds, and hearts

3 bits = 8 symbols (000, 001, 010, 100, 011, 101, 110, 111)

4 bits = 16 symbols (0000, 0001, 0010, 0100, 1000, 0011, 0101, 1001, 0110, 1100, 1110, 0111, 1101, 1011, 1010, 1111)

5 bits = 32 symbols (00000, 00001, 00010, 00100, 01000, 00011, 00101, 01001, 00110, 01100, 01110, 00111, 01101, 01011, 01010, 01111, 10000, 10001, 10010, 10100, 11000, 10011, 10101, 11001, 10110, 11100, 11110, 10111, 11101, 11011, 11010, 11111)

With 5 bits you can have 00000 equal “e” since we know “e” is the most frequent letter of the alphabet (actually, the English alphabet requires 8 bits and “e” is 01100101). What this means, I know it sounds weird, is that resolving the uncertainty of “e” requires at least 5 bits of information (8 bits in reality). Because there are 26 letters in the alphabet, you need more information to distinguish which letter is which. Luckily, communicating is one of the few times in life where the past truly dictates the future. If I know “q” is part of the message or puzzle, then I know “u” is highly likely to follow.

Pat, I’d like to solve the puzzle.