Tutorial 6 for CSC 310 This tutorial should illustrate the idea of capacity for the binary erasure channel, and compare this with the rate at which one would expect to be able to communicate without error. Recall that a binary erasure channel takes inputs that are 0 or 1, and sometimes transmits them unchanged to the output, but other times (with probability f), changes the 0 or 1 to an "erasure" symbol, written as "?". For a given input distribution, with p0 = P(X=0), and a given erasure probability, f, the mutual information between channel input, X, and channel output, Y, can be written in any of three ways: I(X;Y) = H(X) + H(Y) - H(X,Y) = H(X) - H(X|Y) = H(Y) - H(Y|X) What will be the easiest way to calculate I(X;Y)? That depends on how easy it is to calculate the various entropies above... H(X) is easy to calculate, since it depends only on p0. It is equal to H2(p0), where H2(p) is the "binary entropy", defined as p log(1/p) + (1-p) log(1/(1-p)). H(Y) requires that we first calculate q0, q?, and q1, the probabilities of the outputs. How do we do that? H(Y|X) is easy to calculate too. It's just H2(f). H(X|Y) depends on the backwards probabilities, and for many channels is the hardest to calculate. However, for the binary erasure channel, some of the backwards probabilities are easy. What are H(X|Y=0) and H(X|Y=1)? (Answer: Both are zero) What is H(X|Y=?)? To answer this, we need P(X=0|Y=?). What is it? P(X=0)P(Y=?|X=0) p0 f P(X=0|Y=?) = ---------------- = --------------- = p0 P(Y=?) p0 f + (1-p0) f So H(X|Y=?) = H2(p0). So what is H(X|Y)? (Answer: H(X|Y) = P(Y=?) H(X|Y=?), since H(X|Y=0)=H(X|Y=1)=0, from which H(X|Y) = f H2(p0).) The easiest way to calculate I(X;Y) seems to be as H(X) - H(X|Y), which gives I(X;Y) = H2(p0) - f H2(p0) = (1-f) H2(p0). We now can easily find the capacity of this channel. How? We need to choose p0 to maximize I(X;Y). This is easy, since it amounts to maximizing H2(p0), which is done by setting p0=1/2. The capacity is the value of I(X;Y) for this p0, which is 1-f. So the capacity of a binary erasure channel is just one minus the erasure probability. Using a suitable coding and decoding scheme, we can hope to transmit information at any rate less than capacity with vanishingly small error probability. How might we do this? Suppose we try transmitting blocks of K bits by encoding them into blocks of N bits, choosing K and N so that the rate R = K/N is less than the capacity, C = 1-f. We encode a block of K bits by mapping it to one of a set of 2^K codewords. How well this works will depend on what set of codewords we use. Let's suppose we just choose them at random from the 2^N possible blocks of size N. Now imagine the receiver trying to figure out the codeword transmitted by looking at the channel outputs. Before looking at any outputs, the receiver will consider any of the 2^K codewords to be possible. Suppose the receiver looks at the first channel output (ie, the first symbol in the received block of N symbols), and sees that it is 0. How does that reduce the set of possible codewords? (Answer: it eliminates all the codewords that start with 1.) How many codewords will be left? (Answer: it depends on the exact code, but if we chose codewords randomly, we'd expect to eliminate about half of them.) Similarly if the first channel output is 1. How about if the first channel output is the erasure symbol, "?". That doesn't eliminate any codewords. So for each non-erasure in the output, we expect to reduce the number of possible codewords by about half. There are 2^K codewords all together. So how many non-erasure symbols do we need to isolate the single codeword that was actually sent? Answer: about K non-erasures. Put another way, we need less than about N-K erasures if we are to decode successfully. Are we likely to get that few erasures? That will depend on f. If N is large, we expect to get about fN erasures (this is the "law of large numbers"). So we expect to have a good chance of decoding correctly if fN < N-K, which is equivalent to f < 1 - K/N, which is equivalent to K/N < 1-f, which is the same as R