=========================================================================== CSC 310H Lecture Summary for Week 1 Fall 2007 =========================================================================== ------ Topics ------ In this course, we will study the problems of data compression and data communication over a noisy channel. In the context of data compression, a source produces an input X (which could be a string, a computer file, an image, a movie) and we need to encode X into another string Z that is as small as possible, but in such a way that a decoder can later compute X by looking only at Z. If we require that X be correctly retrieved all the time, we say that the compression scheme is "lossless" (e.g. zip, compress). If we only require that X be retrieved "with high probability", the compression scheme is "lossy" (e.g. mp3, mpeg). In the context of data communication over a noisy channel, we again start with a source that produces an input X. We need to transmit X across a communication channel (e.g. network cable, radio signal, CD) that might introduce errors in the transimssion. In order to boost the chance that X is transmitted correctly, we encode X by adding "redundancy" into a string Z. We then transmit Z across the noisy channel and we retrieve Z' at the other end. Finally, we remove the redundancy from Z' and we obtain X'. Ideally, we want to have X'=X. Unlike the case of data compression, we cannot always enforce this, because there is always a small chance that the communication channel alters Z a lot. So we settle for asking that X'=X with high probability. In this course, we will study: - The mathematical theory behind compression and communication problems, including two famous theorems proved by Claude Shannon in 1948 that give limits on how well *any* method can do. - Practical algorithms for compression and communication, which appeared much later. - Modelling issues involved in transforming a real-world problem into a mathematical form so that analytical tools can be applied to it. ------------------- Compression example ------------------- Suppose we are given as input a black/white image of size 10x50 and we would like to compress it. In particular, consider the following image: .................................................. ..XX...XX...........XX.....XX..................... ..XX...XX...........XXX...XXX..................... ..XX...XX...X.......XX.XXX.XX..................... ..XXXXXXX...........XX..X..XX..................... ..XXXXXXX...X.......XX.....XX....XX....XXXXX.XXX.. ..XX...XX...X.......XX.....XX...X..X...XX...X...X. ..XX...XX...X.......XX.....XX...X..X...XX...X...X. ..XX...XX...X.......XX.....XX....XX....XX...X...X. .................................................. Uncompressed, it takes 10x50 = 500 bits. If both encoder and decoder know that the image is actually a piece of English text and they also know the specific font that was used to generate the image, it would be enough to encode the 6 characters: "H", "i", space, "m", "o", "m". There are 26x2=52 letters plus a few punctuation symbols, say at most 64 different symbols. We could encode each one in binary with a 6-bit string. Thus, we would only send 6x6 = 36 bits! This is a huge improvement on the 500 bits required by the raw image, but it relies heavily on "extra information" shared by the encoder and the decoder. Not every 10x50 bw image can be encoded by this scheme. Another idea: going left to right, if a column is identical to the previous one, send a bit that signals this situation. If the new column is not identical to the previous one, simply include it as is in the encoding. Note, we still have to transmit the extra bit, that now says "new column is not identical to previous one". The image above would be encoded as follows: [0,0000000000],[1],[0,0111111110],[1],[0,0000110000],[1],[1] ... Above, I included the delimiters "[", ",", "]" in order to make it clear how the encoding is being constructed. These extra symbols would not be part of the encoding. How many bits does it take to encode the image? We use 1 bit for every column identical to the one before it, and 11 bits for a new column. One can check that the image contains 25 columns identical to the one before them, and 25 new ones. Thus, the image would be encoded in 1x25+11x25 = 12x25 = 300 bits. That's pretty good compared to the raw 500 bits. Furthermore, note that unlike the "english letters" scheme before, this scheme can actually encode any 10x50 bw image. In the worst case, we will use 11 bits for every column, so any encoding has at most 550 bits. One more idea: encode each black rectangle as a 4-tuple that specifies the row and column of its top left and bottom right corners. We need 4 bits to specify a row and 6 bits to specify a column, so 2x(4+6) = 20 bits per black rectangle. The image above contains 20 of them, so it would be encoded in 20x20 = 400 bits. Again, this scheme can encode any image. If the black rectangles are all very small (say, only 1 pixel), we end up using 20 bits per each such rectangle. In the worst case, the input looks like a chess board with 500/2 = 250 single black pixels, which would get encoded into 250x20 = 5000 bits. -------------------- Moral of the example -------------------- Which scheme is better? It ultimately depends on what kind of image we expect to receive as input. The more we know about the source that generates the input, the better we will be at encoding it. The way we incorporate knowledge about the source in our scheme is to build a probabilistic model for the inputs that we expect to receive. A simple such model would be "the probability any bit is white is .7, independently of the colours of the bits around it." A more elaborate model could specify the probability a pixel has colour C given the colours of the pixels above and to the left of it. Based on this information, we could tune our encoding scheme to perform well on these types of inputs. If we know these probabilities before we even look at the image, that would be a static model. If, instead, we learn these probabilities as we go through the image, that would correspond to an adaptive model. ---------------------------------- Shannon's Noiseless Coding Theorem ---------------------------------- How much can we compress the data coming from a given source? Shannon proved in 1948 that there is a lower limit, called the source entropy, below which no scheme can losslessly compress. --------------------- Communication example --------------------- Suppose we have a communication channel that flips every bit passing through it independently with probability .1. We want to send an 8-bit message X through this channel. First, let's try sending X as-is. What is the probability the entire message goes through with no errors? It's (.9)^8 = .430. That's pretty bad, more than half the time the message will have errors. What if we try sending each bit k times? Say, k=3. Then, to decode the received string, we take the majority of the bits received for every source bit. Example: X = 0 0 1 1 1 0 1 1 Z = 000 000 111 111 111 000 111 111 Z' = 000 010 111 100 110 000 111 011 X' = 0 0 1 0 1 0 1 1 As you can see, we were not able to fix the error in the 4th bit because 2 out of the 3 bits sent for that source bit were corrupted by the channel. What is the probability the enitre message goes through with no errors? Pr[X_i = X'_i] = Pr[Z_i1 = Z'_i1 & Z_i2 = Z'_i2 & Z_i3 = Z'_i3] + + Pr[ 2 of Z_i1, Z_i2, Z_i3 are correct and 1 is wrong] = (.9)^3 + (3 choose 2) * (.9)^2 * (.1) = .972 Pr[X = X'] = (.972)^8 = .797, much better than .430! However, we are paying some price for this increase. Define the "rate" of the code to be |X|/|Z|. The first code had rate 1, the second one has rate 1/3 = .333. What if we want the probability that X goes through to be arbitrarily close to 1? It shouldn't be hard to see that we would need k --> infinity, so rate --> 0. Is this unavoidable? Surprisingly: NO. ------------------------------ Shannon's Noisy Coding Theorem ------------------------------ There exists a non-zero constant dependent on the channel only, called the capacity of the channel, such that there are encoding schemes that can transmit messages with arbitrarily small probability of error at a rate close to this capacity.