CSC148H - Summer 96

Assignment 2: Data Encryption

Due: Thursday 20 June, 6pm.

Introduction


The security of data, whether it is stored or communicated, has always been a concern of governments and military organizations. But with the age of electronic communication upon us, ordinary people are now concerned about security and privacy whether they are making a credit card transaction over the Internet, sending/receiving sensitive e-mail, or simply having a chat. Keeping information private and ensuring its authenticity (i.e., making sure it's not a forgery) have become major research areas in computer science. Cryptography is the science (and art) of protecting information by making it unintelligible to all but the authorized parties. In this assignment, we will look at two very simple cryptographic systems (cryptosystems).

The problem


Encryption is the process of transforming the original text, called the plaintext, into a new text, called the ciphertext. Decryption in the inverse process of transforming the ciphertext back to the original plaintext. A cryptosystem, for our purposes, is a set of one-to-one mappings over an alphabet of characters, each mapping of which is defined by a unique key. A mapping tells you how to encrypt/decrypt every character in the alphabet depending on the key.

In this assignment we will look at two very simple cryptosystems from a class of systems called substitution ciphers. A substitution cipher replaces each character (of the alphabet) in the plaintext message with a unique character from the alphabet thereby creating the ciphertext message. The key determines how each substitution is made. Thus, from the key, we can generate a substitution table that lists each individual substitution. For example, this ciphertext message:

Tkh sxusrvh ri wklv dvvljqphqw lv wr jlyh brx hashulhqfh lq ghdolqj
zlwk d odujh slhfh ri frgh, dqg wr whdfk brx derxw surjudp
vshflilfdwlrqv dqg fruuhfwqhvv.
was encrypted by applying this substitution table:
a b c d e f g h i j k l m n o p q r s t u v w x y z
tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241 tex2html_wrap_inline241
d e f g h i j k l m n o p q r s t u v w x y z a b c
to this plaintext message:
The purpose of this assignment is to give you experience in dealing
with a large piece of code, and to teach you about program
specifications and correctness.

The alphabet in this example is the set of lower case letters. Notice that upper case letters, punctuation, and spacing were left untouched. The substitution table was generated by shifting each character of the alphabet over by 3 places. This is just one mapping of a set known as the Caesar ciphers after Julius Caesar, who, not trusting his messengers, communicated with Marcus Tullius Cicero in 50 B.C. using this very mapping. The key of a Caesar cipher is the shift factor; in this case 3. Question: how many different Caesar ciphers are there? (Or, in the worst case, how many keys would you have to go through to break the code?)

Assumptions for the assignment: The alphabet will consist of only the lower case letters; however, a plaintext message may have more than just these letters in it, in which case the letters and symbols not in the alphabet should be left untouched in all encryption/decryption operations. So, a message that you encrypt and then decrypt should look identical to the original message.

Your tasks


We have divided the assignment into several stages, which you do not have to complete in the order shown. Some stages require you to design code, others to provide answers in English. The division is intended to provide a logical structure to a well-designed program. The end goal of your work is to develop a simple interface so that a user can encrypt and decrypt messages stored in files.

  1.   What properties make a substitution table valid? In other words, what must be true of the table so that the original message is always recoverable from its ciphertext. Write one clear and concise paragraph in English that explains your informal specification for a valid substitution table, and give an example of an invalid substitution table.

  2.   We have provided a Turing function that verifies the validity of a substitution table. It takes a substitution table of type subsTableType as a parameter, and returns true if the table is valid, and false otherwise. Your first task is to design the data structure for the substitution table. Next, in anticipation of proving that the main loop (marked in the function) is correct, your task is to specify the precondition(s), postcondition(s), and loop invariant for the loop. Write them in executable Turing using the assert and invariant statements. Your conditions and invariant may call other functions that you have written.

  3.   Prove the correctness of the main loop in the verify function. You must prove partial correctness and termination. This means that you must prove (1) that the loop invariant that you specified in ii is indeed a loop invariant, (2) that if the loop terminates, then the postcondition(s) is true, and (3) that the loop terminates.

  4.   Write a Turing procedure that generates a valid substitution table for a Caesar cipher. Its parameters should include an integer representing the key, and a substitution table that will be filled in.

  5.   Write a Turing procedure that generates a valid substitution table for a random cipher. A random cipher maps each character of the alphabet to a unique randomly-chosen character. To make the random choices, your procedure should use Turing's randnext, which generates a sequence of random numbers. You will `seed', i.e., initialize, the sequence with the key. (Documentation on randseed and randnext.) Note: the function should take the same parameters as the Caesar cipher generator. Question: how many keys would you have to go through to break this cipher (as compared to the Caesar cipher)?

  6.   Write a Turing procedure that encrypts a plaintext message stored in a file. It should output the ciphertext message into a file. It should take at least three parameters: the name of the plaintext file (to read from), the name of the ciphertext file (to write to), and a substitution table to apply in the encryption. Make sure that just the letters in the alphabet (the lower case letters) are encrypted. All others, including punctuation, spaces, and blank lines, should be passed through unchanged.

  7.   Write a Turing procedure that decrypts a ciphertext message stored in a file. It should take at least these three parameters: the name of the ciphertext file (to read from), the name of the plaintext file (to write to), and the substitution table to apply in the decryption.

  8.   Substitution ciphers are not very secure compared to current state-of-the-art cryptosystems, such as PGP (Pretty Good Privacy), which is pretty much unbreakable. To break a substitution cipher is to generate the substitution table using only the ciphertext, thereby allowing one to decrypt the message without knowing the key. One method to do this is to compare the frequency that characters occur in the ciphertext to the frequency that characters occur in normal English text. It is likely that the most frequent character in the ciphertext corresponds to the most frequent character in English; that the second most frequent characters correspond; and so on. While this method is not always perfect, it can be used to partially uncover the original message.

    Write a Turing procedure that attempts to break a substitution cipher. It should take at least three parameters: the name of a file containing the ciphertext, an array of characters ordered (sorted) by their frequency in normal English text, and a substitution table to fill in. It should first compute the frequencies, that is, the number of times each character (of the alphabet) occurs in the ciphertext. Then, it should match the ordering of characters by frequency in the ciphertext to the ordering in normal English text.

  9.   Develop a simple user interface that ties all of the above functions and procedures together. The user should be able to perform the following operations: generate a substitution table of her choice for a given key, display a table on the screen, verify that a table is valid, encrypt a file of her choice, decrypt a file of her choice, and attempt to break an encrypted file of her choice. Basic functionality and ease of use will determine your mark for this interface. We aren't looking for anything fancy, and you may assume that all user input is correct (i.e., no error-checking is necessary).

  10.   An encrypted secret message will be put on the web page. You must try to break its code, and decrypt it. Include the substitution table and the decrypted secret message in your report. Note: because of variations in frequency, the decrypted secret message may not be perfect.

  11.   Write a report, as described in the CSC148 Course Handbook. In the `Method' section, make sure you include a paragraph that describes your data structure for the substitution table, and paragraphs that describe how each of the above subprograms work. Justify all design decisions that you have made. In the `Tests' section, make sure you explain each of the tests that you performed.

Hints


All of the above subprograms rely on the substitution table; be careful in designing its data structure. You may also need to know about converting characters to and from their ASCII codes using the ord() and chr() functions.

On the web page you will find:

1. the verify function
2. documentation on randseed and randnext
3. a list of letters ordered by their frequency in normal English text
4. the secret message

What to hand in


Hand in the following:

To keep the marker happy, clearly label and organize everything that you hand in.

About this document ...

This document was generated using the LaTeX2HTML translator Version 96.1 (Feb 5, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:
latex2html -split 0 -no_navigation as2.

The translation was initiated by Philip Edmonds on Wed May 29 20:55:21 EDT 1996


Philip Edmonds
Wed May 29 20:55:21 EDT 1996