Proof of a Looping Program

The following code can be traced with a few examples in order to discern its purpose.

Assume these declarations have been made:
    int n;
    // Read a value into n;
    int data[] = new int[n];
    int frequency[] = new int[n];
    int i;


Imagine this code comes later:
    /* Preconditions: ??? */
    i = 0;
    while (data[i] != -1)
    {
        frequency[data[i]] = frequency[data[i]] + 1;
        i++;
    }
    /* Postconditions: ??? */
    

Writing appropriate Preconditions and Postconditions

You will notice that the code could certainly have been clearer in stating its purpose. Comments would have helped. Once we figure out the purpose of the program, we can write specifications for it - guidelines that the program will follow.

1.Deciding on the Preconditions

We would like to figure what must be true in order for the program to work. From the code, we can see the purpose is to store the frequency of occurrences of elements in the data array in the corresponding positions in the frequency array. However, the code in between the declarations and the loop hasn't been specified, so it could be anything. We want the loop code to work so let's figure out some situations where it won't work. Once we do that, we can write preconditions so that those using the loop will know when it will work.

When the Loop Code will Fail:

  1. when the data array hasn't been initialized
  2. when the frequency array hasn't been initialized
  3. when the data array contains negative values before the first -1 or when it contains values greater than the size of the data array before the first -1 element.
  4. when the frequency array hasn't been initialized to all zeroes or to appropriate initial frequencies
  5. when the data array does not contain a -1
  6. when a -1 is in the data array before other data elements we would like to process.
We can now simplify these conditions by making some decisions as to how we would like the data structures to be given to us.

We know from conditions 1, 3, 4, and 5 that the data array has to be initialized, it can't have any negative numbers or numbers greater than the size of the frequency array before the first -1 element, it can, it must have at least one -1 element, and the first -1 element. Many different formats can work under these restrictions. We can make the preconditions simpler by imposing another restriction. We can say the -1 element has to be at the end of the data array. That way, there is only one -1 element to think about and all the data in the array up to the -1 element at the end will be processed, and none of the other elements will have "out-of-range" values.

For the frequency array, we know it must be initialized to some sensible values. We can further impose that all these values must be zero. Thus, the loop code will result in the frequency array containing frequencies of all the data elements in only the data array.

We can write the preconditions in a more mathematical way:

         data[n-1] = -1
         frequency[j] = 0          for all 0 <= j <= n-1
         0 <= data[j] < n-1        for all 0 <= j <= n-2
    

2.Figuring out the Postconditions

The postconditions assert what is true after the code is terminated (assuming of course, that the preconditions were true). With the purpose of the program in mind, what can we say is true once the loop is done? ... Well, we know that the frequencies of the data elements have been stored in the corresponding positions in the data array, so let that be our postcondition. Expressed in a mathematical way, the postcondition would be:

         for all 0 <= j <= n-1
         frequency[j] = the number of occurrences of j in data[0..n-2]
    

3. The Resulting Code

Assume these declarations have been made:
    int n;
    // Read a value into n;
    int data[] = new int[n];
    int frequency[] = new int[n];
    int i;


Imagine this code comes later:
    /* Preconditions: 
         data[n-1] = -1
         frequency[j] = 0          for all 0 <= j <= n-1
         0 <= data[j] < n-1        for all 0 <= j <= n-2       */
    i = 0;
    while (data[i] != -1)
    {
        frequency[data[i]] = frequency[data[i]] + 1;
        i++;
    }
    /* Postconditions: 
         for all 0 <= j <= n-1
         frequency[j] = the number of occurrences of j in data[0..n-2] */
    

Finding the Loop Invariants

We now have to find a loop invariant.. something that stays the same as the loop progresses. It also has to help us prove the postconditions on termination of the loop. Here is a loop invariant that shows the progress of the loop but stays true throughout the loop:

         for all 0 <= j <= n-1
             frequency[j] = #occurrences of j in data[0..i-1]
    
This simply means that at the beginning of each iteration i, we have processed the data array up to i - 1.

A second loop invariant would be that the data array remains unchanged:

         for all 0 <= j <= n
             data'[j] = data[j]
    

Proving that if the Code Terminates, the Postconditions will be True

If the preconditions are true and the loop terminates, then the postconditions should be true. We can prove this. We will be making use of an equivalence derived from the preconditions to do this proof: data[k] = -1 <=> k=n-1. These two statements are equivalent. Convince yourself of this by examining the preconditions. The equivalency remains true in the loop since the data array does not change (second loop invariant).

Now, assuming the loop invariants and the negation of the loop guard, we can prove the postconditions:

      LI#1:
          for all 0 <= j <= n-1
              frequency[j] = #occurrences of j in data[0..i-1]

      LI#2:
          for all 0 <= j <= n
              data'[j] = data[j]

      Negation of Loop guard:
          data[i] = -1

      Since the last statement is equivalent to i = n - 1, we also know
          i = n - 1

      We can now substitute this i into the loop invariant #1, yielding:
          for all 0 <= j <= n-1
              frequency[j] = #occurrences of j in data[0..n-2]
    
The last statement obtained is the postcondition we wanted! The above is a proof that the postconditions are true upon termination of the loop.

Proving Termination

For a loop to work, it has to terminate. If it loops infinitely, it is incorrect. So, part of proving a loop program correct is proving that the loop terminates.

First, we must find an integer-valued quantity that decreases on each iteration but can never become negative. This quantity is called the bound function,t - it is an upper bound on the number of iterations left.t must have these properties:

  1. if there are iterations left, then t > 0
  2. t decreases at every iteration.
Clearly then, if we can find such a function for a loop, the loop must terminate.

We can find such a bound on our loop. In the loop, we progress through the data array and exit upon reaching the last element, since, by the preconditions, it must be -1. Thus, a bound on the iterations left can be the number of elements left to look at: the index of the last element of the array minus the current position in the array. We can thus use (n-i)-1 as our bound function. This becomes zero when it is time to exit. n-i will also work. So will (n-i)+400. (n-i)-50 won't work because it violates the first condition.

A bound function exists for the loop, thus the loop terminates.

Proving that the Invariants are Invariant

We used the invariant to show that the postconditions were true if the loop terminated. However, in order to complete our proof, we must prove that the loop invariants are indeed invariant.

Our loop invariant #1 was:

        for all 0 <= j <= n-1
            frequency[j] = #occurrences of j in data[0..i-1]
    

We prove invariance of this statement using induction.

  1. Base Case:
    i = 0
    Arrays have been initialized and preconditions are met. The frequency array must have frequencies of all the data contained in data[0..-1] since i is 0. This is an array with zero elements. Since all data has occurred 0 times in such an array, the frequency array has the correct contents; it has been initialized so all its elements are zero (according to the preconditions). So the statement is true for i=0.
  2. Induction Hypothesis:
    Assume the statement is true at the beginning of an arbitrary iteration k, where k > 0:
                for all 0 <= j <= n-1
                    frequency[j] = #occurrences of j in data[0..k-1]
    	
  3. Induction Step:
    We show the statement is true at the start of the k+1-th iteration. on the i = k iteration, the following code is executed:
                frequency[data[k]] = frequency[data[k]] + 1
    	
    By inspection, we see that this is the correct processing for the k-th element of data. This means:
                frequency[data[k]] = #occurrences of data[k] in data[0..k]
    	
    Thus, given that the statement is true for k, we have shown it must be true for k+1.
  4. Induction Conclusion:
    The statement is true throughout the execution of the loop. In other words, it is invariant.
    Bonny Biswas
    Last modified: Wed Nov 11 18:26:01 EST 1998