CSCB09 2023 Summer Assignment 2

Due: July 1 Saturday 11:59PM
This assignment is worth 10% of the course grade.

In this assignment, you will use fread(), fwrite(), and fseek() to implement operations on fixed-length records in a binary data file.

As usual, you should aim for reasonably efficient algorithms and reasonably organized, comprehensible code.

Correctness (mostly auto-testing) is worth 90% of the marks; code quality is worth 10%.

Heapsort

In short, implement the Heapsort algorithm, except using random access (fseek()) on a file instead of an in-memory array!

See my old lecture slides on max-heaps and Heapsort.

I strongly encourage you to write a helper function for “heapify at v” because it is needed at two places, and it is the most annoying part of the algorithm, so you don’t want to repeat the code twice. Other helper functions can also be good ideas.

Although Heapsort is not a practical choice for files, it is still a good exercise on the nuances of random access on files. (Perhaps next year we will do a more justifiable but harder example, such as B-trees!)

File And Record Format

The file is a binary file of 0 or more customer records, each record matching the C struct customer in customer.h. The size conspires to be 48 bytes on the platform we use.

Of note is that customer name is not always a NUL-terminated C string. If the name is short, it is true that the remainder of the array is filled with NUL, but this is only for best practice in privacy. The longest name can really have length CUSTOMER_NAME_MAX, in which case there is no NUL right after.

And easy way find out the number of records in the file is: fseek() to the end, then ftell(), divide by sizeof(customer).

We want to sort the customer records by loyalty points in increasing order; among customers with the same loyalty points, sort by names in alphabetical order. It is also possible for multiple records to have the same loyalty points and the same name; then none of them should be dropped.

There are many good ways to examine the file format when you debug. You already know about the od program. Another choice is hexdump -C (see man hexdump). You should also write your own C code to print a record in whatever human format you understand.

A sample file sample.dat is provided, but only to the point of exemplifying the file format. It has way too few records for testing your sorting code. You should write your own code to generate more records for testing.

Here are the customers in the sample file:

name loyalty
Dennis Ritchie 1926
Charles Anthony Richard Hoare 71934
The Boy Who Lived And Defeated The Dark Lord 305419896
Alan Turing 305419896

Further Specifications

The sorting function (the function I will call for marking) is to be:

int heapsort(const char *filename);

It should return 0 (C false) if there is an error, 1 (C true) otherwise.

For simplicity, the only error you are required to catch is when fopen() fails. Actually this is to help you catch your own typos in filenames in your own tests. But you should also check errors for fseek() and fread() because they almost certainly mean you have bugs.

Marking will be done under a severe memory limit (4000KB) with much larger files; it will also be in a docker container with little or no spare disk space.

Your code will be compiled with -O2 -Wall when marking.

You can check memory usage with e.g.

/usr/bin/time -f%M ./a.out

It prints out how many kilobytes of memory you used.

You can test under the memory limit with e.g.

bash -c 'ulimit -v 4000; exec ./a.out'

ulimit is a bash command; see help ulimit and perhaps the textbook [PI] Chapter 36 (especially about RLIMIT_AS).

Debugging And Error Messages

If you like to print debugging or error messages for your own sake, please send them to stderr only.

Handing In

Please hand in heapsort.c containing heapsort() and helper functions.