In this assignment, you will use fread()
,
fwrite()
, and fseek()
to implement operations
on fixed-length records in a binary data file.
As usual, you should aim for reasonably efficient algorithms and reasonably organized, comprehensible code.
Correctness (mostly auto-testing) is worth 90% of the marks; code quality is worth 10%.
In short, implement the Heapsort algorithm, except using random
access (fseek()
) on a file instead of an in-memory
array!
See my old lecture slides on max-heaps and Heapsort.
I strongly encourage you to write a helper function for “heapify at v” because it is needed at two places, and it is the most annoying part of the algorithm, so you don’t want to repeat the code twice. Other helper functions can also be good ideas.
Although Heapsort is not a practical choice for files, it is still a good exercise on the nuances of random access on files. (Perhaps next year we will do a more justifiable but harder example, such as B-trees!)
The file is a binary file of 0 or more customer records, each record
matching the C struct customer
in customer.h. The size
conspires to be 48 bytes on the platform we use.
Of note is that customer name is not always a NUL-terminated C
string. If the name is short, it is true that the remainder of the array
is filled with NUL, but this is only for best practice in privacy. The
longest name can really have length CUSTOMER_NAME_MAX
, in
which case there is no NUL right after.
And easy way find out the number of records in the file is:
fseek()
to the end, then ftell()
, divide by
sizeof(customer)
.
We want to sort the customer records by loyalty points in increasing order; among customers with the same loyalty points, sort by names in alphabetical order. It is also possible for multiple records to have the same loyalty points and the same name; then none of them should be dropped.
There are many good ways to examine the file format when you debug.
You already know about the od
program. Another choice is
hexdump -C
(see man hexdump
). You should also
write your own C code to print a record in whatever human format you
understand.
A sample file sample.dat is provided, but only to the point of exemplifying the file format. It has way too few records for testing your sorting code. You should write your own code to generate more records for testing.
Here are the customers in the sample file:
name | loyalty |
---|---|
Dennis Ritchie | 1926 |
Charles Anthony Richard Hoare | 71934 |
The Boy Who Lived And Defeated The Dark Lord | 305419896 |
Alan Turing | 305419896 |
The sorting function (the function I will call for marking) is to be:
int heapsort(const char *filename);
It should return 0 (C false) if there is an error, 1 (C true) otherwise.
For simplicity, the only error you are required to catch is when
fopen()
fails. Actually this is to help you catch your own
typos in filenames in your own tests. But you should also check errors
for fseek()
and fread()
because they almost
certainly mean you have bugs.
Marking will be done under a severe memory limit (4000KB) with much larger files; it will also be in a docker container with little or no spare disk space.
Your code will be compiled with -O2 -Wall
when
marking.
You can check memory usage with e.g.
/usr/bin/time -f%M ./a.out
It prints out how many kilobytes of memory you used.
You can test under the memory limit with e.g.
bash -c 'ulimit -v 4000; exec ./a.out'
ulimit
is a bash command; see help ulimit
and perhaps the textbook [PI] Chapter 36 (especially about
RLIMIT_AS).
If you like to print debugging or error messages for your own sake, please send them to stderr only.
Please hand in heapsort.c containing heapsort()
and
helper functions.