Reading and writing composite things to a file


You can "write" a bunch of stuff at once

You know by now that you can use "write" to write something out in binary. Eg:
     fs.write (&i, sizeof(int));
But "write" can also be used to write out composite structures, such as arrays. You simply tell it what memory address to start at, and how many bytes to write out. For example, if you had an array of 100 integers called "A", you could say:
     fs.write (A, 100*sizeof(int));
This works because the contents of an array are guaranteed to be contiguous in memory. (You don't explicitly take the address of the array because an array name can be used as a pointer to its first element.) You can even use "write" to write out a struct or an object. For example, suppose I have a class called Fraction. I could say:
      fstream fs;
      Fraction f;
      ...

      fs.write( &f, sizeof(Fraction) );
This will write out, in binary, the instance variables inside f. Note that you should never try to compute the size of an object by adding up the sizes of its instance variables, because some extra memory may be used just to make things line up with convenient boundaries in memory.

Watch out if there are any pointers inside!

There is one major limitation with using "write": Whether you're writing a single variable, an array, a struct, or an object, if it contains a pointer to something in memory, all that will be written to the file concerning that thing is the pointer itself.

For example, if you have an array of "char *"s and you write it out, all that will go into the file is the pointers, not the character arrays that they point to. So the information about those character arrays is lost as soon as the program ends. On top of that, the pointers that were written to the file are completely useless the next time you read them in, because they point to memory that you were using the last time you ran the program. Who knows what's there now?!

So if you have an array, struct or object that contains a pointer, you'll have to create a function that writes out it's components individually, making sure to follow any pointers and print out the thing pointed to, rather than the pointer itself. Any or all of this can still be written in binary format, it just has to be done piece by piece. If you are using an object of a class that you didn't write, you should assume it contains pointers and don't write it using a single "write" with sizeof the classname. Instead, use one of the class's own output functions.


What about "empty" spaces?

What if you use "write" on something that hasn't been completely filled with values. As a simple example, imagine we have an array of 100 ints and we've put values into only the first 36 slots. If we use "write" to write out the entire array -- all 100 slots -- there is no problem. The last 64 slots have something in them, it's just meaningless junk, and that's what will get written out. We can later read the array back into memory and we'll get the first 36 slots filled with our integers, and the last 64 filled with exactly the same meaningless junk. This is not a problem, as long as we know not to look past the first 36 slots; and we needed to know this even when we were just dealing with the array in memory. Of course, once we write the array out to the file and stop the program, we've lost track of the number 36. So we probably would want to write it to the file also.

You may wonder why one would ever write out all 100 slots of an array when only 36 slots are in use. We might do this if we wanted to reserve room in the file for the array to expand. By writing out the "junk" slots, we make sure that the next thing in the file goes after them. If we need to, we can seek back to a junk spot and put some real data there. This kind of thing is done, for example, when we write a B-tree node or a hash table bucket, as you have seen (or will see) in lecture.


Only "read" what you've created using "write"

If you need a data file that you plan to read into memory (in whole or in part) using read, you should create that data file using write.

Why? Say some bytes from your data file are to be read into a struct. (The same observations hold for an instance of a class.) Even if you knew how to create a binary file using some editor, you'd still have to know exactly how many bytes to use for each member of the struct. This isn't as simple as it seems, because extra bytes may be used just so that a struct or part thereof uses some multiple of k bytes, for some k. [See the first Q&A question on page 353 of King.]

To see an example of this, make an instance of this struct and write it to a file:

        typedef struct {
                int id;
                char a[5];
        } rec;
Then use od to see how many bytes are there and what is in them. Try this again with array sizes other than 5 for member `a' of the struct.

Using write to create an initial data file isn't a big problem. You could write a little program that prompts you for the data, puts it into appropriate variables(s) in memory -- perhaps into a struct -- and then writes it. Or you could have your program randomly generate data values to write.