How are data represented internally?

Next: September 27 Up: September 25 Previous: Input/Output

How are data represented internally?

Physically data are stored in transistors that can be in an off (0, low voltage) or an on (1, or higher voltage). This allows a natural identification between collections of transistors and binary numbers (sequences of 0s and 1s, interpreted as a number base 2). You should practice a bit converting from familiar decimal (base 10) numbers to binary, and back again.

A char in C is represented by 8 bits (binary digits), or a byte. This can represent 256 distinct values, from 0-255. C specifies that chars in the range 0-127 represent ascii values for characters, and the remaining values are implementation dependent.

An int in C is represented by at least 16 bits, often 32 bits, and occasionally 64 bits. You can figure this out by evaluating sizeof(int) on a particular machine, which will tell you how many bytes an int occupies.

Suppose you have a 32-bit unsigned int. These will represent values from 0- $(2^{32} - 1)$ , and do arithmetic modulo $2^{32}$ -- in other words, $1 + (2^{32} - 1) == 0$ .

Signed int (the default) is a bit trickier. The most significant bit (msb) on the left is reserved to represent the sign: 0 for non-negative and 1 for negative. Non-negative ints in the range 0- $(2^{31} - 1)$ are represented by a 0 followed by their 31-bit binary representation. Negative ints with absolute values in the range 0- $2^{31}$ are represented by their two's complement -- $2^{32}$ minus their absolute value. This representation is efficient for hardware (essentially hardware subtraction becomes just addition).

Floats are commonly 32-bit, but you should check sizeof(float) on your platform. The first bit indicates the sign (1 for negative, 0 for positive). The next 32-FLT_MANT_DIG bits represent the exponent (the constant FLT_MANT_DIG is defined in float.h), usually to the base (aka radix) 2. Suppose we have an 8-bit exponent. Rather than being limited to exponents in the range 0..255 we shift (or bias) the exponent by subtracting 127, so we can represent exponents in the range -127..128.

The mantissa (which has FLT_MANT_DIG bits, or 24 bits in our example) represents the significant digits of our float as a binary number in the range $1 \leq$ num. The exponent is chosen so that sign $\times 2^{\text{exponent}-127} \times \text{1.\{mantissa\}}$ equals our float. Notice that all binary numbers that are no smaller than and smaller than have a leading bit 1. In floating point representation, this leading 1 is omitted (it's assumed), allowing us to represent a 24-bit number with 23 bits. For a 32-bit float this gives us 1 bit for the sign, 8 bits for the exponent, and 23 bits for the mantissa.

Perhaps you're wondering: how could you ever represent 0 with a mantissa that has a leading 1? The answer is that there is one further trick. A floating point number with all zeroes in the exponent is interpreted as $2^{-126}$ $\times$ 0.{mantissa}. In this way, very small floats (including 0) can be represented.

Next: September 27 Up: September 25 Previous: Input/Output

Danny Heap 2002-12-16