The case for immutable dictionaries; and the central misunderstanding of PEP 351
[If this story looks too long to you, then skip to the summary (last paragraph)]
Immutable data types. Python has a nice variety of built-in data types. I know of lists, tuples, dictionaries,
sets, and frozensets; there may be more. Some are immutable versions of others: tuples are immutable lists, and
frozensets are immutable sets.
Restrictions like immutability provide invariants. One may wonder what is the purpose of immutable lists or sets, if there's simply less you can do with
them. The answer becomes immediately apparent when one asks an OOP programmer why he keeps
most variables private instead of public. Private variables only allow fewer uses than public variables - so why would one prefer private ones? Why
restrict oneself in such a way? The answer is that restricted access makes it possible to guarantee a variety of invariants. Invariants allow one to make assumptions about the state of objects,
and assumptions often pave the way for new functionality.
Advantage #1: safer aliasing. So what functionality is made possible by having immutable data types? Lots. In many functional languages, such as Lisp, ML, and Haskell, lists are composed of what Lisp
calls cons pairs, which specify both the first element and the remainder of the list (usually another
cons pair... etcetera). Because cons pairs are immutable, one can very cheaply create a new list, composed of some first object h prepended to an existing list
l: simply create a cons pair of h and l, and that cons pair represents your new list. Even though very little memory is set aside for this new list, nobody will have to worry
about the new list unintentionally being changed when l is changed by another part of the program, simply because l is immutable. If
cons pairs would have been mutable, creation of the list [h]+l would require creating an entire new list from scratch, with no memory
sharing possible. Such advantages are found in Python, too: having multiple references (a.k.a. aliasing) to an immutable object is less dangerous than
aliasing of a mutable object. Like in Lisp, it's not always entirely harmless, because the elements of the aliased list may still be mutable - it's
only mutation of the list itself that's prevented. Still, it's a significant advantage.
Advantage #2: hashability. The most directly visible advantage in Python is the fact that immutable objects may be hashable. If a container (such as a list or tuple) is guaranteed to never change its contents, and
if the contents (the contained objects) are guaranteed to never change either, and their contents (if any) are similarly immutable, etcetera
recursively, then absolutely nothing is ever going to change about that container. If absolutely nothing is ever going to change about an object,
Python is willing to include it in hash-based containers, such as dictionaries and sets. If one wishes to use an object in a hashing container, one must "promise" to
never change the object, or any object directly or indirectly contained in it. In other words, the object must be recursively immutable
(assuming no overriding of the __hash__ method). A Python programmer can promise not to change an object, by choosing an immutable data type for that
object.
Advantage #3: error detection. Another advantage, often emphasized by OOP programmers, is this: if an object is immutable, you'll get
an error message if you try to modify it by mistake. Such mistakes happen, and the earlier an error becomes apparent, the better.
Advantage #4: code analysis potential. At some time in the future, Python might introduce significant compile-time or even run-time partial evaluation-based optimization, or programmer-assisting analysis (see PyChecker for an example of static analysis). Automatic code analysis (automatic semantic
understanding) will be much more powerful when dealing with immutable objects.
Frozensets are always hashable. In short, immutable objects have a variety of advantages. That's why Python has not only set and list objects, but also
frozenset and tuple objects, which are simply immutable versions of the former. frozenset objects are quite interesting. Being a
set, they require that their contents are hashable (recursively immutable, usually). Being immutable, they themselves can't be changed either. Thus,
both the frozenset itself, and anything recursively contained, must be immutable. Thus, frozensets are guaranteed to be recursively
immutable and therefore hashable. Compare this to tuples (immutable lists), which are hashable only if their contents are all hashable.
Immutability is different from hashability. This advantage of frozenset objects, and some common uses of tuples, lead some
Python programmers to mistakenly think that the only advantage of immutable objects is their hashability. That's not true. It's one of the advantages,
and perhaps the biggest, but certainly not the only one. Even if an immutable container has mutable contents, and therefore can't be hashed, it still
serves a purpose. That's why Python has tuples, and doesn't require that their contents be hashable. Tuples don't guarantee
hashability. Nonetheless, they are very useful. Their own immutability is useful in itself, and if their contents are recursively immutable, they
become recursively immutable (and thus hashable) themselves. Again, the useful fact that frozenset objects are guaranteed to be hashable, is
not because they themselves are immutable. It's because of the interaction of two features: their immutability, and their being a set.
Python should introduce immutable dictionaries. Because immutable objects have so many good uses, it is my opinion that Python should introduce immutable dictionaries. Those would have the usual
advantages of being immutable (less dangerous aliasing, and early error detection), and they would be hashable if their contents are hashable. Like
tuples (immutable lists), they should not require that their contents be hashable. Immutable dictionaries, thus, should not
guarantee hashability. They should be hashable if and only if all of their contents are hashable - just like tuples.
Incorrect formulation in PEP 351. It is unfortunate that immutable dictionaries have been formally proposed for Python in a bad way, in PEP 351. That document makes the mistake of assuming that immutability serves only to make objects
hashable. Consequently, the proposal is a bad one, and was correctly rejected. The correct semantics for immutable dictionaries are that an immutable
dictionary is itself immutable, but does not require its contents (its "values") to be immutable or hashable. If some immutable dictionary
does have hashable values, it will be hashable after all. However, that should not be required.
Why a built-in is justified. Of course, it's easy to write a class that implements immutable dictionaries. One can simply inherit from
dict, replace the modification methods with something that raises an exception, and provide a __hash__ based on the hash of the keys and
values. However, the same is true about frozensets: they can easily be made on top of sets. The reason why Python should provide
immutable dictionaries built-in, is to encourage the use of immutable objects. Immutable objects have many advantages, some of which are listed above. (For the same reason, there should be a
map()-like function that takes a function and a tuple, and returns a tuple; but that's another story).
Summary. Frozensets are guaranteed to be hashable, but only because of an interaction between their immutability and their being a set. This
should not lead one to think that all immutable objects should be hashable. In fact, tuples aren't, and are still very useful (even those that
are not hashable). Python should introduce immutable dictionaries, which, like tuples, should not require their contents to be immutable. Those
immutable dictionaries should be hashable if and only if all of their contents are hashable - again, just like tuples.
Back