The case for immutable dictionaries; and the central misunderstanding of PEP 351

[If this story looks too long to you, then skip to the summary (last paragraph)]

Immutable data types. Python has a nice variety of built-in data types. I know of lists, tuples, dictionaries, sets, and frozensets; there may be more. Some are immutable versions of others: tuples are immutable lists, and frozensets are immutable sets.

Restrictions like immutability provide invariants. One may wonder what is the purpose of immutable lists or sets, if there's simply less you can do with them. The answer becomes immediately apparent when one asks an OOP programmer why he keeps most variables private instead of public. Private variables only allow fewer uses than public variables - so why would one prefer private ones? Why restrict oneself in such a way? The answer is that restricted access makes it possible to guarantee a variety of invariants. Invariants allow one to make assumptions about the state of objects, and assumptions often pave the way for new functionality.

Advantage #1: safer aliasing. So what functionality is made possible by having immutable data types? Lots. In many functional languages, such as Lisp, ML, and Haskell, lists are composed of what Lisp calls cons pairs, which specify both the first element and the remainder of the list (usually another cons pair... etcetera). Because cons pairs are immutable, one can very cheaply create a new list, composed of some first object h prepended to an existing list l: simply create a cons pair of h and l, and that cons pair represents your new list. Even though very little memory is set aside for this new list, nobody will have to worry about the new list unintentionally being changed when l is changed by another part of the program, simply because l is immutable. If cons pairs would have been mutable, creation of the list [h]+l would require creating an entire new list from scratch, with no memory sharing possible. Such advantages are found in Python, too: having multiple references (a.k.a. aliasing) to an immutable object is less dangerous than aliasing of a mutable object. Like in Lisp, it's not always entirely harmless, because the elements of the aliased list may still be mutable - it's only mutation of the list itself that's prevented. Still, it's a significant advantage.

Advantage #2: hashability. The most directly visible advantage in Python is the fact that immutable objects may be hashable. If a container (such as a list or tuple) is guaranteed to never change its contents, and if the contents (the contained objects) are guaranteed to never change either, and their contents (if any) are similarly immutable, etcetera recursively, then absolutely nothing is ever going to change about that container. If absolutely nothing is ever going to change about an object, Python is willing to include it in hash-based containers, for example making it a key in a dictionary or an element in a set. If one wishes to use an object in a hashing container, one must "promise" to never change the object, or any object directly or indirectly contained in it. In other words, the object must be recursively immutable (assuming no overriding of the __hash__ method). A Python programmer can promise not to change an object, by choosing an immutable data type for that object.

Advantage #3: error detection. Another advantage, often emphasized by OOP programmers, is this: if an object is immutable, you'll get an error message if you try to modify it by mistake. Such mistakes happen, and the earlier an error becomes apparent, the better.

Advantage #4: code analysis potential. At some time in the future, Python might introduce significant compile-time or even run-time partial evaluation-based optimization, or programmer-assisting analysis (see PyChecker for an example of static analysis). Automatic code analysis (automatic semantic understanding) will be much more powerful when dealing with immutable objects.

Frozensets are always hashable. In short, immutable objects have a variety of advantages. That's why Python has not only set and list objects, but also frozenset and tuple objects, which are simply immutable versions of the former. frozenset objects are quite interesting. Being a set, they require that their contents are hashable (recursively immutable, usually). Being immutable, they themselves can't be changed either. Thus, both the frozenset itself, and anything recursively contained, must be immutable. Thus, frozensets are guaranteed to be recursively immutable and therefore hashable. Compare this to tuples (immutable lists), which are hashable only if their contents are all hashable.

Immutability is different from hashability. This advantage of frozenset objects, and some common uses of tuples, lead some Python programmers to mistakenly think that the only advantage of immutable objects is their hashability. That's not true. It's one of the advantages, and perhaps the biggest, but certainly not the only one. Even if an immutable container has mutable contents, and therefore can't be hashed, it still serves a purpose. That's why Python has tuples, and doesn't require that their contents be hashable. Tuples don't guarantee hashability. Nonetheless, they are very useful. Their own immutability is useful in itself, and if their contents are recursively immutable, they become recursively immutable (and thus hashable) themselves. Again, the useful fact that frozenset objects are guaranteed to be hashable, is not because they themselves are immutable. It's because of the interaction of two features: their immutability, and their being a set.

Python should introduce immutable dictionaries. Because immutable objects have so many good uses, it is my opinion that Python should introduce immutable dictionaries. Those would have the usual advantages of being immutable (less dangerous aliasing, and early error detection), and they would be hashable if their contents are hashable. Like tuples (immutable lists), they should not require that their contents be hashable. Immutable dictionaries, thus, should not guarantee hashability. They should be hashable if and only if all of their contents are hashable - just like tuples.

Incorrect formulation in PEP 351. It is unfortunate that immutable dictionaries have been formally proposed for Python in a bad way, in PEP 351. That document makes the mistake of assuming that immutability serves only to make objects hashable. Consequently, the proposal is a bad one, and was correctly rejected. The correct semantics for immutable dictionaries are that an immutable dictionary is itself immutable, but does not require its contents (its "values") to be immutable or hashable. If some immutable dictionary does have hashable values, it will be hashable after all. However, that should not be required.

Why a built-in is justified. Of course, it's easy to write a class that implements immutable dictionaries. One can simply inherit from dict, replace the modification methods with something that raises an exception, and provide a __hash__ based on the hash of the keys and values. However, the same is true about frozensets: they can easily be made on top of sets. The reason why Python should provide immutable dictionaries built-in, is to encourage the use of immutable objects. Immutable objects have many advantages, some of which are listed above. (For the same reason, there should be a map()-like function that takes a function and a tuple, and returns a tuple; but that's another story).

Summary. Frozensets are guaranteed to be hashable, but only because of an interaction between their immutability and their being a set. This should not lead one to think that all immutable objects should be hashable. In fact, tuples aren't, and are still very useful (even those that are not hashable). Python should introduce immutable dictionaries, which, like tuples, should not require their contents to be immutable. Those immutable dictionaries should be hashable if and only if all of their contents are hashable - again, just like tuples.


Back