This place provides the following:
To understand the proposed changes to the Haskell standard, just consider that the standard is already biased towards the facilities provided in the current Prelude:
The current state of the library is "just after a redesign": I have complete formal specification of the library interface and I expect them to be stable. The implementations, however have not been updated (they still work with old specifications from other modules) and some minor functions have not yet been integrated into the new interface specification. The implementation already contains all major algorithms (and they work for the old interface whose functions are in detail almost the same). Perhaps I will change it's structure when adapting it to the new interface. I will do this some time later, that's why I already put all the code here.
The current specification is here: AlgebraBasic.hs (Collection and subclasses with extensive rationale), AlgebraMap.hs (OrdMap), Relation.hs (auxiliary functions on Sets of Pairs). These files don't represent the final module structure, see below.
Collection
^
/ \
/ \
/ \
Sequence OrdColl OrdMap
^
/ \
/ \
/ \
OrdBag OrdSet
Axioms:
Most programs will use the standard instances of the Collections so they need only write import Collection (or import Prelude depending on the standard) and it will work. One can get further instances just by importing another module, which will only export the instance(s), type name(s) and aliases of constructor function(s) which are restricted to the concrete type, so one can write deque [1, 2, 3] instead of [1, 2, 3] :: (Num a) => Deque a. (The general type would be [1, 2, 3] :: (Num a, Collection coll a) => coll a.)
OrdMap is not an instance of Collection since type constructor is of another kind. In imperative Dessy that doesn't hurt, there we have class MAP[A, B] inherit TRAVERSABLE[TUPLE[A, B] ... , but Haskell's type system really doesn't allow a similar construction, neither do I see a simple extension which would allow it. Therefore OrdMap does repeat the functions and laws from Collection and also the functions on Relations (Sets of Pairs). This creates problems of naming which are treated below. Furthermore, it would be nice to also have Relations as a type class, so we could specialise the algorithms for different types of sets and do other nice things, e.g.: store the relation together with its inverse, so we can have fast images in both directions. However, making Relation into a type class would destroy all its connection to Sets (like the OrdMap) and that would mean losing to much expressive power: alone \/ and /\ are worth it.
It is interesting to compare Dessy's classes with those of Okasaki's Edison library. Okasaki makes two further distinctions, which Dessy doesn't have: First, he has unordered collections besides the ordered ones, and second, he distinguishes between collections whose elements are observable and collections whose elements are not observable. Okasaki judges this necessary because the collection implementations use the equality function == to distinguish among different elements, but it may be that this relation is two weak and two "==-equal" elements are actually different.
In Dessy on the other hand all Collections are ordered (a rationale is given in AlgebraBasic.hs) and it is assumed that "==-equal" elements are really indistinguishable. This is for two reasons: first, we assume that all types are abstract data types that distinguish between the representation and an abstract view. If two values with different representations are equal by == then they are indistinguishable if we observe them only with abstract functions. Without this property we wouldn't have the law x == y ==> f x == f y and reasoning about programs would be infeasible. Second, if wanted to distinguish between "==-equal" elements and specify the Abstract Collections, we would have to introduced some stronger equality relation to express the (now more complicated) behaviour of the data structures. But this relation would only exist in the specification and not be testable, nor be of much use for the user. In other words: to make our classes as simple and useful as possible, we make an assumption about the client code, namely that he isn't interested in the difference of "==-equal" elements. This is nothing but a simple precondition. Long live Design by Contract.
Sequence: first, but_first, last, but_last, front, back, OrdColl: minimum, but_min, maximum, but_max, smaller, larger, smallest, largestHowever, since the six common functions are already defined in class Collection (same semantics on all Collections), we perhaps don't need the synonyms. In any case, we can give up the notion of "left" and "right" when splitting a collection: its the front and back part (or the smaller and larger part) instead.
Dessy's names are designed such that normal users can use it without the need for qualified imports. This is because:
For the collection hierarchy this works very fine, because names are shared via the super class and they also share a common semantics. Furthermore, some functions are sufficiently different as to merit completely different names:
However, there are naming conflicts between Collection and OrdMap, and between Relation and OrdMap, where the functions really have the same semantics, but cannot share names via a type class. In this cases I considered two alternatives: first, suffixing the functions as I did in OrdMap. This is suggested by some small semantic differences between those functions and the Collection equivalents (e.g. apply_map doesn't change the domain of a Map). And it gives nicely pronounceable names because its a verb-object composition: fold_map, split_map, ..., or a predicate-noun composition: empty_map, single_map, is_empty_map. Only size_map would rather be map_size, that's why I don't have this functions yet.
And as another alternative I used qualified names: most functions on Relations (Sets of Pairs) have exactly the same semantics as those on OrdMap, precisely we have: f = relation_map . Relation.f . map_relation. That's why I used qualified names for the functions on Relations.
(Internally, Dessy uses qualified names since many helper functions have the same name as functions on Collection.)
The text of this section is partly obsoleted by what is written in AlgebraBasic.hs. Nevertheless it might be interesting to the reader.
Terminology until I find a better one: The folding function is the function that folds. The folding operator is the function with which is folded.
Folding can be very simple: think of the reduction operator of APL or the operators ∑, ∏, ∀, ∃ in mathematics. And the specification of fold (+) collection "just insert (*) between the elements of collection" is equally trivial. But on the other hand, Haskell has already five different folding functions for its functional stack implementation of Sequences (usually called "list") alone. You can imagine how many functions more we would get when doing this for any implementation of our three general collection classes!
Consequently, to determine the specifications of the folding functions for our Collections we have to reengineer the ideas behind the conventional folding functions and see which of them apply to the generalised Collections. It is already clear, that some implementation specific functions will not be carried over.
Haskells folding functions are especially hard to translate to other data structures, because the form a menu that has been designed according to many aspects. The function foldr for example is isomorphic to the data definition of lists: replace [] by e, : by f and [a] by the recursive call. If we were to design folding functions for our new data structures along this lines, we would have to offer a folding function for each of the many concrete data structures. Incidentally such folding functions are helpful internally to implement the abstract data structures. But the problem is: what folding functions should the abstract data structures provide?
But foldr has another interesting aspect: it is lazy. If the folding operator is lazy, too, foldr will consume the argument list piecewise as results are needed. On the other hand, a good way to explain foldl is to say that it is an optimised version of foldr which exploits the associativity of the folding operator and uses and accumulator to provide imperative-like efficiency: constant space and linear time with a loop-like constant factor. But again this is only one aspect: foldl has also useful applications for non-associative folding operators. (Even more complicated: some of these non-associative folding functions g have been chosen such that foldl g d = foldr f e, where the right-hand side is a specification for given f and e and the left hand side is a more efficient implementation for g and d to be calculated. I believe Horner's algorithm for the evaluation of polynomials can be derived this way.)
So we are unsatisfied with the implementation-dependency of the foldr function, which supposes a [] and a (:) data constructor. Our abstract Sequences on the other hand don't prefer one constructor over another: cons, snoc and (++) are all of the same status and so are the corresponding dissection functions. This means that for Sequences there is no more "standard parenthesising" and therefore a folding function should be expected to put the operator anywhere between the collection elements. Incidentally, if we look at the Prelude uses of fold then most of the non-associative uses are with the operator (:) -- and in a world of Sequences they would be written in terms of (++) and the problem would have disappeared. Indeed the only other use of foldl with non-associatiative folding function in the Prelude is the calculation of the n-ary value of a number given as a list of digits. And this algorithm is identical to the Horner algorithm just mentioned.
It is interesting to look at the implementations of map and filter on the data types (LeafTree tree) => tree a and (LeafTree tree) => Maybe (tree a):
tmap f = leaf_fold (leaf . f) (<+>) tfilter p = leaf_fold (\x -> if p x then leaf' x else Nothing) (<*>) map = fmap . tmap filter = (>>=) . tfilter
First we note that the implementation using leaf_fold is more straight-forward than the one using traditional folds. And the lifting of the functions to the Maybe type is also very instructive: map is just a combination of fmap on Maybe and tmap on Leaf Trees. And filter uses the Maybe-Monadic >>= to "merge" a Nothing argument with a Nothing result from tfilter to one single Nothing result for filter.
The fact that Leaf Trees are already an abstraction over the concrete underlying trees and that Leaf Trees are especially designed to form a useful algebra, suggests that we should define our folding function along this algebra. The three constructors Empty, Single and Disjoint Union are not only valid for Sets (as used by Hinze), but they are also valid for Maps and Sequences (with ++ instead of Disjoint Union). (Interesting point: Sets and Maps have also the \/ (union) or <+ (functional overwrite), respectively, whereas Sequences have no equivalent. In Dessy's internal design this is reflected in the Ordered instance of LeafTree, where the <+> and <*> operators change from disjoint to merging semantics.)
An interesting remark: For all of the associative operators used with fold, the starting element e forms the unit of the operator. This means precisely that we have fold1 f = fold f e, or in other words: the starting element will only be needed to fold the empty Collection (then it will be the result), it is never needed to fold a non-empty collection. This fits particularly well with our view of Collections as Leaf Trees:
reduce1 :: (LeafTree tree) => (a -> a -> a) -> tree a -> a reduce1 = leaf_fold id reduce :: (LeafTree tree) => (a -> a -> a) -> Maybe (tree a) -> a reduce f unit = maybe unit (reduce1 f) -- or even shorter: reduce = flip maybe . reduce1
It seems therefore, that these are the best specifications for reduce. It seems at first to be a huge restriction, that the result type of reduce is the same as the element type, but that corresponds exactly to the associative function restriction and for other cases we still have leaf_fold which generalises the fold on lists. The few non-associative operators that won't work with this functions can easily be programmed using inorder and the traditional folders.
A generalised folding functions which takes three arguments corresponding to the three constructors of non-empty Leaf Trees would look like this:
gen_fold e f g = maybe e (leaf_fold f g)
Since Sets in general have no ordering, we would expect that the corresponding reduce functions would only work for commutative operators. But since in our algebra Sets are always ordered, we can have the same semantics as for Sequences. The only thing, one must know is that min = first, but on the other hand, one must also know that the reduced front (or smaller part) is the first argument of the reduction operator, and the reduced back is the second part. Commutativity is only needed in some laws:
-- on (non-empty) Sequences reduce1 (*) (a ++ b) = reduce1 (*) a * reduce1 (*) b -- on (non-empty) OrdSet, if (*) is commutative or a <= b reduce1 (*) (a \/ b) = reduce1 (*) a * reduce1 (*) b
At the moment, Functional Dessy is written in pure Haskell 98. I think this is also important to validate the design: dependence on non-standard mechanisms may mean it's too complicated. However, they are several (already proposed and also implemented) language features that would make Dessy's code simpler.
To make the Collection classes usable at all, we need manifest notations for the abstract data types and rules to choose standard instances. Both are already implemented for the number classes, e.g. 1 :: (Num a) => a, and show 1 will automatically use 1 :: Integer. Likewise we should have [.., ..] :: (Sequence seq) => seq a and use a democratic standard implementation. (This is no unjustified preferring of one implementation over another. Having [.., ..] always denote functional stacks is much worse!) Perhaps we should also have manifest syntax for Sets and Maps, but I think writing set [.., ..] is not demanded too much. These changes in the language are really easy to make and absolutely consistent with the treatment of the numeric classes!
Most of Dessy's data structure implementations are based on a framework of abstract leaf trees, based on an idea of Ralf Hinze. The framework combines Leaf Trees (used to implement Sequences), and Search Trees (used to implement Sets, Bags and Maps). On these leaf trees I implemented two balancing algorithms: one weight-based, and one level-based (with an invariant stolen from Andersson). Both balancing algorithms are shorter and more functional than anything that was published before. I'm currently preparing them for publication.
The Dessy release will also contain a module "Deque", which provides the Sequence implementations List, Tsil, Deque and SDeque, where ...
data List a = Nil | Cons a (List a) -- Traditional lists
-- should probably call them "stack"
data Tsil a = Snoc (Tsil a) a | Lin -- the inverse
... and Deque and SDeque are the implementations currently living in democratic/deque.hs. I propose to abandon the special syntax for traditional lists and use the manifest list notation for Sequences instead.
The current implementations live in the following modules:
Collection,
Sequence,
OrdSet,
OrdMap,
LeafClass,
LeafTreeTools,
AnderssonTree
where the latter three are internal (i.e. offer nothing useful to most
clients). A separate note contains the
discussing of Sequence implementations and the correct proof and
derivation of the weight-based balancing algorithm.
I believe that any kind of overloading that is not based on Desing by Contract (in Haskell that means: type classes with algebraic specifications) is pure syntatic sugar. It makes short programs shorter and complex programs shorter and harder to understand.