How to write a Lisp evaluator in Haskell

I promised my students last semster that I’d show how to write (some of) a Lisp in Haskell, in order to demonstrate the usefulness of algebraic data types for prototyping structured applications – and also for fun, obviously. Source code for everything in this post can be found here.

This short tutorial is targeted at Haskell beginners, and doesn’t even use a single monad, unlike more popular and full-featured tutorials like write yourself a scheme and write you a haskell. I achieve brevity and simplicity in the following way:

Only implementing the “EPL” part of the REPL, because parsing is Boring.
Not implementing macros.
Cutting lots of corners.

Ready? Let’s begin.

Abstract syntax

We’re going to assume somebody already implemented the Lisp’s Reader for us; it has the following type:

reader :: [Char] -> Program

i.e. the Reader converts a character stream into a Haskell data structure which we can manipulate. So what’s a Lisp program? Why, nothing more than a sequence of S-Expressions.

type Program = [SExp]

data SExp = Literal Integer  -- e.g. 5, 42
          | Symbol String    -- e.g. x, first, lambda, -
          | SExp [SExp]      -- e.g. (+ (* 3 2) 4), (define fact (lambda n (fact (- n 1))))

Note that we are using Strings in our compiler to denote Lisp symbols; the Lisp we are writing has no string primitives of its own, although there’s no reason why it can’t.

It’s handy to implement an instance of Show for SExp (omitted) for debugging.

Values

Now let’s define datatypes for values, i.e. the thing the S-Expressions evaluate into. This is also where we would define a type for macros… if we had any.

data Value = Number Integer 
           | List [SExp]
           | Lambda (Value -> Value)

-- omitted implementation of `Show` --

Our syntax for lambda functions will look like (lambda x (+ x 2)), and only takes one argument for simplicity. To make functions with multiple arguments one could just write functions that just return functions, e.g. (lambda x (lambda y (+ x y))). We’ll introduce currying by default, so

(f x y z)

is equivalent to

(((f x) y) z)

We also need to encapsulate the symbol table in a type, which associates Values to their identifiers. Presumably the symbol table already has some definitions in it before the program is evaluated, e.g. with definitions of + and cons.

type SymbolTable = [(String, Value)]

Evaluation

Evalutating an S-Expression (in this language at least) always produces a value and sometimes also updates the symbol table (e.g. when we define a new function). Hence we have the type:

-- Yes, a State monad is applicable here, but it's not really necessary
-- for this exposition.
eval :: SExp -> SymbolTable -> (Value, SymbolTable)

Now all we need to do is define evaluation on a case-by-case basis for each kind of SExp. The first two are trivial to define:

-- If the expression is just an integer, it has no effect on the symbol table
-- but we should print out that integer.
eval (Literal n) table = (Number n, table)

eval (Symbol s)  table = 
  case (lookup s table) of
    Just value -> (value, table)
    Nothing    -> error ("The value " ++ s ++ " is not defined! Aborting.")

Certain keywords are neither functions nor macros, but built right in to the language. Specifically, we’ll add define, quote, and lambda. I leave let and defmacro as exercises.

In the interest of simplicity, our language will require that define be a top-level expression (though we won’t throw an error if it’s not). Hence something like (+ (define x 3) 4) will not update the symbol table.

eval (SExp (Symbol "define" : Symbol s : expr : [])) table = 
  let value = fst $ eval expr table
  in
  -- Return the new value and update the symbol table, overwriting the old 
  -- definition in the stupidest way possible.
  (value, (s, value) : table)
  
-- Note that in real Lisps, the expression '(1 2 3) is a *reader shorthand*
-- that expands to (quote 1 2 3).
eval (SExp (Symbol "quote" : exprs)) table = (List exprs, table)

Now, lambda is a bit less straightforward. To evaluate a term like (lambda x (+ x 2)) we need to construct a Haskell function of type Value -> Value which will associate the given value with the symbol x. Luckily, eval does just that.

eval (SExp (Symbol "lambda" : Symbol s : expr : [])) table =
  let fn = \value -> fst $ eval expr ((s, value) : table)
  in
  (Lambda fn, table)

And finally, the general case, where the first value evaluates to some function and we do curried function application.

-- This is the general case, 
eval (SExp (car : cdr)) table =
  let (fn, _) = eval car table
      args    = map fst $ map (\expr -> eval expr table) cdr
  in
  (applyRecursive fn args, table)


-- This will get us Curried function application.
applyRecursive :: Value -> [Value] -> Value
applyRecursive f args = foldr next f args where
  next (Lambda fn) value = fn value
  next not_a_function _  = error ("Cannot apply non-function " ++ (show not_a_function))

(R)EPL

All that’s left is to show how to evaluate a sequence of S-Expressions.

repl :: Program -> [Value]
repl prog = fst $ foldr eval_and_print (customPrelude, []) prog where

  eval_and_print :: SExp -> ([Value], SymbolTable) -> ([Value], SymbolTable)
  eval_and_print expr (values, table) =
    let (newval, newtable) = eval expr table
    in
    (newval : values, newtable)

It’s not terribly fun to write some primitive functions like + and cons in the meta-language, but certainly possible. We’d put load them all into the initial symbol table customPrelude :: SymbolTable mentioned in repl.

main = putStrLn $ repl 
  [SExp [Symbol "+", Literal 5, Literal 2]]
-- Prints "[7]"