Language design is library design. — Bjarne Stroustrup
I'm tired of repeating myself: isn't that what computers are for? — Me
While programmers might not use the terminology "language extension", their main activity is extending their programming languages: defining new concepts by combining the existing named concepts of the language, and naming the combinations. A library defines an extended language, by collecting a particular set of newly defined concepts. Languages differ in the kinds of concepts that can be defined and the separations imposed on them.
For example, Java allows the definition of primitive values, reference values, methods and classes (among others). Naming and using values is distinct from naming and using methods, classes and interfaces. Values are named by declaring and assigning to a variable, and values may be passed around (as parameters and return values). Methods have the same set of allowable names, but a separate syntax for naming. Methods may not be passed around.
Java has also named the concepts if, for, while, etc.
But the user can't define new concepts of this kind.
For example, algorithms expressed with "until" can't be reflected directly in Java.
Unlike the specific code patterns that can be specifically captured by methods or classes,
the user can't put the "until" code pattern into a library for reuse.
Instead, the user writes out the pattern each time manually, rather than defining it as "while not",
naming it "until", putting it into a library and simply referring to it.
Much (but not all) Design Patterns work is due to the inability to put richer code patterns into the libraries of mainstream languages. Many design patterns are essentially instructions for the user on how to manually generate the pattern code each time they want to use it.
We will take existing languages (I'm switching back now to the usual definition of a programming language, i.e. C, Python, Java, etc, not the language plus new libraries). and expand the kinds of concepts that the user can define in them (and also make the definition of those concepts more uniform).
Before I discuss the implementation of the approach (Steps 1 and 2 below), let me give you a preview of some of its properties and link to existing work:
Preople have already done this for:
There is room to contribute to those projects, write about them (perhaps comparing the finer details of their approaches), or explore them by writing some `real' code using them. We could also try our own variations of the approach for those languages. (This is also a natural place to make note of Linj, a Lisp-like language, with sexp syntax and macros, but a semantics designed for easier mapping to human-readable Java and a compiler that generates such Java. Also, there's Felix, which seems to have some sort of extensibility).
We start by designing a uniform and more explicit syntax for an existing language, using S-expressions (sexps). Then we write a translator from our syntax into the original syntax: this is fairly easy (compare with, e.g., the reverse process) since we've designed an explicit syntax and sexps are a very explicit representation of tree structure.
For example, the SC Language System does this for C. "Experience with SC: Transformation-based Implementation of Various Extensions to C" has an example comparing the SC syntax with the usual syntax.
In the SC syntax:
(def (sum a n) (fn int (ptr int) int)
(def s int 0)
(def i int 0)
(do-while i
(if (>= i n) (break))
(+= s (aref a (inc i))))
(return s))
Corresponding generated C code:
int sum (int* a, int n) {
int a = 0;
int i = 0;
do {
if (i >= n) break;
s += a[i++];
} while (1);
return s;
}
They use Common Lisp to do the translation, citing two features:
Some other languages (not just Lisps) have libraries for sexps as well, though they vary in how well integrated sexps are (especially how well sexps can be specified as literals in source code).
One could limit the use of sexps to parsing, then translate to XML. The syntax of XML was not meant to be written and read by humans, so we definitely want to stick to sexps for our language syntax. The semantics of sexps and XML are similar enough that internally the choice is less important. However, while libraries for XML are now common in Lisps and non-Lisps, specifying XML literals in source code is usually much less transparent than specifying sexp literals in Lisps.
As for dynamic variables, their integration with the language is less important, and probably can be mimiced well enough in most languages.
Here's a quick and dirty (well, dirty for Scheme) beginning of Step 1 for Python. It implements enough of an sexp syntax for Python to write the Python 2.5 tutorial Hello World as:
(= the_world_is_flat 1)
(if the_world_is_flat
((print "Be careful not to fall off!")))
and turns it into
the_world_is_flat = 1
if the_world_is_flat :
print "Be careful not to fall off!"
Try it as mzscheme --script spite.scm to see the Python code,
and mzscheme --script spite.scm | python to see Python execute it.
Now we add an extension system. Roughly, we allow the user to specify, name and apply code transformations.
In Lisps these are called macros (but don't confuse them with the much more primitive notion of macros in C/C++, which, e.g., suffer from lack of hygiene). Lisp people have researched macro systems for over forty years. The Scheme people especially are doing a lot of deep work. A lot of their work can be adapted to any language with sexp syntax, not just Lisps. People outside of Lisp aren't used to such power from within the language, so when they reinvent this idea they call it "Metaprogramming", "Code is Data", "Design Patterns", "Refactoring", etc. (on the subject of Design Patterns, Greg Sullivan discusses and creates an object system in Scheme to better support the Gang of Four patterns but since it's meant to study the particular object system it only uses macros in the building of the system, not in its use when implementing the patterns).
The Scheme syntax-rules system is a good one for us to begin with.
It applies well to any lexically scoped language with sexp syntax,
and is quite powerful and natural when defining simpler macros.
Some lectures on syntax-rules in Scheme are
here,
here and
here.
Examples of using syntax-rules to add common features of other languages (and beyond)
to Scheme are here
(I need to add these to the
Scheme Cookbook at some point;
here's another example:
Ruby's upto in Scheme.)
We could use existing implementations of syntax-rules from
alexpander,
riaxpender,
eiod, or one of the three
by Dorai Sitaram.
There is also syntax-case which subsumes syntax-rules.
It is now part of the R6RS Scheme standard. Two implementations are discussed
here.
Some ideas were mentioned above when the existing systems for Java, Haskell and C were mentioned.
Or you could apply the approach to another language. Separate tasks could be:
When writing code in the new language you'd look for repetition (idioms, design patterns, boiler plate, etc) that is difficult to capture in the original language but that now can be added to the library as a macro. This can be done as part of the syntax design process before the translator and macro system are implemented. The code wouldn't be executable yet, but would let us get a feel for how comfortable the syntax is (though the extension capabilities we're adding make this less significant than in other languages).