Code is Data – A Python to Scheme Compiler for a Subset of Python
The night before a 5 minute (!) talk on easily capturing the modern approach to program transformation in Scheme, I spent a half hour or so writing a couple dozen lines of Scheme to compile the subset of Python used in the first few Python programs of the introductory programming course CSC 108 to Scheme.
All the code for the compiler is here, in fact the entire semantic analysis and code generation program (i.e. everything after the syntactic parsing phase) is displayed in The Compiler (and a Compiler Framework) – take a quick look now! The reaction from one student taking the traditional Compilers and Interpreters course was:
“I loved how you took a Python program that converted Python into S-Expressions, then were able to run that same program in Scheme. That kind of stuff is never even considered in [any] course in this school.”
Now for the details read on.
1 The Subset of Python
students = 132 |
seats = 196 |
empty_seats = seats - students |
lab_capacity = 32 |
labs_needed = students / lab_capacity |
def f(x): |
return x ** 2 |
def g(x): |
return x + 5 |
number = f(3) |
print number |
my_average = 87.5 / 112 * 100 |
print "My average on the BIO test was", my_average |
if my_average >= 90: |
print "Awesome!" |
elif my_average >= 70: |
print "Doing fine" |
elif my_average >= 50: |
print "So so" |
elif my_average < 50: |
print "Uh oh!!" |
This was enough for a serious (but still 5 minute) demonstration.
2 Syntactic Analysis – Parsing
Parsing is a necessary but relatively uninteresting step in modern code analysis and transformation, because any reasonable parser produces a good enough representative syntax tree to work with. Modern computers are fast enough, with orders of magnitude more memory than source code size, that if different forms of the tree are useful one simply post-processes the initial tree.
So a simple Python program (py2sexp.py – about a page long) suffices, calling Python’s parser to read python source code text and produce a tree object, then traversing the tree writing it out in Scheme’s literal notation for tree-structured data (S-Expressions). The details of this are not important for the rest of the discussion, nor is diving into Python’s parser to understand the form of the trees it produces – instead it’s simpler to examine the output when run on the sample programs.
3 Python Code Trees
(define p3 |
'(Module |
(Stmt |
(Assign |
(line 1) |
(AssName (line 1) "my_average" "OP_ASSIGN") |
(Mul |
(line 1) |
(Div (line 1) (Const (line 1) 87.5) (Const (line 1) 112)) |
(Const (line 1) 100))) |
(Printnl |
(line 2) |
(Const (line 2) "My average on the BIO test was") |
(Name (line 2) "my_average")) |
(If |
(line 3) |
(Compare (line 3) (Name (line 3) "my_average") ">=" (Const (line 3) 90)) |
(Stmt (Printnl (line 4) (Const (line 4) "Awesome!"))) |
(Compare (line 5) (Name (line 5) "my_average") ">=" (Const (line 5) 70)) |
(Stmt (Printnl (line 6) (Const (line 6) "Doing fine"))) |
(Compare (line 7) (Name (line 7) "my_average") ">=" (Const (line 7) 50)) |
(Stmt (Printnl (line 8) (Const (line 8) "So so"))) |
(Compare (line 9) (Name (line 9) "my_average") "<" (Const (line 9) 50)) |
(Stmt (Printnl (line 10) (Const (line 10) "Uh oh!!"))))))) |
PLT Scheme forms and functions, e.g. define, are linked to the online PLT Scheme Reference. In many cases, there will be a link in the margin there referring to a discussion in the PLT Scheme Guide, which newcomers to Scheme should look at instead.
I studied this (and the results for the first and second program) to understand how Python’s native parser represents Python code. For example,
(Assign |
(line 1) |
(AssName (line 1) "my_average" "OP_ASSIGN") |
(Mul (line 1) |
(Div (line 1) (Const (line 1) 87.5) (Const (line 1) 112)) |
(Const (line 1) 100))) |
a line number child
an AssName (their name for it – not mine) subtree
an expression subtree for the value to assign
(AssName (line <number>) <variable-name-string> "OP_ASSIGN") |
(Assign |
(line <n1>) |
(AssName (line <n2>) <variable-name-string> "OP_ASSIGN") |
<value-expression>) |
4 Scheme Code to Produce
(`(Assign ,_ |
(AssName ,_ ,<variable-name-string> "OP_ASSIGN") |
,<expression-value>) |
`(define |
,(string->symbol <variable-name-string>) |
,<expression-value>)) |
The “<...>” convention could be made understandable to Scheme with a bit more Scheme code outside the scope of this discussion
5 The Compiler Rules
Most of the rules for the other types of Python statements used in the three sample programs are as easy or easier. The two more elaborate ones are not much harder, and the difficulty is in understanding the meaning of compiling the statement from Python to Scheme, not in writing it down. Scheme makes the coding part easy if you know exactly what the application is supposed to do (and if you don’t, then it is hard to code in any language).
Unfortunately, many programmers don’t have a proper specification (were not given one by the application designers, or did not make one if also in charge of design), and instead work backwards writing a program that ends up implicitly (but quite obscurely when read as such) being the only precise specification. And that “specification” contains any unintended behaviour of the code.
(`(Module ,<s>) <s>) |
(`(Const ,_ ,<c>) |
(if (or (number? <c>) (string? <c>)) |
<c> |
(error <c>))) |
(`(Compare ,_ ,<ls> ,<op> ,<rs>) |
`(,(string->symbol <op>) ,<ls> ,<rs>)) |
(`(Assign ,_ (AssName ,_ ,<var> "OP_ASSIGN") ,<val>) |
`(define ,(string->symbol <var>) ,<val>)) |
(`(Name ,_ ,<n>) (string->symbol <n>)) |
(`(Stmt . ,<s>) `(begin . ,<s>)) |
(`(Sub ,_ ,<e1> ,<e2>) `(- ,<e1> ,<e2>)) |
(`(Div ,_ ,<e1> ,<e2>) `(/ ,<e1> ,<e2>)) |
(`(Mul ,_ ,<e1> ,<e2>) `(* ,<e1> ,<e2>)) |
(`(Add ,_ ,<e1> ,<e2>) `(+ ,<e1> ,<e2>)) |
(`(Power ,_ ,<e1> ,<e2>) `(expt ,<e1> ,<e2>)) |
(`(Return ,_ ,<e>) <e>) |
(`(CallFunc ,_ ,<f> ,<a>) `(,<f> ,<a>)) |
(`(Function ,_ ,<f> (,<v>) ,<s>) |
`(define (,(string->symbol <f>) ,(string->symbol <v>)) ,<s>)) |
(`(Printnl ,_ . ,<es>) |
`(begin (for-each (λ (e) (display e) (display " ")) |
(list . ,<es>)) |
(newline))) |
(`(If ,_ ,<c> ,<then> . ,<else>) |
`(if ,<c> ,<then> ,(if (not (empty? <else>)) |
`(If _ . ,<else>) |
`(void)))) |
Can you write a better specification?
It has to be one that someone who doesn’t know what compiling Python to Scheme means (the computer doesn’t) can be shown to follow blindly (the computer does) and get the correct result.
6 The Compiler (and a Compiler Framework)
Making the rules run is straightforward in Scheme, building on a common pattern matching library and making a function rewrite that captures the idea of “do something everywhere in some structured data until nothing changes”.
(define python->scheme-rules |
(match-lambda |
(`(Module ,<s>) <s>) |
(`(Const ,_ ,<c>) |
(if (or (number? <c>) (string? <c>)) |
<c> |
(error <c>))) |
(`(Compare ,_ ,<ls> ,<op> ,<rs>) |
`(,(string->symbol <op>) ,<ls> ,<rs>)) |
(`(Assign ,_ (AssName ,_ ,<var> "OP_ASSIGN") ,<val>) |
`(define ,(string->symbol <var>) ,<val>)) |
(`(Name ,_ ,<n>) (string->symbol <n>)) |
(`(Stmt . ,<s>) `(begin . ,<s>)) |
(`(Sub ,_ ,<e1> ,<e2>) `(- ,<e1> ,<e2>)) |
(`(Div ,_ ,<e1> ,<e2>) `(/ ,<e1> ,<e2>)) |
(`(Mul ,_ ,<e1> ,<e2>) `(* ,<e1> ,<e2>)) |
(`(Add ,_ ,<e1> ,<e2>) `(+ ,<e1> ,<e2>)) |
(`(Power ,_ ,<e1> ,<e2>) `(expt ,<e1> ,<e2>)) |
(`(Return ,_ ,<e>) <e>) |
(`(CallFunc ,_ ,<f> ,<a>) `(,<f> ,<a>)) |
(`(Function ,_ ,<f> (,<v>) ,<s>) |
`(define (,(string->symbol <f>) ,(string->symbol <v>)) ,<s>)) |
(`(Printnl ,_ . ,<es>) |
`(begin (for-each (λ (e) (display e) (display " ")) |
(list . ,<es>)) |
(newline))) |
(`(If ,_ ,<c> ,<then> . ,<else>) |
`(if ,<c> ,<then> ,(if (not (empty? <else>)) |
`(If _ . ,<else>) |
`(void)))) |
(<e> <e>))) |
Can you add define-recursive and if-differ to your library in your favourite language? In most languages, which only allow defining call-by-value operations, the answer is no. For example, the obvious implementation of if-differ as a call-by-value function usually behaves incorrectly when the argument expressions have side-effects.
(define-recursive (rewrite rule s) (on s) |
(define with-subparts-rewritten |
(if (list? s) (map rewrite s) s)) |
(define with-also-rule-self |
(rule with-subparts-rewritten)) |
(if-differ with-subparts-rewritten with-also-rule-self |
(rewrite with-also-rule-self))) |
The Stratego language was built around variations of this idea (called "Term Rewriting"). Not surprisingly, Stratego is not used much because it has few libraries for the breadth of tasks one wants to accomplish in most modern applications, especially when best programmed in other styles.
One should not make new languages: instead make a language expressive enough that more concepts can be written in the language and put into a library to be used exactly where desired.
7 Do It!
> (begin |
(define my_average (* (/ 87.5 112) 100)) |
(begin |
(for-each |
(λ (e) (display e) (display " ")) |
(list |
"My average on the BIO test was" |
my_average)) |
(newline)) |
(if (>= my_average 90) |
(begin |
(begin |
(for-each |
(λ (e) (display e) (display " ")) |
(list "Awesome!")) |
(newline))) |
(if (>= my_average 70) |
(begin |
(begin |
(for-each |
(λ (e) (display e) (display " ")) |
(list "Doing fine")) |
(newline))) |
(if (>= my_average 50) |
(begin |
(begin |
(for-each |
(λ (e) (display e) (display " ")) |
(list "So so")) |
(newline))) |
(if (< my_average 50) |
(begin |
(begin |
(for-each |
(λ (e) (display e) (display " ")) |
(list "Uh oh!!")) |
(newline))) |
(void)))))) |
My average on the BIO test was 78.125 |
Doing fine |
> (eval p3-as-scheme |
; run as if run on its own |
(make-base-namespace)) |
My average on the BIO test was 78.125 |
Doing fine |