The focus (at first) is on style issues that beginning programmers have (i.e. students in our courses!), especially issues that suggest a misunderstanding of programming or the language and that may even indicate errors. This focus distinguishes it from a lot of existing style checker work (e.g. Checkstyle and PMD last I checked, but maybe they've got more checks relevant to us now).
An implementation goal is to make a general framework for style checking (and more: TBA!) multiple languages.
The approach to implementation (see below for working Java and Python checkers) is:
Producing an sexp representation is a simple recursive walking of the tree, with a bit of massaging of the leaf data (e.g. reescaping parts of string literals). Some parsers and compilers even produce sexp output as a canonical represenatation for debugging, storing and transmitting. Unlike the "Richer Language Extension" project, the sexp format requires almost no design: humans won't be writing and reading it.
Analyzing the code via the sexp is mostly CSC 148 and 324 type tree recursion, with some graph recursion for control flow analysis (i.e. determining which statements are reachable from which due to looping, exceptions, etc).
Ideas for how to contribute:
For example, Java source translated to an S-Expression that can be read by Scheme (keep in mind: it's not for human eyes).
This requires a parser that recognizes Java. The parser itself does not have to be in Java.
I used Antlr, which is in fact a parser generator. It is written in Java. It takes a grammar for a language and generates a parser for the language. The generated parser is in Java or one of a few other languages.
There are Antlr compatible grammars for Java 1.5. To generate the above example:
java -cp antlr-2.7.4.jar antlr.Tool java15.gMain.java and SchemeAST.java to the directory.javac -cp antlr-2.7.4.jar *.javaC.java, and run the parser on it: java -cp .:antlr-2.7.4.jar Main < C.javaVersion 0: all in one file, finds 4 problems involving booleans.
Version 1: in four files (three are general-purpose libraries), finds 6 problems.
We write the translator in Python, using the python parser module.
The translator: py2sexp.py.
We use the framework of Version 1 of the Java checker above, modiying check.scm
to understand the Python grammar sexp representation. This requires changing the two procedures
for stripping and extracting line numbers, and changing the patterns in the checks.
Here it is with line number handling and the first check updated:
checkpy.scm.