Style Checker(s)

Goals

The focus (at first) is on style issues that beginning programmers have (i.e. students in our courses!), especially issues that suggest a misunderstanding of programming or the language and that may even indicate errors. This focus distinguishes it from a lot of existing style checker work (e.g. Checkstyle and PMD last I checked, but maybe they've got more checks relevant to us now).

An implementation goal is to make a general framework for style checking (and more: TBA!) multiple languages.

Implementation Approach

The approach to implementation (see below for working Java and Python checkers) is:

  1. Parse the code using an existing parser for the language.
  2. Produce an S-expression (sexp) representation (think XML but simpler) of the AST / parse tree.
  3. Process the sexp in Scheme, using its powerful pattern matching and tree manipulation.

Producing an sexp representation is a simple recursive walking of the tree, with a bit of massaging of the leaf data (e.g. reescaping parts of string literals). Some parsers and compilers even produce sexp output as a canonical represenatation for debugging, storing and transmitting. Unlike the "Richer Language Extension" project, the sexp format requires almost no design: humans won't be writing and reading it.

Analyzing the code via the sexp is mostly CSC 148 and 324 type tree recursion, with some graph recursion for control flow analysis (i.e. determining which statements are reachable from which due to looping, exceptions, etc).

Ideas for how to contribute:

  1. Investigate parsers for various languages (other than Java and Python: they're done below), use them to generate sexps, or XML (then we use one of the many Scheme libraries that turn XML into sexps).
  2. Catalog what existing checkers check.
  3. Work on the Java and Python checkers: port the Java checks to the Python framework (the first one has been done as an example), and/or add more checks to either. Contact me first if you want to do a lot of work on this: I've got a bunch of checks from an old version that will go into it shortly.

Java Checker

Converting Java Source to Sexps

For example, Java source translated to an S-Expression that can be read by Scheme (keep in mind: it's not for human eyes).

This requires a parser that recognizes Java. The parser itself does not have to be in Java.

I used Antlr, which is in fact a parser generator. It is written in Java. It takes a grammar for a language and generates a parser for the language. The generated parser is in Java or one of a few other languages.

There are Antlr compatible grammars for Java 1.5. To generate the above example:

  1. Download Antlr 2.
  2. Download an Antlr 2 Java 1.5 grammar.
  3. Put them in the same directory, and run (ignore the warnings): java -cp antlr-2.7.4.jar antlr.Tool java15.g
  4. That generated a parser for Java, in Java, that we can call from our own code.
  5. Add my Main.java and SchemeAST.java to the directory.
  6. Compile the parser and our code (ignore the warnings): javac -cp antlr-2.7.4.jar *.java
  7. Download C.java, and run the parser on it: java -cp .:antlr-2.7.4.jar Main < C.java

Style Checking Via the Sexp

Version 0: all in one file, finds 4 problems involving booleans.

Version 1: in four files (three are general-purpose libraries), finds 6 problems.

Python Checker

Converting Python Source to Sexps

We write the translator in Python, using the python parser module. The translator: py2sexp.py.

Style Checking Via the Sexp

We use the framework of Version 1 of the Java checker above, modiying check.scm to understand the Python grammar sexp representation. This requires changing the two procedures for stripping and extracting line numbers, and changing the patterns in the checks. Here it is with line number handling and the first check updated: checkpy.scm.



Valid XHTML 1.0 Strict