CSC207 Software Design
Lectures
Parsing Text Files

How to Build a Build Tool?

Should never have to do this

Lots of good ones out there

But instructive to see how tools like Make are constructed

Good application for object-oriented design

Shows how useful graph theory is

Helps you see that the tools you use are just like the programs you write...

...even if they're bigger

Major parts

Reading build files

Executing rules

Requirements

Command-line tool, not GUI

Read rules from a single input file

Real Make allows files to include other files

Ignore macros, pattern rules, special variables, etc.

None of these are especially difficult

But they will all be easier to tackle with regular expressions

What does that leave?

Blank lines

Comments

Rules consisting of:

A head made up of a target and some dependencies, and

A body containing zero or more actions

Plan of Attack

1. Build placeholder Rule class

A single target, a set of pre-requisites, and a list of actions

2. Build a parser

3. Go back and improve rule representation

Note 1: Often write temporary placeholders, then replace them later

Note 2: Much easier to do if the system is modular

If two pieces communicate through an interface, they can be modified independently

Modular systems are not just easier to maintain, they are also easier to build

Simple Rule Class

class Rule {

    // Construct empty rule.
    public Rule() {
        init();
    }

    // Construct rule with target, pre-requisites, and actions.
    public Rule(String target, String[] prereqs, String[] actions) {
        init();
        fTarget = target;
        for (int i=0; i<prereqs.length; ++i) {
            fPrereqs.add(prereqs[i]);
        }
        for (int i=0; i<actions.length; ++i) {
            fActions.add(actions[i]);
        }
    }
  
    // ...Get rule's target...
    // ...Set rule's target...
    // ...Add a pre-requisite...
    // ...Iterate through pre-requisites...
    // ...Add an action...
    // ...Iterate through actions...
    // ...Convert to printable form...
  
    // Common initialization.
    protected void init() {
        fTarget  = null;
        fPrereqs = new ArrayList();
        fActions = new ArrayList();
    }
  
    protected String    fTarget;        // target
    protected List      fPrereqs;       // pre-requisites
    protected List      fActions;       // actions
}

Notes on Style

Preferred order is:

Static variables and methods

Static methods

Public methods

Protected or private methods

Member variables

Use fName for fields

Less error-prone than using nothing, or this.name

Always put common initialization in a protected method

Declare members to be of most abstract types

Makes it easy to change mind later

Prefer protected to private

Sooner or later, you're going to want to override it...

...or test it

Everything is negotiable

Consistency is more important than any detail of style

First Parser

Parser class use static methods

Reads to the end of stream, creating rules

Can handle standard input as well as files

Restrict input

No blank lines or comments

One pre-requisite per rule

One action per rule

Rules for exploratory programming:

Do it

Do the simplest thing that could possibly work

Build one to throw away

Because you will, even if you don't plan to

Main Driver

public static void main(String[] args) {

    // Connect to input
    InputStreamReader stream = new InputStreamReader(System.in);
    BufferedReader input = new BufferedReader(stream);

    // Get rules
    List rules = parse(input);

    // Show results
    Iterator ir = rules.iterator();
    while (ir.hasNext()) {
        System.out.println(ir.next());
    }
}

Parser Skeleton

// Parse Makefile rules from input stream
public static List parse(BufferedReader input) {

    // Initialize
    List result = new ArrayList();
 
    // Have to watch out for I/O exceptions
    try {
        // Read input a line at a time
        String line;
        while ((line = input.readLine()) != null) {
            // ...Split the head into fields...
            // ...Read the action...
            // ...Create the rule...
        }
    }
    catch (IOException e) {
        System.err.println(e);
    }
    return result;
}

Filling In the Body

    // Split the head into fields
    StringTokenizer st = new StringTokenizer(line, ":");
    String target = st.nextToken().trim();
    String prereq = st.nextToken().trim();

    // Read the action
    String action = input.readLine().trim();

    // Create the rule
    Rule rule = new Rule();
    rule.setTarget(target);
    rule.addPrereq(prereq);
    rule.addAction(action);
    result.add(rule);

Testing

Step 1: create sample input

a : b c
        fix a
d : e
        fix d

Step 2: put commands in Makefile

Your project's long-term memory

Allows you to share tests with others

all : simple

simple : ParseSimple.class
        @echo "EXPECTED"
        @cat simple_01.mk
        @echo "ACTUAL"
        @java ParseSimple < simple_01.mk
        @echo "EXPECTED"
        @cat simple_02.mk
        @echo "ACTUAL"
        @java ParseSimple < simple_02.mk

Explore ways to improve on this later

Objectifying the Parser

Generally avoid static methods in Java

Less flexible than instance methods

Can't override (no virtual static)

State has to be static as well...

...which means you can't run two parsers at once

Not idiomatic

Every language has its way of doing things

Doing things other ways confuses readers

Solution: make the parser an object

Have main() create and run an instance of its own class

New Main Driver

public static void main(String[] args) {
    InputStreamReader stream = new InputStreamReader(System.in);
    BufferedReader input = new BufferedReader(stream);
    ParseBlankObject parser = new ParseBlankObject();
    List rules = parser.parse(input);
    Iterator ir = rules.iterator();
    while (ir.hasNext()) {
        System.out.println(ir.next());
    }
}

Blank Lines, Comments, and All That

Real input is (much) more complicated than what previous parser handles

Blank lines within and between rules

Comments

For the moment, assume they're on lines of their own

Multiple pre-requisites per rule

Multiple actions per rule

For the moment, keep assuming one action per rule

Create a method to get "interesting" lines from input

I.e., neither blank nor comments

Keeps the main parser simple

A filter

Line Reader

protected String readLine(BufferedReader input) {
    try {
        String line;
        while ((line = input.readLine()) != null) {
            String tmp = line.trim();
            if ((tmp.length() > 0) && (tmp.charAt(0) != '#')) {
                return line;
            }
        }
    }
    catch (IOException e) {
        System.out.println(e);
    }

    // No valid line found
    return null;
}
}

Wrapping vs. Deriving

Could derive a class from BufferedReader that skips blank lines and comments

A filter class

Advantages

Re-use

Disadvantages

Do we know enough yet to design it?

Make a note, see how many similar methods we implement, then return to the idea

Handling Multiple Actions

Real rules have multiple actions

Or none at all

Can't tell when a rule's actions are done until the head of the next rule is read

Re-write main parsing loop

Keep a rule "in hand" until the next one starts

Then finish current, and start next

State Machine

Parser is a finite state machine

Circles show the states the parser can be in

Arrows show transitions between states triggered by different inputs

This state machine has:

Four states (including an error state)

One is the start state

Only one is a legal end state

Two kinds of transitions (head and action)

[State Machine Diagram]

Another Way to Show the Error State

[State Machine Diagram]

Define Legal States

protected static final String STATE_INITIAL   = "initial";
protected static final String STATE_HEAD_ONLY = "head only";
protected static final String STATE_HEAD_BODY = "head and body";

Note: there are more robust ways to do this

Implement State Machine

Remember to check for legal state at end of input

public List parse(BufferedReader input) {

    // Initialize
    fLineNum = 0;
    fRules = new ArrayList();
    String state = STATE_INITIAL;

    // Read input a line at a time
    String line;
    while ((line = readLine(input)) != null) {

        // Initial empty state goes to head-only state.
        if (state.equals(STATE_INITIAL)) {
            state = handleInitialState(line);
        }

        // Head-only state goes to head-and-body state.
        else if (state.equals(STATE_HEAD_ONLY)) {
            state = handleHeadOnlyState(line);
        }

        // Head-and-body state can go either way.
        else if (state.equals(STATE_HEAD_BODY)) {
            state = handleHeadBodyState(line);
        }

        else {
            fail("Illegal state '" + state + "'");
        }
    }

    // Finish last rule and report.
    if (state.equals(STATE_HEAD_ONLY)) {
        fail("Last rule has head but no body");
    }
    finishRule();
    return fRules;
}

Handling a State

protected String handleHeadOnlyState(String line) {
    assert fCurrentRule != null : "Current rule null in head-only state";
    if (!isActionLine(line)) {
        fail("Non-body line after head line");
    }
    fCurrentRule.addAction(line.trim());
    return STATE_HEAD_BODY;
}

Other state handlers are similar

Difference between assert and fail():

Use assert (built into Java) to check for bugs in our code

Use fail() (a method in this class) to check for badly-formatted input from user

Input Classifiers

protected boolean isHeadLine(String line) {
    return line.charAt(0) != '\t';
}

protected boolean isActionLine(String line) {
    return line.charAt(0) == '\t';
}

Starting a Rule

protected void startRule(String line) {
    fCurrentRule = new Rule();
    try {
        // Get target
        StringTokenizer st = new StringTokenizer(line, ":");
        String target = st.nextToken().trim();
        fCurrentRule.setTarget(target);

        // Get pre-requisites
        String allPrereqs = st.nextToken().trim();
        StringTokenizer pt = new StringTokenizer(allPrereqs);
        while (pt.hasMoreTokens()) {
            fCurrentRule.addPrereq(pt.nextToken());
        }
    }
    catch (NoSuchElementException e) {
        fail("Unable to find tokens on line");
    }
}

Finishing a Rule

protected void finishRule() {
    fRules.add(fCurrentRule);
    fCurrentRule = null;
}

Handling Errors

Handling all errors with one method helps make it consistent

Reporting line numbers is essential

Imagine if your Java compiler just said "missing semi-colon", but didn't tell you where...

protected void fail(String message) {
    message += " (line " + fLineNum + ")";
    System.err.println(message);
    System.exit(0);
}

Reading Lines

protected String readLine(BufferedReader input) {
    try {
        String line;
        while ((line = input.readLine()) != null) {
            fLineNum += 1;
            String tmp = line.trim();
            if ((tmp.length() > 0) && (tmp.charAt(0) != '#')) {
                return line;
            }
        }
    }
    catch (IOException e) {
        System.out.println(e);
    }

    // No valid line found
    return null;
}

$Id: parse.html,v 1.1.1.1 2004/01/04 05:02:31 reid Exp $