Generating Abstract Syntax Tree

csc488, Spring 1998

Description

The goal of the abstract syntax tree is to store intermediate representation of input between multiple passes of the compiler. In this project, we are implementing a multiple-pass compiler. Thus, the only semantic routines that are executed during the parse are to create the abstract syntax tree (AST). The rest of the processing (type-checking, code generation, optimization) is performed using the AST.

In this part of the project, you will modify parser.y to create an abstract syntax tree. Do not attempt other stages of the project before AST functions work. We are providing header files and a printTree() routine. So, your job is to add corresponding semantic actions to the parser to populate the tree. However, more information might be necessary for efficient semantic analysis. Try to keep modifications of ast.h ant printTree to the minimum.

Data structures

The following are the data structures from ast.h
/* types of objects                                     */
typedef enum tempObjType {
  DECL_O,           /* declaration                      */
  STMT_O,           /* statement (no return)            */
  EXP_O,            /* expression (return)              */
  LIT_O,            /* literal (constant)               */
  ID_O              /* identifier                       */
} ObjType;


/* data types                                           */
typedef enum tempDataType {
  INTEGER_D,        /* integer                          */
  BOOLEAN_D,        /* boolean                          */
  STRING_D,         /* string                           */
  VOID_D            /* none                             */
} DataType;

/* This is used to store literal values                 */
typedef struct {
  DataType  type;  /* type of literal stored            */
  union litValue_tag {
    char* str;     /* value for string constant         */
    int   num;     /* value for numeric constant,       */
    /* or 0 (false) or 1 (true) for boolean constants   */
  } litValue;
} Value;

typedef struct tempObject*  ObjectP;
typedef struct tempArgs*    ArgsP;
typedef Value*              ValueP;
typedef struct tempDecl*    DeclP;
typedef struct tempFArgs*   FArgsP;

/*  Store information about the objects - either a      */
/*  command string or information about literals        */
typedef union {
  ValueP   litVal;    /* information about literals     */
  char*    name;      /* everything else is a character */
  /* string with the name of the function or operator   */
} ValueInfo;

/* Object structure.  Used to store anything that      */
/* will result in code generation: expressions,        */
/* statements, declarations, etc.                      */
typedef struct tempObject {
  ObjType  type;
  DeclP    decl;
  ValueInfo value;
  ArgsP    args;
} Object;

/* Linked list of objects.  Used to specify parameters */
/* to statements and expressions.                      */
typedef struct tempArgs {
  ObjectP  object;
  ArgsP    next;
} Args;


/* Declarations of variables and functions              */
typedef struct tempDecl {
  DataType  type;       /* type                         */
  char*     name;       /* variable/func name           */
  FArgsP    fargs;      /* formal arguments (if any)    */
  ObjectP   bound;      /* array bounds (if any)        */
  ObjectP   object;     /* code associated with it      */
  DeclP     next;       /* next declaration             */
} Decl;

/* Formal arguments (to functions and procedures)       */
typedef struct tempFArgs {
  char*     name;       /* argument name                */
  DataType  type;       /* argument datatype            */
  FArgsP    next;       /* next argument                */
} FArgs;

Using Data Structures

There are two basic data structures: to store objects and to store declarations. Objects are stored in a n-ary tree. Each object can have several other objects as its arguments. The value field of Objects is usually used to store a string indicating what the current object is. For example, a statement x <- 2 would be stored as follows:
 {
  Statement Start: ASSIGN,
      {
       Identifier Object: x,
      }
      {
       Literal Object: integer <2>,
      }
 }
x is an identifier with value "x" (indicating that "x" should be looked up to determine its type and l-value). 2 is a literal, with type INTEGER_D and value 2. x is assigned 2 in a statement. The value for this statement is a command "ASSIGN" with two arguments - x and 2.

In general, all commands are represented as strings. The possible commands are:

"NOP"    - no operation
"SEQ"    - statement sequence
"ASSIGN" - assignment statement
"EXIT"   - EXIT statement
"PUT"    - PUT statement
"GET"    - GET statement
"SKIP"   - SKIP statement
"IF"     - IF statement
"RET"    - RETURN statement
"WHILE"  - WHILE statement
"LOOP"   - LOOP statement
"UMINUS" - unary operation -
"PLUS"   - binary operation +
"MINUS"  - binary -
"TIMES"  - binary *
"DIV"    - binary /
"POWER"  - binary power
"NOT"    - not operation
"AND"    - and operation
"OR"     - not operation
"EQ"     - equality
"NEQ"    - inequality
"GEQ"    - greater than or equal to
"GT"     - greater than
"LT"     - less than
"LEQ"    - less than or equal to
and names of identifiers. No other commands should be necessary.

Declarations are entered into the Decl data structure. It is a linked list of declarations consisting of type of variable (return type of function, void for procedure), its name, its arguments, and code. Declarations are pointed to by a scope (represented as Object). If the scope consists of just declarations, an empty Object (containing command "NOP") is creating, pointing to the declaration.

Graphical Illustration

The following is two figures illustrating data structures created for several small examples. Comments for the examples:

Examples

The following is a series of small examples followed by output of printTree().
Program
=================================================

begin
  put "hello, world"
end



AST printout
=================================================

 {
  Scope Start: PUT,
  +++ Arguments Start +++
      {
       Literal Object: string <"hello, world">,
      }
  +++ Arguments End +++++
 }

Program
=================================================
begin
  put "hello", skip
  put "world"
end

AST printout 
=================================================
 {
  Scope Start: SEQ,
  +++ Arguments Start +++
      {
       Statement Object: PUT,
       +++ Arguments Start +++
           {
            Literal Object: string <"hello">,
           }
           {
            Statement Object: SKIP,
           }
       +++ Arguments End +++++
      }
      {
       Statement Object: PUT,
       +++ Arguments Start +++
           {
            Literal Object: string <"world">,
           }
       +++ Arguments End +++++
      }
  +++ Arguments End +++++
 }


Program
=================================================
/* A program using arrays but not functions or procedures  */
begin
  integer : n
  integer: array[n]
 
  n <- 123
  array[2] <- 4
end

AST printout
=================================================
 {
  Scope Start: SEQ,
  --- Declaration Starts ---
  integer n 
  integer array [
                 {
                  Identifier Object: n,
                 }
                ]
  --- Declaration Ends -----
      {
       Statement Object: ASSIGN,
           {
            Identifier Object: n,
           }
           {
            Literal Object: integer <123>,
           }
      }
      {
       Statement Object: ASSIGN,
           {
            Identifier Object: array,
                {
                 Literal Object: integer <2>,
                }
           }
           {
            Literal Object: integer <4>,
           }
           }
      }
  }

Program
=================================================
begin
  integer function hello (integer : i, integer : j )
  begin
  end
  procedure foo (boolean : b)
  begin
    boolean function trueORfalse (integer: guess)
    begin
      return(FALSE)
    end
  end
end

AST printout
=================================================
 {
  Scope Start: NOP,
  --- Declaration Starts ---
  integer hello (integer : i, integer : j)
      {
       Scope Start: NOP,
      }
  string foo (boolean : b)
      {
       Scope Start: NOP,
       --- Declaration Starts ---
       boolean trueORfalse (integer : guess)
           {
            Scope Start: RET,
                {
                 Literal Object: boolean <0>,
                }
           }
       --- Declaration Ends -----
      }
  --- Declaration Ends -----
 }
 


/* AST: A long  example  */
/* A program using input and output  */
/* Computation of the harmonic series: 1 + 1/2 + 1/3 + 1/4 + ... */
 
begin
  integer : Terms
  integer :Num_1
  integer : Sum
 
  put "Please, enter the number of Term to sum"
  get Terms 
  /* The computation is done here ! */
  /* comments are bracketed by * and / */
  while Num_1 not = terms do
    Sum <- Sum + (1/Num_1)
    Num_1 <- Num_1 + 1
  end
   put "The result of the first ", Terms, " terms is:"
  put "aB )'$"
  put "He said ","hello",".", skip
end 

 *** Printing Tree ***
 {
{
  Scope Start: SEQ,
  --- Declaration Starts ---
  integer Terms 
  integer Num_1 
  integer Sum 
  --- Declaration Ends -----
  +++ Arguments Start +++
      {
       Statement Object: PUT,
       Declaration_None,
       +++ Arguments Start +++
           {
            Literal Object: string <"Please, enter the number of Term to sum">,
            Declaration_None,
            Arguments_None
           }
       +++ Arguments End +++++
      }
      {
       Statement Object: SEQ,
       Declaration_None,
       +++ Arguments Start +++
           {
            Statement Object: GET,
            Declaration_None,
            +++ Arguments Start +++
                {
                 Identifier Object: Terms,
                 Declaration_None,
                 Arguments_None
                }
            +++ Arguments End +++++
           }
           {
            Statement Object: SEQ,
            Declaration_None,
            +++ Arguments Start +++
                {
                 Statement Object: WHILE,
                 Declaration_None,
                 +++ Arguments Start +++
                     {
                      Expression Object: NEQ,
                      Declaration_None,
                      +++ Arguments Start +++
                          {
                          {
                           Identifier Object: Num_1,
                           Declaration_None,
                           Arguments_None
                          }
                          {
                           Identifier Object: terms,
                           Declaration_None,
                           Arguments_None
                          }
                      +++ Arguments End +++++
                     }
                     {
                      Statement Object: SEQ,
                      Declaration_None,
                      +++ Arguments Start +++
                          {
                           Statement Object: ASSIGN,
                           Declaration_None,
                           +++ Arguments Start +++
                               {
                                Identifier Object: Sum,
                                Declaration_None,
                                Arguments_None
                               }
                               {
                                Expression Object: PLUS,
                                Declaration_None,
                                +++ Arguments Start +++
                                    {
                                     Identifier Object: Sum,
                                     Declaration_None,
                                     Arguments_None
                                    }
                                    {
                                     Expression Object: DIV,
                                     Declaration_None,
                                     +++ Arguments Start +++
                                         {
                                          Literal Object: integer <1>,
                                          Declaration_None,
                                          Arguments_None
                                         }
                                         {
                                          Identifier Object: Num_1,
                                          Declaration_None,
                                          Arguments_None
                                          Arguments_None
                                         }
                                     +++ Arguments End +++++
                                    }
                                +++ Arguments End +++++
                               }
                           +++ Arguments End +++++
                          }
                          {
                           Statement Object: ASSIGN,
                           Declaration_None,
                           +++ Arguments Start +++
                               {
                                Identifier Object: Num_1,
                                Declaration_None,
                                Arguments_None
                               }
                               {
                                Expression Object: PLUS,
                                Declaration_None,
                                +++ Arguments Start +++
                                    {
                                     Identifier Object: Num_1,
                                     Declaration_None,
                                     Arguments_None
                                    }
                                    {
                                     Literal Object: integer <1>,
                                     Declaration_None,
                                     Arguments_None
                                    }
                                +++ Arguments End +++++
                               }
                           +++ Arguments End +++++
                          }
                      +++ Arguments End +++++
                     }
                 +++ Arguments End +++++
                }
                {
                 Statement Object: SEQ,
                 Declaration_None,
                 +++ Arguments Start +++
                     {
                      Statement Object: PUT,
                      Declaration_None,
                      +++ Arguments Start +++
                     +++ Arguments Start +++
                          {
                           Literal Object: string <"The result of the first ">,
                           Declaration_None,
                           Arguments_None
                          }
                          {
                           Identifier Object: Terms,
                           Declaration_None,
                           Arguments_None
                          }
                          {
                           Literal Object: string <" terms is:">,
                           Declaration_None,
                           Arguments_None
                          }
                      +++ Arguments End +++++
                     }
                     {
                      Statement Object: SEQ,
                      Declaration_None,
                      +++ Arguments Start +++
                          {
                           Statement Object: PUT,
                           Declaration_None,
                           +++ Arguments Start +++
                               {
                                Literal Object: string <"aB )'$">,
                                Declaration_None,
                                Arguments_None
                               }
                           +++ Arguments End +++++
                          }
                          {
                           Statement Object: PUT,
                           Declaration_None,
                           +++ Arguments Start +++
                               {
                                Literal Object: string <"He said ">,
                                Declaration_None,
                                Arguments_None
                               }
                               {
                                Literal Object: string <"hello">,
                                Declaration_None,
                                Arguments_None
                               }
                                }
                               {
                                Literal Object: string <".">,
                                Declaration_None,
                                Arguments_None
                               }
                               {
                                Statement Object: SKIP,
                                Declaration_None,
                                Arguments_None
                               }
                           +++ Arguments End +++++
                          }
                      +++ Arguments End +++++
                     }
                 +++ Arguments End +++++
                }
            +++ Arguments End +++++
           }
       +++ Arguments End +++++
      }
  +++ Arguments End +++++
 }

Compilation Ends

 
/* AST example               */
/* YIELDS statement          */
begin
    integer : a
    a <- 89 + { integer : x integer : y
                get x , y yields x + y * x ^ y }
    put "Test yield", skip
end




*** Printing Tree ***
 {
  Program Start: SEQ,
  --- Declaration Starts ---
  integer a 
  --- Declaration Ends -----
  +++ Arguments Start +++
      {
       Statement Object: ASSIGN,
       Declaration_None,
       +++ Arguments Start +++
           {
            Identifier Object: a,
            Declaration_None,
            Argument_None
           }
           {
            Expression Object: PLUS,
            Declaration_None,
            +++ Arguments Start +++
                {
                 Literal Object: integer <89>,
                 Declaration_None,
                 Argument_None
                }
                {
                 Statement Object: SEQ,
                 --- Declaration Starts ---
                 integer x 
                 integer y 
                 --- Declaration Ends -----
                 +++ Arguments Start +++
                     {
                      Statement Object: GET,
                      Declaration_None,
                      +++ Arguments Start +++
                          {
                           Identifier Object: x,
                           Declaration_None,
                           Argument_None
                          }
                          {
                           Identifier Object: y,
                           Declaration_None,
                           Argument_None
                          }
                      +++ Arguments End +++++
                     }
                     {
                      Expression Object: RET,
                      Declaration_None,
                      +++ Arguments Start +++
                          {
                           Expression Object: PLUS,
                           Declaration_None,
                           +++ Arguments Start +++
                               {
                                Identifier Object: x,
                                Declaration_None,
                                Argument_None
                               }
                               {
                                Expression Object: TIMES,
                                Declaration_None,
                                +++ Arguments Start +++
                                    {
                                     Identifier Object: y,
                                     Declaration_None,
                                     Argument_None
                                    }
                                    {
                                     Expression Object: POWER,
                                     Declaration_None,
                                     +++ Arguments Start +++
                                         {
                                          Identifier Object: x,
                                          Declaration_None,
                                          Argument_None
                                         }
                                         {
                                          Identifier Object: y,
                                          Declaration_None,
                                          Argument_None
                                         }
                                     +++ Arguments End +++++
                                    }
                                +++ Arguments End +++++
                               }
                           +++ Arguments End +++++
                          }
                      +++ Arguments End +++++
                     }
                 +++ Arguments End +++++
                }
            +++ Arguments End +++++
           }
       +++ Arguments End +++++
      }
      {
       Statement Object: PUT,
       Declaration_None,
       +++ Arguments Start +++
           {
            Literal Object: string <"Test yield">,
            Declaration_None,
            Argument_None
           }
           {
            Statement Object: SKIP,
            Declaration_None,
            Argument_None
           }
       +++ Arguments End +++++
      }
  +++ Arguments End +++++
 }

Compilation Ends


To illustrate some critical points, some more examples will be added when needed.
Marsha Chechik
Back to csc488 homepage
Last modified on February 2, 1998