Reverse Engineering Goal Models from Legacy Code

A reverse engineering process aims at reconstructing high-level abstractions from source code. This paper presents a novel reverse engineering methodology for recovering requirements goal models from both structured and unstructured legacy code. The methodology consists of the following major steps: 1) Refactor source code by extracting methods based on comments; 2) Convert the refactored code into an abstract structured program through statechart refactoring and hammock graph construction; 3) Extract a goal model from the structured program abstract syntax tree; 4) Identify non-functional requirements and derive softgoals based on the traceability between the code and the goal model. To illustrate this requirements recovery process, we refactor requirements goal models from two legacy software code bases: an unstructured Web-based email in PHP (SquirrelMail) and a structured email client system in Java (Columba).
The presentation at the Requirements Engineering conference, Paris, 2005 does not include all the technical details. The following might help you repeat what we have done to gain more insights. We have conducted two case studies:

Columba: a structured Java email client

Here is the instructions on how to compile Columba inside Eclipse and as a result, here is the workspace of Columba under Eclipse development. The studied source code and reverse engineered goal models are show as follows:
  • the originial Columba code and the goal model converted from the AST of the original program
  • the refactored Columba code and the goal model converted from AST of the refactored program
  • the code with identified NFR and the goal model converted from the AST of the program without NFR's.
  • the code with softgoals introduced to the NFR goals and the goal model converted from the AST of the program with softgoals.
  • The goal models above are created based on the simple meta-model for the goal models using the Eclipse Modeling Framework.
    They were generated from the abstract syntax tree of the above Java code using a tool based on the JDT API.
  • After the above analysis, a goal model with the softgoals is visualized as follows:

  • Squirrel Mail: from unstructured code to goal model

    Squirrel mail is used in our department as a Web-based email client.

    The refactored top-level statechart of Squirrel Mail, which models the behavior of the browser based on the refactoring of the PHP source code.
  • The code before GOTO removal is converted from the above statechart by introducing GOTO's to the non-Hammock graphs, i.e. multi-entry or multi-exit states.
  • The code after GOTO removal is done through the FPT compiler. It's equivalent Java code is shown here.
  • The annotated goal model is created by the AST conversion tool. A visual representation of it is shown as follows:

  • The annotated goal model converted from the structurized program.
  • After identifying the NFRs, a goal model with the softgoals is constructured as follows:

  • Resources

  • Here is a technique report explaining the whole process.
  • Eclipse is a comprehensive IDE with semi-automatic Extract Method refactoring support. We also uses its EMF to represent our goal models and its JDT API to convert structured Java programs into annotated goal models.
  • Remove GOTO's using the FPT compiler, which is based on the algorithms discussed in this TSE paper. The paper uses Hammock graph to represent single-entry, single-exit blocks, that is also the concept behind our refactoring tool support to know the scope of extracting methods.
  • Currently this AST conversion tool creates an annotated goal model for Java programs. For the other programming languages, such as Fortran, one can use f2j. We plan to extend the work on the FPT compiler in future, so that the step of program structurizing refactoring can be linked smoothly with the current tool support.