Genesis

Genesis is an automatic patch generation system that works on an automatically infered search space. Genesis obtains its search space with its powerful automatic code transform inference algorithm.

The instructions in this document are for reproducing the results in the full Genesis paper (MIT-CSAIL-TR-2017-008) available at:

http://www.cs.toronto.edu/~fanl/program_repair/genesis-rep/genesis-full.pdf

https://dspace.mit.edu/handle/1721.1/108619

In the remaining document, we will use "the full paper" to reference the full version of the Genesis paper and use "the FSE paper" to reference the paper accepted by FSE. Note that the full paper contains a super set of the results in the FSE paper.

1. VM for Genesis

If you have troubles to setup environment for Genesis, we provide a VMWare Virtual Machine with Genesis ready to use. We prepared the VM following exactly the steps mentioned in Section 2.

Here is the link to the VM:

http://www.cs.toronto.edu/~fanl/program_repair/genesis-rep/Ubuntu-16.04-Genesis.zip

The login password of the VM is "genesis".

2. Install Instructions

We recommend you to work with the provided VM in Section 1 to avoid dealing with the potential environment setup issues. Building and setting up an automatic patch generation system could be hard. The following is a step-by-step detailed instruction to build Genesis and reproduce the patch generation experiments.

First of all, install a linux-like operating system. The following instructions will assume to use Ubuntu 16.04.1 64-bit downloaded from the Ubuntu official website. In theory, Genesis works on all linux-like 64-bit systems, but the behavior of the benchamrk applications may change. We recommend to stay with the same or similar operating systmes.
Install Java. We recommend openjdk-8

a. Open a terminal b. Run "sudo apt-get install openjdk-8-jdk openjdk-8-jre"
Install Apache Maven. Open a terminal and run "sudo apt-get install maven"
Genesis uses MySQL, so we need to install it.

a. Open a terminal b. Run "sudo apt-get install mysql-server" c. When setting up the root password for MySQL, use "genesis" as the password without quotas. Genesis directly accesses the database with user:root and passwd:genesis.
Setup Genesis MySQL database. Genesis uses a database to store metadata of the training patches and benchmark appliations. If you desire to reproduce our experiments with Genesis. This step is required.

a. Download the snapshot of Genesis database at:
```
http://www.cs.toronto.edu/~fanl/program_repair/genesis-rep/genesis-v0.2.sql.tar.gz
```
b. Run "tar xvzf genesis-v0.2.sql.tar.gz" to untar the downloaded file c. Run "mysql -u root -p < genesis-v0.2.sql" d. If ask for password, type "genesis" (the previous MySQL password you setup)
Install MySQL Java Connector. Open the terminal and run "sudo apt-get install libmysql-java"
Build and install Spoon library. Genesis uses a modified version of Spoon library.

a. Download the modified version of the Spoon libaray at:
```
http://www.cs.toronto.edu/~fanl/program_repair/genesis-rep/spoon-fork-for-genesis.tar.gz
```
b. Untar it to your desired directory. In our example, we assume we put Spoon library at "~/Workspace/spoon" where ~ is your home directory. c. Inside the directory "~/Workspace/spoon", run "mvn compile". Make sure your Internet connection is up, because maven will automatically download required dependencies. d. Then run "mvn install" to install spoon
Install Genesis itself.

a. Download the Genesis source code at:
```
http://www.cs.toronto.edu/~fanl/program_repair/genesis-rep/genesis-0.21.tar.gz
```
b. Untar it to your desired directory. In our example, we assume we put Genesis at "~/Workspace/genesis" where ~ is your home directory. c. Inside the directory "~/Workspace/genesis", run "mvn compile". Make sure your Internet connection is up.
Configure Genesis properly. a. Inside the directory "~/Workspace/genesis", run "cp global-sample.conf global.conf" b. Use an editor to open "global.conf". Change the values of the options "pythonsrcdir" to "/home/ubuntu/Workspace/..." (Your Genesis path). c. Run "mvn test" and you will see all Genesis unit tests passed.

3. Reproduce Our Experiments

Now you can use Genesis to reproduce the experiments in the full paper. Here is the detailed instructions of applying Genesis to 20 collected NullPointerException (NPE) errors, 13 collected OutOfBound (OOB) errors, and 16 collected ClassCastException (CCE) errors in our benchmark set.

Obtain the benchmark error. Note that to apply Genesis to an application error requires several meta-information to specify the failing and passing JUnit test case of the buggy-version of the application. We have provided python scripts to automatically download our benchmark applications from github and set it up for you. To use our script to obtain an error:

a. Make sure the system has git. If not, run "sudo apt-get install git". Note that our VM already includes git.

b. We need python MySQLdb library. Run "sudo apt-get install python-mysqldb" to get one if not installed. Note that our VM already includes MySQLdb.

c. Suppose Genesis resides in the directory "~/Workspace/genesis". Inside the directory, run "python/create-case.py X DirY", where X is an integer case # and DirY is the directory to hold the created case. If you want to create OOB error cases, use additional flag "--oobcase". If you want to create CCE error cases, use additional flag "--ccecase".

For example:
```
python/create-case.py 1 npe-case1
```
The above command creates the directory npe-case1 holding the first NPE case. Note that the NPE benchmark cases are numbered in the order as it appears in Table 3, Table 6, and Table 9 in the appendix of the full paper. Therefore after running the above command, npe-case1 will contain the benchmark case corresponding to the revision "2ec5459" of the "caelum-stella" repository (see row 1 of Table 3 in the full paper).
```
python/create-case.py --oobcase 10 oob-case10
```
The above command creates the directory oob-case10 holding the 10th OOB case in our experiments. Note that OOB cases are numbered following the order of Table 4, Table 7, and Table 10 in the full paper. After running this command, oob-case10 will contain the benchmark case corresponding to the revision "df400ac" of the "jPOS" repository (see row 10 of Table 4 in the full paper).

d. Note that inside the case directory, the "testcase.txt" file specifies the passing and failing JUnit test case. The goal of Genesis is to generate a patch that fix the error and pass all test cases.
Apply Genesis on the benchmark error.

a. Suppose the case directory is "npe-case1". We first run
```
mvn exec:java -Dexec.mainClass="genesis.repair.Main" -Dexec.args="-c npe-space-tv/candidate -s npe-space-tv/space.txt -init-only -w npe-wdir1 npe-case1/case.conf"
```
This command initializes the repair process, runs defect localization, and creates a work directory "npe-wdir1". This takes roughly two minutes for the first NPE case.

Note that "-Dexec.mainClass" specifies the java class maven is going to run. Here we run genesis.repair.Main, which is the main class of the Genesis repair system. "-Dexec.args" specifies the arguments we pass to the java class. "-c" and "-s" point to the search space files Genesis is going to use to drive the patch generaiton process. In future release we may consider to combine those two arguments to simplify the arguments. "-init-only" means that Genesis is going to exit after initializing the repair process. "-w" specifies the work directory Genesis is going to create during the initialization. Here we separate the initialization and the repair process, because it helps debugging and understanding the system flows.

Note that "npe-space-tv/candidate" and "npe-space-tv/space.txt" correspond to the NPE search space inffered by Genesis. You should use the directory "oob-space-tv" instead for running OOB cases. You should use "cce-space-tv" instead for running CCE cases. If you want to use the composite search space, you should use "any-space-tv" instead. See the next section for the explanation of different search spaces.

b. Then we run:
```
mvn exec:java -Dexec.mainClass="genesis.repair.Main" -Dexec.args="-c npe-space-tv/candidate -s npe-space-tv/space.txt -skip-init -w npe-wdir1"
```
This command runs the patch generation process. This takes roughly fourty minutes for the first NPE case to finish, but the user does not need to wait for the whole generation process, i.e., after two minutes the first correct patch will be generated.

c. It will generate in total 21 different patches stored as files "__patchX.java". The first line of each file specifies the file name Genesis is going to change. You can search for word "genesis generated change" to locate the changed part.

d. In the first NPE case, the first generated patch "__patch1.java" is actually a correct patch that is semantically equivalent to the later developer patch for this error. This patch will be generated after running for less than two minutes.

Note that our experimental results are obtained at Amazon EC2 m4.xlarge machines rather than the VM. However, one can follow the same procedure in Section 2 to setup Genesis on an Amazon EC2 instance starting from a clean Ubuntu 16.04-LTS ami. In our experience, the VM runs slightly longer than the reported time in the appendix of the full paper. Also for some cases Genesis takes more than half hour to generate the first correct patch. See the appendix of the full paper for the experiment details of each benchmark case.
Apply Genesis to new errors

It is possible to apply Genesis to a new error. You just need to create a case directory similar to the one created by the python script. Inside the case directory, "case.conf" specifies meta information of the application. "testcase.txt" specifies the passing and failing JUnit test cases. "src_orig" contains application repository. The current version of Genesis supports any application that can be built by Maven and JUnit.
Fault localization algorithm in Genesis

Similar to other patch generation systems, Genesis relies on a fault localization algorithm to identify the list of suspicious statements to apply transforms. The current implementation of Genesis is a stack trace based algorithm. You will be able to find its source code at src/main/java/genesis/repair/localization/*. Note that Genesis can work with any fault localization algorithm. If you want to implement your own localization algorithm for Genesis, you can simply replace the code in the directory with your own.

4. The Inferred Search Space

There are seven different inffered search spaces included in the current Genesis system. All of them are automatically inferred from the training databases we collected. There are two for each of the three error classes (npe-space-tv, npe-space-vo, oob-space-tv, oob-space-vo, cce-space-tv, and cce-space-vo) and one composite search space (any-space-tv). "TV" and "VO" represent different configurations we used to infer the search space, see Section 5.1 of the full paper for the detail explanation. Note that in the FSE paper, we only present the results of "TV" search spaces and the composite search space.

Print the search space.

If you want to investigate the transforms inside a search space. You can run the following command to print the "npe-space-tv" search space into the console:
```
mvn exec:java -Dexec.mainClass="genesis.space.SearchSpace" -Dexec.args="npe-space-tv/space.txt npe-space-tv/candidate"
```
You can use similar command to print other six inferred search spaces.
Run the inference algorithm

It is possible to run the inference algorithm to regenerate the search space from the training databases. However, in our experiments we used an Amazon EC2 server with 36 cores and 64GB memory to run the inference algorithm.

We therefore do not recommend to run the inference algorithm in the VM. If you built your own environment in a multi-core server, you rerun the inference step with the following instructions:
1. Perform the same installation step described in Section 2.
2. Download the training database "http://www.cs.toronto.edu/~fanl/program_repair/genesis-rep/data-v0.2.tar.gz". Untar the downloaded tarball. The database contains serialized Java AST trees of collected training Github revisions. In the remaining instructions, we assume you the untared database to the location "/home/ubuntu/Workspace/github/data". If you use a different location, please modify "global.conf" accordingly.
3. Run the following command to infer NPE(TV) search space:
```
MAVEN_OPTS="-Xms256m -Xmx40g" mvn exec:java -Dexec.mainClass="genesis.learning.Main" -Dexec.args="1 /home/ubuntu/Workspace/github/data 503 362 483 36"
```
  Here MAVEN_OPTS enlarges the memory cap of Java VM to 40GB. The last argument is the number of threads running in parallel for the training. The remaining arguments denote the total training set size and the split between training and validation sets.
4. Run the following command to infer OOB(TV) search space:
```
MAVEN_OPTS="-Xms256m -Xmx40g" mvn exec:java -Dexec.mainClass="genesis.learning.Main" -Dexec.args="2 /home/ubuntu/Workspace/github/data 212 149 199 36"
```
5. Run the following command to infer CCE(TV) search space:
```
MAVEN_OPTS="-Xms256m -Xmx40g" mvn exec:java -Dexec.mainClass="genesis.learning.Main" -Dexec.args="3 /home/ubuntu/Workspace/github/data 303 215 287 36"
```