README

This replication package contains the necessary files to reproduce the results of the ECES/FSE 2015 paper ``Staged Program Repair with Condition Synthesis''. Our paper presents SPR, a novel patch generation system that uses the staged condition synthesis technique. The replication package provides a way to reproduce all of our experiments with SPR.

The original replication package relies on Amazon EC2 images to reproduce the exact execution environment of the experiments. Although it gives 100% replication, it is not as conveniant as VM images. Now the instance types we used to run our experiments are no longer provided by Amazon. We decide to rebuild the replication package with VMWare VMs. Therefore, it is worth noting that the running time you obtain with the VM runs will be different from the time originally mentioned in the paper.

1) A VMWare VM image for reproducing all of our experiments except fbc. http://www.cs.toronto.edu/~fanl/program_repair/vm/Ubuntu-prophet-64-bit.tar.gz

2) A 32-bit VM image for reproducing all of the fbc experiments. http://www.cs.toronto.edu/~fanl/program_repair/vm/Ubuntu-14.04.1-32bit.tar.gz

3) Scenario tarballs that contain necessary files to reproduce each of the defects in our experiments. http://www.cs.toronto.edu/~fanl/program_repair/scenarios/

4) SPR generated patches for the benchmark defects: http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-result/ http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-wsf-result/

Note that SPR source code in the replication packages are licensed under GPLv3. Relevant license documents are included in the VM images.

1.1 Consistency and Completeness

A user can use the replication package to reproduce all of our SPR experiments we performed in the paper. Specifically, the replication package is able to reproduce all SPR results in Table 1, Table 2, and Table 3 in the paper. Note that the results of GenProg and AE are obtained from previous work.

The results obtained from the replication package will be in general consistent with the reported results in the paper. For each defect for which we claim, in the paper, that SPR generates a plausible or correct patch, the user can use the replication package to obtain the same patch.

1.2 Ease of Reuse

Program defects often require specific environment setup to reproduce. To facilitate the reproduction we packed our systems into VM images and our benchmark defects into scenario tarballs. We also provide script to facilitate the generation of every number in Table 1, Table 2, and Table 3. Note that our experiments include manual analysis of the generated patches to identify whether the patch is correct or not. We provide descriptions for each SPR generated correct patches.

1.3 Potential to facilitate future research

One benefit to the future research is that SPR in this replication package can be used as the baseline system for the future research. As all the source code of SPR is available inside the provided VM images, researcher who want to build new patch generation systems can build their systems on top of SPR and reuse (part of) the SPR source code.

Another benefit to the future research in the field is that the benchmark scenario tarballs in the replication package provide useful infrastructures (test cases and scripts) to evaluate future patch generation systems. Note that SPR is evaluated on the GenProg 2012 benchmark set and we fixed several known issues in the original GenProg 2012 infrastructure.

1.4 Quality of the documentation

Section 2 provides step-by-step instructions of reproducing SPR experiments with this replication package. Section 3 describes our manual analysis of each correct SPR patch. Section 4 describes how to apply SPR to new defects if desired. We recommend the user of this replication package starts with the step-by-step instructions in Section 2.

2. Reproduce SPR experiments

As described in our paper, we performed all experiments except fbc in Amazon EC2 machines (running Ubuntu) and we performed fbc experiments on a 32-bit VM (running Ubuntu). In this replication package, we provide two VM images of our experiment environments, one for 64-bit and one for 32-bit (fbc experiments).

We evaluated 105 defects/changes in the GenProg 2012 benchmark set. Due to the disk space limit, we cannot package all of the defects/changes into the VM images. Instead, we separately provide a scenario tarball for each defect/change we evaluated in our experiments.

We next provide a step by step instructions of running SPR to generate patches for the php defect php-309579-309580. Our experiments on other defects can be reproduced similarly.

2.1 Step-by-step instruction of reproducing php-309579-309580

(1) Launch VM image (a) Download VMWare image from: http://www.cs.toronto.edu/~fanl/program_repair/vm/Ubuntu-prophet-64-bit.tar.gz (b) Untar the tarball to obtain the VMWare image directory (c) Use VMWare to start this image. The login name is ubuntu and the password is "password".

(2) Go to the directory ~/Workspace/prophet/build/tests. Type "cd ~/Workspace/prophet/build/tests" in the terminal.

(3) Run "../../tests/scripts/reproduce.py php-309579-309580" to reproduce the php-309579-309580 case with SPR. This case takes approximately 40 minutes to complete.

  This script automatically downloads the corresponding scenario tarball
  from our server, untars the tarball and runs SPR on the defect scenario.
  For php-309579-309580, the untared directory php-case-2adf58 contains all
  files of the scenario. 2adf58 is the revision number of the php case from
  github repository. For some applications, we use different repository
  systems than the GenProg 2012 paper because old repository systems are
  often maintained anymore.

  If you want to reproduce the experiments of running SPR with a specified
  source file name to repair. Run:
    "../../tests/scripts/reproduce.py --bug-file php-309579-309580"

  This "--bug-file" flag causes SPR to explore only those locations inside
  the specified source file name that the developer patch modifies. Note that 
  GenProg and AE require this information to run, so we provide this option
  to enable a fair comparison between SPR and those systems.

(4) The produced php-fix-2adf58XXX.c is the generated patch file, where XXX is the filename of the original source file to be modified.

(5) At the end of the execution, SPR prints a line like: Total XXXX different repair candidate schemas!!!!

  The XXXX corresponds to the size of SPR search space. This corresponds to
  the number presented in the column "Search Space" in Table 3.

(6) At the end of the execution, SPR prints a line like: Generate a patch at candidate schema no XXXX

  The XXXX identifies the rank of the candidate schema that SPR generates
  the first plausible patch. It corresponds to "Gen At" in Table 3.

(6) At the end of the script, it will print time information. Note that we are keep improving the SPR system, the search space numbers and the running time may be slightly different than the number in the submission. See our draft of final version for the updated numbers.

  The timing numbers are presented in the column "SPR Time" and "SPR(WSF)
  Time" in Table 3. It is also used to compute the average time "SPR(WSF)
  Time" in Table 1.

(7) At the end of the script, SPR also prints two lines like: Total cnt of passed cond schemas: XXX Total cnt of cond schemas: YYY

  The YYY corresponds to the total number of schemas that manipulates
  branch conditions (that SPR encounters). The XXX corresponds to the
  number of such schemas that SPR discovers a sequence of abstract
  condition values to generate correct output for the test case inputs. For
  the 13 defects that SPR generates correct patches, these two numbers
  correspond to the column "Condition Value Search On" in Table 2.

(8) We manually search "prophet" in the generated patch source code and we will locate the SPR changes.

  In our experiments, we manually analyzed each of the generated patches
  and determines whether it is correct. For php-309579-309580, SPR
  generates a patch that changes the condition at ext/date/php_date.c:3766.
  It is a correct patch and it is equivalent to the developer patch of this
  defect. See Section 2 of our paper for more detail.

  This manual analysis obtains the columns "Result" for both SPR and SPR
  (With Specified File Name) in Table 3. "SPR" and "SPR(WSF)" columns in
  Table 1 are the summarizations of these results.

  See Section 3 of this README for more details of each correct SPR patches.

(9) For the 20 defects that the search space of SPR contains at least one correct patch, we manually identify the correct patch in our experiments. To obtain the position of the correct patches in the search space. We wrote scripts to dump the SPR search space and parse the dumped search space.

  Run "../../tests/scripts/reproduce.py --parse-space php-309579-309580".
  It takes approximately 5min to finish and you will see a line like:
  Correct at schema XX blowup YY ratio ZZ

  The XX corresponds to the position of the correct patch in the search
  space ("Correct At" in Table 3), and the ZZ corresponds to the search
  space blowup if we turn off the staged condition synthesis ("Condition
  Value Search Off" in Table 2).

  Run "../../tests/scripts/reproduce.py --parse-space --bug-file php-309579-309580".
  You will see a line like:
  Correct at schema XX blowup YY ratio ZZ

  The XX corresponds to the position of the correct patch in the search
  space when running SPR with a specified source file name information 
  ("Correct At" in Table 3).

(10) The "Init Time" column in Table 1 is the average running of SPR to initialize a defect scenario. At the initialization step, SPR 1) verifies the original program passes all positive test cases, 2) verifies that the original program fails all negative test cases, and 3) runs the error localization algorithm to identify potential statements (program points) to modify.

  If you want to reproduce the initialization step of SPR for the defect
  php-309579-309580, you can run "../../tests/scripts/reproduce.py --init
  php-309579-309580".

  It takes approximately 40 minutes to finish. At the end of the execution,
  it will print the running time of the initialization.

(11) It is similar to replicate the rest of cases in the benchmark set. Just replace "php-309579-309580" in the above commands with other case id. Run "../../tests/scripts/reproduce.py" without any argument will print out all of the case ids. Section 4 of this README contains general instructions about how to apply SPR to other applications and defects.

(12) All SPR generated fix are checked against all negative and positive test cases by automated scripts. For example, the php-309579-309580 fix is checked with ~/Workspace/prophet/tools/php-test.py. Inside the php-test.py it invokes run-test.php like our manual test for GenProg for test cases. Unlike GenProg, our test script for php checks not only the exit code but also the output of the execution. If you want to manually test the generated fix for php-309579-309580, inside VM you can: a) Make sure the current directory is ~/Workspace/prophet/build/tests, and you have run the reproduce script in previous steps to generate patches for php-309579-309580. c) Replace php-case-2adf58/php-2adf58-workdir/src/ext/date/phpdate.c with the generated fix file. The name of this fix file is php-fix-2adf58extdatephpdate.c in our example. "cp php-fix-2adf58extdatephpdate.c php-case-2adf58/php-2adf58-workdir/src/ext/date/phpdate.c" d) "cd php-case-2adf58/php-2adf58-workdir/src" and "make" e) The newly built php binary sits in: "~/Workspace/prophet/build/tests/php-case-2adf58/php-2adf58-workdir/src/sapi/cli/php" To test the negative case, type in:

      "./sapi/cli/php run-tests.php -p ./sapi/cli/php ../tests/03996.phpt"

      You should see output indicating that the test case passed. 

      In the file ../../php-2adf58.revlog, you will find the ids for the positive
      and negative test cases. For php, all cases have a five digit id. Edit the
      ../../php-2adf58.revlog file and find the id 00051 for positive cases. 

      "./sapi/cli/php run-tests.php -p ./sapi/cli/php ../tests/00051.phpt"

      Again, you should see that the tests pass.

(13) To avoid additional costs, you need to terminate the instance once you have done with replication. You may also need to delete volumes that used by the instance after its termination.

2.2 fbc Experiments

The instructions above applies also to the fbc experiments. The only difference is that at the step (1) you need to start the provided 32-bit VM image instead of an Amazon EC2 instance. For fbc experiments:

(1) Launch VM image (a) Download VMWare image from: http://www.cs.toronto.edu/~fanl/program_repair/vm/Ubuntu-14.04.1-32bit.tar.gz (b) Untar the tarball to obtain the VMWare image directory (c) Use VMWare to start this image. The login name is fanl and the password is "password".

(2)-(13) Same as the instructions in Section 2.1. There are three fbc cases "fbc-5251-5252", "fbc-5458-5459", and "fbc-5556-5557".

2.3 Known issues

The current SPR implementation fails to perform initialization on six cases in the GenProg 2012 benchmark set: php-308046-308051, libtiff-ed4969a-8a184dc, python-69831-69833, wireshark-35419-35414, wireshark-37171-37170, and wireshark-37190-37191. Note that we report that SPR generates no patch for these defects in our paper.

The plausible (but incorrect) patch SPR generates for python-70019-70023 is compiler dependent and possibly machine dependent. The patch attempts to get around the test case via messing with memory library calls. Its behavior is therefore highly dependent on the underlying implementation of the memory routine it links to. It is known that the binary generated during SPR repair process can pass the test case in Amazon EC2 (via clang compiler); however, the binary generated by gcc compiler cannot pass the test case. Thanks to Alex Zhikhartsev for reporting this.

3. SPR Correct Patches

SPR generates correct patches for 12 defects. For each defect, we provide an url that contains the developer patch and we either identifies the SPR patch is semantically equivalent to the developer patch or provides a brief analysis for why the SPR patch is correct.

The SPR correct patches: http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-result/php-307562-307561/php-fix-f455f8%5e1-f455f8extdom_document.c

http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-wsf-result/php-307562-307561/php-fix-f455f8%5e1-f455f8extdom_document.c

The SPR generated patch is identical to the developer patch. Note that this is a regression that occurs in the repository. The reference correct revision occurs before the buggy revision.

The SPR correct patches: http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-result/php-307846-307853/php-fix-1e91069extdatephpdate.c

http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-wsf-result/php-307846-307853/php-fix-1e91069extdatephpdate.c

The SPR correct patches: http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-result/php-308734-308761/php-fix-1d984a7exttokenizer_tokenizer.c

http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-wsf-result/php-308734-308761/php-fix-1d984a7exttokenizer_tokenizer.c

Analysis: The statement order of the developer patch is slightly different from that of the SPR generated patch, but two patches are semantically equivalent at high level.

The SPR correct patches: http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-result/php-309516-309535/php-fix-991ba131extdatephpdate.c

http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-wsf-result/php-309516-309535/php-fix-991ba131extdatephpdate.c

The SPR correct patches: http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-result/php-309579-309580/php-fix-2adf58extdatephpdate.c

http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-wsf-result/php-309579-309580/php-fix-2adf58extdatephpdate.c

Analysis: The SPR generated patch is semantically equivalent to the developer patch. This is our motivating example in our paper. See our paper for details.

The SPR correct patches: http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-result/php-309892-309910/php-fix-5a8c917extstandard_string.c

http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-wsf-result/php-309892-309910/php-fix-5a8c917extstandard_string.c

Analysis: The developer patch removes an if statement block. The SPR generated patch conjoins the branch condition of the if statement with 0, which effectively nullifies the whole if statement block. Two patches are semantically equivalent.

The SPR correct patches: http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-result/php-310991-310999/php-fix-8ba00176Zendzend_compile.c

http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-wsf-result/php-310991-310999/php-fix-8ba00176Zendzend_compile.c

The developer patch changes an if statement condition from (A || (B && C)) to ((A || B) && C). The SPR patch changes the condition to ((A || (B && C)) && C), which is semantically equivalent to the developer patch.

The SPR correct patches: http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-result/php-311346-311348/php-fix-1056c57fextstandardurlscanner_ex.c

http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-wsf-result/php-311346-311348/php-fix-1056c57fextstandardurlscanner_ex.c

The functionality of the developer patched code is that if "ctx->buf.len" (which holds the length of "ctx->buf") is not zero, then "handledoutput" is assigned as the concatenation of "ctx->buf" and "output"; otherwise "handledoutput" is assigned as the "output". When "ctx->buf.len" is zero, the code in the then branch has the same effect as the else branch since the string "ctx->buf" is empty. So the SPR patch, which eliminates the condition and lets the program always do the then branch, is also correct.

The SPR correct patches: http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-result/libtiff-ee2ce5b7-b5691a5a/libtiff-fix-tests-eec7ec0toolstiff2pdf.c

http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-wsf-result/libtiff-ee2ce5b7-b5691a5a/libtiff-fix-tests-eec7ec0toolstiff2pdf.c

The SPR correct patches: http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-result/gmp-13420-13421/gmp-fix-13421mpngeneric_powm.c

http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-wsf-result/gmp-13420-13421/gmp-fix-13421mpngeneric_powm.c

The developer patch removes the variable "b2p" and the assignment statement "b2p = tp + 2 * n". It then replaces every occurrence or "b2p" to "rp". The SPR patch simply changes the assignment "b2p = tp + 2 * n" to "b2p = rp", which is semantically equivalent to the developer patch at high level.

The developer patch: http://git.savannah.gnu.org/cgit/gzip.git/commit/?id=f17cbd13a1d0a7

The SPR correct patches: http://www.cs.toronto.edu/~fanl/program_repair/spr-rep/spr-wsf-result/gzip-a1d3d4019d-f17cbd13a1/gzip-fix-f17cbd13a1d0a7gzip.c

Both the developer patch and the SPR patch inserts an assignment statement to initialize the variable "ifd" to 0. Two patches are semantically equivalent at high level.

The SPR correct patches: http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-result/php-307914-307915/php-fix-09273098521913aextphar_phar.c

http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-wsf-result/php-307914-307915/php-fix-09273098521913aextphar_phar.c

The SPR correct patches: http://www.cs.toronto.edu/~fanl/programrepair/spr-rep/spr-wsf-result/python-69783-69784/python-fix-69784Modulestimemodule.c

Analysis: Both the developer patch and the SPR patch remove an if statement block. They are semantically equivalent. Note that this case is a deliberate functionality change during development not a defect.

4. How to Run SPR in New Environments? (New Applications/Defects)

Defects often require specific environments to reproduce. We strongly recommend any user of this replication package to reproduce our experiments via provided VM images with the above instructions. However, SPR is able to apply to other UNIX like environments, other applications, and other defects. Here are general instructions about how to do so:

0) The application needs to be able to build with both gcc and clang. SPR uses clang to run its error localization algorithm. It requires llvm and clang 3.6.1.

1) Write a script that builds the application. See, for example, ~/Workspace/prophet/tools/fbc-build.py in the SPR VM is the script for building fbc.

2) Write another script that tests the application. See, for example, ~/Workspace/prophet/tools/fbc-test.py in the SPR VM. The script takes the built src directory, the testcase directory if any, and a set of testcase ids. It outputs the list of passed testcase ids.

3) Write or generate a log file that specifies the testcase ids of the positive testcase set and the testcase ids of the negative testcase set. See, for example, if you untar the scenario fbc-5458-5459 (fbc-5459.tar.gz), you can find the log file for the scenario at ~/Workspace/prophet/build/tests/fbc-case-5459/fbc-5459.revlog in SPR 32-bit vm.

4) Write a configuration file that specifies: a) the location of the scripts b) the source location of the application c) the location of the test cases d) the location of the log file that specifies negative and positive testcases e) add a line "localizer=profile" to enable error localizer See, for example, if you untar the scenario fbc-5458-5459 (fbc-5459.tar.gz), you can find the configuration file of the scenario at: ~/Workspace/prophet/build/tests/fbc-case-5459/fbc-5459.conf in SPR 32-bit vm.

5) Invoke SPR with the configuration file: for example, you can call: ../src/prophet ~/Workspace/prophet/build/tests/fbc-case-5459/fbc-5459.conf to run SPR.

When you run it in this way, SPR will create a temporary workdir. You can specify the worker name and make the workdir permanent. You can run initialization step first by invoking: ../src/prophet ~/Workspace/prophet/build/tests/fbc-case-5459/fbc-5459.conf -r workdir -init-only SPR will create a directory called "workdir" to hold all results after initialization.

Then after initialization you can call ../src/prophet -r workdir -skip-verify or ../src/prophet -r workdir -skip-verify -first-n-loc 200 -consider-all to run SPR.

a) skip-verify flag tells SPR to skip initialization step b) first-n-loc and consider-all flags tell SPR to ignore the supplied bug file information if any and run SPR without file information.

The benefit of having a working directory is that you can avoid running error localization algorithm and testcase verification again.

1. Replication Package Overview