CS2125 Paper Review Form - Winter 2019

Reviewer: Hazem Ibrahim

Paper Title: DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars

Author(s): Yuchi Tian, Kexin Pei, Suman Jana, Baishakhi Ray

1) Is the paper technically correct?
 [X] Yes
 [ ] Mostly (minor flaws, but mostly solid)
 [ ] No

2) Originality
 [ ] Very good (very novel, trailblazing work)
 [X] Good 
 [ ] Marginal (very incremental)
 [ ] Poor (little or nothing that is new)

3) Technical Depth
 [ ] Very good (comparable to best conference papers)
 [X] Good (comparable to typical conference papers)
 [ ] Marginal depth
 [ ] Little or no depth

4) Impact/Significance
 [ ] Very significant
 [X] Significant
 [ ] Marginal significance.
 [ ] Little or no significance.

5) Presentation
 [ ] Very well written
 [X] Generally well written
 [ ] Readable
 [ ] Needs considerable work
 [ ] Unacceptably bad

6) Overall Rating
 [ ] Strong accept (award quality)
 [X] Accept (high quality - would argue for acceptance)
 [ ] Weak Accept (borderline, but lean towards acceptance)
 [ ] Weak Reject (not sure why this paper was published)


7) Summary of the paper's main contribution and rationale
   for your recommendation. (1-2 paragraphs)

   In this paper, the authors discuss a number of research questions regarding the possibility of evaluating Deep Neural Networks (DNN) used for Autonomous Driving Systems. Firstly, they introduce the idea of "neuron coverage", which is analogous to "code coverage" in traditional software. They show that neuron coverage is correlated with input-output diversity and that it can be used for systematic test generation. Building off this idea of neuron coverage, the authors aimed to test whether various image transformations (liner, affine, and convolutional) activate different neurons in the DNNs tested. They found that different image transformations do indeed activate different sets of neurons. Moreover, the authors noted that neuron coverage can be increased further by combining different image transformations on the same image (brightness and contrast, for example). In addition, through testing the DNNs with sets of transformed images, the authors were able to highlight over 1000 erroneous behaviours that were previously undetected by the models using the original images. Finally, using the DeepTest framework, the authors inferred that the accuracy of a DNN can be improved by up to 46% by ret-raining a DNN with synthetic images.

   While the study suffers from some weaknesses as listed below, this paper introduced some interesting contributions into the field of testing and verification of DNNs, and I would argue for it's acceptance due to the strong results that are indicative of improvements to the verification and testing of DNNs.


8) List 1-3 strengths of the paper.  (1-2 sentences each,
identified as S1, S2, S3.)

S1. The paper is well-written and is easy to follow, even for readers who do not have a strong Machine Learning background. The paper goes into some detail regarding the differences in architecture between Convolutional Neural Networks and Recurrent Neural Networks, allowing the reader to have a solid understanding of the concepts, without burdening them with too much detail.

9) List 1-3 weaknesses of the paper (1-2 sentences each,
identified as W1, W2, W3.)

W1. The sample images presented, such as in Figure 7 and 8, were incredibly small, hence, the differences between the original and transformed images were not easily identifiable.

W2. Some details such as the difference between cumulative coverage of transformations introduced in the discussion on RQ2 and cumulative transformations in RQ3 are not explained in enough detail.

W3. While the authors acknowledge that this paper only studies steering angle, they do not address the concern that outputs to steering angle must also be accounted for in the adjustment of accelration and brake controls (when steering angle is adjusted greatly, one must increase brake and reduce acceleration). In other words, the adjustment of steering angle can not happen in a vacuum, and most likely must be accompanied with a change in brake and acceleration values.