CS2125 Paper Review Form - Winter 2019 Reviewer: Nils Wenzler Paper Title: DeepRoad: GAN-Based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems Author(s): Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khushid 1) Is the paper technically correct? [X] Yes [ ] Mostly (minor flaws, but mostly solid) [ ] No 2) Originality [ ] Very good (very novel, trailblazing work) [ ] Good [X] Marginal (very incremental) [ ] Poor (little or nothing that is new) 3) Technical Depth [ ] Very good (comparable to best conference papers) [X] Good (comparable to typical conference papers) [ ] Marginal depth [ ] Little or no depth 4) Impact/Significance [ ] Very significant [ ] Significant [X] Marginal significance. [ ] Little or no significance. 5) Presentation [ ] Very well written [X] Generally well written [ ] Readable [ ] Needs considerable work [ ] Unacceptably bad 6) Overall Rating [ ] Strong accept (award quality) [ ] Accept (high quality - would argue for acceptance) [X] Weak Accept (borderline, but lean towards acceptance) [ ] Weak Reject (not sure why this paper was published) 7) Summary of the paper's main contribution and rationale for your recommendation. (1-2 paragraphs) In their paper, the authors present an approach to automatically generating more realisitic test inputs for testing deep neural networks. Their approach for reaching that goal lies within the usage of generative adversarial networks (GANs) to simulate different wheather conditions such as fog and snow. Their main motivation seems to lie within a weakness that they have identified in earlier papers. Those papers had used very basic approaches to generate new input images. Although their input generation varies, their target value generation lies within metaphoric relations as it was used in earlier papers as well. 8) List 1-3 strengths of the paper. (1-2 sentences each, identified as S1, S2, S3.) S1: The paper improves on the weakness of somehow unrealistic inputs generated by earlier approaches to automated testing. S2: The paper is in general well written. S3: The paper has fancy looking 3D plots. 9) List 1-3 weaknesses of the paper (1-2 sentences each, identified as W1, W2, W3.) W1: The originality of their approach feels borderline to me. They basically just applied GANs to generate input images. W2: Since I am looking for borderline cases to be generated. How do I train my GAN if I didn't have enough images of borderline cases in the first place. W3: GANs may produce semantically wrong images that are unrealistic as well. This new problem is not resolved.