CS2125 Paper Review Form - Winter 2019 Reviewer: Zi Yi Chen Paper Title: DeepRoad: GAN-Based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems Author(s): Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid 1) Is the paper technically correct? [X] Yes [ ] Mostly (minor flaws, but mostly solid) [ ] No 2) Originality [ ] Very good (very novel, trailblazing work) [ ] Good [X] Marginal (very incremental) [ ] Poor (little or nothing that is new) 3) Technical Depth [X] Very good (comparable to best conference papers) [ ] Good (comparable to typical conference papers) [ ] Marginal depth [ ] Little or no depth 4) Impact/Significance [ ] Very significant [X] Significant [ ] Marginal significance. [ ] Little or no significance. 5) Presentation [ ] Very well written [X] Generally well written [ ] Readable [ ] Needs considerable work [ ] Unacceptably bad 6) Overall Rating [ ] Strong accept (award quality) [X] Accept (high quality - would argue for acceptance) [ ] Weak Accept (borderline, but lean towards acceptance) [ ] Weak Reject (not sure why this paper was published) 7) Summary of the paper's main contribution and rationale for your recommendation. (1-2 paragraphs) This paper has criticized DeepTest for not able to generate authentic images that reflect realistic road conditions. The authors used UNIT, a DNN-based method to perform unsupervised image-to-image transformation composed by GAN and VAE. DeepRoad then put the synthesized images along with the original images to the autonomous driving systems for metamorphic testing. The results show that DeepRoad can generate realiszed synthesized images and can determine driving inconsistency behaviours. This paper has mostly used off the shelf technologies such as UNIT for the system, and tested with metamorphic testing technique that was used in DeepTest as well, which lacks a bit originality. The metamorphic relation they proposed is also questionable as changing driving environment always changes driving behaviours, but the authors treat that as inconsisitency behaviours. Although they tested varies error bounds, they didn't mention what's an appropriate error bound for the behaviour to be considered as inconsistent. 8) List 1-3 strengths of the paper. (1-2 sentences each, identified as S1, S2, S3.) S1: GAN-based image transformation show significant improvement in the authenticity of the quality of the generated images S2: They tested their system with multiple ADS to show that the system not only can detect inconsistent driving behaviours but it can also be a tool to test how well the ADS performs compare to others 9) List 1-3 weaknesses of the paper (1-2 sentences each, identified as W1, W2, W3.) W1: their metamorphic relation for testing doesn't reflect real driving behaviours as described above W2: using neural network to generate test cases for another neural network seems like vicious cycle, the few photos they shown in the paper look realistic but how do they validate that all generated test cases are realistic?