CS2125 Paper Review Form - Winter 2019 Reviewer: Nils Wenzler Paper Title: DeepMutation: Mutation Testing of Deep Learning Systems Author(s): Lei Ma, Fuyun Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei-Xu, Chao Xie, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang 1) Is the paper technically correct? [X] Yes [ ] Mostly (minor flaws, but mostly solid) [ ] No 2) Originality [ ] Very good (very novel, trailblazing work) [X] Good [ ] Marginal (very incremental) [ ] Poor (little or nothing that is new) 3) Technical Depth [ ] Very good (comparable to best conference papers) [X] Good (comparable to typical conference papers) [ ] Marginal depth [ ] Little or no depth 4) Impact/Significance [ ] Very significant [X] Significant [ ] Marginal significance. [ ] Little or no significance. 5) Presentation [ ] Very well written [X] Generally well written [ ] Readable [ ] Needs considerable work [ ] Unacceptably bad 6) Overall Rating [ ] Strong accept (award quality) [X] Accept (high quality - would argue for acceptance) [ ] Weak Accept (borderline, but lean towards acceptance) [ ] Weak Reject (not sure why this paper was published) 7) Summary of the paper's main contribution and rationale for your recommendation. (1-2 paragraphs) This paper presents the intrigueing combination of the in software engineering well known mutation testing with deep neural networks. As with classical mutation testing, the authors want to evaluate the quality and sensitivity of a set of test cases for detecting software faults. In the paper they introduce two majorly different approaches of how mutation testing could be implemented in the context of deep neural networks. They introduce mutations on the training data set on the one hand. On the other hand, they introduce mutations to the trained model itself. 8) List 1-3 strengths of the paper. (1-2 sentences each, identified as S1, S2, S3.) S1: The paper gives a reasonable explanation for their approach and give a good explanation of why they consider the two general cases. S2: The paper presents a in general very interesting approach. S3: They explained the used mutations thoroughly. 9) List 1-3 weaknesses of the paper (1-2 sentences each, identified as W1, W2, W3.) W1: The findings/results of the paper are very hard to understand. It is still unclear to me whether this approach did work or not. W2: Some of the mutations, such as switching of neurons may be irrelevant because of training with dropout. W3: The classical mutation testing introduces realisitic error cases. In a real training, neurons will never be accidentally e.g. switched.