CS2125 Paper Review Form - Winter 2019 Reviewer: Ali Harakeh Paper Title: DeepXplore: Automated Whitebox Testing of Deep Learning Systems Author(s): Pei, Cao, Yang, and Jana 1) Is the paper technically correct? [ ] Yes [X] Mostly (minor flaws, but mostly solid) [ ] No 2) Originality [ ] Very good (very novel, trailblazing work) [X] Good [ ] Marginal (very incremental) [ ] Poor (little or nothing that is new) 3) Technical Depth [ ] Very good (comparable to best conference papers) [X] Good (comparable to typical conference papers) [ ] Marginal depth [ ] Little or no depth 4) Impact/Significance [ ] Very significant [X] Significant [ ] Marginal significance. [ ] Little or no significance. 5) Presentation [ ] Very well written [X] Generally well written [ ] Readable [ ] Needs considerable work [ ] Unacceptably bad 6) Overall Rating [ ] Strong accept (award quality) [X] Accept (high quality - would argue for acceptance) [ ] Weak Accept (borderline, but lean towards acceptance) [ ] Weak Reject (not sure why this paper was published) 7) Summary of the paper's main contribution and rationale for your recommendation. (1-2 paragraphs) This paper presents deep explore, the first white box framework for systematic testing of real-world deep learning systems. Testing is performed using neuron coverage as the main metric to estimate the number of neurons activated via test examples. This metric is used jointly with ensembles of the same Deep model to find behavioral differences between them. The results of this exploration paradigm can be used during training to improve the performance of the target deep neural networks. 8) List 1-3 strengths of the paper. (1-2 sentences each, identified as S1, S2, S3.) S1: The paper is one of the first to tackle white box coverage testing for deep neural networks. S2: The provided optimization procedure is useful to explore input that could be problematic enough to generate different behavior within the same ensemble. S3: The optimization procedure can further be used to force all members of the ensemble to provide a consistent behavior for the same input. 9) List 1-3 weaknesses of the paper (1-2 sentences each, identified as W1, W2, W3.) W1: Domain specific constraints are required for the optimization procedure to work. Extending deep explore to new domains will require the formulation of new constraints, which might be non-trivial. W2: Multiple neural networks performing the same task is a must if using deep explore. This hinders coverage testing of novel neural networks on novel tasks.