CS2125 Paper Review Form - Winter 2019

Reviewer: Eric Langlois

Paper Title: Semantic Adversarial Deep Learning

Author(s): Tommaso Dreossi, Somesh Jha, Sanjit A. Seshia

1) Is the paper technically correct?
 [X] Yes
 [ ] Mostly (minor flaws, but mostly solid)
 [ ] No

2) Originality
 [ ] Very good (very novel, trailblazing work)
 [ ] Good
 [ ] Marginal (very incremental)
 [X] Poor (little or nothing that is new)

3) Technical Depth
 [ ] Very good (comparable to best conference papers)
 [ ] Good (comparable to typical conference papers)
 [X] Marginal depth
 [ ] Little or no depth

4) Impact/Significance
 [ ] Very significant
 [ ] Significant
 [X] Marginal significance.
 [ ] Little or no significance.

5) Presentation
 [ ] Very well written
 [X] Generally well written
 [ ] Readable
 [ ] Needs considerable work
 [ ] Unacceptably bad

6) Overall Rating
 [ ] Strong accept (award quality)
 [ ] Accept (high quality - would argue for acceptance)
 [ ] Weak Accept (borderline, but lean towards acceptance)
 [X] Weak Reject (not sure why this paper was published)


7) Summary of the paper's main contribution and rationale
   for your recommendation. (1-2 paragraphs)

The main contribution of this paper is an abstract compositional verification
approach for systems involving machine-learned (ML) components. In the proposed
approach, system-level constraints are used to develop a "region of uncertainty"
(ROC) for an integrated ML component. The ROC then guides a ML-specific analyzer
to find prediction errors, which are then checked against the system level
constraints as a whole. The authors perform several experiments identifying
errors in ML models. The paper also includes a large background section on
machine learning models and attacks.

While the proposed approach is interesting, I find that this paper lacks focus
and depth. After describing the compositional verification algorithm in
abstract, the authors proceed to experiment with the non-novel part: the ML
model-specific counterexample generation. The paper then moves on to
an analysis of the relatively old and well-studied hinge loss function.
I recommend that the authors shorten the background, remove the discussion of
hinge loss, describe the ROC generation in greater detail, and perform
experiments investigating the entire compositional verification algorithm.


8) List 1-3 strengths of the paper.  (1-2 sentences each,
identified as S1, S2, S3.)

S1. The proposed compositional verification algorithm appears practical and
potentially effective. Its description is clear and well-written.

S2. The focus on semantic adversarial analysis is motivated well, and I support
the authors advocating for semantics as a guiding principle when generating
counterexamples to machine learning systems.

9) List 1-3 weaknesses of the paper (1-2 sentences each,
identified as W1, W2, W3.)

W1. None of the experiments appear to involve the main proposal: the
compositional verification approach. The experiments are focused on finding
counterexamples using (1) a simple sampling strategy (2) over a space that
appears hand-designed, not derived from system-level constraints.

W2. The analysis of the hinge loss is out of place and does not contribute to
the paper. Given the history of the hinge loss function I find it unlikely that
the content presented is particularly novel.

W3. The region of uncertainty generation algorithm is not sufficiently analyzed.
How is the "completely-wrong classifier" defined when there are multiple
possible classes? What if the system uses the probabilities output by the model
like the authors advocate, not just the classes? What if the system invokes the
model multiple times, how is the combinatorial growth of possible predictions
handled?