CS2125 Paper Review Form - Winter 2018 Reviewer: Azadeh Assadi Paper Title: Mega-modeling for Big Data Analytics Author(s): Stefano Ceri, Emanuele Della Valle, Dino Pedreschi, and Roberto Trasarti 1) Is the paper technically correct? [X] Yes [ ] Mostly (minor flaws, but mostly solid) [ ] No 2) Originality [ ] Very good (very novel, trailblazing work) [X] Good [ ] Marginal (very incremental) [ ] Poor (little or nothing that is new) 3) Technical Depth [ ] Very good (comparable to best conference papers) [X] Good (comparable to typical conference papers) [ ] Marginal depth [ ] Little or no depth 4) Impact/Significance [ ] Very significant [X] Significant [ ] Marginal significance. [ ] Little or no significance. 5) Presentation [ ] Very well written [X] Generally well written [ ] Readable [ ] Needs considerable work [ ] Unacceptably bad 6) Overall Rating [ ] Strong accept (award quality) [X] Accept (high quality - would argue for acceptance) [ ] Weak Accept (borderline, but lean towards acceptance) [ ] Weak Reject (not sure why this paper was published) 7) Summary of the paper's main contribution and rationale for your recommendation. (1-2 paragraphs) The authors of this article propose the concept of mega-modelling as a new way of modeling systems that would handle big data. These mega-models are meant to denote the higher order relationships between models and trace the dependencies between them during their evolution. Therefore, these models have a computational nature to support the dynamic aspects of big data with respect to inspection, adaptation and integration. The authors begin by explaining that mega-modules are software components that are capable of processing “big data” for analysis. They state correctly that data is domain specific while patters within the data are domain independent, the schema of which potentially formulating the science that is to be discovered and understood. They also suggest that the mega-schema of data prepared for analysis should follow a domain specific ontological design. The authors then explain three phases to mega-module computation being data preparation (processing input data for the purpose of preparing it for further analysis), data analysis (i.e. the scientific processing), and data evaluation (post-processing and contextualizing for the user). Such a setup allows for two inspection points and feedback by mega-module controls. Composition abstraction is described as the means by which various mega-modules are combined together to allow for further complex analysis. Two types are discussed and explained: general purpose composition abstractions (Pipeline decomposition, parallel decomposition, and Map-reduced decomposition) and specific composition abstractions (what-if control, drift control, and component-based graph decomposition). Finally, the authors apply their modeling technique to a few post-design examples to illustrate its application. Since this paper was originally published in 2012, the contents and the approach to big data is novel and impactful to the field given that many of its proposed analytical techniques are widely used in big data today. 8) List 1-3 strengths of the paper. (1-2 sentences each, identified as S1, S2, S3.) S1 – well written overall with good examples from the real world S2 – excellent graphics to illustrate the various composition abstraction techniques 9) List 1-3 weaknesses of the paper (1-2 sentences each, identified as W1, W2, W3.) W1 – It would have been beneficial if the authors had explained the concept of “big data” more clearly in the paper to separate it from “large data”. W2 – It would have been beneficial if future areas of focus and improvement were also mentioned in the paper (e.g. potentially, developing mechanisms by which these models and their results could be checked/verified for accuracy)