## Papers

**On the descriptive complexity of groups without Abelian normal subgroups**.

arXiv:2209.13725 [cs.LO, cs.CC, math.GR, math.LO], Sep 2022 (arXiv)

Abstract BibTeX

@misc{GrochowLevetWL2, AUTHOR = {Joshua A. Grochow and Michael Levet}, TITLE = {On the descriptive complexity of groups without {Abelian} normal subgroups}, YEAR = {2022}, HOWPUBLISHED = {arXiv:2209.13725 [cs.LO]}, }

In this paper, we explore the descriptive complexity theory of finite groups by examining the power of the second Ehrenfeucht-Fraïssé bijective pebble game in Hella's (Ann. Pure Appl. Log., 1989) heirarchy. This is a Spoiler-Duplicator game in which Spoiler can place up to two pebbles each round. While it trivially solves graph isomorphism, it may be nontrivial for finite groups, and other ternary relational structures. We first provide a novel generalization of Weisfeiler-Leman (WL) coloring, which we call 2-ary WL. We then show that the 2-ary WL is equivalent to the second Ehrenfeucht-Fraïssé bijective pebble game in Hella's heirarchy.

Our main result is that, in the pebble game characterization, only O(1) pebbles and O(1) rounds are sufficient to identify all groups without Abelian normal subgroups (a class of groups for which isomorphism testing is known to be in P; Babai, Codenotti, & Qiao, ICALP 2012). In particular, we show that within the first few rounds, Spoiler can force Duplicator to select an isomorphism between two such groups at each subsequent round. By Hella's results (*ibid.*), this is equivalent to saying that these groups are identified by formulas in first-order logic with generalized 2-ary quantifiers, using only O(1) variables and O(1) quantifier depth.

**Matrix multiplication via matrix groups**.

arXiv:2204.03826 [math.GR, cs.DS, math.CO], Apr 2022 (arXiv)

*14th Innovations in Theoretical Computer Science (ITCS)*2023

Abstract BibTeX

@misc{MatMultMatGroups, AUTHOR = {Blasiak, Jonah and Cohn, Henry and Grochow, Joshua A. and Pratt, Kevin and Umans, Chris} TITLE = {Matrix multiplication via matrix groups}, YEAR = {2022}, HOWPUBLISHED = {arXiv:2209.13725 [cs.LO]}, NOTE = {To appear, ITCS '23}, }

In 2003, Cohn and Umans proposed a group-theoretic approach to bounding the exponent of matrix multiplication. Previous work within this approach ruled out certain families of groups as a route to obtaining ω=2, while other families of groups remain potentially viable. In this paper we turn our attention to matrix groups, whose usefulness within this framework was relatively unexplored.

We first show that groups of Lie type cannot prove ω=2 within the group-theoretic approach. This is based on a representation-theoretic argument that identifies the second-smallest dimension of an irreducible representation of a group as a key parameter that determines its viability in this framework. Our proofs build son Gowers's result concerning product-free sets in quasirandom groups. We then give another barrier that rules out certain natural matrix group constructions that make use of subgroups that are far from being self-normalizing.

Our barrier results leave open several natural paths to obtain ω=2 via matrix groups. To explore these routes we propose working in the continuous setting of Lie groups, in which we develop an analogous theory. Obtaining the analogue of ω=2 in this potentially easier setting is a key challenge that represents an intermediate goal short of actually proving ω=2. We give two constructions in the continuous setting, each of which evades one of our two barriers.

**Experience Report: Standards-Based Grading at Scale in Algorithms**.

Proc. 27th ACM Conf. Innovation and Technology in Computer Science Education (ITiCSE '22) (doi)

arXiv:2204.12046 [cs.CY], Dec 2021 (arXiv)

Abstract BibTeX

@inproceedings{CGLLSBG, AUTHOR = {Lijun Chen and Joshua A. Grochow and Ryan Layer and Michael Levet}, TITLE = {Experience Report: Standards-Based Grading at Scale in Algorithms}, YEAR = {2022}, BOOKTITLE = {Proc. 27th ACM Conf. Innovation and Technology in Computer Science Education (ITiCSE '22)}, DOI = {10.1145/3502718.3524750}, NOTE = {Preprint of full version available at arXiv:2204.12046 [cs.CY]}, }

We report our experiences implementing standards-based grading at scale in an Algorithms course, which serves as the terminal required CS Theory course in our department's undergraduate curriculum. The course had 200-400 students, taught by two instructors, eight graduate teaching assistants, and supported by two additional graders and several undergraduate course assistants. We highlight the role of standards-based grading in supporting our students during the COVID-19 pandemic. We conclude by detailing the successes and adjustments we would make to the course structure.

**On the parallel complexity of Group Isomorphism via Weisfeiler-Leman**.

arXiv:1905.02518 [cs.DS, cs.CC, cs.LO, math.GR], Dec 2021 (arXiv)

Abstract BibTeX

@misc{GrochowLevetWL, AUTHOR = {Joshua A. Grochow and Michael Levet}, TITLE = {On the parallel complexity of {Group} {Isomorphism} via {Weisfeiler}--{Leman}}, YEAR = {2021}, HOWPUBLISHED = {arXiv:2112.11487 [cs.DS]}, }

In this ppaer, we show that the constant-dimension Weisfeiler-Leman algorithm for groups (Brachter & Schweitzer, LICS 2020, arXiv) can be fruitfully used to improved parallel complexity upper bounds on isomorphism testing in several families of groups. In particular, we show:

- Groups with an Abelian normal Hall subgroup whose complement is O(1)-generated are identified by constant-dimensional Weisfeiler-Leman using only a constant number of rounds. This places isomorphism testing for this family of groups into LOGSPACE; the previous upper bound for isomorphism testing was P (Qiao, Sarma, & Tang, STACS 2011).
- We use the individualize-and-refine paradigm to obtain a quasiSAC
^{1}isomorphism test for groups without Abelian normal subgroups, previously only known to be in P (Babai, Codenotti, & Qiao, ICALP 2012). - We extend a result of Brachter & Schweitzer (arXiv, 2021) on direct products of groups to the parallel setting. Namely, we also show that Weisfeiler-Leman can identify direct products in parallel, provided it can identify each of the indecomposable direct factors in parallel. They previously showed the analogous result for P.

We finally consider the count-free Weisfeiler-Leman algorithm, where we show that count-free WL is unable to even distinguish Abelian groups in polynomial-time. Nonetheless, we use count-free WL in tandem with bounded non-determinism and limited counting to obtain a new upper bound of β_{1}MAC^{0}(FOLL) for isomorphism testing of Abelian groups. This improves upon the previous TC^{0}(FOLL) upper bound due to Chattopadhyay, Torán, & Wagner (ACM Trans. Comput. Theory, 2013)

**Polynomial-time Axioms of Choice and polynomial-time cardinality**.

arXiv:2301.07123 [cs.CC, math.LO], Jan 2023 (arXiv)

To appear in commemorative volume of Theory of Computing Systems in honor of Alan L. Selman

Abstract BibTeX

@misc{GrochowAC, AUTHOR = {Grochow, Joshua A.}, TITLE = {Polynomial-time Axioms of Choice and polynomial-time cardinality}, YEAR = {2023}, HOWPUBLISHED = {arXiv:2301.07123 [cs.CC]}, NOTE = {To appear in commemorative volume of Theory of Computing Systems in honor of Alan L. Selman}, }

There is no single canonical polynomial-time version of the Axiom of Choice (AC); several statements of AC that are equivalent in Zermelo-Fraenkel (ZF) set theory are already inequivalent from a constructive point of view, and are similarly inequivalent from a complexity-theoretic point of view. In this paper we show that many classical formulations of AC, when restricted to polynomial time in natural ways, are equivalent to standard complexity-theoretic hypotheses, including several that were of interest to Selman. This provides a unified view of these hypotheses, and we hope provides additional motivation for studying some of the lesser-known hypotheses that appear here.

Additionally, because several classical forms of AC are formulated in terms of cardinals, we develop a theory of polynomial-time cardinality. Nerode & Remmel (*Contemp. Math.* 106, 1990 and Springer Lec. Notes Math. 1432, 1990) developed a related theory, but restricted to unary sets. Downey (Math. Reviews MR1071525) suggested that such a theory over larger alphabets could have interesting connections to more standard complexity questions, and we illustrate some of those connections here.

The connections between AC, cardinality, and complexity questions also allow us to highlight some of Selman's work. We hope this paper is more of a beginning than an end, introducing new concepts and raising many new questions, ripe for further research.

**On p-Group Isomorphism: search-to-decision, counting-to-decision, and nilpotency class reductions via tensors**.

*Computational Complexity Conference (CCC)*, 2021, based on part of this preprint. (doi)

CCC conference video

Abstract BibTeX

@incollection{GrochowQiaoPGroups, AUTHOR = {Grochow, Joshua A. and Qiao, Youming}, TITLE = {On p-Group Isomorphism: search-to-decision, counting-to-decision, and nilpotency class reductions via tensors}, YEAR = {2021}, BOOKTITLE = {36th Computational Complexity Conference (CCC '21)}, DOI = {10.4230/LIPIcs.CCC.2021.16}, NOTE = {Preliminary version available as part of arXiv:1907.00309 [cs.CC]}, }

In this paper we study some classical complexity-theoretic questions regarding Group Isomorphism (GpI). We focus on p-groups (groups of prime power order) with odd p, which are believed to be a bottleneck case for GpI, and work in the model of matrix groups over finite fields. Our main results are as follows.

- Although search-to-decision and counting-to-decision reductions have been
known for over four decades for Graph Isomorphism (GI), they had remained open for GpI, explicitly asked by Arvind & Torán (
*Bull. EATCS*, 2005). Extending methods from Tensor Isomorphism (Grochow & Qiao, ITCS 2021), we show moderately exponential-time such reductions within p-groups of class 2 and exponent p. - Despite the widely held belief that p-groups of class 2 and exponent p
are the hardest cases of GpI, there was no reduction to these groups from
*any*larger class of groups. Again using methods from Tensor Isomorphism (ibid.), we show the first such reduction, namely from isomorphism testing of p-groups of "small" class and exponent p to those of class*two*and exponent p.

For the first results, our main innovation is to develop linear-algebraic analogues of classical graph coloring gadgets, a key technique in studying the structural complexity of GI. Unlike the graph coloring gadgets, which support restricting to various subgroups of the symmetric group, the problems we study require restricting to various subgroups of the general linear group, which entails significantly different and more complicated gadgets. The
analysis of one of our gadgets relies on a classical result from group theory
regarding random generation of classical groups (Kantor & Lubotzky, *Geom. Dedicata*, 1990). For the nilpotency class reduction, we combine a runtime analysis
of the Lazard Correspondence with Tensor Isomorphism-completeness
results (Grochow & Qiao, *ibid.*).

**An improved algorithm for coarse-graining cellular automata**.

arXiv:2012.12153 [nlin.CG, cond-mat.stat-mech, cs.DS, nlin.PS], December 2020 (arXiv)

Abstract BibTeX

@misc{SongGrochowCA, AUTHOR = {Song, Yerim and Grochow, Joshua A.}, TITLE = {An improved algorithm for coarse-graining cellular automata}, YEAR = {2020}, HOWPUBLISHED = {arXiv:2012.12153 [nlin.CG]}, }

*Phys. Rev. Lett. 2004, Phys. Rev. E 2006*) showed how to coarse-grain (elementary) cellular automata (CA). Their algorithm for finding coarse-grainings of supercell size N took doubly-exponential 2

^{2N}-time, and thus only allowed them to explore supercell sizes N ≤ 4. Here we introduce a new, more efficient algorithm for finding coarse-grainings between any two given CA that allows us to systematically explore all elementary CA with supercell sizes up to N=7, and to explore individual examples of even larger supercell size. Our algorithm is based on a backtracking search, similar to the DPLL algorithm with unit propagation for the NP-complete problem of Boolean Satisfiability.

**Average-case algorithms for testing isomorphism of polynomials, algebras, and multilinear forms**.

*Symposium on Theoretical Aspects of Computer Science (STACS), 2021*(doi)

arXiv:2012.01085 [cs.DS], Dec 2020 (arXiv)

Abstract BibTeX

@InProceedings{GrochowQiaoTangAvg, author = {Grochow, Joshua A. and Qiao, Youming and Tang, Gang}, title = {Average-Case Algorithms for Testing Isomorphism of Polynomials, Algebras, and Multilinear Forms}, booktitle = {38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021)}, pages = {38:1--38:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-180-1}, ISSN = {1868-8969}, year = {2021}, volume = {187}, editor = {Bl\"{a}ser, Markus and Monmege, Benjamin}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, URN = {urn:nbn:de:0030-drops-136836}, doi = {10.4230/LIPIcs.STACS.2021.38}, note = {Preprint of full version at arXiv:2012.01085 [cs.DS]} }

_{q}[x

_{1},...,x

_{n}], and decides whether f and g are isomorphic in time q

^{O(n)}for most f. This average-case setting has direct practical implications, having been studied in multivariate cryptography since the 1990s. Our second result concerns the complexity of testing equivalence of alternating trilinear forms. This problem is of interest in both mathematics and cryptography. We show that this problem is polynomial-time equivalent to testing equivalence of symmetric trilinear forms, by showing that they are both Tensor Isomorphism-complete (Grochow-Qiao, ITCS, '21; arXiv), therefore is equivalent to testing isomorphism of cubic forms over most fields.

**Complexity in ideals of polynomials: questions on algebraic complexity of circuits and proofs**.

*Bull. EATCS*No. 130, Feb 2020 (pdf)

Abstract BibTeX

@article{GrochowBEATCS, AUTHOR = {Grochow, Joshua A.}, TITLE = {Complexity in ideals of polynomials: questions on algebraic complexity of circuits and proofs}, YEAR = {2020}, FJOURNAL = {Bulletin of the European Association for Theoretical Computer Science}, JOURNAL = {Bull. EATCS}, NUMBER = {130}, URL = {http://bulletin.eatcs.org/index.php/beatcs/article/view/607}, }

_{n}⊆ F[x

_{1},...,x

_{n}] for each n, what can we say about the circuit complexity of polynomial families f

_{n}in those ideals, that is, such that f

_{n}∈ I

_{n}for all n? Such ideals and their cosets arise naturally in algebraic circuit lower bounds, geometric complexity theory, and algebraic proof complexity. For ideals generated by a single element, this is the question of relating the complexity of a polynomial to the complexity of its factors, which has a long and rich history. For general ideals, essentially nothing beyond that is known, even for ideals generated by just 2 elements. For a few examples of specific ideals of interest coming from circuit lower bounds or proof complexity, some lower bounds on polynomials in these ideals are known using succinct hitting sets (Forbes-Shpilka-Volk,

*Theory Comput.*, 2019) and circuit techniques (Forbes-Shpilka-Tzameret-Wigderson, CCC 2016). In this survey, we review these connections & motivations, and raise many questions that we hope will help shed light on the complexity landscape of polynomials in ideals.

**Isomorphism problems for tensors, groups, and cubic forms: completeness and reductions**.

arXiv:1907.00309 [cs.CC, math.AG, math.GR, math.RT, quant-ph], Jul 2019 (arXiv)

*12th Innovations in Theoretical Computer Science (ITCS)*2021 (doi)

ITCS full video, ITCS shorter live version with Q & A

Video from Banff BIRS 2019 Workshop on Algebraic Techniques in Computational Complexity

Abstract BibTeX

@misc{GrochowQiaoTensors, AUTHOR = {Grochow, Joshua A. and Qiao, Youming}, TITLE = {Isomorphism problems for tensors, groups, and cubic forms: completeness and reductions}, YEAR = {2019}, HOWPUBLISHED = {arXiv:1907.00309 [cs.CC]}, }

- All the aforementioned isomorphism problems are equivalent under polynomial-time reductions, in conjunction with the recent results of Futorny-Grochow-Sergeichuk (Lin. Alg. Appl., 2019; arXiv:1810.09219).
- Isomorphism of d-tensors reduces to isomorphism of 3-tensors, for any d≥3.

- We give a reduction from GpI for p-groups of exponent p and small class (c<p) to GpI for p-groups of exponent p and class 2. The latter are widely believed to be the hardest cases of GpI, but as far as we know, this is the first reduction from any more general class of groups to this class.
- We give a search-to-decision reduction for isomorphism of p-groups of exponent p and class 2 in time |G|
^{O(loglog|G|)}. While search-to-decision reductions for Graph Isomorphism (GI) have been known for more than 40 years, as far as we know this is the first non-trivial search-to-decision reduction in the context of GpI.

**Ecogeographical rules and the macroecology of food webs**.

*Global Ecology & Biogeography*, 28:1204-1218, 2019, concept paper (doi)

Abstract BibTeX

@article{BaiserEtAlMacroFoodWebs, AUTHOR = {Baiser, Benjamin and Gravel, Dominique and Cirtwell, Alyssa R. and Dunne, Jennifer A. and Fahimipour, Ashkaan K. and Gilarranz, Luis J. and Grochow, Joshua A. and Li, Daijiang and Martinez, Neo D. and McGrew, Alicia and Romanuk, Tamara N. and Stouffer, Daniel B. and Trotta, Lauren B. and Valdovinos, Fernanda S. and Williams, Richard J. and Wood, Spencer A. and Yeakel, Justin D.}, TITLE = {Ecogeographical rules and the macroecology of food webs}, YEAR = {2019}, NUMBER = {28}, PAGES = {1204--1218}, FJOURNAL = {Global Ecology and Biogeography---A Journal of Macroecology}, JOURNAL = {Glob. Ecol. \& Biogeog.}, DOI = {10.111/geb.12925}, NOTE = {Available online May 20, 2019}, }

**Incorporating Weisfeiler-Leman into algorithms for group isomorphism**.

arXiv:1905.02518 [cs.CC, math.GR], May 2019 (arXiv)

Abstract BibTeX

@misc{WLGroupIso, AUTHOR = {Brooksbank, Peter A. and Grochow, Joshua A. and Li, Yinan and Qiao, Youming and Wilson, James B.}, TITLE = {Incorporating {Weisfeiler}--{Leman} into algorithms for group isomorphism}, YEAR = {2019}, HOWPUBLISHED = {arXiv:1905.02518 [cs.CC]}, }

- Let the width of a filter be the dimension of the largest quotient of two adjacent subgroups of the filter; the color-ratio of our hypergraph captures how much smaller a color class is compared to the layer of the filter it is coloring. When we use genus-g quotients and hypergraph k-WL, we can solve isomorphism for solvable groups of order n in time
(n / color-ratio)

In the "base case," where the solvable radical is itself low-genus and the semisimple part acts trivially, we can get a better guaranteed running time of n^{width}poly(n) + n^{O(gk)}^{O(log log n)}, by combining cohomological techniques (Grochow-Qiao, CCC '14, SIAM J. Comput., 2017), code equivalence (Babai-Codenotti-Grochow-Qiao, SODA '11), and low-genus isomorphism ([BMW], ibid.). - We introduce a new random model of finite groups. Unlike previous models, we prove that our model has good coverage, in that it produces a wide variety of groups, and in particular a number of distinct isomorphism types that is logarithmically equivalent to the number of all isomorphism types. In this random model, we show that our filter-and-1-WL refinement method results in constant average width (the above result uses max width).
- For p-groups of class 2 and exponent p—widely believed to be the hardest cases of GpI, and where we also expect the above techniques to get stuck—we improve on the average-case algorithm of Li-Qiao (FOCS '17). Our new algorithm is simpler and applies to a larger fraction of random p-groups of class 2 and exponent p. The previous algorithm was based on a linear-algebraic analogue of the individualize-and-refine technique; our new algorithm combines that technique with concepts from isomorphism of low-genus groups. We also implement this algorithm in MAGMA and show that in experiments it improves over the default (brute force) algorithm for this problem.

**Wildness for tensors**.

*Linear Algebra and its Applications*, 566(1):212-244 (doi)

Preprint available as arXiv:1810.09219 [math.RT] (arXiv)

Abstract BibTeX

@article{FutornyGrochowSergeichukTensorWild, AUTHOR = {Vyacheslav Futorny and Joshua A. Grochow and Sergeichuk, Vladimir V.}, TITLE = {Wildness for tensors}, JOURNAL = {Lin. Alg. Appl.}, FJOURNAL = {Linear Algebra and its Applications}, YEAR = {2019}, VOLUME = {566}, ISSUE = {1}, PAGES = {212--244}, DOI = {10.1016/j.laa.2018.12.022}, NOTE = {Preprint arXiv:1810.09219 [math.RT]}, }

[ain which [u_{ijk}]_{i=1}^{m}_{j=1}^{n}_{k=1}^{t}→ [Σ_{ijk}a_{ijk}u_{ii'}v_{jj'}w_{kk'}]_{i'=1}^{m}_{j'=1}^{n}_{k'=1}^{t}

_{ii'}], [v

_{jj'}], [w

_{kk'}] are nonsingular matrices: this problem contains the problem of classifying an arbitrary system of tensors of order at most three.

**New applications of the polynomial method: The cap set conjecture and beyond**.

*Bulletin of the American Mathematical Society*, 56(1):29-64, 2019 (online Oct 2018) (doi)

Abstract BibTeX

@article{GrochowCapSet, AUTHOR = {Joshua A. Grochow}, TITLE = {New applications of the polynomial method: The cap set conjecture and beyond}, JOURNAL = {Bulletin of the AMS}, YEAR = {2019}, MONTH = {Jan}, VOLUME = {56}, NUMBER = {1}, PAGES = {29--64}, DOI = {10.1090/bull/1648}, NOTE = {Published electronically Oct 2018}, }

^{n}be and contain no lines or, more generally, how can large a subset of (Z/pZ)

^{n}be and contain no arithmetic progressions. This problem was motivated by deep questions about structure in the prime numbers, the geometry of lattice points, and the design of statistical experiments. In 2016, Croot, Lev, and Pach solved the analogous problem in (Z/4Z)

^{n}, showing that the largest set without arithmetic progressions had size at most c

^{n}for some c < 4. Their proof was as elegant as it was unexpected, being a departure from the tried and true methods of Fourier analysis that had dominated the field for half a century. Shortly thereafter, Ellenberg and Gijswijt leveraged their method to resolve the original cap set problem. This expository article covers the history and motivation for the cap set problem and some of the many applications of the technique: from removing triangles from graphs, to rigidity of matrices, and to algorithms for matrix multiplication. The latter application turns out to give back to the original problem, sharpening our understanding of the techniques involved and of what is needed for showing that the current bounds are tight. Most of our exposition assumes only familiarity with basic linear algebra, polynomials, and the integers modulo N.

**Beyond number of bit erasures: New complexity questions raised by recently discovered thermodynamic costs of computation**.

*ACM SIGACT News*, June 2018 (doi)

Abstract BibTeX

@article{GrochowWolpertThermoSIGACTNews, AUTHOR = {Joshua A. Grochow and David H. Wolpert}, TITLE = {Beyond number of bit erasures: New complexity questions raised by recently discovered thermodynamic costs of computation}, JOURNAL = {SIGACT News}, VOLUME = {49}, ISSUE = {2}, PAGES = {33--56}, YEAR = {2018}, MONTH = {June}, DOI = {10.1145/3232679.3232689}, }

**Computational topology and the Unique Games Conjecture**.

*International Symposium on Computational Geometry (SoCG), 2018*(doi)

arXiv:1803.06800 [cs.CC, cs.CG, cs.DM, math.AT], Mar 2018 (arXiv)

Abstract BibTeX

@inproceedings{GrochowTuckerFoltzUG, AUTHOR = {Joshua A. Grochow and Jamie Tucker-Foltz}, TITLE = {Computational topology and the {Unique} {Games} {Conjecture}}, BOOKTITLE = {34th International Symposium on Computational Geometry (SoCG 2018)}, PAGES = {43:1--43:16}, SERIES = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-066-8}, ISSN = {1868-8969}, YEAR = {2018}, VOLUME = {99}, EDITOR = {Bettina Speckmann and Csaba D. T{\'o}th}, PUBLISHER = {Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik}, NOTE = {Preprint of full version available at arXiv:1803.06800 [cs.CC]}, DOI = {10.4230/LIPIcs.SoCG.2018.43}, }

Covering spaces of graphs have long been useful for studying expanders (as "graph lifts") and unique games (as the "label-extended graph"). In this paper we advocate for the thesis that there is a much deeper relationship between computational topology and the Unique Games Conjecture. Our starting point is Linial's 2005 observation that the only known problems whose inapproximability is equivalent to the Unique Games Conjecture—Unique Games and Max-2Lin—are instances of Maximum Section of a Covering Space on graphs. We then observe that the reduction between these two problems (Khot-Kindler-Mossel-O'Donnell, FOCS 2004 and SICOMP, 2007) gives a well-defined map of covering spaces. We further prove that inapproximability for Maximum Section of a Covering Space on (cell decompositions of) closed 2-manifolds is also equivalent to the Unique Games Conjecture. This gives the first new "Unique Games-complete" problem in over a decade.

Our results partially settle an open question of Chen and Freedman (SODA 2010 and Disc. Comput. Geom., 2011) from computational topology, by showing that their question is almost equivalent to the Unique Games Conjecture. (The main difference is that they ask for inapproximability over Z/2Z>, and we show Unique Games-completeness over Z/kZ for large k.) This equivalence comes from the fact that when the structure group G of the covering space is Abelian—or more generally for principal G-bundles—Maximum Section of a G-Covering Space is the same as the well-studied problem of 1-Homology Localization.

Although our most technically demanding result is an application of Unique Games to computational topology, we hope that our observations on the topological nature of the Unique Games Conjecture will lead to applications of algebraic topology to the Unique Games Conjecture in the future.

**Which groups are amenable to proving exponent two for matrix multiplication?**.

arXiv:1712.02302 [math.GR, cs.DS, math.CO], Dec 2017 (arXiv)

Abstract BibTeX

@misc{BCCGUgroupsMM, AUTHOR = {Blasiak, Jonah and Church, Thomas and Cohn, Henry and Grochow, Joshua A. and and Umans, Chris}, TITLE = {Which groups are amenable to proving exponent two for matrix multiplication?}, YEAR = {2017}, HOWPUBLISHED = {arXiv:1712.02302 [math.GR]}, }

The Cohn-Umans group-theoretic approach to matrix multiplication suggests embedding matrix multiplication into group algebra multiplication, and bounding ω in terms of the representation theory of the host group. This framework is general enough to capture the best known upper bounds on ω and is conjectured to be powerful enough to prove ω = 2, although finding a suitable group and constructing such an embedding has remained elusive. Recently it was shown, by a generalization of the proof of the Cap Set Conjecture, that abelian groups of bounded exponent cannot prove ω = 2 in this framework, which ruled out a family of potential constructions in the literature.

In this paper we study nonabelian groups as potential hosts for an embedding. We prove two main results:

- We show that a large class of nonabelian groups—nilpotent groups of bounded exponent satisfying a mild additional condition—cannot prove ω = 2 in this framework. We do this by showing that the shrinkage rates of powers of the augmentation ideal is similar to the shrinkage rate of the number of functions over (Z/pZ)
^{n}that are degree d polynomials; our proof technique can be seen as a generalization of the polynomial method used to resolve the Cap Set Conjecture. - We show that the symmetric groups S
_{n}cannot prove nontrivial bounds on ω when the embedding is via three Young subgroups—subgroups of the form S_{k1}x S_{k2}x ... x S_{kℓ}—which is a natural strategy that includes all known constructions in S_{n}.

By developing techniques for negative results in this paper, we hope to catalyze a fruitful interplay between the search for constructions proving bounds on ω and methods for ruling them out.

**Minimum circuit size, graph isomorphism, and related problems**.

*Innovations in Theoretical Computer Science (ITCS), 2018*(doi)

*SIAM J. Comput.*47(4):1339-1372, 2018 (doi)

arXiv:1710.09806 [cs.CC] (arXiv) and ECCC Technical Report TR17-158, October 2017 (ECCC)

Abstract BibTeX

@article{AGMMM, TITLE = {Minimum circuit size, graph isomorphism, and related problems}, AUTHOR = {Eric Allender and Joshua A. Grochow and Dieter van Melkebeek and Cristopher Moore and Andrew Morgan}, JOURNAL = {SIAM J. Comput.}, FJOURNAL = {SIAM Journal on Computing}, YEAR = {2018}, VOLUME = {47}, NUMBER = {4}, PAGES = {1339--1372}, NOTE = {Preliminary version in Innovations in Theoretics Computer Science (ITCS) 2018 (DOI:10.4230/LIPIcs.ITCS.2018.20). Also available as arXiv:1710.09806 [cs.CC] and ECCC Technical Report TR17-158.}, DOI = {10.1137/17M1157970}, }

**Designing Strassen's algorithm**.

arXiv:1708.09398 [cs.DS, cs.CC, cs.SC, math.RT], 2017. (arXiv) and ECCC Technical Report TR17-131, August 2017 (ECCC)

Video at the LA Combinatorics & Complexity seminar

Abstract BibTeX

@misc{GrochowMooreStrassen, AUTHOR = {Joshua A. Grochow and Cristopher Moore}, TITLE = {Designing {Strassen's} algorithm}, YEAR = {2017}, HOWPUBLISHED = {arXiv:1708.09398 [cs.DS]}, }

^{3}). While the recursive construction in his algorithm is very clear, the key gain was made by showing that 2 x 2 matrix multiplication could be performed with only 7 multiplications instead of 8. The latter construction was arrived at by a process of elimination and appears to come out of thin air. Here, we give the simplest and most transparent proof of Strassen's algorithm that we are aware of, using only a simple unitary 2-design and a few easy lines of calculation. Moreover, using basic facts from the representation theory of finite groups, we use 2-designs coming from group orbits to generalize our construction to all n (although the resulting algorithms aren't optimal for n at least 3).

**Comparing information-theoretic measures of complexity in Boltzmann machines**.

*Entropy*19(7):310, 2017. (doi)

arXiv:1706.09667 [cs.IT, cs.NE, q-bio.NC], 2017. (arXiv)

Abstract BibTeX

@article{KanwalGrochowAyMeasures, AUTHOR = {Maxinder S. Kanwal and Joshua A. Grochow and Nihat Ay}, TITLE = {Comparing Information-Theoretic Measures of Complexity in {Boltzmann} Machines}, YEAR = {2017}, JOURNAL = {Entropy}, VOLUME = {19}, ISSUE = {7}, PAGES = {310}, NOTE = {Open access. Special issue ``{Information} {Geometry} {II}.'' Also available as arXiv:1706.09667 [cs.IT]}, }

**On the records**.

arXiv:1705.04353 [physics.soc-ph, q-bio.PE, nlin.AO, cs.SI, cs.MA], May 2017. (arXiv)

Abstract BibTeX

@misc{72hS, AUTHOR = {Berdahl, Andrew and Bhat, Uttam and Ferdinand, Vanessa and Garland, Joshua and Ghazi-Zahedi, Keyan and Grana, Justin and Grochow, Joshua A. and Hobson, Elizabeth A. and Kallus, Yoav and Kempes, Christopher P. and Kolchinsky, Artemy and Larremore, Daniel B. and Libby, Eric and Power, Eleanor A. and Tracey, Brendan D. (Santa Fe Institute Postdocs)}, TITLE = {On the records}, YEAR = {2017}, HOWPUBLISHED = {arXiv:1705.04353 [physics.soc-ph]}, NOTE = {This paper was produced, from conception of idea, to execution, to writing, by a team in just 72 hours (see Appendix)}, }

**Towards an algebraic natural proofs barrier via polynomial identity testing**.

arXiv:1701.01717 [cs.CC, math.AG], January 2017 (arXiv) and ECCC Technical Report TR17-009, January 2017 (ECCC)

Abstract BibTeX

@misc{GKSSAlgebraicNaturalProofs, AUTHOR = {Grochow, Joshua A. and Kumar, Mrinal and Saks, Michael and Saraf, Shubhangi}, TITLE = {Towards an algebraic natural proofs barrier via polynomial identity testing}, YEAR = {2017}, HOWPUBLISHED = {ECCC Tech. Report TR17-009 and arXiv:1701.01717 [cs.CC]}, }

We observe that a certain kind of algebraic proof—which covers essentially all known algebraic circuit lower bounds to date—cannot be used to prove lower bounds against VP if and only if what we call succinct hitting sets exist for VP. This is analogous to the Razborov-Rudich natural proofs barrier in Boolean circuit complexity, in that we rule out a large class of lower bound techniques under a derandomization assumption. We also discuss connections between this algebraic natural proofs barrier, geometric complexity theory, and (algebraic) proof complexity.

**Matrix multiplication algorithms from group orbits**.

arXiv:1612.01527 [cs.CC, cs.DS, math.AG, math.RT], December 2016 (arXiv)

Abstract BibTeX

@misc{GrochowMooreMM, AUTHOR = {Grochow, Joshua A. and Moore, Cristopher}, TITLE = {Matrix multiplication algorithms from group orbits}, YEAR = {2016}, HOWPUBLISHED = {arXiv:1612.01527 [cs.CC]}, }

We show how to construct highly symmetric algorithms for matrix
multiplication. In particular, we consider algorithms which decompose the
matrix multiplication tensor into a sum of rank-1 tensors, where the
decomposition itself consists of orbits under some finite group action. We show
how to use the representation theory of the corresponding group to derive
simple constraints on the decomposition, which we solve by hand for n=2,3,4,5,
recovering Strassen's algorithm (in a particularly symmetric form) and new
algorithms for larger n. While these new algorithms do not improve the known
upper bounds on tensor rank or the matrix multiplication exponent, they are
beautiful in their own right, and we point out modifications of this idea that
could plausibly lead to further improvements. Our constructions also suggest
further patterns that could be mined for new algorithms, including a
tantalizing connection with lattices. In particular, using lattices we give the most transparent proof to date of Strassen's algorithm; the same proof works for all n, to yield a decomposition with n^{3} - n + 1 terms.

**A quantitative definition of organismality and its application to lichen**.

arXiv:1612.00036 [q-bio.OT], December 2016 (arXiv)

Abstract BibTeX

@misc{LibbyGrochowDeDeoWolpertOrganismalitySSCLichen, AUTHOR = {Libby, Eric and Wolpert, David H. and Grochow, Joshua A. and DeDeo, Simon}, TITLE = {A quantitative definition of organismality and its application to lichen}, YEAR = {2016}, HOWPUBLISHED = {arXiv:1612.00036 [q-bio.OT]}, }

The organism is a fundamental concept in biology. However there is no universally accepted, formal, and yet broadly applicable definition of what an organism is. Here we introduce a candidate definition. We adopt the view that the "organism" is a functional concept, used by scientists to address particular questions concerning the future state of a biological system, rather than something wholly defined by that system. In this approach organisms are a coarse-graining of a fine-grained dynamical model of a biological system. Crucially, the coarse-graining of the system into organisms is chosen so that their dynamics can be used by scientists to make accurate predictions of those features of the biological system that interests them, and do so with minimal computational burden. To illustrate our framework we apply it to a dynamic model of lichen symbiosis—a system where either the lichen or its constituent fungi and algae could reasonably be considered "organisms." We find that the best choice for what organisms are in this scenario are complex mixtures of many entities that do not resemble standard notions of organisms. When we restrict our allowed coarse-grainings to more traditional types of organisms, we find that ecological conditions, such as niche competition and predation pressure, play a significant role in determining the best choice for organisms.

**NP-hard sets are not sparse unless P=NP: An exposition of a simple proof of Mahaney's Theorem, with applications**.

arXiv:1610.05825 [cs.CC, math.CO, math.RT] (arXiv) and ECCC Technical Report TR15-162, October 2016. (ECCC)

Abstract BibTeX

@misc{GrochowAgrawalMahaney, AUTHOR = {Grochow, Joshua A.}, TITLE = {NP-hard sets are not sparse unless P=NP: {An} exposition of a simple proof of {Mahaney's} {Theorem}, with applications}, YEAR = {2016}, HOWPUBLISHED = {arXiv:1610.05825 [cs.CC]}, }

Mahaney's Theorem states that, assuming P ≠ NP, no NP-hard set can have a polynomially bounded number of yes-instances at each input length. We give an exposition of a very simple unpublished proof of Manindra Agrawal whose ideas appear in Agrawal-Arvind (Theoret. Comp. Sci., 1996). This proof is so simple that it can easily be taught to undergraduates or a general graduate CS audience - not just theorists! - in about 10 minutes, which the author has done successfully several times. We also include applications of Mahaney's Theorem to fundamental questions that bright undergraduates would ask which could be used to fill the remaining hour of a lecture, as well as an application (due to Ikenmeyer, Mulmuley, and Walter, arXiv:1507.02955) to the representation theory of the symmetric group and the Geometric Complexity Theory Program. To this author, the fact that sparsity results on NP-complete sets have an application to classical questions in representation theory says that they are not only a gem of classical theoretical computer science, but indeed a gem of mathematics.

**On cap sets and the group-theoretic approach to matrix multiplication**.

Discrete Analysis 2017:3 (doi)

arXiv:1605.06702 [math.CO, cs.DS, math.GR], May 2016 (arXiv)

Abstract BibTeX

@article{BCCGNSUCapSetMM, AUTHOR = {Blasiak, Jonah and Church, Thomas and Cohn, Henry and Grochow, Joshua A. and Naslund, Eric and Sawin, William F. and Umans, Chris}, TITLE = {On cap sets and the group-theoretic approach to matrix multiplication}, YEAR = {2017}, JOURNAL = {Disc. Analysis}, FJOURNAL = {Discrete Analysis}, NUMBER = {3}, NOTE = {Available as arXiv:1605.06702 [math.CO]}, DOI = {10.19086/da.1245}, }

In 2003, Cohn and Umans described a framework for proving upper bounds on the exponent ω of matrix multiplication by reducing matrix multiplication to group algebra multiplication, and in 2005 Cohn, Kleinberg, Szegedy, and Umans proposed specific conjectures for how to obtain ω=2 in this framework. In this paper we rule out obtaining ω=2 in this framework from the abelian groups of bounded exponent. To do this, we bound the size of tricolored sum-free sets in such groups, extending the breakthrough results of Croot, Lev, Pach, Ellenberg, and Gijswijt on cap sets. As a byproduct of our proof, we show that a variant of tensor rank due to Tao gives a quantitative understanding of the notion of unstable tensor from geometric invariant theory.

**Boundaries of VP and VNP**.

*International Colloquium on Automata, Languages, and Programming (ICALP)*, 2016. (doi)

arXiv:1605.02815 [cs.CC, math.AG, math.CO, math.RT], May 2016 (arXiv)

Abstract BibTeX

@inproceedings{GrochowMulmuleyQiaoBoundaries, AUTHOR = {Grochow, Joshua A. and Mulmuley, Ketan D. and Qiao, Youming}, TITLE = {Boundaries of {$\mathsf{VP}$} and {$\mathsf{VNP}$}}, YEAR = {2016}, PAGES = {34:1--34:14}, SERIES = {Leibniz International Proceedings in Informatics (LIPIcs)}, BOOKTITLE = {43rd International Colloquium on Automata, Languages, and Programming (ICALP16)}, VOLUME = {55}, DOI = {10.4230/LIPIcs.ICALP.2016.34}, NOTE = {Preprint of the full version available as arXiv:1605.02815 [cs.CC]}, }

One fundamental question in the context of the geometric complexity theory approach to the VP versus VNP conjecture is whether VP = VP, where VP is the class of families of polynomials that can be computed by arithmetic circuits of polynomial degree and size, and VP is the class of families of polynomials that can be approximated infinitesimally closely by arithmetic circuits of polynomial degree and size. The goal of this article is to study the conjecture in (Mulmuley, arXiv:1209.5993 [cs.CC] and FOCS 2012) that VP is not contained in VP.

Towards that end, we introduce three degenerations of VP (i.e., sets of points in VP), namely the stable degeneration Stable-VP, the Newton degeneration Newton-VP, and the p-definable one-parameter degeneration VP^{*}. We also introduce analogous degenerations of VNP.
We show that Stable-VP ⊆ Newton-VP ⊆ VP^{*} ⊆ VNP, and Stable-VNP = Newton-VNP = VNP^{*} = VNP. The three notions of degenerations and the proof of this result shed light on the problem of separating VP from VP.

Although we do not yet construct explicit candidates for the polynomial families in VP \ VP, we prove results which tell us where not to look for such families. Specifically, we demonstrate that the families in Newton-VP \ VP based on semi-invariants of quivers would have to be non-generic by showing that, for many finite quivers (including some wild ones), Newton degeneration of any generic semi-invariant can be computed by a circuit of polynomial size. We also show that the Newton degenerations of perfect matching Pfaffians, monotone arithmetic circuits over the reals, and Schur polynomials have polynomial-size circuits.

**Dynamics of beneficial epidemics**.

*Scientific Reports*,

**9**, Article no. 15093, 2019. (doi)

Preprint arXiv:1604.02096 [physics.soc-ph, q-bio.PE, nlin.AO, cs.SI, cs.MA], April 2016. (arXiv)

Abstract BibTeX

@article{72hS, AUTHOR = {Berdahl, Andrew and Brelsford, Christa and De Bacco, Caterina and Dumas, Marion and Ferdinand, Vanessa and Grochow, Joshua A. and H\'{e}bert-Dufresne, Laurent and Kallus, Yoav and Kempes, Christopher P. and Kolchinsky, Artemy and Larremore, Daniel B. and Libby, Eric and Power, Eleanor A. and Stern, Caitlin A. and Tracey, Brendan D. (Santa Fe Institute Postdocs)}, TITLE = {Dynamics of beneficial epidemics}, YEAR = {2019}, JOURNAL = {Sci. Rep.}, FJOURNAL = {Scientific Reports}, VOLUME = {9}, PAGES = {15093}, NOTE = {Preprint available as arXiv:1604.02096 [physics.soc-ph]}, DOI = {10.1038/s41598-019-50039-w}, NOTE = {This paper was produced, from conception of idea, to execution, to writing, by a team in just 72 hours (see Appendix)}, }

**Multi-scale structure and topological anomaly detection via a new network statistic: The onion decomposition**.

*Scientific Reports*,

**6**, Article no. 31708, 2016. (doi)

Preprint available as arXiv:1510.08542 [physics.soc-ph, cond-math.dis-nn, cs.DM, math.CO, cs.SI], October 2015. (arXiv)

Abstract BibTeX

@article{HebertDufresneGrochowAllardOnion, AUTHOR = {H\'{e}bert-Dufresne, Laurent and Grochow, Joshua A. and Allard, Antoine}, TITLE = {Multi-scale structure and topological anomaly detection via a new network statistic: The onion decomposition}, YEAR = {2016}, JOURNAL = {Sci. Rep.}, FJOURNAL = {Scientific Reports}, VOLUME = {6}, PAGES = {31708}, NOTE = {Preprint available as arXiv:1510.08542 [physics.soc-ph]}, DOI = {10.1038/srep31708}, }

*the onion spectrum*, is based on the

*onion decomposition*, which refines the k-core decomposition, a standard network fingerprinting method. The onion spectrum is exactly as easy to compute as the k-cores: It is based on the stages at which each vertex gets removed from a graph in the standard algorithm for computing the k-cores. Yet, the onion spectrum reveals much more information about a network, and at multiple scales; for example, it can be used to quantify node heterogeneity, degree correlations, centrality, and tree- or lattice-likeness. Furthermore, unlike the k-core decomposition, the combined degree-onion spectrum immediately gives a clear local picture of the network around each node which allows the detection of interesting subgraphs whose topological structure differs from the global network organization. This local description can also be leveraged to easily generate samples from the ensemble of networks with a given joint degree-onion distribution. We demonstrate the utility of the onion spectrum for understanding both static and dynamic properties on several standard graph models and on many real-world networks.

**Monotone projection lower bounds from extended formulation lower bounds**.

*Theory of Computing*13:18, 2017. (doi)

ECCC Technical Report TR15-171 (ECCC) and arXiv:1510.08417 [cs.CC] (arXiv), October 2015.

Abstract BibTeX

@article{GrochowMonotone, AUTHOR = {Grochow, Joshua A.}, TITLE = {Monotone Projection Lower Bounds from Extended Formulation Lower Bounds}, YEAR = {2017}, PAGES = {1--15}, DOI = {10.4086/toc.2017.v013a018}, PUBLISHER = {Theory of Computing}, JOURNAL = {Theory of Computing}, VOLUME = {13}, NUMBER = {18}, URL = {http://www.theoryofcomputing.org/articles/v013a018}, NOTE = {Preprint originally appeared as ECCC Tech. Report TR15-171 and arXiv:1510.08417 [cs.CC]}, }

In this short note, we reduce lower bounds on monotone projections of polynomials to lower bounds on extended formulations of polytopes. Applying our reduction to the seminal extended formulation lower bounds of Fiorini, Massar, Pokutta, Tiwari, & de Wolf (STOC 2012; J. ACM, 2015) and Rothvoss (STOC 2014; J. ACM, 2017), we obtain the following interesting consequences.

- The Hamiltonian Cycle polynomial is not a monotone subexponential-size projection of the permanent; this both rules out a natural attempt at a monotone lower bound on the Boolean permanent, and shows that the permanent is not complete for non-negative polynomials in VNP
_{R}under monotone p-projections. - The cut polynomials and the perfect matching polynomial (or "unsigned Pfaffian") are not monotone p-projections of the permanent. The latter, over the Boolean and-or semi-ring, rules out monotone reductions in one of the natural approaches to reducing perfect matchings in general graphs to perfect matchings in bipartite graphs.

As the permanent is universal for monotone formulas, these results also imply exponential lower bounds on the monotone formula size and monotone circuit size of these polynomials.

**Graph isomorphism and circuit size**.

arXiv:1511.08189 [cs.CC] (arXiv) and ECCC Technical Report TR15-162, October 2015. (ECCC)

Watch Eric's talk about it!

Abstract BibTeX

@misc{AllenderGrochowMooreGI, AUTHOR = {Allender, Eric and Grochow, Joshua A. and Moore, Cristopher}, TITLE = {Graph isomorphism and circuit size}, YEAR = {2015}, HOWPUBLISHED = {arXiv:1511.08189 [cs.CC] and ECCC Tech. Report TR15-162}, }

^{MKTP}that is not known to lie in NP∩coNP. We also show that this approach can be used to provide a reduction of the Graph Isomorphism problem to MKTP.

**Polynomial-time isomorphism test of groups that are tame extensions**.

*26th International Symposium on Algorithms and Computation (ISAAC)*, 2015. (doi)

arXiv:1507.01917 [cs.DS, cs.CC, math.GR, math.RT] (arXiv)

Abstract BibTeX

@inproceedings{GrochowQiaoTame, AUTHOR = {Grochow, Joshua A. and Qiao, Youming}, TITLE = {Polynomial-time isomorphism test of groups that are tame extensions}, YEAR = {2015}, BOOKTITLE = {26th International Symposium on Algorithms and Computation (ISAAC) (Springer Lecture Notes in Computer Science 9472)}, PAGES = {578--589}, DOI = {10.1007/978-3-662-48971-0_49}, NOTE = {Full version available as arXiv:1507.01917 [cs.DS]}, }

We give new polynomial-time algorithms for testing isomorphism of a class of groups given by multiplication tables (GpI). Two results (Cannon & Holt, J. Symb. Comput. 2003; Babai, Codenotti & Qiao, ICALP 2012) imply that GpI reduces to the following: given groups G, H with characteristic subgroups of the same type and isomorphic to Z_{p}^{d}, and given the coset of isomorphisms Iso(G/Z_{p}^{d}, H/Z_{p}^{d}), compute Iso(G, H) in time poly(|G|). Babai & Qiao (STACS 2012) solved this problem when a Sylow p-subgroup of G/Z_{p}^{d} is trivial. In this paper, we solve the preceding problem in the so-called "tame" case, i.e., when a Sylow p-subgroup of G/Z_{p}^{d} is cyclic, dihedral, semi-dihedral, or generalized quaternion. These cases correspond exactly to the group algebra F_{p}[G/Z_{p}^{d}] being of tame type, as in the celebrated tame-wild dichotomy in representation theory. We then solve new cases of GpI in polynomial time.

Our result relies crucially on the divide-and-conquer strategy proposed earlier by the authors (CCC 2014), which splits GpI into two problems, one on group actions (representations), and one on group cohomology. Based on this strategy, we combine permutation group and representation algorithms with new mathematical results, including bounds on the number of indecomposable representations of groups in the tame case, and on the size of their cohomology groups.

Finally, we note that when a group extension is not tame, the preceding bounds do not hold. This suggests a precise sense in which the tame-wild dichotomy from representation theory may also be a dividing line between the (currently) easy and hard instances of GpI.

**A framework for optimal high-level descriptions in science and engineering—preliminary report**.

Chapter in

*From matter to life: information and causality*, S. Walker and P. Davies (eds.), Cambridge University Press, in press, 2015.

Preliminary version available as arXiv:1409.7403 [cs.IT, cond-mat.stat-mech, cs.AI, cs.CE, q-bio.PE] (arXiv) and SFI Working Paper 2015-06-017 (SFI)

Abstract BibTeX

@incollection{WolpertGrochowLibbyDeDeoSSC, AUTHOR = {Wolpert, David H. and Grochow, Joshua A. and Libby, Eric and DeDeo, Simon}, TITLE = {Optimal high-level descriptions of dynamical systems}, YEAR = {2015}, EDITOR = {S. I. Walker and P. Davies}, BOOKTITLE = {From Matter to Life: Information and Causality}, PUBLISHER = {Cambridge University Press}, NOTE = {In press. Preliminary version available as arXiv:1409.7403 [cs.IT] and SFI Working Paper 2015-06-017}, }

Both science and engineering rely on the use of high-level descriptions. These go under various names, including "macrostates," "coarse-grainings," and "effective theories". The ideal gas is a high-level description of a large collection of point particles, just as a a set of interacting firms is a high-level description of individuals participating in an economy and just as a cell a high-level description of a set of biochemical interactions. Typically, these descriptions are constructed in an *ad hoc* manner, without an explicit understanding of their purpose. Here, we formalize and quantify that purpose as a combination of the need to accurately predict observables of interest, and to do so efficiently and with bounded computational resources. This State Space Compression framework makes it possible to solve for the optimal high-level description of a given dynamical system, rather than relying on human intuition alone.

In this preliminary report, we present our framework, show its application to a diverse set of examples in Computer Science, Biology, Physics and Networks, and develop some technical machinery for evaluating accuracy and computation costs in a variety of systems.

**Circuit complexity, proof complexity, and polynomial identity testing: The ideal proof system**.

*Journal of the ACM*, 65(6), Article No. 37, November 2018. (doi)

*IEEE Symposium on Foundations of Computer Science (FOCS)*, October 2014. (doi) (watch the video of Toni's FOCS talk!)

Also available as ECCC Technical Report TR14-052, April 2014 (ECCC) and arXiv:1404.3820 [cs.CC, cs.LO, math.LO] (arXiv)

Abstract BibTeX

@article{GrochowPitassiIPS, AUTHOR = {Grochow, Joshua A. and Pitassi, Toniann}, TITLE = {Circuit complexity, proof complexity, and polynomial identity testing: The ideal proof system}, YEAR = {2018}, JOURNAL = {J. ACM}, FJOURNAL = {Journal of the Association for Computing Machinery}, VOLUME = {65}, ISSUE = {6}, PAGES = {37}, NOTE = {Preliminary version appeared in FOCS 2014 (doi:10.1109/FOCS.2014.20). Also available as arXiv:1404.3820 [cs.CC] and ECCC Technical Report TR14-052.}, }

We introduce a new and natural algebraic proof system, whose complexity measure is essentially the algebraic circuit size of Nullstellensatz certificates. This enables us to exhibit close connections between effective Nullstellensatzë, proof complexity, and (algebraic) circuit complexity. In particular, we show that any super-polynomial lower bound on any Boolean tautology in our proof system implies that the permanent does not have polynomial-size algebraic circuits (VNP is not equal to VP). We also show that super-polynomial lower bounds on the number of lines in Polynomial Calculus proofs imply the Permanent versus Determinant Conjecture. Note that there was no proof system prior to ours for which lower bounds on an arbitrary tautology implied any complexity class lower bound.

Our proof system helps clarify the relationships between previous algebraic proof systems. In doing so, we highlight the importance of polynomial identity testing (PIT) in proof complexity. In particular, we use PIT to illuminate AC^{0}[p]-Frege lower bounds, which have been open for nearly 30 years, with no satisfactory explanation as to their apparent difficulty.

Finally, we explain the obstacles that must be overcome in any attempt to extend techniques from algebraic circuit complexity to prove lower bounds in proof complexity. Using the algebraic structure of our proof system, we propose a novel route to such lower bounds. Although such lower bounds remain elusive, this proposal should be contrasted with the diffculty of extending AC^{0}[p] circuit lower bounds to AC^{0}[p]-Frege lower bounds.

**Algorithms for group isomorphism via group extensions and cohomology**.

*SIAM J. Comput.*46(4):1153-1216, 2017,

*Open Access*(doi)

Preliminary version appeared as

*IEEE Conference on Computational Complexity (CCC)*, June 2014. (doi)

Also available as arXiv:1309.1776 [cs.DS] (arXiv) and ECCC Technical Report TR13-123, September 2013. (ECCC)

Abstract BibTeX

@article{GrochowQiaoGpIso, TITLE = {Algorithms for group isomorphism via group extensions and cohomology}, AUTHOR = {Grochow, Joshua A. and Qiao, Youming}, JOURNAL = {SIAM J. Comput.}, FJOURNAL = {SIAM Journal on Computing}, YEAR = {2017}, VOLUME = {46}, NUMBER = {4}, PAGES = {1153--1216}, NOTE = {Preliminary version in IEEE Conference on Computational Complexity (CCC) 2014 (DOI:10.1109/CCC.2014.19). Also available as arXiv:1309.1776 [cs.DS] and ECCC Technical Report TR13-123.}, DOI = {10.1137/15M1009767}, }

The isomorphism problem for groups given by their multiplication tables (GpI) has long been known to be solvable in n^{O(log n)} time, but only recently has there been significant progress towards polynomial time. For example, Babai, Codenotti & Qiao (ICALP 2012) gave a polynomial-time algorithm for groups with no abelian normal subgroups. Thus, at present it is crucial to understand groups with abelian normal subgroups to develop n^{o(log n)}-time algorithms.

Towards this goal we advocate a strategy via the extension theory of groups, which describes how a normal subgroup N is related to the quotient group G/N via G. This strategy "splits" GpI into two subproblems: one regarding group actions on other groups, and one regarding group cohomology. The solution of these problems is essentially necessary and sufficient to solve GpI. Most previous works on GpI naturally align with this strategy, and it thus helps explain in a unified way the recent polynomial-time algorithms for other group classes. In particular, most results in the Cayley table model have focused on the group action aspect, despite the general necessity of cohomology, for example for p-groups of class 2—believed to be the hardest case of GpI.

To make progress on the group cohomology aspect of GpI, we consider *central-radical groups*, proposed in Babai *et al.* (SODA 2011): the class of groups such that G mod its center has no abelian normal subgroups. Recall that Babai *et al.* (ICALP 2012) consider the class of groups G such that G itself has no abelian normal subgroups. Following the above strategy, we solve GpI in n^{O(log log n)} time for central-radical groups, and in polynomial time for several prominent sub-classes of central-radical groups. We also achieve an n^{O(log log n)}-time bound for groups whose solvable normal subgroups are elementary abelian but not necessarily central. **As far as we are aware, this is the first time that a nontrivial algorithm with worst-case guarantees has tackled both aspects of GpI—actions and cohomology—simultaneously.**

**Prior to our work, nothing better than the trivial n ^{O(log n)}-time algorithm was known,** even for groups with a central radical of constant size, such as Z(G)=Z

_{2}. To develop these algorithms we utilize several mathematical results on the detailed structure of cohomology classes, as well as algorithmic results for code equivalence, coset intersection and cyclicity testing of modules over finite-dimensional associative algebras. We also suggest several promising directions for future work.

**Rotor-routing and spanning trees on planar graphs**.

*International Mathematics Research Notices*11:3225-3244, 2015. (doi) (first published online 2014)

Also available as arXiv:1308.2677 [math.CO] August 2013. (arXiv)

Abstract BibTeX

@article{ChanChurchGrochowRotor, AUTHOR = {Chan, Melody and Church, Thomas and Grochow, Joshua A.}, TITLE = {Rotor-routing and spanning trees on planar graphs}, JOURNAL = {Int. Math Res. Not.}, FJOURNAL = {International Mathematics Research Notices}, YEAR = {2015}, VOLUME = {11}, PAGES = {3225--3244}, NOTE = {Also available as arXiv:1308.2677 [math.CO]}, DOI = {10.1093/imrn/rnu025}, }

^{0}(G) of a finite graph G is a discrete analogue of the Jacobian of a Riemann surface which was rediscovered several times in the contexts of arithmetic geometry, self-organized criticality, random walks, and algorithms. Given a ribbon graph G, Holroyd

*et al.*used the "rotor-routing" model to define a free and transitive action of Pic

^{0}(G) on the set of spanning trees of G. However, their construction depends

*a priori*on a choice of basepoint vertex. Ellenberg asked whether this action does in fact depend on the choice of basepoint. We answer this question by proving that the action of Pic

^{0}(G) is independent of the basepoint if and only if G is a planar ribbon graph.

**Unifying known lower bounds via geometric complexity theory**.

J. A. Grochow

*Computational Complexity*, Special Issue, 24(2):393-475, 2015. (doi)

Extended abstract appeared in

*IEEE Conference on Computational Complexity (CCC)*, June 2014. (doi)

Preliminary version available as arXiv:1304.6333 [cs.CC] April 2013, but the final official version is Open Access!

Abstract BibTeX

@article{GrochowGCTUnity, TITLE = {Unifying known lower bounds via geometric complexity theory}, AUTHOR = {Grochow, Joshua A.}, JOURNAL = {computational complexity}, YEAR = {2015}, VOLUME = {24}, ISSUE = {2}, PAGES = {393--475}, NOTE = {Special issue from IEEE CCC 2014. Open access.}, DOI = {10.1007/s00037-015-0103-x}, }

^{0}[p], multilinear formula and circuit size lower bounds (Raz

*et al.*), the degree bound (Strassen, Baur–Strassen), the connected components technique (Ben-Or), depth 3 arithmetic circuit lower bounds over finite fields (Grigoriev–Karpinski), lower bounds on permanent versus determinant (Mignon–Ressayre, Landsberg–Manivel–Ressayre), lower bounds on matrix multiplication (Bürgisser–Ikenmeyer) (these last two were already known to fit into GCT), the chasms at depth 3 and 4 (Gupta–Kayal–Kamath–Saptharishi; Agrawal–Vinay; Koiran), matrix rigidity (Valiant) and others. That is, the original proofs, with what is often just a little extra work, already provide representation-theoretic obstructions in the sense of GCT for their respective lower bounds. This enables us to expose a new viewpoint on GCT, whereby it is a natural unification of known results and broad generalization of known techniques. It also shows that the framework of GCT is at least as powerful as known methods, and gives many new proofs-of-concept that GCT can indeed provide significant asymptotic lower bounds. This new viewpoint also opens up the possibility of fruitful two-way interactions between previous results and the new methods of GCT; we provide several concrete suggestions of such interactions. For example, the representation-theoretic viewpoint of GCT naturally provides new properties to consider in the search for new lower bounds.

**Matrix Lie algebra isomorphism**. (Previously: Lie algebra conjugacy. More accurately: Matrix isomorphism of matrix Lie algebras.)

J. A. Grochow

*IEEE Conference on Computational Complexity (CCC)*, June 2012. (doi)

Also available as arXiv:1112.2012 [cs.CC, cs.DS, cs.SC, math.RT] (arXiv) and ECCC Technical Report TR11-168 (ECCC)

See my Ph.D. thesis for a more complete version.

Short Abstract Detailed Abstract BibTeX

@inproceedings{GrochowLieAlgebraIso, AUTHOR = {Grochow, Joshua A.}, TITLE = {Matrix {Lie} algebra isomorphism}, BOOKTITLE = {IEEE Conference on Computational Complexity (CCC12)}, YEAR = {2012}, PAGES = {203--213}, NOTE = {Also available as arXiv:1112.2012 [cs.CC] and ECCC Technical Report TR11-168.}, DOI = {10.1109/CCC.2012.34}, }

*Comm. ACM*, 2012, and references therein). A matrix Lie algebra is a set L of matrices that is closed under linear combinations and the operation [A,B] = AB - BA. Two matrix Lie algebras L, L' are matrix isomorphic if there is an invertible matrix M such that conjugating every matrix in L by M yields the set L'. We show that certain cases of MatIsoLie—for the wide and widely studied classes of

*semisimple*and

*abelian*Lie algebras—are equivalent to graph isomorphism and linear code equivalence, respectively. On the other hand, we give polynomial-time algorithms for other cases of MatIsoLie, which allow us to mostly derandomize a recent result of Kayal on affine equivalence of polynomials.

*Comm. ACM*, 2012, and references therein). A matrix Lie algebra is a set L of matrices such that A,B ∈ L implies AB - BA ∈ L. Two matrix Lie algebras are conjugate if there is an invertible matrix M such that L

_{1}= ML

_{2}M

^{-1}. We show that certain cases of Lie algebra conjugacy are equivalent to graph isomorphism. On the other hand, we give polynomial-time algorithms for other cases of Lie algebra conjugacy, which allow us to mostly derandomize a recent result of Kayal on affine equivalence of polynomials. Affine equivalence is related to many complexity problems such as factoring integers, graph isomorphism, matrix multiplication, and permanent versus determinant. Specifically, we show:

- Abelian Lie algebra conjugacy is as hard as graph isomorphism. A Lie algebra is abelian if all of its matrices commute pairwise.
- Abelian diagonalizable Lie algebra conjugacy of n × n matrices can be solved in poly(n) time when the Lie algebras have dimension O(1). The dimension of a Lie algebra is the maximum number of linearly independent matrices it contains. A Lie algebra L is diagonalizable if there is a single matrix M such that for every A in L, MAM
^{-1}is diagonal. - Semisimple Lie algebra conjugacy is equivalent to graph isomorphism. A Lie algebra is semisimple if it is a direct sum of simple Lie algebras.
- Semisimple Lie algebra conjugacy of n × n matrices can be solved in polynomial time when the Lie algebras consist of only O(log n) simple direct summands.
- Conjugacy of completely reducible Lie algebras—that is, a direct sum of an abelian diagonalizable and a semisimple Lie algebra—can be solved in polynomial time when the abelian part has dimension O(1) and the semisimple part has O(log n) simple direct summands.

**Report on "Mathematical Aspects of P vs. NP and its Variants"**.

arXiv:1203.2888 [cs.CC, math.AG, math.NT, math.RT] (arxiv)

Abstract BibTeX

@misc{GrochowRusekReport, AUTHOR = {Grochow, Joshua A. and Rusek, Korben}, TITLE = {Report on ``{Mathematical} {Aspects} of {P} vs. {NP} and its {Variants}''}, YEAR = {2012}, HOWPUBLISHED = {arXiv:1203.2888 [cs.CC]}, NOTE = {Workshop held at {Brown--ICERM} in August, 2011, organizers: {Saugata} {Basu}, {J.} {M.} {Landsberg,} and {J.} {Maurice} {Rojas}}, }

*Comm. ACM*, 2012, and references therein), and number theory and other ideas in the Blum-Shub-Smale model.

**Complexity classes of equivalence problems revisited**.

*Information and Computation*209(4):748-763, 2011. (doi)

Also available as arXiv:0907.4775v2 [cs.CC], 2009. (arXiv)

Originally my master's thesis. See my my Ph.D. thesis for the latest updates.

Abstract BibTeX

@article{FortnowGrochowPEq, AUTHOR = {Fortnow, Lance and Grochow, Joshua A.}, TITLE = {Complexity classes of equivalence problems revisited}, JOURNAL = {Inform. and Comput.}, FJOURNAL = {Information and Computation}, VOLUME = {209}, YEAR = {2011}, NUMBER = {4}, PAGES = {748--763}, ISSN = {0890-5401}, NOTE = {Also available as arXiv:0907.4775 [cs.CC]}, DOI = {10.1016/j.ic.2011.01.006}, }

*canonical form*for the equivalence relation of set equality. Other canonical forms arise in graph isomorphism algorithms. To determine if two graphs are cospectral (have the same eigenvalues), we compute their characteristic polynomials and see if they are equal; the characteristic polynomial is a

*complete invariant*for cospectrality. Finally, an equivalence relation may be decidable in P without either a complete invariant or canonical form. Blass and Gurevich (

*SIAM J. Comput.*, 1984) ask whether these conditions on equivalence relations—having an FP canonical form, having an FP complete invariant, and being in P—are distinct. They showed that this question requires non-relativizing techniques to resolve. We extend their results, and give new connections to probabilistic and quantum computation.

**Code equivalence and group isomorphism**.

*ACM-SIAM Symposium on Discrete Algorithms (SODA)*, 2011. (pdf) (doi)

Abstract BibTeX

@inproceedings {BabaiCodenottiGrochowQiaoSODA11, AUTHOR = {Babai, L{\'a}szl{\'o} and Codenotti, Paolo and Grochow, Joshua A. and Qiao, Youming}, TITLE = {Code equivalence and group isomorphism}, BOOKTITLE = {Proceedings of the {Twenty-Second} {Annual} {ACM--SIAM} {Symposium} on {Discrete} {Algorithms} ({SODA11})}, PAGES = {1395--1408}, PUBLISHER = {SIAM}, ADDRESS = {Philadelphia, PA}, YEAR = {2011}, DOI = {10.1137/1.9781611973082.107}, }

The isomorphism problem for groups given by their multiplication tables has long been known to be solvable in time n^{log n + O(1)}. The decades-old quest for a polynomial-time algorithm has focused on the very difficult case of class-2 nilpotent groups (groups whose quotient by their center is abelian), with little success. In this paper we consider the opposite end of the spectrum and initiate a more hopeful program to find a polynomial-time algorithm for *semisimple groups*, defined as groups without abelian normal subgroups. First, we prove that the isomorphism problem for this class can be solved in time n^{O(log log n)}. We then identify certain bottlenecks to polynomial-time solvability and give a polynomial-time solution to a rich subclass, namely the semisimple groups where each minimal normal subgroup has a bounded number of simple factors. We relate the results to the filtration of groups introduced by Babai and Beals (1999).

One of our tools is an algorithm for equivalence of (not necessarily linear) codes in simply-exponential time in the length of the code, obtained by modifying Luks's algorithm for hypergraph isomorphism in simply-exponential time in the number of vertices (STOC 1999).

We comment on the complexity of the closely related problem of permutational isomorphism of permutation groups.

**Genomic analysis reveals a tight link between transcription factor dynamics and regulatory network architecture**.

*Molecular Systems Biology*5:294, 2009. (pdf) (doi)

Abstract BibTeX

@article{jothiBalajiEtAlMSB2009, AUTHOR = {Jothi, Raja and Balaji, S. and Wuster, Arthur and Grochow, Joshua A. and Gsponer, J\"{o}rg and Przytycka, Teresa M. and Aravind, L. and Babu, M. Madan}, TITLE = {Genomic analysis reveals a tight link between transcription factor dynamics and regulatory network architecture}, JOURNAL = {Mol. Syst. Biol.}, FJOURNAL = {Molecular Systems Biology}, VOLUME = {5}, NUMBER = {294}, YEAR = {2009}, PUBLISHER = {EMBO and Nature Publishing Group}, DOI = {10.1038/msb.2009.52}, }

**Network motif discovery using subgraph enumeration and symmetry-breaking**.

In

*RECOMB 2007*, Lecture Notes in Computer Science 4453, pp. 92-106. Springer-Verlag, 2007. (pdf) (doi)

See my master's thesis for a more complete version.

Abstract BibTeX

@inproceedings{GrochowKellisRECOMB2007, AUTHOR = {Grochow, Joshua A. and Kellis, Manolis}, TITLE = {Network motif discovery using subgraph enumeration and symmetry-breaking}, BOOKTITLE = {Research in Computational Molecular Biology (RECOMB07)}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {4453}, YEAR = {2007}, PAGES = {92--106}, PUBLISHER = {Springer-Verlag}, ISBN = {978-3-540-71680-8}, ISSN = {0302-9743}, DOI = {10.1007/978-3-540-71681-5_7}, }

The study of biological networks and network motifs can yield significant new insights into systems biology. Previous methods of discovering network motifs—network-centric subgraph enumeration and sampling—have been limited to motifs of 6 to 8 nodes, revealing only the smallest network components. New methods are necessary to identify larger network sub-structures and functional motifs.

Here we present a novel algorithm for discovering large network motifs that achieves these goals, based on a novel symmetry-breaking technique, which eliminates repeated isomorphism testing, leading to an exponential speed-up over previous methods. This technique is made possible by reversing the traditional network-based search at the heart of the algorithm to a motif-based search, which also eliminates the need to store all motifs of a given size and enables parallelization and scaling. Additionally, our method enables us to study the clustering properties of discovered motifs, revealing even larger network elements.

We apply this algorithm to the protein-protein interaction network and transcription regulatory network of *S. cerevisiae*, and discover several large network motifs, which were previously inaccessible to existing methods, including a 29-node cluster of 15-node motifs corresponding to the key transcription machinery of *S. cerevisiae*.

## Theses

**Symmetry and equivalence relations in classical and geometric complexity theory**.

J. A. Grochow

Doctoral dissertation, U. Chicago, 2012. Advisors: Prof. Ketan Mulmuley and Prof. Lance Fortnow (pdf)

Informal Summary Abstract BibTeX

@phdthesis{grochowPhD, AUTHOR = {Grochow, Joshua A.}, TITLE = {Symmetry and equivalence relations in classical and geometric complexity theory}, YEAR = {2012}, SCHOOL = {University of Chicago}, ADDRESS = {Chicago, IL}, }

This thesis studies some of the ways in which symmetries and equivalence relations arise in classical and geometric complexity theory. The Geometric Complexity Theory Program is aimed at resolving central questions in complexity such as P versus NP using techniques from algebraic geometry and representation theory. The equivalence relations we study are mostly algebraic in nature and we heavily use algebraic techniques to reason about the computational properties of these problems. We first provide a tutorial and survey on Geometric Complexity Theory to provide perspective and motivate the other problems we study.

One equivalence relation we study is matrix isomorphism of matrix Lie algebras, which is a problem that arises naturally in Geometric Complexity Theory. For certain cases of matrix isomorphism of Lie algebras we provide polynomial-time algorithms, and for other cases we show that the problem is as hard as graph isomorphism. To our knowledge, this is the first time graph isomorphism has appeared in connection with any lower bounds program.

Finally, we study algorithms for equivalence relations more generally (joint work with Lance Fortnow). Two techniques are often employed for algorithmically deciding equivalence relations: 1) finding a complete set of easily computable invariants, or 2) finding an algorithm which will compute a canonical form for each equivalence class. Some equivalence relations in the literature have been solved efficiently by other means as well. We ask whether these three conditions—having an efficient solution, having an efficiently computable complete invariant, and having an efficiently computable canonical form—are equivalent. We show that this question requires non-relativizing techniques to resolve, and provide new connections between this question and factoring integers, probabilistic algorithms, and quantum computation.

**The complexity of equivalence relations**.

J. A. Grochow.

Master's thesis, U. Chicago, 2008. Advisor: Prof. László Babai (pdf)

Journal version above.

Abstract BibTeX

@mastersthesis{GrochowEquivalence2008, AUTHOR = {Grochow, Joshua A.}, TITLE = {The complexity of equivalence relations}, SCHOOL = {University of Chicago}, YEAR = {2008}, MONTH = {December}, }

To determine if two given lists of numbers are the same set, we would sort both lists and see if we get the same result. The sorted list is a *canonical form* for the equivalence relation of set equality. Other canonical forms for equivalences arise in graph isomorphism and its variants, and the equality of permutation groups given by generators. To determine if two given graphs are cospectral, however, we compute their characteristic polynomials and see if they are the same; the characteristic polynomial is a *complete invariant* for the equivalence relation of cospectrality. This is weaker than a canonical form, and it is not known whether a canonical form for cospectrality exists. Note that it is a priori possible for an equivalence relation to be decidable in polynomial time without either a complete invariant or canonical form.

Blass and Gurevich (*SIAM J. Comput.*, 1984) ask whether these conditions on equivalence relations—having an FP canonical form, having an FP complete invariant, and simply being in P—are in fact different. They showed that this question requires non-relativizing techniques to resolve. Here we extend their results using generic oracles, and give new connections to probabilistic and quantum computation.

We denote the class of equivalence problems in P by PEq, the class of problems with complete FP invariants Ker, and the class with FP canonical forms CF; CF ⊆ Ker ⊆ PEq, and we ask whether these inclusions are proper. If x ~ y implies |y| ≤ poly(|x|), we say that ~ is polynomially bounded; we denote the corresponding classes of equivalence relation CF_{p}, Ker_{p}, and PEq_{p}. Our main results are:

- If CF=PEq then NP=UP=RP and thus PH = BPP;
- If CF = Ker then NP = UP, PH = ZPP
^{NP}, integers can be factored in probabilistic polynomial time, and deterministic collision-free hash functions do not exist; - If Ker=PEq then UP ⊆ BQP;
- There is an oracle relative to which CF ≠ Ker ≠ PEq; and
- There is an oracle relative to which CF
_{p}= Ker_{p}and Ker ≠ PEq.

**On the structure and evolution of protein interaction networks**.

J. A. Grochow.

Master's thesis, M. I. T., 2006. Advisor: Prof. Manolis Kellis (pdf)

One chapter was published in RECOMB 2007

(This thesis won the Charles and Jennifer Johnson Thesis Award.)

Abstract BibTeX

@mastersthesis{GrochowNetworks2006, AUTHOR = {Grochow, Joshua A.}, TITLE = {On the structure and evolution of protein interaction networks}, SCHOOL = {Massachusetts Institute of Technology}, YEAR = {2006}, MONTH = {August}, }

The study of protein interactions from the networks point of view has yielded new insights into systems biology. In particular, "network motifs" become apparent as a useful and systematic tool for describing and exploring networks. Finding motifs has involved either exact counting (e.g. Milo *et al.*, *Science*, 2002) or subgraph sampling (e.g. Kashtan *et al.*, *Bioinf.* 2004 and Middendorf *et al.*, *PNAS* 2005). In this thesis we develop an algorithm to count all instances of a particular subgraph, which can be used to query whether a given subgraph is a significant motif. This method can be used to perform exact counting of network motifs faster and with less memory than previous methods, and can also be combined with subgraph sampling to find larger motifs than ever before—we have found motifs with up to 15 nodes and explored subgraphs up to 20 nodes. Unlike previous methods, this method can also be used to explore motif clustering and can be combined with network alignment techniques (e.g. Graemlin or pathBLAST).

We also present new methods of estimating parameters for models of biological network growth, and present a new model based on these parameters and underlying binding domains.

Finally, we propose an experiment to explore the effect of the whole genome duplication on the protein-protein interaction network of *S. cerevisiae*, allowing us to distinguish between cases of subfunctionalization and neofunctionalization.