This page has been prepared as a guide for the type of work with which a summer research student can get involved. If you are thinking about working with us over the summer, feel free to contact us to get more details on the projects. Research problems are always evolving in our group, and we have given samples of 4 ongoing projects in the group here. But, the exact nature of the work may change by May when the internship officially starts.

Common Information for All Projects

We do research in formal methods, programming languages, and logic. We are interested in solid well-studied theoretical results with impact in solving real world problems. There are many different ways to get involved. Your involvmenet can be near the theoretical end of this spectrum, if your interests lie there and you have the right background for this. Or, it can be near the practical end of this, where you can help in turning the theoretical insights into practical solutions.

Background

Having a backgraound in any of the following areas is useful, although each project may require a subset of this list.

Theory Basic:

Logic: propositinal, first order, second order, temporal logics, etc.
Automata: classic undergraduate-level automata theory course, and additional knowledge of automata on trees, graphs, infinite words, infinite trees, etc.
Algebra: Undergraduate level algebra: monoids, groups, fields, etc.
Algorithms and Complexity: undergraduate level algorithms course and an understanding of basic complexity classes.
Knowledge of probabilities to understand probabilistic programming

Theory Additional:

Foundations of Program Analysis: basic lattice theory, abstract interpretation, etc.
Knowledge in Theory of Concurrency: Petrinets, event structures, traces, etc.
Types: a basic understading of types and type checking algorithms for programs.

Practice:

SAT and SMT: Z3, CVC4, etc.
Experience with a theorem prover: Dafny, Coq, Lean, etc.
Functional Programming: fluency in one or more functional programming langauges, like OCaml.
Good implementation skills
Experience with Concurrent or Distributed Programming

Projects

Slicing Probabilistic Programs

Slicing a program means producing a fragment of the program — for instance, a subset of the program instructions — that preserves its behaviours. The behaviours that we wish to preserve may be specified in a variety of ways; commonly, we wish to preserve the values of some set of variables.

There is a large body of work on slicing in general. However, probabilistic programs differ significantly from classical programs; slicing techniques in the classical setting may not translate into the probabilistic setting. There are a few techniques for slicing probabilistic programs in a manner that preserves variables’ values [2, 3, 4]. A recent paper [1] presents a slicing technique that takes a more refined view on preserving a program’s behaviour: a slice preserves a specification. For instance, slicing a program that is equipped with a pre- and post-condition will produce a program for which the pre- and post-condition still hold.

Probabilistic programs may be used in order to perform inference for machine learning. One way that this is done is by repeatedly sampling an output of the program and then computing a statistic on the resultant values [5]. Slicing a program that is used in this fashion is desirable because running the program repeatedly may use significant computational resources. However, slicing in a manner that preserves variables may produce a slice that contains too many instructions and is therefore far from optimal; we would like our slice to merely preserve a statistic on the output values, such as the average. The aim of this project is to develop a technique for slicing probabilistic programs in the manner of [1] that does not preserve a specification but, rather, preserves a statistic.

Slicing a program to preserve a property only makes sense if the program is correct — that is, if the property is actually preserved to begin with. However, in the case of faulty programs, slicing may also play a dual role: slicing may be used to determine which subset of program instructions are responsible for a faulty program’s failure to satisfy a specification [6]. In this manner, we also wish to investigate if slicing to preserve a statistic has a dual use in the case of faulty programs.

References

[1] M. Navarro and F. Olmedo, Slicing of Probabilistic Programs based on Specifications, May 07, 2022, arXiv: arXiv:2205.03707. doi: 10.48550/arXiv.2205.03707.

[2] T. Amtoft and A. Banerjee, A Theory of Slicing for Imperative Probabilistic Programs, ACM Trans. Program. Lang. Syst., vol. 42, no. 2, p. 6:1-6:71, Apr. 2020, doi: 10.1145/3372895.

[3] A Theory of Slicing for Imperative Probabilistic Programs

ACM Transactions on Programming Languages and Systems. Accessed: Nov. 15, 2024. [Online]. Available: https://dl.acm.org/doi/10.1145/3372895

[6] Soremekun, E., Kirschner, L., Bohme, M., & Zeller, A. (2021). Locating faults with program slicing: an empirical analysis. Empirical Software Engineering, 26, 1-45.

Verification for recursive programs with exception-like constructs

Operator Precedence Languages (OPL) are a subclass of context-free languages that were introduced long ago [1] and re-discovered in recent years [2]. It is the largest known class that enjoys the nice properties of regular languages, such as closure under Boolean operations and decidability for the emptiness problem; it is capable of modeling not only one-to-one matching, e.g., between function calls and returns, but also one-to-many or many-to-one matching. These make them a promising choice for the formal verification of procedural programs with exception-like constructs.

Model checking algorithms have been developed for OPL [3, 4], which verifies finite-state programs (in the sense that the domain of program variables is finite) modeled by OPL against temporal, context-free properties. The student will investigate the problem of verifying infinite-state programs modeled by OPL.

To verify an infinite-state program, the use of an abstraction is crucial, so that the program behaviors can be over-approximated in a finite way. Moreover, when the over-approximation is not precise enough, we want to be able to refine it by analyzing the proof of a spurious counterexample. Here are some initial questions to investigate:

What is the correct notion of proofs for these counterexamples? Syntactically, these counterexamples are just words accepted by an operator precedence automaton, so the answer is naturally tied to the semantics. There are many advanced programming constructs that can be modeled by OPL, including exceptions and continuation is there a recipe to obtain the correct notion of proofs for any construct that can be modeled by OPL?
How to obtain such proofs algorithmically? In particular, we want to ensure that each assertion only refers to variables that are in scope. By the same token, we are interested in a recipe to design such an algorithm for any construct that can be modeled by OPL.

The student may also explore alternative methods for verifying infinite-state programs modeled by OPL, possibly against simpler properties, such as pre/postconditions and safety.

An introduction to structured context-free languages, including OPL: https://doi.org/10.1016/j.cosrev.2017.12.001
Predicate abstraction and counterexample-guided abstraction refinement: https://doi.org/10.1007/978-3-319-10575-8_15

References

[1] R. W. Floyd, Syntactic Analysis and Operator Precedence, J. ACM, vol. 10, no. 3, pp. 316–333, Jul. 1963, doi: 10.1145/321172.321179.

[2] S. Crespi Reghizzi and D. Mandrioli, Operator precedence and the visibly pushdown property, Journal of Computer and System Sciences, vol. 78, no. 6, pp. 1837–1867, Nov. 2012, doi: 10.1016/j.jcss.2011.12.006.

[3] M. Chiari, D. Mandrioli, and M. Pradella, Model-Checking Structured Context-Free Languages, in Computer Aided Verification, A. Silva and K. R. M. Leino, Eds., Cham: Springer International Publishing, 2021, pp. 387–410. doi: 10.1007/978-3-030-81688-9_18.

[4] M. Chiari, L. Geatti, N. Gigante, and M. Pradella, SMT-Based Symbolic Model-Checking for Operator Precedence Languages, in Computer Aided Verification, A. Gurfinkel and V. Ganesh, Eds., Cham: Springer Nature Switzerland, 2024, pp. 387–408. doi: 10.1007/978-3-031-65627-9_19.

Differential Privacy via Quantitative Transformer Semantics

Program instructions may be interpreted as transformations of predicates. For instance, the instruction x := x + 1 transforms the predicate $P_1: x \ge 0 \land y \ge 0$ into $P_2: x - 1 \ge 0 \land y \ge 0$ since the x referred to in the original predicate is one less than the value of x after executing the instruction; we call $P_2$ the strongest post-condition of $P_1$ under instruction x := x + 1. This sort of transformer may be used to check if a program is correct with respect to a given pre- and post-condition: starting from a program’s pre-condition as our predicate, we can analyze an entire program by transforming each predicate in turn by each subsequent program instruction and determining if the final predicate implies the post-condition. The dual of the strongest post-condition is called the weakest pre-condition; using the weakest pre-condition transformer requires you to analyze the program in reverse order, starting from the post-condition.

There exist variants of these transformers that operate over quantitative domains rather than over predicates [1]. There are quantitative analogues of both strongest post-condition as well as weakest pre-condition. One notable quantitative weakest pre-condition transformer is the weakest pre-expectation which operates on expectations essentially, expected values of random variables over probabilistic programs. A recent paper [2] proposed a novel unifying framework for quantitative transformers via a transformer they call weakest hyper pre (whp) which can be instantiated for expectations and a broad swath of other quantitative properties.

The aim of this project is to investigate whether whp can be used to check if a program is privacy-preserving. In particular, we are interested in differential privacy [3], a widely-used formalization of statistical privacy; at a high level, differentially private algorithms should successfully obscure whether or not a participant’s private data was used to produce the output. We aim to determine whether whp is complete for differential privacy and, if not, which restrictions on programs or properties would make the technique complete. Finally, we wish to implement this technique in a tool that allows us to automatically verify a probabilistic program’s privacy and evaluate this tool against existing tools for checking differential privacy.

References

[1] B. Kaminski, Advanced Weakest Precondition Calculi for Probabilistic Programs, Ph.D. dissertation, RWTH Aachen University, Aachen, Germany, 2019. https://publications.rwth-aachen.de/record/755408/files/755408.pdf

[2] L. Zhang, N. Zilberstein, B. L. Kaminski, and A. Silva, Quantitative Weakest Hyper Pre: Unifying Correctness and Incorrectness Hyperproperties via Predicate Transformers, Proc. ACM Program. Lang., vol. 8, no. OOPSLA2, p. 300:817-300:845, Oct. 2024, doi: 10.1145/3689740.

[3] C. Dwork, A. Roth, The Algorithmic Foundations of Differential Privacy. Found. Trends Theor. Comput. Sci., 9, 211-407, 2014. https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf

Theory of Concurrency

Trace Theory denotes a mathematical theory of free partially commutative monoids from the perspective of concurrent or parallel systems. Traces, or equivalently, elements in a free partially commutative monoid, are given by a sequence of letters (or atomic actions). Two sequences are assumed to be equal if they can be transformed into each other by equations of type ab = ba, where the pair (a,b) belongs to a predefined relation between letters. This relation is usually called partial commutation or independence.

The analysis of sequential programs describes a run of a program as a sequence of atomic actions. On an abstract level such a sequence is simply a string in a free monoid over some (finite) alphabet of letters. This purely abstract viewpoint embeds program analysis into a rich theory of combinatorics on words and a theory of automata and formal languages. Trace theory is used to lift this rich theory to analysis of concurrent programs. For a simple introductory survery, see this paper.

In this project, we want to aim for a new mathematical structure that accommodates combining reasoning about commutativity and reasoning about program abstractions all under a unified mathematical strucutre. Partially commutative monoids capture commutativity reasoning perfectly, but they do not accommodate reasoning for abstraction.

Knowing the thoeretical basics listed earlier in this page is essential to the success of a student getting involved in this project. Good knowledge of algebra, automata, and their connection can be additionally very helpful. A good test of your abilities to engage with this project can be how easily you can read throught (at least the first half of) this paper.