Image of Andrew Li

Andrew Li
PhD Candidate, Computer Science
University of Toronto

Google Scholar


I am a first-year PhD student in Computer Science at the University of Toronto and the Vector Institute for Artificial Intelligence, supervised by Sheila McIlraith. My research interests lie at the intersection of Machine Learning (particularly Reinforcement Learning), AI Planning, and Knowledge Representation & Reasoning. I aim to develop AI which learns over a long lifetime by acquiring knowledge from its interactions with the world, abstracting knowledge into generalizable concepts, and reasoning at a high-level to robustly handle new situations.

As an undergrad at the University of Waterloo, I researched combinatorial optimization algorithms for learning Bayesian networks under structural constraints.


  • Candidate for Ph.D. in Computer Science, 2021 - present
    University of Toronto
  • M.Sc. in Computer Science, 2019 - 2021
    University of Toronto
  • B.Math. in Computer Science, Combinatorics & Optimization (Double Major), 2015 - 2019
    University of Waterloo

Selected Publications

  • LTL2Action: Generalizing LTL Instructions for Multi-Task RL   (ICML 2021)

    by Pashootan Vaezipoor*, Andrew Li*, Rodrigo Toro Icarte, Sheila McIlraith
    PaperTalkPosterBlog PostCode

    Abstract: We address the problem of teaching a deep reinforcement learning (RL) agent to follow instructions in multi-task environments. Instructions are expressed in a well-known formal language – linear temporal logic (LTL) – and can specify a diversity of complex, temporally extended behaviours, including conditionals and alternative realizations. Our proposed learning approach exploits the compositional syntax and the semantics of LTL, enabling our RL agent to learn task-conditioned policies that generalize to new instructions, not observed during training. To reduce the overhead of learning LTL semantics, we introduce an environment-agnostic LTL pretraining scheme which improves sample-efficiency in downstream environments. Experiments on discrete and continuous domains target combinatorial task sets of up to ∼1039 unique tasks and demonstrate the strength of our approach in learning to solve (unseen) tasks, given LTL instructions.

  • Interpretable Sequence Classification via Discrete Optimization   (AAAI 2021)

    by Maayan Shvo, Andrew Li, Rodrigo Toro Icarte, Sheila McIlraith

    Abstract: Sequence classification is the task of predicting a class label given a sequence of observations. In many applications such as healthcare monitoring or intrusion detection, early classification is crucial to prompt intervention. In this work, we learn sequence classifiers that favour early classification from an evolving observation trace. While many state-of-the-art sequence classifiers are neural networks, and in particular LSTMs, our classifiers take the form of finite state automata and are learned via discrete optimization. Our automata-based classifiers are interpretable—supporting explanation, counterfactual reasoning, and human-in-the-loop modification—and have strong empirical performance. Experiments over a suite of goal recognition and behaviour classification datasets show our learned automata-based classifiers to have comparable test performance to LSTM-based classifiers, with the added advantage of being interpretable.

  • Learning Symbolic Representations for Reinforcement Learning of Non-Markovian Behavior   (KR2ML 2020)

    by Phillip Christofferson, Andrew Li, Rodrigo Toro Icarte, Sheila McIlraith

    Abstract: Many real-world reinforcement learning (RL) problems necessitate learning complex, temporally extended behavior that may only receive reward signal when the behavior is completed. If the reward-worthy behavior is known, it can be specified in terms of a non-Markovian reward function—a function that depends on aspects of the state-action history, rather than just the current state and action. Such reward functions yield sparse rewards, necessitating an inordinate number of experiences to find a policy that captures the reward-worthy pattern of behavior. Recent work has leveraged Knowledge Representation (KR) to provide a symbolic abstraction of aspects of the state that summarize reward-relevant properties of the state-action history and support learning a Markovian decomposition of the problem in terms of an automaton over the KR. Providing such a decomposition has been shown to vastly improve learning rates, especially when coupled with algorithms that exploit automaton structure. Nevertheless, such techniques rely on a priori knowledge of the KR. In this work, we explore how to automatically discover useful state abstractions that support learning automata over the state-action history. The result is an end-to-end algorithm that can learn optimal policies with significantly fewer environment samples than state-of-the-art RL on simple non-Markovian domains.

  • Bayesian Network Structure Learning with Side Constraints   (PGM 2018)

    by Andrew Li, Peter van Beek

    Abstract: Hybrid methods for Bayesian network structure learning that incorporate both observed data and expert knowledge have proven to be important in many fields. Previous studies have presented both exact and approximate hybrid methods for structure learning. In this paper, we propose an approximate method based on local search that is capable of efficiently handling a variety of prior knowledge constraints, including an important class of non-decomposable ancestral constraints that assert indirect causation between random variables. In our experiments, our proposed approximate method is able to significantly outperform an existing state-of-the-art approximate method in finding feasible solutions when hard constraints are imposed. Our approach is able to find nearoptimal networks while scaling up to almost fifty random variables. In contrast, previous exact methods are unable to handle more than twenty random variables. Furthermore, we show that when prior knowledge is integrated, we are often able to produce a network much closer to the ground truth network, particularly when the amount of data is limited.