The Linearized Activation Function TRick (LAFTR) enabled an efficient parametric function space distance approximation for linear networks with ReLU activations. We extend this to a more general LInearized Function TRick (LIFTR) to enable data-free FSD estimation for arbitrary architectures, with particular focus on transformers. On a modular arithmetic continual learning task, a stochastic variant of LIFTR approaches oracle performance while outperforming parameter-space linearization baselines.
We generalize existing worst-case frameworks to estimate the sensitivity of three common assumptions in causal inference. Empirically, worst-case conclusions about sensitivity can rely on unrealistic changes in the data-generating process. To overcome this limitation, we introduce a new criterion, the Bayesian Sensitivity Value (BSV), which computes the expected sensitivity of an estimate to assumption violations under priors constructed from real-world evidence.
We introduce NATURAL, a novel family of causal effect estimators built with LLMs that operate over datasets of unstructured text. Our estimators use LLM conditional distributions (over variables of interest, given the text data) to assist in the computation of classical estimators of causal effect. NATURAL estimators demonstrate remarkable performance, yielding causal effect estimates that fall within 3 percentage points of their ground truth counterparts, including on real-world Phase 3/4 clinical trials. Our results suggest that unstructured text data is a rich source of causal effect information, and NATURAL is a first step towards an automated pipeline to tap this resource.
We consider a specific case of approximating the function space distance (FSD) over the training set, i.e. the average distance between the outputs of two ReLU neural networks, based on approximating the architecture as a linear network with stochastic gating. Despite requiring only one parameter per unit of the network, our parametric approximation is competitive with state-of-the-art nonparametric approximations with larger memory requirements, when applied to continual learning and influence function estimation.