Master's Thesis: Counterfactual Policy Introspection using Structural Causal Models


Inspired by a growing interest in applying reinforcement learning (RL) to healthcare, we introduce a procedure for performing qualitative introspection and ‘debugging’ of models and policies. In particular, we make use of counterfactual trajectories, which describe the implicit belief (of a model) of ‘what would have happened’ if a policy had been applied. These serve to decompose model-based estimates of reward into specific claims about specific trajectories, a useful tool for ‘debugging’ of models and policies, especially when side information is available for domain experts to review alongside the counterfactual claims. More specifically, we give a general procedure (using structural causal models) to generate counterfactuals based on an existing model of the environment, including common models used in model-based RL. We apply our procedure to a pair of synthetic applications to build intuition, and conclude with an application on real healthcare data, introspecting a policy for sepsis management learned in the recently published work of Komorowski et al. (2018).

Michael Oberst
Michael Oberst
PhD Student

Michael’s research interests include developing learning algorithms for dealing with non-stationarity / dataset shift in predictive modelling, as well as robust learning of treatment policies from observational data.