# Variable elimination and graph reduction: towards an efficient g-formula for causal DAGs

Consider a study where the causal structure is known and described by a directed acyclic graph (DAG). A causal quantity of interest, say a counterfactual mean, can often be expressed as a functional of the observed distribution given by the g-formula (also known as the “truncated factorization”). The g-formula, which can be written down from the graph, usually takes the form of an integral involving conditional expectations of the variables in the graph.

Naturally, to estimate the causal quantity efficiently, one can use a plugin estimator of the g-formula, where every conditional expectation is replaced by its MLE . However, we find that asymptotically not every variable appearing in the g-formula carries information for estimation. In fact, the causal quantity can often be estimated with an “efficient” g-formula that drops the redundant variables such that the cost of measuring these variables can be saved.

We present a graphical procedure towards this goal. First, we identify a set of graphical conditions that are necessary and sufficient for eliminating redundant variables. Second, we construct a reduced DAG on the non-redundant variables only, from which the “efficient” g-formula can be derived. The reduced DAG is transformed from the original DAG through a set of “moves”, traversing both within and between Markov equivalence classes, which nonetheless preserve the semiparametric efficiency bound for estimating the causal quantity.