Disclaimer — This post assumes you are familiar with the language of causal inference, and have no fundamental ontological objections to using DAGs to describe causal models and causal relationships.
免责声明-这篇文章假定您熟悉因果推理语言,并且对使用DAG描述因果模型和因果关系没有基本的本体论异议。
If you work in empirical/quantitative fields, chances are that you find yourself struggling with writing down causal inference problems on a daily basis. And if you also employ Directed Acyclic Graphs (in short, DAGs) in your causal adventures (for clarity, identification, debiasing, or any other reason), then you can probably recognize that DAGs may suffer from a bit of a “cold start” problem.
如果您从事的是经验/量化领域,则很可能每天都在努力写下因果推理问题。 而且,如果您在因果冒险中(出于清晰性,识别性,去偏性或任何其他原因)还使用了有向无环图(简称DAG ),那么您可能会认识到DAG可能会遭受“冷启动”问题。
In fact, it can be difficult to draw a minimal DAG right away, simply starting from a causal question or a list of variables— or, at least, I have always struggled with this causal-inference-version of a writer’s block.
实际上,仅从因果问题或变量列表开始立即绘制最小的DAG可能很困难-至少,至少我一直在努力处理作者块的因果推理版本。
While teaching-assisting a Causal Inference class, I shared some tips and tricks with students who were also having troubles with plotting and connecting a large number of given variables, all relating to the same causal question. Then, I thought I could actually share these with a wider, possibly interested (and equally struggling) audience.
在教因果推理课的过程中,我与学生分享了一些技巧和窍门,这些学生在绘制和连接大量给定变量时也遇到了麻烦,这些变量都与同一个因果问题有关。 然后,我认为我实际上可以与更广泛,可能感兴趣(且同样挣扎)的受众分享这些内容。
So, here are some practical suggestions that can help you getting started with your minimal DAG, and getting one step closer to the identification and estimation of your causal effect of interest.
因此,这里有一些实用的建议,可以帮助您开始使用最小的DAG,并更进一步地识别和估计您感兴趣的因果关系。
Defining the causal question will already give a sense of the quantities involved in the causal effect. Starting from the causal question, in order to get started with a minimal DAG, you will need to define at least three key variables:
定义因果问题将已经使人对因果效应所涉及的数量有所了解。 从因果问题开始,为了开始使用最小的DAG,您将需要至少定义三个关键变量:
An agent, unit, or individual which experiences a certain state change.
经历某种状态变化的代理,单位或个人。
If the unit is on an aggregate scale (e.g. average sales, average price change), it may help to re-define the unit on an individual scale (e.g. instead of average sales, sales level for one single customer), at least when writing the first version of the DAG.
如果单位是总体规模(例如,平均销售额,平ASP格变动),则至少在撰写本文时,可能有助于重新定义个体规模的单位(例如,代替平均销售额,单个客户的销售水平) DAG的第一个版本。
An outcome variable.
结果变量。
A state, intervention, or treatment variable, which, when changed, is believed to be inducing a change in the outcome for the unit.
状态,干预或治疗变量,当更改时,被认为会导致单位结果的变化。
Pearl (2009) also recommends to think about other quantities when you begin to define our causal models. These quantities are called causal parents (PA), and are all the relevant and immediate (observable) causes of the outcome variable. A causal parent PA must be included in the list of variables, if it also affects other variables modeled in the system. If you exclude such variables, there will be unobserved disturbances which influence several variables simultaneously, and this will cause many subsequent assumptions to be violated.
Pearl(2009)还建议您在开始定义因果模型时考虑其他数量。 这些数量称为因果父母(PA),并且是结果变量的所有相关且直接(可观察)的原因。 如果因果的父级PA也影响系统中建模的其他变量,则必须将其包含在变量列表中。 如果排除此类变量,将有不可观察的干扰同时影响多个变量,这将导致违反许多后续假设。
Once you have the list of (1) who/what is the unit of analysis, (2) the outcome, (3) the treatment, (4) the immediate causal parents of the outcome, you can start drawing the DAG in a few steps:
一旦获得以下列表(1)谁/什么是分析单位,(2)结果,(3)治疗方法,(4)结果的直接因果父母,您就可以开始绘制DAG了脚步:
Write down the outcome variable 写下结果变量 Write down the treatment variable and link it to the outcome variable写下治疗变量并将其链接到结果变量Write down the causal parents of the outcome: the immediate and relevant causes of the outcome. Link them to the outcome using arrows. 写下结果的因果父母:结果的直接和相关原因。 使用箭头将它们链接到结果。Make sure that the causal parents of the outcome that affect more than one variable (i.e. more than just the outcome) in the system are explicitly defined, and their links with the other variables are explicitly defined.
确保明确定义影响系统中多个变量(即,不仅仅是结果)的结果的因果双亲,并明确定义它们与其他变量的联系。
(From Pearl, 2009) Think about and list the causal parents of the causal parents of the outcome variable (i.e. causal granparents).
(摘自Pearl,2009年)考虑并列出结果变量的因果父母(即因果祖父母)的因果父母。
If you think they are not directly relevant for your causal question, we can keep them in a separate, more extensive list, or in a separate, more extensive DAG for future reference.
如果您认为它们与您的因果问题不直接相关,我们可以将它们放在单独的,更广泛的列表中,或者放在单独的,更广泛的DAG中,以备将来参考。
These steps will give you the first, minimal DAG representing the relevant causal question. Afterwards, you may want to check a couple more things:
这些步骤将为您提供代表相关因果问题的第一个最小DAG 。 之后,您可能需要检查几件事:
The missing arrows: missing arrows in the DAG represent your assumptions. You are implicitly assuming independence among variables that are not connected by arrows. Are these assumptions plausible? Reasonable? Are they justified by previous research or existing theories?
丢失的箭头:DAG中的丢失的箭头代表您的假设。 您隐式地假设未通过箭头连接的变量之间具有独立性。 这些假设合理吗? 合理? 他们被先前的研究或现有理论证明是正确的吗?
The unobservable disturbances or errors: so far, the DAG only contains observable quantities. Each quantity has also an unobserved component. For example, variable X has an unobserved component Ux, represented as: Ux → X
不可观察的干扰或错误:到目前为止,DAG仅包含可观察的量。 每个数量还具有未观察到的成分。 例如,变量X具有不可观察的分量Ux,表示为:Ux→X
How are each variable’s unobserved components connected to the variables in the DAG? Is any variable’s U connected to any other observable variable? For example, if my DAG contains X, Y, and Z, it can happen that according to my intuition, experience, or reference theories, Ux may affect both X and Z:
每个变量的未观察组件如何与DAG中的变量连接? 变量的U是否与其他可观察变量连接? 例如,如果我的DAG包含X,Y和Z,则根据我的直觉,经验或参考理论,Ux可能会同时影响X和Z:
X < — Ux → Z
X <— Ux→Z
Finally, you can initially reason about the treatment assignment mechanisms (although they would deserve an entirely separate article), by checking these quantities in the DAG:
最后,您可以通过在DAG中检查以下数量,初步推断出处理分配机制(尽管它们值得一本单独的文章):
Is there any observed variable directly causing my treatment variable?
是否有任何观察到的变量直接导致我的治疗变量?
Is there any unobserved variable affecting both my treatment and the outcome? E.g. if my treatment is T, the outcome is Y, I may have a situation like:
是否有任何未观察到的变量同时影响我的治疗和结果? 例如,如果我的治疗方法是T,结果是Y,那么我可能会遇到以下情况:
Y < — Ut → T
Y <— Ut→T
Intuitively, this is a violation of the
直觉上,这是对
ignorability assumption
可燃性假设
Is there any unobserved variable affecting both the causes of my treatment and the outcome? E.g. if my treatment is T, the outcome is Y, the cause of my treatment is Z, I may have a situation like:
是否有任何未观察到的变量同时影响我的治疗原因和结果? 例如,如果我的治疗是T,结果是Y,我的治疗原因是Z,则我可能会遇到以下情况:
Y < — Uz → Z
Y <— Uz→Z
Likewise, this is a violation of the
同样,这违反了
excludability assumption
排他性假设
You can iterate these steps over different versions and updates of the DAG. Eventually, when you’re happy with the minimal version, you can move on to more sophisticated graphical diagnostic tools for your model, like the d-separation criterion or the backdoor criterion. But I’ll leave those fun things for the next causally oriented post.
您可以遍历DAG的不同版本和更新的这些步骤。 最终,当您对最小版本感到满意时,可以继续使用适用于模型的更复杂的图形诊断工具,例如d分离标准或后门标准。 但我会将这些有趣的事情留给下一个因果关系的帖子。
Happy minimal-DAG drawing!
快乐的最小DAG绘图!
翻译自: https://medium.com/@martina.pocchiari/creating-minimal-dags-step-by-step-d604cb05e59a
相关资源:四史答题软件安装包exe