Exciting research is being done by investigators in the University of Bristol’s MRC Integrative Epidemiology Unit (MRC IEU). For instance, is there a causal relationship between maternal BMI and baby's birthweight? An international team, including professor Debbie Lawlor (of the MRC IEU), examined this with Mendelian randomization (MR). Key to understanding this is knowing that a genetic risk score (genetic BMI) that’s correlated with a trait of interest (maternal BMI) can be used to reduce the likelihood of confounding and strengthen causal inference. A figure from their paper nicely describes how MR works. See their explanation:
Still confused? The difficulty is that MR is often explained with another term that few people, if they know it, seem readily able to explain (they'd have to look it up to talk fluently about it). And this is the term instrumental variable. MR is the use of genetic variants as instrumental variables [2,3]. In the caption beneath the figure above, the authors write: "If a maternal trait causally influences offspring birth weight, then a risk score of genetic variants associated with that trait will also be associated with birth weight." Here the maternal risk score of genetic variants associated with BMI is the instrumental variable for maternal BMI.
So what is an instrumental variable? And wait, I thought we couldn't talk causality with observational (non-experimental) designs, but the authors use the word causally. You might be thinking that, right? Well, this brings up another obscure term: confounding. To many of you (unless you are a statistician, health or social scientist), I'll be talking gibberish if I start bandying about the words confounding and confounders without saying what I mean. So what's a confounder? A confounder is something (a variable) that causes a spurious association. It does this by being associated with both the exposure (independent variable) and the outcome (dependent variable) while not being in the pathway between the exposure and the outcome (it doesn't mediate the association). Confounding makes causal inference in observational studies difficult. In the figure, socioeconomic factors are potential confounders of the association between maternal BMI and offspring birth weight. You can see why, right? Socioeconomic factors, such as SES, are linked to BMI (being poor is associated with having a higher BMI) and socioeconomic factors can impact birthweight. Therefore, at least some of the association between maternal BMI and offspring birth weight is likely explained by SES.
Instrumental variables (long used in economics) are a way around this. An instrumental variable is one that is associated (correlated) with the exposure (here maternal BMI) but not correlated with confounders of the exposure/outcome relationship and not related to the outcome (offspring birth weight) except through the effect on the exposure. In other words, the instrument can be used as a proxy (thus the metaphor of instrument: a device or tool used to facilitate) for the exposure to improve causal inference because the instrument is free from the confounding that mucked up the association between the exposure and the outcome. However, do you see "fetal genotype" in the figure? The authors point out that there's a problem with the choice of a maternal genetic risk score as an instrument because mom's genes directly impact baby's genes. And this means that the genetic risk score (potentially) directly affects the baby's birth weight through the influence of the mom's genes in the baby and not directly through the influence of the mom's BMI. To account for this, the author's create a BMI genetic risk score for the fetus and use this score in their statistical model, which adjusts away this problem
But there's something I haven't yet explained, and that's why genetic variants are attractive candidates as instrumental variables. The use of genetic variants as instrumental variables is like a natural experiment, where the laws of nature, in this case, Mendel's law of independent assortment, assign a value to the exposure (the exposure is the genotype, and the level is the genetic variant) randomly. In this sense, the assignment is random, as it is in experimental designs where investigators allocate members of groups to randomly receive a level of the desired treatment . Two benefits arise from this:
But if you are an epidemiologist, you may be wondering about the validity of a given MR analysis, the extent to which a proposed genetic variable as an instrument adheres to the assumptions of MR, as you'd likely know there are assumptions. In order for MR to work, the following assumptions must be met to minimize bias:
So MR can't be deployed carelessly. That said, if the assumptions are not violated, its use facilitates causal inference and helps us identify modifiable factors. Namely, if we have strong reason to believe that mother's BMI influences offspring birth weight, this knowledge should motivate prevention efforts to modify lifestyle factors likely to influence BMI in premenopausal women.
I'm a Public Health Genetics PhD student at the University of Washington and a molecular epidemiology research fellow at the Fred Hutchinson Cancer Research Center. I post (mostly) about topics in epidemiology and genetics.