This post has emerged out of a 24-hour Twitter debate with Stuart Buck regarding the (de)merits of a randomized control trial (RCT) to test whether paying former criminals causes them to murder or be murdered less.
Background: Richmond (known for its high murder rate) has initiated a policy to pay former criminals not to murder, and their homicide rate has declined with the majority of former inmates in the program remaining alive without also killing anyone else. People are skeptical that paying criminals is worth the cost and are calling for a RCT, given the awareness that before and after analyses (i.e., murder rate in Richmond before and after initiation of payment policy) are imperfect, and RCT provides the gold standard for causal inference. The move to a RCT is motivated by economics and the belief that payment actually has no difference on murder outcomes.
But dismissing the knowledge that payment appears to have kept people alive is problematic. It fails to take into account that, to date, the Richmond experience does suggest that initiating payment is protective, at least temporarily. Given this, how could randomization into the non-payment group be justified? When it comes to a trial where the outcome is murder, how is not providing what might prevent murder even considerable? Further, supposing a RCT went forward, the informed consent process would necessitate full disclosure of the risks, which would likely alter or bias participant behavior, calling into question the validity and usefulness of the trial itself.
Though a RCT could complement and inform government policy, does that justify it? As soon as we leave practice and enter the research enterprise, we are held to a different (and high) standard, one that places the interests of the particular humans enrolled first. The interests of the State are no longer on the table. When we take our sights off individuals, we risk violations to trust, and we fail as scientists, even if the research is statistically meaningful and economically useful.
Invoking the Declaration of Helsinki, "while the primary purpose of medical research is to generate new knowledge, this goal can never take precedence over the rights and interests of individual research subjects."
Nothing, not even the benefits to the State gained from causal inference, can impinge on the rights and interests of individual humans in the context of research. As soon as we conceive of former criminals as research participants, we are bound by the principles of beneficence and respect and have a duty to not randomize due to the knowledge that payment seems to be protective. And this holds until the state of knowledge from other natural (policy) experiments, such as that done in Richmond, changes. Thus, given what we know now, the proposal for the RCT of payment on murder is denied.
Vernot and colleagues recently mined the genomes of Europeans and East Asians scouting for sequences from Neandertals and Denisovans (two archaic hominin lineages with whom ancient humans interbred) . They found an absence of archaic hominin sequence in some regions of the genome. These 'deserts', as they call them, may reflect loci where Neandertal and Densiovan DNA were deleterious and purged. Remarkably, one of the deserts contains FOXP2, a gene involved in speech and language (I refer readers to Geoffrey Pullum's Language Log write-up for how people have thought about FOXP2 and language , as well as to Steven Pinker's elegant discussion of the topic ).
The suggestion that FOXP2 is not under neutral selection is not new [3,4]. In 2002, Enard and colleagues in Svante Paabo's lab compared FOXP2 in humans and chimps, speculating that the human-specific signatures in FOXP2 occurred relatively recently, about 200,000 years ago, Later, these human-specific variants were thought to have arisen somewhat earlier, around 300,000 to 400,000 years ago, before the most recent common ancestor of humans and Neandertals. Why? Because, while humans and chimps have different FOXP2 sequences, it was discovered that humans and Neandertals share the same evolutionary changes in FOXP2 .
Since humans and Neandertals have comparable FOXP2 sequences, it is interesting that FOXP2 falls in a large desert where no archaic hominin introgression was observed. Why is that? The authors state that their findings are inconsistent with neutral selection (implying some kind of regional target for selection?) but that the mechanism explaining the deserts is uncertain. Structural variation, alteration involving segments longer than 1 kb, are one possibility, as large regions of the genome can be substrates for natural selection .
Update (and parenthetical)
A reader let me know that the above synopsis was too technical. He attributed this to him not having the background to understand. But I believe the issue is mine (not his). At the risk of introducing (or exposing) errors in my own thinking, here's my attempt to say what I wrote above but in plain English.
Ancient humans had sex with their now-extinct cousins (Neandertals and Denisovans). By comparing our DNA with theirs, we can get clues about what makes us unique. There are some long stretches in our DNA that don’t have much of our extinct cousins’ DNA mixed in with them. One of these regions contains a gene called FOXP2, which was the first gene discovered (years ago) to have an influence on speech development. Language is a feature that is unique to humans. So interestingly, humans and Neandertals have the same DNA for FOXP2, which implies that Neandertals may have been equipped, like us, to make fine movements with their mouths. This is in contrast with chimpanzees. Chimps have a different DNA signature at FOXP2 than we do, and they don’t speak. But what’s super interesting about recent research is that these long regions of DNA without Neandertal DNA mixed in may give us clues to our evolution. For some reason, FOXP2 occurs in a region that may have been protective for our ancestors’ survival.
An international team led by researchers at the University of Washington made splashes when their newest article , revealing the sexual escapades of our hominin ancestors, made it into the popular press. See their figure below displaying how ancestral humans interbred.
This led to a flurry of activity on Twitter with speculations about the frequency of matings and how often mating resulted in viable offspring. I found the tweets by Matthew Herper and Carl Zimmer, two science journalists, to be especially interesting.
In thinking about Carl Zimmer's comment about the offspring of human-Neanderthal matings not being mules, several thoughts come to mind. To start, what's a mule? Answer: a mule is the product of interbreeding of two distinct species, horses and donkeys, which have a different number of chromosomes. As a result, mules are sterile. So to say that the offspring of human-Neanderthal (or human-Denisovan) matings weren't mules might imply that they weren't sterile. Indeed, humans and Neanderthals (and Denisovans) have the same number of chromosomes, making our ancestors less like mules. But more than this, Neanderthal and Denisovan DNA survives in us. For this reason, we can infer that, minimally, some of the offspring of the ancient interbreeders were fertile.
Second (and more tangential) thought: what is our closest relative who has a different number of chromosomes? (Perhaps this is something you learned in junior high and high school biology, but I missed this piece of not-so trivial trivia.) Well, if you already know that humans and chimps share most of their DNA, then chimps are a good guess. As it turns out, chimps have 48 chromosomes. We have 46. So how is it that we can have a different number of chromosomes but share so much DNA? Part of the answer is that our chromosome 2 is actually a hybrid of the two more ancient chromosomes existing among members of the Hominidae family (humans, Neanderthals, and Denisovans being the exception). The evidence for this is that human chromosome 2 actually contains a vestigial centromere, the remains of two telomeres joining that resulted in the singular human chromosome 2 [2,3]. The genes on our chromosome 2 and the corresponding ones in chimps match up.
The set of anatomically modern humans who left Africa and interbred with Neanderthals ~50,000 years ago left a genomic legacy: 1.5-4% of the DNA within today's Eurasians comes from Neanderthals [1,2]. Simonti and colleagues recently aimed to study the extent to which this Neanderthal heritage impacts risk of disease. To do so, they identified Neanderthal alleles in the genotyping data available within the Electronic Medical Records and Genomics (eMERGE) cohort, which contains both genotyping data and phenotype data for health outcomes from electronic medical records. They then analyzed the set of Neanderthal-introgressed alleles in a genome-phenotype association of 28,416 adults of European ancestry. Surprisingly, they found an enrichment of circadian genes among the set of alleles that explain an observed risk for depression . Depression is a condition that's been linked to light , and the persistence of Neanderthal-circadian-related alleles may reflect a tolerance to light exposure at higher latitudes. So what does this imply?
Given the widespread use of artificial lighting at night and the lack of strong exposure to light during the day that accompanies indoor work activities, exposure to light today differs substantially than it did for Neanderthals. This statement is equally true for those without Neanderthal ancestry. What we don't know is the extent to which circadian disruption from chronically aberrant exposure to light at the wrong times varies by variation in circadian genes, which may have been selected due to the influence of light at different latitudes.
Exciting research is being done by investigators in the University of Bristol’s MRC Integrative Epidemiology Unit (MRC IEU). For instance, is there a causal relationship between maternal BMI and baby's birthweight? An international team, including professor Debbie Lawlor (of the MRC IEU), examined this with Mendelian randomization (MR). Key to understanding this is knowing that a genetic risk score (genetic BMI) that’s correlated with a trait of interest (maternal BMI) can be used to reduce the likelihood of confounding and strengthen causal inference. A figure from their paper nicely describes how MR works. See their explanation:
Still confused? The difficulty is that MR is often explained with another term that few people, if they know it, seem readily able to explain (they'd have to look it up to talk fluently about it). And this is the term instrumental variable. MR is the use of genetic variants as instrumental variables [2,3]. In the caption beneath the figure above, the authors write: "If a maternal trait causally influences offspring birth weight, then a risk score of genetic variants associated with that trait will also be associated with birth weight." Here the maternal risk score of genetic variants associated with BMI is the instrumental variable for maternal BMI.
So what is an instrumental variable? And wait, I thought we couldn't talk causality with observational (non-experimental) designs, but the authors use the word causally. You might be thinking that, right? Well, this brings up another obscure term: confounding. To many of you (unless you are a statistician, health or social scientist), I'll be talking gibberish if I start bandying about the words confounding and confounders without saying what I mean. So what's a confounder? A confounder is something (a variable) that causes a spurious association. It does this by being associated with both the exposure (independent variable) and the outcome (dependent variable) while not being in the pathway between the exposure and the outcome (it doesn't mediate the association). Confounding makes causal inference in observational studies difficult. In the figure, socioeconomic factors are potential confounders of the association between maternal BMI and offspring birth weight. You can see why, right? Socioeconomic factors, such as SES, are linked to BMI (being poor is associated with having a higher BMI) and socioeconomic factors can impact birthweight. Therefore, at least some of the association between maternal BMI and offspring birth weight is likely explained by SES.
Instrumental variables (long used in economics) are a way around this. An instrumental variable is one that is associated (correlated) with the exposure (here maternal BMI) but not correlated with confounders of the exposure/outcome relationship and not related to the outcome (offspring birth weight) except through the effect on the exposure. In other words, the instrument can be used as a proxy (thus the metaphor of instrument: a device or tool used to facilitate) for the exposure to improve causal inference because the instrument is free from the confounding that mucked up the association between the exposure and the outcome. However, do you see "fetal genotype" in the figure? The authors point out that there's a problem with the choice of a maternal genetic risk score as an instrument because mom's genes directly impact baby's genes. And this means that the genetic risk score (potentially) directly affects the baby's birth weight through the influence of the mom's genes in the baby and not directly through the influence of the mom's BMI. To account for this, the author's create a BMI genetic risk score for the fetus and use this score in their statistical model, which adjusts away this problem
But there's something I haven't yet explained, and that's why genetic variants are attractive candidates as instrumental variables. The use of genetic variants as instrumental variables is like a natural experiment, where the laws of nature, in this case, Mendel's law of independent assortment, assign a value to the exposure (the exposure is the genotype, and the level is the genetic variant) randomly. In this sense, the assignment is random, as it is in experimental designs where investigators allocate members of groups to randomly receive a level of the desired treatment . Two benefits arise from this:
But if you are an epidemiologist, you may be wondering about the validity of a given MR analysis, the extent to which a proposed genetic variable as an instrument adheres to the assumptions of MR, as you'd likely know there are assumptions. In order for MR to work, the following assumptions must be met to minimize bias:
So MR can't be deployed carelessly. That said, if the assumptions are not violated, its use facilitates causal inference and helps us identify modifiable factors. Namely, if we have strong reason to believe that mother's BMI influences offspring birth weight, this knowledge should motivate prevention efforts to modify lifestyle factors likely to influence BMI in premenopausal women.
I'm a Public Health Genetics PhD student at the University of Washington and a molecular epidemiology research fellow at the Fred Hutchinson Cancer Research Center. I post (mostly) about topics in epidemiology and genetics.