Contextualizing selection bias in Mendelian randomization: how bad is it likely to be?

Mar 11, 2018

Apostolos Gkatzionis, Stephen Burgess

Introduction

Mendelian randomization is the use of genetic information to assess the existence of a causal relationship between a risk factor and an outcome of interest [1, 2].It is the application of instrumental variable analysis in the context of genetic epidemiology, where genetic variants are used as instruments. To be a valid instrumental variable, a genetic variant must be associated with the risk factor in a specific way – it cannot influence the outcome except via its association with the risk factor, and it cannot be associated with any confounder of the risk factor–outcome association. An association between a valid instrumental variable and outcome is indicative of a causal effect of the risk factor on the outcome [3, 4].

Selection Bias in Mendelian Randomization

This paper discusses selection bias in Mendelian randomization. In general, selection bias arises when individuals included in the study population are not a representative sample of the target population [5]. Selection bias is likely to be present in all epidemiological analyses to some extent. Bias due to non-representative selection usually occurs as an example of collider bias [6, 7, 8]. A collider is a variable that is a common effect of two variables (it is causally downstream of both variables). Collider bias occurs when conditioning on such a variable: even if the two initial variables were unrelated (marginally independent), they will typically become related when conditioning on a collider (conditionally dependent).

An example of this is the so-called Berkson’s bias [7]: two diseases A and B that often cause hospitalization may be independent across the population, but they will typically be dependent among hospitalized individuals since being hospitalized and not having disease A means one is more likely to have disease B.

Throughout this paper, we assume that risk factor–outcome confounding is represented by a single variable, referred to as the confounder. Collider bias in Mendelian randomization studies often results in a violation of the instrumental variable assumptions. By assumption, an instrumental variable and the confounder are marginally independent. Conditioning on a collider of the instrumental variable and the confounder would induce an association between the two [9] and would lead to the instrumental variable becoming invalid. Hence, selection bias can lead to an association between the instrumental variable and the outcome in the absence of a causal effect of the risk factor on the outcome [10].

Collider Bias in Mendelian Randomization Visualization

Collider bias in Mendelian randomization can be visualized through causal diagrams. Directed acyclic graphs indicating the relationships between the genetic variant, risk factor, confounder, and outcome are shown in Figure 1. We can see that the risk factor and outcome are both colliders between the genetic variant and the confounder. This means that if selection into the sample population is a function of the risk factor then selection bias will occur (Figure 1, left). The same will occur if selection is a function of the outcome (Figure 1, right), but not if it is a function of the confounder alone, as the confounder is not a collider [11].

Dealing with Selection Bias

The possibility that selection bias may undermine instrumental variable analyses, and Mendelian randomization in particular, has long been noted in the literature [12]. However, simply saying that selection bias may undermine a Mendelian randomization study is a platitude – it is a true statement, but not a helpful one. Such unhelpful statements are pervasive in epidemiology papers – it is common in the discussion of papers analysing observational data to read bald statements highlighting the possibility that findings could have been adversely affected by selection bias, or similar phenomena such as unmeasured confounding and measurement error. It would be more helpful to evaluate to what extent selection bias is likely to influence findings in terms of bias or Type 1 error rate inflation, or to suggest the magnitude of selection bias that would be required for a positive finding to be explained through bias alone [13].

In this paper, we aim to contextualize to what extent selection bias affects a typical Mendelian randomization investigation. Our hope is that this paper will help investigators make an informed judgement about the relative importance of selection bias in their work compared to other potential sources of bias. We first list some typical scenarios for Mendelian.

Sign up to AI First Newsletter

Recommended

We use our own cookies as well as third-party cookies on our websites to enhance your experience, analyze our traffic, and for security and marketing. Select "Accept All" to allow them to be used. Read our Cookie Policy.