Planned Missing Data

Overview
Planned missing data design have long been considerations in research methods (Graham, 2006; Shoemaker, 1971; Twisk, 2001). In planned missing data designs, participants are intentionally not presented with all the items used in a study. The hope is that by reducing the demands made of participants, participant fatigue and drop-out (attrition) will be reduced. In the case of online surveys such as the Wellbeing Assessment, planned missing data designs are intended to reduce attrition by asking participants to complete fewer items.

The multiform approach
There are many approaches to planned missing data designs. For the Wellbeing Assessment, we use the multiform approach, which is sometimes referred to as a split questionnaire or efficiency design. We use a within-block version of this design. Participants receive some items from each dimension, but they do not receive items from all of the dimensions. A hypothetical example of a within-block design for a single survey respondent is presented in Table 1. In that example, the respondent receives items 1 and 3 from Scale A, 2 and 4 from Scale B, and 2 and 3 from Scale C.

Table 1. Sample within-block design for Participant A

Item # Scale A items Scale B items Scale C items
1 x
2 x x
3 x x
4 x

In contrast, a between-block design gives participants the items from some dimensions, but none from others. A hypothetical example of a between-block design for a single survey respondent is presented in Table 2. In that example, the respondent receives all the items from Scale A, but none from Scale B or Scale C.

Table 2. Sample between-block design for Participant A

Item # Scale A items Scale B items Scale C items
1 x
2 x
3 x
4 x

In most studies of planned missing data designs’ performance, within-block designs result in less loss of statistical power than between-block designs. Within-block designs are typically more powerful because all participants provide at least a few between-scale data points for all possible between-scale combinations. Within-block designs are particularly useful for research questions that focus on estimating relations between sets of items or latent factors, which is one of the key goals of the Wellbeing Assessment: we want to know how wellbeing outcomes are affected by each other and how they are associated with pathway items and other variables such as life satisfaction, mood, GPA, and intent to transfer.

In contrast, between-block designs capture some between-scale data points within each participant, but there are also some between-scale combinations that are not captured at all. The benefit of the between-block design is that it is more robust for research questions that focus on evaluating the association of items within scales. An example is factor analytic studies that have little focus on between-scale comparisons.

Within-block design details
The within-block design is executed through the creation of an “X-block” of items. The X-block is administered to all participants. Guidelines for its design vary, but methodological recommendations typically advise that the X-block contains items that capture information not captured by other items (i.e., items that are not correlated with other items). Recommendations sometimes also advise that the X-block contain at least one item from every scale or items that are key to the research question. However, recommendations also tend to advise that the size and contents of the X-block should be determined by research questions and anticipated respondent burden.

For the Wellbeing Assessment, we enter two types of items into the X-block: (a) those items that were not well-correlated with other items and (b) a small set of items that was highly correlated with a great deal of other items. The former includes items such as demographics. We cannot “recapture” this information through other items in the Assessment, so having them all in the X-block is important. The latter type of item includes the mood items. At least some of the items in each of the wellbeing dimensions are correlated with at least some of the mood items, making the mood items very good at helping to “recapture” some of the information that is lost in the planned missing data design. We have limited our planned missing data design to these two types of items because including other groups of items impairs the design’s ability to reduce the number of items seen by each respondent.

To further decrease unplanned missingness, we also randomize the order in which participants received the wellbeing dimensions. Each participant receives the same demographic and mood items at the beginning of the survey, and they receive the same demographic items at the end. The presentation of the wellbeing dimensions between these two sets of items varies. As a result, unplanned missingness due to attrition is distributed evenly across these item sets.

Further reading
The following two articles provide very accessible overviews of planned missing data designs:

  • Little, T., & Rhemtulla, M. (2013). Planned missing data designs for developmental researchers. Child Development Perspectives, 7(4), 199-204. DOI: 10.1111/cdep.12043
  • Rhemtulla, M., & Hancock, G. (2016). Planned missing data designs in educational psychology research. Educational Psychologist, 51(3-4), 305-316. DOI: 10.1080/00461520.2016.1208094