A Letter from Our Director
I’m often asked why a particular program did not meet Blueprints criteria. Most programs fail to receive Blueprints certification because their scientific evaluations have major limitations. These limitations are such that apparently positive results could emerge not from participation in the program but from design problems in the evaluation. The most obvious problems are not having a control group (in which participants are compared to nonparticipants) or not assigning participants to the intervention and control groups randomly. Randomized controlled trials do the most to ensure that control and intervention groups are identical at the start of a program in all ways except that one receives the intervention and the other does not. Some quasi-experimental designs, in which participants are statistically matched to non-participants on key sociodemographic and/or behavioral measures, come close to the quality of randomized trials in ensuring group equivalence at the start of a program.
Even randomized controlled trials and quasi-experimental designs can face serious problems, however. Programs approved by Blueprints must take appropriate steps to overcome these common problems. We cannot offer a comprehensive list of methodological problems that the Board considers in their program reviews but can highlight ones that often prevent Blueprints certification.
Some of the Most Common Methodological Problems
Biased Measures. The best outcome measures are obtained from independent sources. Researchers who rate subjects participating in their programs or mothers in parenting programs who rate their child’s behavior are prone to bias. Strong studies have more objective measures based on subject self-reports, blind ratings, or external data sources. Additionally, the study should collect outcome data for treatment and control subjects in the same way and at the same time, and report the intervention’s effects on all outcomes the study measured and not just the positive ones.
Limited Outcomes. The outcomes to be measured should be behavioral and not simply attitudes or intervening risk and protective factors that are closely related to program content. Success in changing attitudes or intervening factors does not always translate into success in changing the ultimate behavioral outcomes of interest to Blueprints. For instance, a program designed to reduce drug use should measure actual drug use patterns and not just intentions or attitudes towards drugs.
Dropping Subjects. Believing that only subjects who complete an intervention should be evaluated, researchers sometimes drop non-participants. However, this approach selects only the best subjects and leads to biased results. Strong studies use an “intent-to-treat” approach that analyzes all subjects in the condition of their original assignment.
Non-Equivalent Groups. Even in randomized trials, studies need to examine the baseline equivalence of the intervention and control groups. Statistical tests should show few significant differences between the groups before the intervention begins on all sociodemographic and outcome measures.
Differential Attrition. Most studies face loss of subjects, but if the loss differs between the intervention and control groups, it introduces potential bias. Strong studies track subjects by condition at baseline and at each follow up measure to ensure participants lost to attrition do not differ across conditions, baseline measures, or the interaction of condition by baseline measures.
Incorrect Level of Analysis. Researchers often randomize at one level, such as the school level, but conduct analyses at a different level (e.g., students). When individuals are part of larger groups or organizations that are randomized, it violates the assumption made in tests of statistical significance that the observations are independent. To avoid overstating the statistical significance of the results, multilevel models, robust standard errors, or other adjustments should be used.
Inconsistent or Weak Program Benefits. Most studies examine program effects on multiple outcomes, often including a wide range of measures, risk and protective factors, and behaviors. To minimize the role chance may play in multiple statistical tests, Blueprints looks for consistent results. Although it is hard to provide a simple formula to define consistency, the Board looks for robust and reliable benefits that are not limited to a small portion of the outcome measures and hold across time points and subgroups. Ideally, the program benefits are strong enough that they will show when the program is used with other samples and in different contexts.
Return to Blueprints Bulletin Issue 3. July 2017.