“Black Box Thinking:” Drawing on disappointing findings to improve youth mentoring programs

by Jean Rhodes

After a plane crash, investigators examine black box data to determine what went wrong and to make the necessary changes to protect against future incidents. Unfortunately,  that is not always the case in evaluations of prevention programs. In fact, we often find it easier to reframe and dismiss disappointing results than to delve into what might have led to disappointing findings and, if indicated, change our beliefs about its effectiveness. This lack of curiosity essentially closes loops of learning and opportunities for improvement.  According to British researcher Nick Axford “This is unhelpful for several reasons, not least that it skews the evidence base, contributes to research “waste”, undermines respect for science, and stifles creativity in intervention development.”

This point was brought home to in a 2020 Prevention Science evaluation by Axford and colleagues entitled, “The effectiveness of a community-based mentoring program for children aged 5–11 years: Results from a randomized controlled trial.” As summarized in the Chronicle, this randomized controlled trial assessed the effectiveness and implementation of a year-long mentoring program for children with behavioral issues in London.

  • Analyses found that children in the intervention group did not significantly differ from the control group on parent and teacher ratings of behavior.
  • The authors conclude that the mentoring intervention did not affect children’s behavior as intended, and requires changes in design and/or implementation in order to increase effectiveness.] The article concludes that the mentoring program had no effect on children’s behavior or emotional well-being, and that program content needs revising to satisfactorily address key risk and protective factors.
  • The authors note that, “There was no statistically significant effect on any outcome. Given the high level of need of children at baseline, it is possible that many participants were recruited at a point of crisis, and that this level of need in both arms naturally reduced slightly over time. Effect sizes at endpoint are small and none are statistically significant…given the relatively serious needs of the children at recruitment, the lack of effect may be related in part to what mentors actually deliver and whether program content focuses sufficiently and efficaciously on relevant issues.”

Axelrod and colleagues drew from this and other evaluations to formulate an important essay in Prevention Science “Promoting learning from null or negative results in prevention science trials,”  by Nick Axelrod and colleagues, excerpted below.

“In his best-selling book, Black Box Thinking, Matthew Syed (2015) argues that aviation is much better than other fields in acknowledging and learning from performance failure. If an aeroplane crashes, the black box containing essential flight data is recovered, the data are analyzed, and any ensuing lessons are shared rapidly across the industry in order to improve engineering practice or pilot behavior and reduce the risk of a repeat event. He contrasts this with healthcare, contending that there can be a tendency to cover up or explain away treatment that is ineffective or harmful, or at least not to use this valuable information as an opportunity to learn and contribute to continuous improvement. We think there is a danger of similarly unhelpful behavior in prevention science when randomized controlled trials find a null or negative effect, and use this article to explore how to foster a more constructive approach. As will be seen, this might mean challenging the value of different types of research design in prevention science and what they can bring to improving the knowledge base from which learning can take place.

We recognize that there are complexities when trying to identify null or negative effect trials owing to issues with methodological quality and the pattern of results; taking the extremes, there is a world of difference between a well-conducted trial showing no effect on any measure of any outcome and a poorly executed trial showing no effect on the primary outcome but small effects on some measures of some secondary outcomes. The picture is further muddied by reporting practices that claim an effect when there is none. For the purposes of this article, we define null effect trials in terms of failure to disprove the null hypothesis on the primary outcome, despite what the authors may say or do, and negative effect trials as those that find a negative effect on the primary outcome.

Our interest in this subject was triggered by our experience of conducting several null effect superiority trials (Berry et al. 2016; Lloyd et al. 2018; Axford et al. 2020abc). This prompted us to reflect on how we and other stakeholders responded, the relative value of the results (including whether they would even get published), and, in our (NA, TH) darker moments, whether the primary outcome meant a null effect was inevitable, whether the research design limited learning, and even whether the trials should have gone ahead in the first place. But our experience and concerns are not uncommon (Bonafide and Keren 2018; Oldehinkel 2018); a significant and possibly growing proportion of trials in prevention science and beyond (e.g., Kaplan and Irvin 2015) find no or even harmful effects…

it would be remiss if, as a field, we did not reflect on how to learn from well-conducted null and negative effect trials, particularly because how we respond affects not just what happens after a trial but how we think about and design interventions and tests of interventions. ..In what follows, we describe how researchers often respond to null or negative trial results and the implications of their responses, set out what stakeholders might decide to do with the intervention following the results, hypothesize what influences those decisions, and finally propose a series of actions to promote learning from null or negative effect trial results. The suggested steps are designed to minimize the likelihood of unhelpful null effect trials—for example, those that are poorly designed or provide little or no explanation for the findings—and increase the proportion of trials which, even if they have null or negative effect findings, advance our learning. We draw on examples from our own and other people’s work in prevention science…both individually and collectively, unhelpful researcher responses to null or negative trial results limit learning.

First, by unfairly casting doubt on robust findings, or artificially creating or inflating positive results, it contributes to a skewed impression of “what works” in a given subject area, inadvertently suggesting that some forms of intervention are more effective than they are (de Vries et al. 2018). This has the potential to cause harm. While there are techniques in meta-analysis to identify and compensate for publication bias (funnel plot, trim and fill algorithm, fail safe N), they are necessarily imperfect (Carter et al. 2019).

Second, it contributes to research “waste”, which can increase risk and reduce benefits for service users. Accurate knowledge of earlier null or negative findings helps make future research more suitable and may even render some proposed studies unnecessary and irrelevant (Ioannidis et al. 2014). Third, it risks undermining the credibility of prevention science. Critics have highlighted what they perceive to be behaviors that artificially inflate reported intervention effectiveness (e.g., Gorman 2014); we should not ignore the issues. Fourth, it fosters a fear of null or negative results, which in turn stifles creativity and new approaches to intervention development and evaluation.

Deciding What to Do with the Intervention

When a rigorous trial shows that an intervention is not effective, or that it is harmful, there are essentially three options for what to do with the intervention. Depending on the context, they may or may not represent appropriate learning.

The first possible response is to continue to commission or deliver the intervention. Stakeholders might accept the null or negative results but conclude that there are no better alternatives, or that the intervention is commendable for reasons besides its (non-)effect on outcomes. For example, despite the lack of effect in a trial of the PATHS social-emotional learning program in one city in the UK (Berry et al. 2016), the intervention continued to be commissioned in local schools for a further 3 years, at least in part because coaches, teachers, and students liked it. Of course, continuing to deliver the intervention may also happen if the results are not accepted by commissioners or are explained away by researchers.

A second response is to stop delivering and/or refining the intervention. This might take the form of decommissioning an established intervention or, if evidence accumulates from several null or negative effect trials of essentially similar programs albeit with different heritage or branding, de-implementing a class of interventions (Norton and Chambers 2020). Of course, if an intervention only existed as part of a trial, as in the school-based obesity prevention program tested in the Healthy Lifestyles Program (HeLP) trial (Lloyd et al. 2018), there may be nothing to decommission, but further development might cease. Additionally, when evidence from numerous null or negative effect trials accumulates, developers of health guidelines, such as the National Institute for Health and Care Excellence (NICE) in the UK, may issue “do not do” recommendations for clinical practices that should be discontinued or not used routinely.

A third response is to adapt the intervention and then test those changes. The rationale is that the trial results are broadly trustworthy and yield important lessons that need to be acted upon. In such cases, it is deemed premature to cease delivery but continuing with the intervention unchanged is not viable. In this way, the trial results are used as a platform for intentionally improving the intervention. Decisions about what to adjust are likely to be informed not only by outcome patterns but also, where available, by process evaluation results, not to mention wider evidence and expert opinion. Examples of this option include the reworking of a group parenting program (Ghate 2018) following a null effect trial (Simkiss et al. 2013) and the rapid cycle testing of adaptations to the Family Nurse Partnership home visiting program (FNP National Unit and Dartington Service Design Lab 2020) following disappointing trial results (Robling et al. 2016).

Such practice and policy decisions arise from a range of stakeholder responses which, we hypothesize, are shaped by the following four sets of potentially competing and interacting factors (Table 2). Exactly how these impact on decision-making is complex: their importance will vary by stakeholder and may change over time. We have derived these factors from our collective experience of responding to trials in which we have been directly involved as well as from our observations of other researchers and stakeholders.

Table 2 Influences on what happens to an intervention following a null or negative effect trial

The Intervention

An important issue is where the intervention is in its gestation. Finding a lack of effect early in its development is arguably less of an issue, and therefore easier to deal with, than if the intervention is considered to be mature and commissioned widely; the emphasis for newly developed interventions can be put on learning and re-design as there is little, if anything, to de-implement. Indeed, guidance on developing and evaluating complex interventions includes feasibility and piloting stage as a critical stage in the process (Craig et al. 2008).

A related factor concerns the profile and perceived importance of the intervention. If it is well established or politically important, for instance because it has been introduced by or received significant funding from government, it may be “too big to fail”, leading perhaps to a temptation to dismiss the results or plow on regardless with implementation and scale-up.

A further intervention-related factor is the degree to which it is possible to implement the intervention easily and well and whether it is acceptable to practitioners and users. An intervention that is well received or superior to its competitors in these respects may be more likely to continue to be commissioned, despite trial results showing no effect (see the PATHS example above).

Finally, the outcome(s) that the intervention seeks to address influences how trial results are treated. Specifically, some outcomes might be regarded as more important than others, for instance in terms of threat to health or cost to society if not achieved, such that null or negative results spur stakeholders into action in terms of discontinuing or modifying the intervention.

Towards a More Constructive Approach

 So how do we cultivate a stronger culture of learning in response to evidence that an intervention was ineffective or harmful, and in so doing foster a climate for intervention design and testing that encourages learning for the field (i.e., beyond benefit for that specific intervention)? Broadly the actions identified fall into five categories.

Culture

It is necessary to cultivate a learning culture among key stakeholders, that is, those people who will shape the decision about what to do with the intervention following the trial. This requires agreeing why the trial is being conducted, namely to learn about an intervention’s effectiveness and factors that contribute to this, with a view to improving the quality of services provided for children and families. The influence may be direct. For instance, provision may be enhanced by the incorporation of the intervention if it is found to be effective, or by efforts to improve the intervention if the results are equivocal or disappointing, or by replacing it with something that is more effective. Lessons from the evaluation may also contribute to services more indirectly through being picked up in systematic reviews or meta-analyses, which in turn have the potential to shape policy and practice. While achieving consensus among key stakeholders about trial purpose and value may be challenging, failure to do so will seriously undermine efforts to respond appropriately to the results should they be null or negative.

A learning culture can further be enhanced by managing expectations about results, namely the possibility of null or negative results (based on precedent), and by articulating likely and unlikely scenarios, such as the relatively common experience of seeing some effects on some measures of some outcomes and the rare experience of finding large effects on most outcomes. In order to reinforce a sense of openness and realism among stakeholders, it may help to develop outline plans for communicating positive, mixed, null, or negative results publicly. The overarching aim is to counter the erroneous belief that the trial will unquestionably prove the intervention to be effective and thereby give it a ticket to scale.

The aim should also be to encourage a collegiate culture, so that investigators and key stakeholders, especially program developers, feel that they are working together on a shared endeavor. This requires early and ongoing consultation, partly to understand different perspectives, motivations, and needs and thereby identify potential tensions but also to discuss trial design and conduct. For example, agreeing outcome constructs and measures before the trial commences guards against the temptation to criticize or regret the choice of measures post hoc once disappointing results are known and thereby undermine confidence in the null or negative effect. Failure to work together can create an adversarial culture in which, for instance, the deliverers of the intervention feel “done to” or under surveillance, which in turn contributes (unsurprisingly) to a reticence to accept and act on results.

Process

In addition to working collaboratively, learning from null or negative results is more likely if the process of conducting the trial is done carefully and thoughtfully. There are various aspects to this. First, a definitive trial should only proceed if it is clearly necessary and appropriate, meaning that all of the following apply: (i) it has a plausible evidence-informed theory of change; (ii) potential harms have been considered and ruled out; (iii) intervention feasibility and acceptability have been established; (iv) there is genuine uncertainty about intervention effectiveness relative to the control (“equipoise”); (v) alternative methods of impact evaluation are unsuitable; and (vi) key stakeholders agree that a null or negative result is as worthy, interesting, and publication-worthy as a positive result. If an established or scaled intervention lacks a sound theory of change, efforts should be made to develop one retrospectively before proceeding to a trial, for example through an evaluability assessment (Davies 2013). Moreover, since many purportedly “innovative” interventions are highly derivative, it is arguable that testing their effectiveness in a definitive trial is unlikely to tell us anything important that we do not already know. In these cases, time and effort would be better spent improving the intervention so that it better embodies features known to be associated with or predictive of stronger effects. For example, a structured approach to doing this has been used to strengthen juvenile justice provision (Lipsey et al. 2010)…

Intervention Design

Much has been written about good intervention design elsewhere (for a review, see O’Cathain et al. 2019), so here we highlight only a few points. One is the importance of drawing on relevant literature that has been appraised carefully and is deemed to be reliable. This, in turn, requires that the quality of basic research is improved, for instance through study pre-registration, better data sharing, and more replication research (Lortie-Forgues and Inglis 2019). Next, design is likely to be further strengthened by building trusting relationships with intervention developers, professional development providers, and people with lived experience of the issue targeted by the intervention and collaborating with them in a process of human-centered co-design (Lyon and Koerner 2016). A further consideration should be intervention context, specifically the factors (e.g., political, organizational, cultural, social, economic, geographical, financial) that are anticipated to impact on implementation and therefore outcomes. An implementation research framework (e.g., Damschroder et al. 2009) and guidance on how to take account of context in intervention research (Craig et al. 2018) could usefully inform this exercise, shaping both intervention design and implementation strategy. Lastly, possible unintended adverse effects of the intervention (which may contribute to null or negative effects) should be considered and the design adjusted accordingly (Bonell et al. 2015). In addition to asking stakeholders to consider likely adverse effects freely and without prompting, it can be useful to work together through common types such as psychological stress, widening health inequalities, deviancy training, and opportunity costs (Lorenc and Oliver 2013).

Trial Design

Trial design has a significant bearing on the extent to which the results are conducive to learning. Several steps can be taken to minimize the likelihood of results leaving ambiguities in the event of null or negative effects, thereby making them more informative. Equally, certain actions enable the exploration and therefore potential elimination of competing explanations for an intervention being ineffective or harmful, thereby pointing to possible improvements or practices to avoid.

The first is ensuring that the study is adequately powered, either by increasing sample size if practical or, if not, by focusing on more targeted subgroups or using more targeted outcome measures (Lortie-Forgues and Inglis 2019). This helps to avoid finding no effect because the sample was too small. Second, it pays to record carefully the services received by control arm participants. If they significantly exceed those received by intervention participants, or resemble the intervention, it may help to account for null or negative effects. Third, the timing of follow-up points should be calibrated according to theoretical and empirical evidence on when outcomes are likely to be observed. If an effect on the primary outcome is not expected until 12 months post-intervention, this data collection point should be in the study design. Fourth, statistical mediation analysis (O’Rourke and MacKinnon 2018) and qualitative techniques such as contribution analysis (Mayne 2008) can be used to explore whether the theory of change has materialized in practice, which may help explain null or negative effects. Fifth, all aspects of fidelity need to be recorded, including delivery (dose, adherence, quality, responsiveness), implementer training, and the degree to which participants enact what the intervention focuses on (Borrelli 2011). This helps with determining if and how poor fidelity accounts for a lack of effect. Sixth, there is much value in conducting pre-specified ancillary analyses that explore the relationship between outcomes on the one hand and sample characteristics and fidelity on the other. This involves sufficiently powered subgroup analyses to explore whether some types of participant benefit more than others, and complier average causal effect (CACE) analysis, which compares “compliers” in the intervention arm with a comparable group in the control arm (Hewitt et al. 2006). Finally, robust data should be gathered on implementation context, as this affects intervention effectiveness (Craig et al. 2018), and possible adverse or neutralizing effects (see above). Many of the suggested actions here align with the trend towards mixed methods and realist trials (Hesse-Biber 2012; Bonell et al. 2012), which move from answering “Does it work?” to “For whom does it work, why and in what context?”

Environment

As indicated earlier, the behavior of investigators and key stakeholders is shaped by multiple incentives and constraints. For this reason, their ability to enact our recommendations demands a suitable infrastructure and supporting climate. This requires collaboration from a number of actors besides investigators and program developers (the audience for most of the preceding recommendations).

First, funders need to be willing to pay for feasibility studies and pilot trials, and for “thicker” trials that incorporate robust process evaluations and analyses of mediators, moderators, and fidelity x outcome interaction effects. They should also fund—and indeed insist on—protocol sharing and publication of results, regardless of what form they take. If investment in trials is seen as part of a developmental process, there is also a case for a guaranteed “improvement fund” should re-design be the preferred option or a protected “decommission fund” if an established intervention is deemed to have no future. While these suggestions have cost implications, funders can save money by being more selective about the trials they fund, which might include paying for evaluations that use other methods where suitable.

Second, publishers—supported by journal editors and editorial boards—need to make it easier to publish null and negative trial results. Strategies might include results-free peer review or accepting results papers “in principle” on acceptance of a protocol article. Additional steps to support honest reporting of results and reduce potentially biased post hoc critique of methods include only publishing trial results if the protocol and analysis plan are in the public domain, making more space available in journals for trial protocols, and allowing room in journals for authors and critics to debate the merits of a given trial design before results are known (Chan and Hróbjartsson 2018).

Third, intermediary organizations concerned with promoting research utilization could play a valuable role in supporting developers and purveyors with intervention design, improvement, and evaluation. This includes helping them to develop interventions that are less likely to produce null or negative effects, which might entail assistance with finding and applying existing research evidence in the context of a human-centered co-design process. It might also involve adapting interventions sensibly in the light of disappointing findings, or encouraging the use of evaluation methods that contribute to intervention improvement rather than progressing prematurely to a trial.

Fourth, EBP registries should encourage the appropriate generation and use of evidence. This might entail providing credit for robust evidence of a null or negative effect and issuing guidance on how to weigh such evidence, for example highlighting that depending on other factors (see above) it need not mean discontinuing the intervention. It could also involve providing stronger ratings for well-conducted non-trial impact evaluations that nevertheless go some way towards attributing causal inference, and highlighting programs that display features or common elements of effective interventions (even if they have not themselves been evaluated experimentally). These steps would mitigate the pressure felt by developers and purveyors to subject their intervention to a trial prematurely in order to attain a rating that will, they believe, increase its likelihood of being commissioned.

Lastly, academic institutions could credit investigators who share trial protocols (Chan and Hróbjartsson 2018) and publish null or negative trial results.

Conclusion

We have sought to recast null or negative trial results as something to learn from, not fear. The learning should be for the field and not restricted to the intervention in question. This depends on trials being designed and conducted with a learning mindset and in a commissioning and policy climate that encourages innovation and experimentation and reduces associated disincentives. There is also a need for researchers, funders, and developers to reflect on the fact that while simple behavioral interventions are easier to implement and to evaluate through trials, they are less likely to work in tackling complex social and health problems with complex causes (Ghate 2016; Rutter et al. 2017). In other words, the system that encourages such activity inadvertently increases the likelihood of null effect trials.

More empirical research is needed into how stakeholders manage and respond to null and negative effect trials and the factors that predict this, since this will help with understanding the barriers to and facilitators of learning. This should entail a combination of desk-based research to code responses to null or negative effect trials and in-depth interviews with key stakeholders about post-trial decision-making to illuminate what happened and why. We also plan to conduct a Delphi exercise to synthesize multiple stakeholders’ perspectives on our recommendations with a view to producing guidance for investigators. In the meantime, we look forward to a time when there will be fewer but more informative null and negative effect trials—essentially more mixed method trials of potentially ground-breaking innovations—and a stronger emphasis on applying the lessons from such studies to embedded practice.

Notes

  1. 1.

    In this article, we refer primarily to superiority trials investigating the hypothesized added value of an innovation/intervention over a services as usual comparator. But the arguments may also apply to null results from equivalence or non-inferiority trials, which suggest that the new intervention is likely to be inferior to or not as good (by a defined margin) as standard practice or an alternative treatment.

  2. 2.

    https://www.gla.ac.uk/researchinstitutes/healthwellbeing/news/hawkeye2018onwards/march2019/headline_641840_en.html

  3. 3.

    Our experience has been with: Blueprints for Healthy Youth Development; the Early Intervention Foundation Guidebook; Project Oracle; the EMCDDA XChange database; Evidence2Success; and Investing in Children.

  4. 4.

    The Guidebook is a UK-based registry of prevention and early intervention programs: https://guidebook.eif.org.uk

References

  1. Axford, N., Bjornstad, G., Clarkson, S., Ukoumunne, O. C., Wrigley, Z., Matthews, J., et al. (2020a). The effectiveness of the KiVa bullying prevention programme in Wales, UK: Results from a pragmatic cluster randomized controlled trial. Prevention Science. 21, 615–626.

  2. Axford, N., Bjornstad, G., Matthews, J., Whybra, L., Berry, V., Ukoumunne, O. C., et al. (2020b). The effectiveness of a community-based mentoring program for children aged 5–11 years: Results from a randomized controlled trial. Prevention Sciencehttps://doi.org/10.1007/s11121-020-01132-4.

  3. Axford, N., Bjornstad, G., Matthews, J., Heilmann, S., Raja, A., Ukoumunne, O., Berry, V., et al. (2020c). The effectiveness of a therapeutic parenting programme for children aged 6–11 years with behavioural or emotional difficulties: Results from a randomized controlled trial. Children and Youth Services Reviewhttps://doi.org/10.1016/j.childyouth.2020.105245.

  4. Berry, V., Axford, N., Blower, S., Taylor, R. S., Edwards, R. T., Tobin, K., et al. (2016). The effectiveness and micro-costing analysis of a universal, school-based, social-emotional learning programme in the UK: A cluster-randomised controlled trial. School Mental Health, 8, 238–256.

    Google Scholar

  5. Bonafide, C. P., & Keren, R. (2018). Editorial: Negative studies and the science of deimplementation. JAMA Pediatrics 23 July, E1-E2.

  6. Bonell, C., Fletcher, A., Morton, M., Lorenc, T., & Moore, L. (2012). Realist randomised controlled trials: A new approach to evaluating complex public health interventions. Social Science and Medicine, 75, 2299–2306.

    PubMed Google Scholar

  7. Bonell, C., Jamal, F., Melendez-Torres, G. J., & Cummins, S. (2015). ‘Dark logic’: Theorising the harmful consequences of public health interventions. Journal of Epidemiology and Community Health, 69, 95–98.

    PubMed Google Scholar

  8. Bywater, T., Berry, V., Blower, S. L., Cohen, J., Gridley, N., Kiernan, K., et al. (2018). Enhancing social-emotional health and wellbeing in the early years (E-SEE): A study protocol of a community-based randomised controlled trial with process and economic evaluations of the Incredible Years infant and toddler parenting programmes, delivered in a proportionate universal model. BMJ Open, 8, e026906.

  9. Carter, E. C., Schönbrodt, F. D., Gervais, W. M. & Hilgard, J. (2019). Correcting for bias in psychology: A comparison of meta-analytic methods. Advances in Methods and Practices in Psychological Science, 2, 115–144.

  10. Cartwright, N., & Hardie, J. (2012). Evidence-based policy: A practical guide to doing it better. Oxford: Oxford University Press.

    Google Scholar

  11. Chan, A.-W., & Hróbjartsson, A. (2018). Promoting public access to clinical trial protocols: Challenges and recommendations. Trials, 19, 116.

    PubMed PubMed Central Google Scholar

  12. Chen, Y.-F., Hemming, K., Stevens, A. J., & Lilford, R. J. (2016). Secular trends and evaluation of complex interventions: The rising tide phenomenon. BMJ Quality and Safety, 25, 303–310.

    PubMed Google Scholar

  13. Chow, J., & Eckholm, E. (2018). Do published studies yield larger effect sizes than unpublished studies in education and special education? A meta-review. Educational Psychology Review, 30, 727–744.

    Google Scholar

  14. Craig, P., Dieppe, P., Macintyre, S., Michie, S., Nazareth, I., & Petticrew, M. (2008). Developing and evaluating complex interventions: The new Medical Research Council guidance. BMJ, 337, a1655.

    PubMed PubMed Central Google Scholar

  15. Craig, P., Di Ruggiero, E., Frohlich, K. L., Mykhalovskiy, E., White, M., et al. (2018). Taking account of context in population health intervention research: Guidance for producers, users and funders of research. Southampton: NIHR Evaluation, Trials and Studies Coordinating Centre.

  16. Damschroder, L. J., Aron, D. C., Keith, R. E., Kirsh, S. R., Alexander, J. A., & Lowery, J. C. (2009). Fostering implementation of health services research findings into practice: A consolidated framework for advancing implementation science. Implementation Science, 4, 50.

    PubMed Google Scholar

  17. Davies, R. (2013). Planning evaluability assessments: A synthesis of the literature with recommendations. London: Department for International Development.

    Google Scholar

  18. De Vries, Y. A., Roest, A. M., de Jonge, P., Cuijpers, P., Munafò, M. R., & Bastiaansen, J. A. (2018). The cumulative effect of reporting and citation biases on the apparent efficacy of treatments: The case of depression. Psychological Medicine, 48, 2453–2455.

    PubMed PubMed Central Google Scholar

  19. Duyx, B., Urlings, M. J. E., Swaen, G. M. H., Bouter, L. M., & Zeegers, M. P. (2017). Scientific citations favour positive results: A systematic review and meta-analysis. Journal of Clinical Epidemiology, 88, 92–101.

    PubMed Google Scholar

  20. Eisner, M. (2009). No effects in independent prevention trials: Can we reject the cynical view? Journal of Experimental Criminology, 5, 163–183.

    Google Scholar

  21. Evans, R. E., Craig, P., Hoddinott, P., Littlecott, H., Moore, L., Murphy, S., et al. (2019). When and how do ‘effective’ interventions need to be adapted and/or re-evaluated in new contexts? The need for guidance. Journal of Epidemiology and Community Health, 73, 481–482.

    PubMed PubMed Central Google Scholar

  22. FNP National Unit, & Dartington Service Design Lab. (2020). FNP ADAPT: Using evidence, pragmatism and collaboration to change the Family Nurse Partnership programme in England. London: FNP National Unit.

    Google Scholar

  23. Fonagy, P., Butler, S., Cottrell, D., Scott, S., Pilling, S., Eisler, I., et al. (2018). Multisystemic therapy versus management as usual in the treatment of adolescent antisocial behaviour (START): A pragmatic, randomised controlled, superiority trial. The Lancet Psychiatry, 5, 119–133.

    PubMed PubMed Central Google Scholar

  24. Ghate, D. (2016). From programs to systems: Deploying implementation science and practice for sustained real world effectiveness in services for children and families. Journal of Clinical Child & Adolescent Psychology, 45, 812–826.

    Google Scholar

  25. Ghate, D. (2018). Developing theories of change for social programmes: Co-producing evidence-supported quality improvement. Palgrave Communications, 4, 90.

    Google Scholar

  26. Gorman, D. M. (2014). Is Project Towards No Drug Abuse (TND) an evidence-based drug and violence prevention program? A review and reappraisal of the evaluation studies. Journal of Primary Prevention, 35, 217–232.

    PubMed Google Scholar

  27. Gorman, D. M. (2018). Can we trust positive findings of intervention research? The role of conflict of interest. Prevention Science, 19, 295–305.

  28. Gottfredson, D. C., Cook, T. D., Gardner, F. E., Gorman-Smith, D., Howe, G. W., Sandler, I. N., & Zafft, K. M. (2015). Standards of evidence for efficacy, effectiveness, and scale-up research in prevention science. Prevention Science, 16, 893–926.

    PubMed PubMed Central Google Scholar

  29. Grant, S., Mayo-Wilson, E., Montgomery, P., Macdonald, G., Michie, S., Hopewell, S., Moher, D., & for the CONSORT-SPI Group. (2018). CONSORT-SPI 2018 explanation and elaboration: Guidance for reporting social and psychological intervention trials. Trials, 19, 406.

    PubMed PubMed Central Google Scholar

  30. Greenberg, M. T., & Abenavoli, R. (2017). Universal interventions: Fully exploring their impacts and potential to produce population-level impacts. Journal of Research on Educational Effectiveness, 10, 40–67.

    Google Scholar

  31. Hesse-Biber, S. (2012). Weaving a multimethodology and mixed methods praxis into randomised control trials to enhance credibility. Qualitative Inquiry, 18, 876–889.

    Google Scholar

  32. Hewitt, C. E., Torgerson, D. J., & Miles, J. N. V. (2006). Is there another way to take account of noncompliance in randomized controlled trials? Canadian Medical Association Journal, 175, 347–348.

    PubMed Google Scholar

  33. Hill, K. G., Woodward, D., Woelfel, T., Hawkins, J. D., & Green, S. (2016). Planning for long-term follow-up: Strategies learned from longitudinal studies. Prevention Science, 17, 806–818.

    PubMed PubMed Central Google Scholar

  34. Hopewell, S., Loudon, K., Clarke, M. J., Oxman, A. D., & Dickersin, K. (2009). Publication bias in clinical trials due to statistical significance or direction of trial results. Cochrane Database of Systematic Reviews 2009, Issue 1.

  35. Humayun, S., Herlitz, L., Chesnokov, M., Doolan, M., Landau, S., & Scott, S. (2017). Randomized controlled trial of Functional Family Therapy for offending and antisocial behavior in UK youth. Journal of Child Psychology and Psychiatry, 58, 1023–1032.

    PubMed Google Scholar

  36. Ioannidis, J. P., Greenland, S., Hlatky, M. A., Khoury, M. J., Macleod, M. R., Moher, D., Schulz, K. F., & Tibshirani, R. (2014). Increasing value and reducing waste in research design, conduct, and analysis. Lancet, 383, 166–175.

    PubMed PubMed Central Google Scholar

  37. Kaplan, R. M., & Irvin, V. L. (2015). Likelihood of null effects of large NHLBI clinical trials has increased over time. PLoS One, 10, e132382.

    Google Scholar

  38. Kasenda, B., Schandelmaier, S., Sun, X., von Elm, E., You, J., Blümle, A., et al. (2014). Subgroup analyses in randomised controlled trials: Cohort study on trial protocols and journal publications. BMJ, 349, g4539.

    PubMed Google Scholar

  39. Kirkpatrick, T., Lennox, C., Taylor, R., Anderson, R., Maguire, M., Haddad, M., et al. (2018). Evaluation of a complex intervention (Engager) for prisoners with common mental health problems, near to and after release: Study protocol for a randomized controlled trial. BMJ Open, 8, e017931.

    PubMed PubMed Central Google Scholar

  40. Lipsey, M. W., Howell, J. C., Kelly, M. R., Chapman, G., & Carver, D. (2010). Improving the effectiveness of juvenile programs: A new perspective on evidence-based practice. Washington, DC: Georgetown University, Center for Juvenile Justice Reform.

    Google Scholar

  41. Lloyd, J., Creanor, S., Logan, S., Green, C., Dean, S. G., Hillsdon, M., et al. (2018). Effectiveness of the Healthy Lifestyles Programme (HeLP) to prevent obesity in UK primary-school children: A cluster randomised controlled trial. Lancet Child and Adolescent Health, 2, 35–45.

    PubMed Google Scholar

  42. Lorenc, T., & Oliver, K. (2013). Adverse effects of public health interventions: A conceptual framework. Journal of Epidemiology and Community Health, 68, 288–290.

    PubMed Google Scholar

  43. Lortie-Forgues, H., & Inglis, M. (2019). Rigorous large-scale educational RCTs are often uninformative: Should we be concerned? Educational Researcher, 48, 158–166.

    Google Scholar

  44. Lyon, A. R., & Koerner, K. (2016). User-centered design for psychosocial intervention development and implementation. Clinical Psychology: Science and Practice, 23, 180–200.

    Google Scholar

  45. Martin, J., McBride, T., Brims, L., Doubell, L., Pote, I., & Clarke, A. (2018). Evaluating early intervention programmes: Six common pitfalls, and how to avoid them. London: EIF.

    Google Scholar

  46. Mayne, J. (2008). Contribution analysis: An approach to exploring cause and effect. Institutional Learning and Change (ILAC) Initiative.

  47. Mihalic, S. F., & Elliott, D. S. (2015). Evidence-based programs registry: Blueprints for Healthy Youth Development. Evaluation and Program Planning, 48, 124–131.

    PubMed Google Scholar

  48. Moore, G. F., Audrey, S., Barker, M., Bond, L., Bonell, C., Hardeman, W., et al. (2015). Process evaluation of complex interventions: Medical Research Council guidance. BMJ, 350, h1258.

    PubMed PubMed Central Google Scholar

  49. Moore, G. F., Evans, R. E., Hawkins, J., Littlecott, H., Melendez-Torres, G. J., Bonell, C., et al. (2019). From complex social interventions to interventions in complex social systems: Future directions and unresolved questions for intervention development and evaluation. Evaluation, 25, 23–45.

    PubMed Google Scholar

  50. Norton, W. E., & Chambers, D. A. (2020). Unpacking the complexities of de-implementing inappropriate health interventions. Implementation Science, 15, 1–7.

    Google Scholar

  51. O’Cathain, A., Croot, L., Sworn, K., Duncan, E., Rousseau, N., Turner, K., Yardley, L., & Hoddinott, P. (2019). Taxonomy of approaches to developing interventions to improve health: A systematic methods overview. Pilot and Feasibility Studies, 5, 1–27.

    Google Scholar

  52. O’Rourke, H. P., & MacKinnon, D. P. (2018). Reasons for testing mediation in the absence of an intervention effect: A research imperative in prevention and intervention research. Journal of Studies on Alcohol and Drugs, 79, 171–181.

    PubMed PubMed Central Google Scholar

  53. Oldehinkel, A. J. (2018). Editorial: Sweet nothings–The value of negative findings for scientific progress. Journal of Child Psychology and Psychiatry, 59, 829–830.

    PubMed Google Scholar

  54. Robling, M., Bekkers, M.-J., Bell, K., Butler, C. C., Cannings-John, R., Channon, S., et al. (2016). Effectiveness of a nurse-led intensive home-visitation programme for first-time teenage mothers (Building Blocks): A pragmatic randomised controlled trial. Lancet, 387, 146–155.

    PubMed PubMed Central Google Scholar

  55. Rosenthal, R. (1979). The “file drawer problem” and tolerance for null results. Psychological Bulletin, 86, 638–641.

    Google Scholar

  56. Rutter, H., Savona, N., Glonti, K., Bibby, J., Cummins, S., Finegood, D. T., et al. (2017). The need for a complex systems model of evidence for public health. Lancet, 390, 2602–2604.

    PubMed Google Scholar

  57. Schulz, K. F., Altman, D. G., Moher, D., & for the CONSORT Group. (2010). CONSORT 2010 Statement: Updated guidelines for reporting parallel group randomised trials. BMJ, 340, c332.

    PubMed PubMed Central Google Scholar

  58. Segrott, J., Rothwell, H., Hewitt, G., Playle, R., Huang, C., Murphy, S., Moore, L., Hickman, M., & Reed, H. (2015). Preventing alcohol misuse in young people: An exploratory cluster randomised controlled trial of the Kids, Adults Together (KAT) programme. Public Health Research, 3, 15.

    Google Scholar

  59. Simkiss, D. E., Snooks, H. A., Stallard, N., Kimani, P. K., Sewell, B., Fitzsimmons, D., et al. (2013). Effectiveness and cost-effectiveness of a universal parenting skills programme in deprived communities: A multicentre randomised controlled trial. BMJ Open, 2013, e002851.

    Google Scholar

  60. Syed, M. (2015). Black box thinking: The surprising truth about success (and why some people never learn from their mistakes). London: John Murray.

    Google Scholar

Download references

Acknowledgments

We are grateful to Lorna Burns for undertaking a literature search for this article and to Leandra Box and Sarah Darton for helpful comments on a draft. The time of Nick Axford and Vashti Berry is supported by the National Institute for Health Research (NIHR) Applied Research Collaboration South West Peninsula (PenARC). The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health and Social Care.

Author information

Affiliations

Corresponding author

Correspondence to Vashti Berry.

Ethics declarations

Conflict of Interest

Two authors are involved with assessing programs for the Early Intervention Foundation Guidebook (NA, VB) and the Xchange database of the European Monitoring Centre for Drugs and Drug Addiction (NA). NA is a member of the EIF and Xchange Evidence Panels. The other authors declare that they have no conflict of interest.

Ethical Approval

Not applicable.

Informed Consent

Not applicable.

Human and Animal Studies

This article does not contain any studies with human participants or animals performed by any of the authors.

Data Access Statement

This study did not generate any new data.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix. Recommended actions to promote learning from null and negative effect trials in prevention science

Appendix. Recommended actions to promote learning from null and negative effect trials in prevention science

A. Culture

Foster a learning culture among key stakeholders by:

  • [1] agreeing pre-trial that the goal is to help improve population health outcomes, whether through the selected or other interventions, and learning how best to do this
  • [2] agreeing the opportunities for learning (i.e., questions the trial will help answer)
  • [3] managing expectations about outcomes (e.g., possibility of null or negative results)
  • [4] planning for how to interpret and communicate results, whatever form they take

Foster a collegiate culture by:

  • [5] engaging from the outset in regular and ongoing consultation about decisions regarding the intervention and trial

B. Process

Proceed carefully, thoughtfully, and collaboratively, so that:

  • [6] a definitive trial is only conducted if necessary and appropriate, by:
    • [i] developing a clear and logical theory of change
    • [ii] considering potential harms and either putting in place mitigating actions or redesigning the intervention to reduce or eliminate potential harms
    • [iii] establishing intervention feasibility and acceptability
    • [iv] ensuring that there is genuine uncertainty about intervention effectiveness relative to the control (“equipoise”)
    • [v] obtaining consensus among key stakeholders that a null or negative result is as interesting, useful, and publication-worthy as a positive result
    • [vi] considering and ruling out alternative (non-trial) methods of impact evaluation
  • [7] the trial is terminated early if appropriate, by developing and, if necessary, applying early stopping rules
  • [8] results are considered in an honest way, by sharing process evaluation results within the research team first, then sharing the outcome results (blind to trial arm in the first instance)
  • [9] results are reported openly and fairly, by:
    • [i] stating success criteria before the trial commences, in particular the primary outcome(s) and minimum effect size that is of practical significance
    • [ii] registering the trial on a relevant online database, publishing the trial protocol, and developing (and making publicly available) a detailed analysis plan (statistical and qualitative) that aligns with the protocol
    • [iii] publishing the results as fully and in as publicly accessible a way as possible

C. Intervention design

Design the intervention in such a way that it is less likely to have a null or negative effect, more likely to be suitable for the context and more likely to be implemented well, by:

  • [10] drawing on literature that has been appraised as being reliable to inform the intervention design
  • [11] co-designing the intervention with practitioners, professional development providers, and people with lived experience of the issue
  • [12] identifying at the outset possible unintended adverse effects that might contribute to null or negative effects, and either redesigning the intervention completely or making adaptations accordingly
  • [13] understanding and taking account of context and the system in which the intervention will be implemented

D. Trial design

Design and conduct the trial in such a way that it is:

  • [14] less likely to leave ambiguities and more likely to be informative, by:
    • [i] ensuring the study is adequately powered
    • [ii] recording what intervention and control group participants are receiving by way of the intervention and other (non-intervention) services
    • [iii] calibrating follow-up data collection time points based on theory and empirical evidence on when effects are expected to be observed
    • [iv] gathering robust data on all aspects of fidelity
    • [v] exploring mechanisms of impact both qualitatively and quantitatively
    • [vi] undertaking pre-specified and sufficiently powered moderator analyses
    • [vii] undertaking appropriate fidelity x outcome analyses
    • [viii] gathering robust data on implementation context
  • [15] less open to post hoc criticism, by agreeing measures and other aspects of design a priori (see above)
  • [16] alert to possible adverse effects or at least neutralizing influences, by gathering appropriate data on such influences and undertaking relevant analyses

E. Environment

Enable all of the above by cultivating an infrastructure and climate that incentivize desired behaviors and disincentivize undesired behaviors on the part of investigators and program developers. This involves the following:

  • [17] funders paying for: feasibility and pilot studies; “thicker” trials with substantial process evaluations and ancillary analyses; protocol sharing; open access results publication; alternative evaluation methods where suitable; post-trial action plans
  • [18] academic publishers: mandating protocol publication prior to trial results publication; making more space for protocol sharing and debate on trial methods as specified in protocols; making space for publication of statistical analysis plans; offering results-free peer review; and accepting trial results articles “in principle” at the point of accepting a protocol for publication
  • [19] intermediary organizations: providing support and training with intervention design/adaptation; and assisting developers and purveyors with service improvement and evaluation
  • [20] registries of EBPs providing credit for: interventions subjected to a high-quality null effect trial; non-trial impact evaluation; and non-trialed programs assessed as displaying key features of effective programs
  • [21] academic institutions crediting investigators who share trial protocols and publish null or negative trial results

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Axford, N., Berry, V., Lloyd, J. et al. Promoting Learning from Null or Negative Results in Prevention Science Trials. Prev Sci (2020). https://doi.org/10.1007/s11121-020-01140-4

Download citation