Chapter 15 Evidence Quality

Chapter lead: Jon Duke

15.1 Understanding Evidence Quality

How do we know if the results of a study are reliable? Can they be trusted for use in clinical settings? What about in regulatory decision-making? Can they serve as a foundation for future research? Each time a new study is published or disseminated, readers must consider these questions, regardless of whether the work was a randomized controlled trial, an observational study, or another type of analysis.

One of the concerns that is often raised around observational studies and the use of “real world data” is the topic of data quality (Botsis et al. 2010; Hersh et al. 2013; Sherman et al. 2016). Commonly noted is that data used in observational research were not originally gathered for research purposes and thus may suffer from incomplete or inaccurate data capture as well as inherent biases. These concerns have given rise to a growing body of research around how to measure, characterize, and ideally improve data quality (Kahn et al. 2012; Liaw et al. 2013; N. G. Weiskopf and Weng 2013a). The OHDSI community is a strong advocate of such research and community members have led and participated in many studies looking at data quality in the OMOP CDM and the OHDSI network (Huser et al. 2016; Kahn et al. 2015; Callahan et al. 2017; Yoon et al. 2016).

Given the findings of the past decade in this area, it has become apparent that data quality is not perfect and never will be. This notion is nicely reflected in this quote from Dr. Clem McDonald, a pioneer in the field of medical informatics:

Loss of fidelity begins with the movement of data from the doctor’s brain to the medical record.

Thus, as a community we must ask the question – given imperfect data, how can we achieve the most reliable evidence? The OHDSI community is seeking to address this question through a holistic focus on “evidence quality.” Evidence quality considers not only the quality of observational data but also the validity of the methods, software, and clinical definitions used in our observational analyses.

In the following chapters, we will explore four components of evidence quality:

Component of Evidence Quality What it Measures
Data Quality Are the data completely captured with plausible values in a manner that is conformant to agreed-upon structure and conventions?
Clinical Validity To what extent does the analysis conducted match the clinical intention?
Software Validity Can we trust that the process transforming and analyzing the data does what it is supposed to do?
Method Validity Is the methodology appropriate for the question, given the strengths and weaknesses of the data?

15.2 Communicating Evidence Quality

An important aspect of evidence quality is the ability to express the uncertainty that comes from the data being imperfect. Thus, our efforts around evidence quality include not only concepts but also specific tools and community processes. The overarching goal of OHDSI’s work around evidence quality is to produce confidence in health care decision-makers that the evidence generated by OHDSI – while undoubtedly imperfect in many ways – has been consistently measured for its weaknesses and strengths and that this information has been communicated in a rigorous and open manner.


Botsis, Taxiarchis, Gunnar Hartvigsen, Fei Chen, and Chunhua Weng. 2010. “Secondary Use of Ehr: Data Quality Issues and Informatics Opportunities.” Summit on Translational Bioinformatics 2010: 1.

Callahan, Tiffany J, Alan E Bauck, David Bertoch, Jeff Brown, Ritu Khare, Patrick B Ryan, Jenny Staab, Meredith N Zozus, and Michael G Kahn. 2017. “A Comparison of Data Quality Assessment Checks in Six Data Sharing Networks.” eGEMs 5 (1).

Hersh, William R, Mark G Weiner, Peter J Embi, Judith R Logan, Philip RO Payne, Elmer V Bernstam, Harold P Lehmann, et al. 2013. “Caveats for the Use of Operational Electronic Health Record Data in Comparative Effectiveness Research.” Medical Care 51 (8 0 3): S30.

Huser, Vojtech, Frank J. DeFalco, Martijn Schuemie, Patrick B. Ryan, Ning Shang, Mark Velez, Rae Woong Park, et al. 2016. “Multisite Evaluation of a Data Quality Tool for Patient-Level Clinical Data Sets.” EGEMS (Washington, DC) 4 (1): 1239.

Kahn, Michael G., Jeffrey S. Brown, Alein T. Chun, Bruce N. Davidson, Daniella Meeker, P. B. Ryan, Lisa M. Schilling, Nicole G. Weiskopf, Andrew E. Williams, and Meredith Nahm Zozus. 2015. “Transparent Reporting of Data Quality in Distributed Data Networks.” EGEMS (Washington, DC) 3 (1): 1052.

Kahn, Michael G, Marsha A Raebel, Jason M Glanz, Karen Riedlinger, and John F Steiner. 2012. “A Pragmatic Framework for Single-Site and Multisite Data Quality Assessment in Electronic Health Record-Based Clinical Research.” Medical Care 50.

Liaw, Siaw-Teng, Alireza Rahimi, Pradeep Ray, Jane Taggart, Sarah Dennis, Simon de Lusignan, B Jalaludin, AET Yeo, and Amir Talaei-Khoei. 2013. “Towards an Ontology for Data Quality in Integrated Chronic Disease Management: A Realist Review of the Literature.” International Journal of Medical Informatics 82 (1): 10–24.

Sherman, Rachel E, Steven A Anderson, Gerald J Dal Pan, Gerry W Gray, Thomas Gross, Nina L Hunter, Lisa LaVange, et al. 2016. “Real-World Evidence—What Is It and What Can It Tell Us.” N Engl J Med 375 (23): 2293–7.

Weiskopf, Nicole Gray, and Chunhua Weng. 2013a. “Methods and Dimensions of Electronic Health Record Data Quality Assessment: Enabling Reuse for Clinical Research.” Journal of the American Medical Informatics Association 20 (1): 144–51.

Yoon, D., E. K. Ahn, M. Y. Park, S. Y. Cho, P. Ryan, M. J. Schuemie, D. Shin, H. Park, and R. W. Park. 2016. “Conversion and Data Quality Assessment of Electronic Health Record Data at a Korean Tertiary Teaching Hospital to a Common Data Model for Distributed Network Research.” Healthc Inform Res 22 (1): 54–58.