We’re excited to share a new paper out of the lab, in collaboration with and funded by the FDA along with many wonderful colleagues at VUMC, Washington Health Research Institute, University of Pennsylvania, Brigham and Women’s Hospital, and Harvard Pilgrim Health Care Institute.
Post marketing safety surveillance depends in part on the ability to detect concerning clinical events
at scale. Spontaneous reporting might be an effective component of safety surveillance, but it requires
awareness and understanding among healthcare professionals to achieve its potential. Reliance on
readily available structured data such as diagnostic codes risks under-coding and imprecision. Clinical
textual data might bridge these gaps, and natural language processing (NLP) has been shown to
aid in scalable phenotyping across healthcare records in multiple clinical domains.
In this study, we developed and validated a novel incident phenotyping approach using unstructured clinical textual data agnostic to Electronic Health Record (EHR) and note type. It’s based on a published, validated
approach (PheRe) used to ascertain social determinants of health and suicidality across entire
healthcare records. To demonstrate generalizability, we validated this approach on two separate
phenotypes that share common challenges with respect to accurate ascertainment: (1) suicide
attempt; (2) sleep-related behaviors. With samples of 89,428 records and 35,863 records for suicide
attempt and sleep-related behaviors, respectively, we conducted silver standard (diagnostic coding)
and gold standard (manual chart review) validation. We showed Area Under the Precision-Recall Curve
of ~ 0.77 (95% CI 0.75–0.78) for suicide attempt and AUPR ~ 0.31 (95% CI 0.28–0.34) for sleep-related
behaviors. We also evaluated performance by coded race and demonstrated differences in performance
by race differed across phenotypes. Scalable phenotyping models, like most healthcare AI, require
algorithmovigilance and debiasing prior to implementation.
