This new study out in Scientific Reports, led by Dr. Adi Bejan, uses Natural Language Processing (NLP) to improve how well we identify suicidal thoughts and behaviors in healthcare data. Methods relying on diagnostic codes to identify suicidal ideation and suicide attempt in Electronic Health Records (EHRs) at scale are suboptimal because suicide-related outcomes are heavily under-coded.
With our team, Dr. Bejan adapted his NLP from his work with homelessness and adverse childhood events to ascertain 1) suicidal thoughts and 2) suicidal behaviors from clinical text in an EHR-agnostic manner. We validated it with manual chart review and compared NLP-based ascertainment with ICD codes and other clinical forms. The system performed with high PPV 95%+. A helpful finding is that ICD10 codes alone outperformed ICD9 notably (85% vs 58% PPV).
Overall, we demonstrated that scalable and accurate NLP methods can be developed to identify suicidal behavior in EHRs to enhance prevention efforts, predictive models, and precision medicine. This work has already had direct impacts on how we assess the scope of this problem, how we power genetic analyses to better understand it, and how we will evaluate clinical prevention.
The full open access article is available here. Many thanks to co-authors Michael Ripperger, Drew Wilimitis, Ryan Ahmed, JooEun Kang, Theodore Morley and Douglas Ruderfer for work on this study and to the NIMH and thoughtful reviewers who funded the R01 supporting this effort.