Back to
Projects List
Extraction of Orofacial Pain Comorbidities from Clinical Notes Using Large Language Models
Key Investigators
- Alban Gaydamour (University of Michigan, USA)
- Lucia Cevidanes (University of Michigan, USA)
- Steve Pieper (Isomics, USA)
- David Hanauer (University of Michigan, USA)
- Juan Prieto (University of North Carolina, USA)
- Lucie Dole (University of North Carolina, USA)
Project Description
Temporomandibular Disorders (TMDs) are often linked with complex comorbidities that are difficult to extract from long free-text clinical notes. This project leverages Large Language Models (LLMs) to identify and summarize these comorbidities, enabling structured analysis and visualization across patient cohorts.
Objective
- Fine-tune open-source LLMs to extract a curated list of TMD-related comorbidities from clinical notes.
- Generate structured patient-level outputs from model predictions.
- Visualize comorbidity data using an interactive dashboard.
- Compare model performance to determine the most clinically effective approach.
Approach and Plan
- Annotate clinical notes with summaries across 56 comorbidity criteria.
- Fine-tune LLMs such as
facebook/bart-large-cnn
using chunked note inputs.
- Generate structured outputs and compile them into a CSV.
- Visualize cohort-level trends using a Python-based dashboard.
- Evaluate model performance and deploy the tool to be accessible in 3D Slicer.
Progress and Next Steps
- Deidentified clinical notes were obtained and manually summarized for 112 patients; a total of 500 are planned.
- Fine-tuned
facebook/bart-large-cnn
on these summaries to generate structured outputs across 56 comorbidity fields.
- Generated CSV outputs from model summaries and created a dashboard to visualize cohort-level patterns.
- Currently working on fine-tuning larger models and expanding the dataset.
- Next steps include completing 500 patient summaries, comparing model performance, and deploying the tool for use in 3D Slicer.
Illustrations

Background and References
- Github Page: https://github.com/DCBIA-OrthoLab/MedEx
- Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online. Association for Computational Linguistics.