Edit this page

NA-MIC Project Weeks

Back to Projects List

Extraction of Orofacial Pain Comorbidities from Clinical Notes Using Large Language Models

Key Investigators

Project Description

Temporomandibular Disorders (TMDs) are often linked with complex comorbidities that are difficult to extract from long free-text clinical notes. This project leverages Large Language Models (LLMs) to identify and summarize these comorbidities, enabling structured analysis and visualization across patient cohorts.

Objective

  1. Fine-tune open-source LLMs to extract a curated list of TMD-related comorbidities from clinical notes.
  2. Generate structured patient-level outputs from model predictions.
  3. Visualize comorbidity data using an interactive dashboard.
  4. Compare model performance to determine the most clinically effective approach.

Approach and Plan

  1. Annotate clinical notes with summaries across 56 comorbidity criteria.
  2. Fine-tune LLMs such as facebook/bart-large-cnn using chunked note inputs.
  3. Generate structured outputs and compile them into a CSV.
  4. Visualize cohort-level trends using a Python-based dashboard.
  5. Evaluate model performance and deploy the tool to be accessible in 3D Slicer.

Progress and Next Steps

  1. Deidentified clinical notes were obtained and manually summarized for 112 patients; a total of 500 are planned.
  2. Fine-tuned facebook/bart-large-cnn on these summaries to generate structured outputs across 56 comorbidity fields.
  3. Generated CSV outputs from model summaries and created a dashboard to visualize cohort-level patterns.
  4. Currently working on fine-tuning larger models and expanding the dataset.
  5. Next steps include completing 500 patient summaries, comparing model performance, and deploying the tool for use in 3D Slicer.

Illustrations

Dashboard summary from first 112 cases

Background and References