Edit this page

NA-MIC Project Weeks

Back to Projects List

ChatIDC: Navigating DICOM and IDC using Natural Language

Key Investigators

Presenter location: In-person

Project Description

ChatIDC is a natural language interface tool for exploring the rich ecosystem of DICOM tags and IDC. It is intended to filter and download highly specific cohorts of imaging data and discover relevant information pertaining to the DICOM standard, IDC documentation, and data that consists of DICOM tags.


The goal of this project is to reduce some technical barriers for clinical researchers to filter and download highly specific cohorts of imaging data. As a result, the project is poised to make the retrieval of data more efficient and encourage the widespread adoption of the platforms in which it is integrated.

For IDC, you can currently filter cohorts by some of the most common tags with sliders and buttons but this eventually has a limit when the researcher has to gather data that is highly tailored to their use case, which may be highly compositional and utilises more esoteric DICOM Tags. When the number of filter parameters is too large, manual selection and query construction may become infeasible if you are not an expert in both DICOM and SQL.

Approach and Plan

We will prepare a list of queries to motivate and test the development of the project. The list will contain “free text request” and the matching SQL query. We will work with IDC/SQL domain “experts” to confirm that SQL queries on this list are both syntactically and semantically correct. This list will be shared at the end of the project week.

We will implement semantic searching for DICOM tags based on the user’s input that is then used for the pretext in the language model. We will work with IDC/DICOM experts to confirm that this curated list is meaningful and comprehensive. This list will be shared at the end of the project week.

We plan to document our current experience and recommendations to what prompts users should use to improve the quality of the responses generated by the existing LLM interfaces. We will document our experience observing syntactic accuracy of generated queries to motivate future development (ie, what worked, what didn’t work, what can be fixed with refinements to the prompt, what can be improved with the approach used in the text2cohort project).

We would like to conduct interviews with the AI developers attending project week to gather the list of requests/ideas for queries that the users would like to see addressed.

Progress and Next Steps

No response


Background and References

No response