NA-MIC Project Weeks

Back to Projects List

Exploration of foundation models and their embeddings for other tasks using the cloud

Key Investigators

Deepa Krishnaswamy (Brigham and Women's Hospital, USA)
Andrey Fedorov (Brigham and Women's Hospital, USA)
Steve Pieper (Isomics, Inc., USA)
Mike Halle (Brigham and Women's Hospital, USA)
Suraj Pai (Brigham and Women's Hospital, USA)

Project Description

The popularity and use of foundation models (FMs) have exploded in recent years. Within the medical imaging field alone, numerous models have been developed to support various downstream tasks, including classification and segmentation.

However, as a user, it’s hard to understand the embeddings that the models produce. Also, it’s hard to figure out: 1) which model to use, and 2) what tasks the model supports.

In this project, we plan to explore how the cloud can help us understand these embeddings from various FMs. Recently, we have extracted embeddings from lung cancer tumors in the National Lung Screening Trial (NLST) CT dataset from 9 different models. We will use the latest features in the Google Cloud Platform to help us explore and understand these embeddings.

We are most interested in: 1) how these embeddings can be visualized, and if clusters are visible, 2) if these embeddings can be used to find similar patients, and 3) if they can actually be used for other tasks

Possible extensions to this work: 1) We could explore the embeddings that Google has provided from their pathology foundation model here and here. 2) We could extend this exploration to lung ultrasound images, where visualizing the embedding space could help us choose representative and diverse images for expert annotation

How could it relate to Slicer and Imaging Data Commons (IDC)? Given a sample patient image, we could retrieve the k-closest patients in IDC.

Objective

We will figure out how to store these embeddings in the cloud to enable a quick search and comparison.
Next, we will explore and visualize these embeddings.
Then, we will use these embeddings to perform image retrieval – finding similar patients in the NLST collection.
Lastly, we will show how to use these embeddings for a downstream task.

Approach and Plan

We will see if Google Cloud Platform BigQuery (BQ) can be used to store the embeddings: vector-search-intro and vector-index
Next, we will create an interactive plot to explore embeddings and any clustering in a low-dimensional space. We will let the user click on points to open up an OHIF link with the original image data.
We will investigate whether vector search or a similarity search can be performed to find similar patients.

Progress and Next Steps

We first explored the embeddings in a low-dimensiontal space using UMAP, but we couldn’t see any visible clusters.
Then, we used BigQuery vector search from Google Cloud to try content-based image retrieval. This worked, but was not efficient.
Then, we decided to precompute distances between these embeddings.
We used Apache e-charts to show results from querying a patient and finding the top 5 matches.

This project was further developed - see the following:

https://github.com/ImagingDataCommons/nlst-sybil-connectome
https://imagingdatacommons.github.io/nlst-sybil-connectome/

Illustrations

Overview of project:

Sample of connectome plot and showing the query image:

Demonstration of content-based image retrieval:

Background and References

Try this out yourself!