NA-MIC Project Weeks

Back to Projects List

Comparison of crowd sourced vs. model generated accuracy on abdominal ultrasound

Key Investigators

Jacqueline Foody (Centaur Labs/MGB, USA)
Hallee Wong (MIT, USA)
Mike Jin (Centaur Labs/MGB, USA)
Tina Kapur (MGB, USA)

Project Description

Segmenting small bowel from abdominal ultrasound images is a challenging task, even for highly trained physicians. However, it may be a powerful way to diagnose small bowel obstruction. We employed the Centaur AI platform to leverage a crowd of labelers, by training them on a dataset of labels generated by a consensus expert physicians. For this project, we wanted to explore whether a model given the context of this specific task in the form of a few segmented frames can perform well.

Objective

Objective A. Implement MultiverSeg for predictions on abdominal ultrasound images.
Objective B. Evaluate the accuracy of the model for generating segmentations relative to the crowd consensus by comparing the resulting bowel diameters.

Approach and Plan

Set up MultiverSeg: https://github.com/halleewong/MultiverSeg
Evaluate how the model performs using an increasing number of context frames from the same patient, and separately from different patients
Similarly, add in user input in the form of positive and negative clicks with the context frames.
Compare the performance of these methods by evaluating the resulting bowel diameter.

Progress and Next Steps

2-3 frames from the same patient clip were sufficient context to achieve consistent results, and adding more didn’t appear to improve the results
Tested up to 30 context frames from a set of 10 randomly selected patients. While there was some improvement in adding >15 context frames, the model struggled to identify the bowel in new patients.
Given 20 context frames from a set of 10 randomly selected patient clips

Prediction: 262.968 Ground truth: 338.373 ICC(2,1): -0.296 95% CI: (-0.590, 0.071)

Given 20 context frames from a set of 10 randomly selected patient clips, with 2 positive & 2 negative support points

Prediction: 280.856 Ground truth: 338.373 ICC(2,1): -0.192 95% CI: (-0.514, 0.180)

Given 2 context frames from the same clip

Prediction: 313.465 Ground truth: 338.373 ICC(2,1): 0.748 95% CI: (0.524, 0.876)

Given 2 context frames from the same clip, with 2 positive & 2 negative support points

Prediction: 305.765 Ground truth: 338.373 ICC(2,1): 0.784 95% CI: (0.584, 0.895)

Illustrations

Example of Crowd Segmentations:
crowd_example
Example of Expert Segmentations Demonstrating Bowel Diameter:
expert_example

Background and References

Relevant Publications:

Wong, H.E., Ortiz, J.J.G., Guttag, J. & Dalca, A.V., (2024). MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance. arXiv preprint arXiv:2412.15058. paper code

Wong, H.E., Rakic, M., Guttag, J., & Dalca, A.V., (2024). ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image. In European Conference on Computer Vision. paper code