Back to
Projects List
Comparison of crowd sourced vs. model generated accuracy on abdominal ultrasound
Key Investigators
- Jacqueline Foody (Centaur Labs/MGB, USA)
- Hallee Wong (MIT, USA)
- Mike Jin (Centaur Labs/MGB, USA)
- Tina Kapur (MGB, USA)
Project Description
Segmenting small bowel from abdominal ultrasound images is a challenging task, even for highly trained physicians. However, it may be a powerful way to diagnose small bowel obstruction. We employed the Centaur AI platform to leverage a crowd of labelers, by training them on a dataset of labels generated by a consensus expert physicians. For this project, we wanted to explore whether a model given the context of this specific task in the form of a few segmented frames can perform well.
Objective
- Objective A. Implement MultiverSeg for predictions on abdominal ultrasound images.
- Objective B. Evaluate the accuracy of the model for generating segmentations relative to the crowd consensus by comparing the resulting bowel diameters.
Approach and Plan
- Set up MultiverSeg: https://github.com/halleewong/MultiverSeg
- Evaluate how the model performs using an increasing number of context frames from the same patient, and separately from different patients
- Similarly, add in user input in the form of positive and negative clicks with the context frames.
- Compare the performance of these methods by evaluating the resulting bowel diameter.
Progress and Next Steps
- 2-3 frames from the same patient clip were sufficient context to achieve consistent results, and adding more didn’t appear to improve the results
- Tested up to 30 context frames from a set of 10 randomly selected patients. While there was some improvement in adding >15 context frames, the model struggled to identify the bowel in new patients.
Given 20 context frames from a set of 10 randomly selected patient clips
Prediction: 262.968
Ground truth: 338.373
ICC(2,1): -0.296
95% CI: (-0.590, 0.071)

Given 20 context frames from a set of 10 randomly selected patient clips, with 2 positive & 2 negative support points
Prediction: 280.856
Ground truth: 338.373
ICC(2,1): -0.192
95% CI: (-0.514, 0.180)

Given 2 context frames from the same clip
Prediction: 313.465
Ground truth: 338.373
ICC(2,1): 0.748
95% CI: (0.524, 0.876)

Given 2 context frames from the same clip, with 2 positive & 2 negative support points
Prediction: 305.765
Ground truth: 338.373
ICC(2,1): 0.784
95% CI: (0.584, 0.895)

Illustrations
Example of Crowd Segmentations:
Example of Expert Segmentations Demonstrating Bowel Diameter:

Background and References
Relevant Publications:
Wong, H.E., Ortiz, J.J.G., Guttag, J. & Dalca, A.V., (2024). MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance. arXiv preprint arXiv:2412.15058.
paper code
Wong, H.E., Rakic, M., Guttag, J., & Dalca, A.V., (2024). ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image. In European Conference on Computer Vision.
paper code