Edit this page

NA-MIC Project Weeks

Back to Projects List

Comparison of crowd sourced vs. model generated accuracy on abdominal ultrasound

Key Investigators

Project Description

Segmenting small bowel from abdominal ultrasound images is a challenging task, even for highly trained physicians. However, it may be a powerful way to diagnose small bowel obstruction. We employed the Centaur AI platform to leverage a crowd of labelers, by training them on a dataset of labels generated by a consensus expert physicians. For this project, we wanted to explore whether a model given the context of this specific task in the form of a few segmented frames can perform well.


  1. Objective A. Implement MultiverSeg for predictions on abdominal ultrasound images.
  2. Objective B. Evaluate the accuracy of the model for generating segmentations relative to the crowd consensus by comparing the resulting bowel diameters.

Approach and Plan

  1. Set up MultiverSeg: https://github.com/halleewong/MultiverSeg
  2. Evaluate how the model performs using an increasing number of context frames from the same patient, and separately from different patients
  3. Similarly, add in user input in the form of positive and negative clicks with the context frames.
  4. Compare the performance of these methods by evaluating the resulting bowel diameter.

Progress and Next Steps

  1. 2-3 frames from the same patient clip were sufficient context to achieve consistent results, and adding more didn’t appear to improve the results
  2. Tested up to 30 context frames from a set of 10 randomly selected patients. While there was some improvement in adding >15 context frames, the model struggled to identify the bowel in new patients.

    Given 20 context frames from a set of 10 randomly selected patient clips

    Prediction: 262.968 Ground truth: 338.373 ICC(2,1): -0.296 95% CI: (-0.590, 0.071) image

    Given 20 context frames from a set of 10 randomly selected patient clips, with 2 positive & 2 negative support points

    Prediction: 280.856 Ground truth: 338.373 ICC(2,1): -0.192 95% CI: (-0.514, 0.180) image

    Given 2 context frames from the same clip

    Prediction: 313.465 Ground truth: 338.373 ICC(2,1): 0.748 95% CI: (0.524, 0.876) image

    Given 2 context frames from the same clip, with 2 positive & 2 negative support points

    Prediction: 305.765 Ground truth: 338.373 ICC(2,1): 0.784 95% CI: (0.584, 0.895) image


Example of Crowd Segmentations:
Example of Expert Segmentations Demonstrating Bowel Diameter:

Background and References

Relevant Publications:

Wong, H.E., Ortiz, J.J.G., Guttag, J. & Dalca, A.V., (2024). MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance. arXiv preprint arXiv:2412.15058. paper code

Wong, H.E., Rakic, M., Guttag, J., & Dalca, A.V., (2024). ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image. In European Conference on Computer Vision. paper code