NA-MIC Project WeeksThis project builds on work carried out during PW43 and on the work “In Search of Truth: Evaluating Concordance of AI-Based Anatomy Segmentation Models”.
Our overall goal is to enrich images available in Imaging Data Commons with segmentations and quantitative features.
In this work, we developed a practical workflow to compare AI-based anatomy segmentation models in the absence of ground truth annotations. Segmentation outputs from different models were harmonized into a standardized representation, enabling structure-wise comparison and efficient visual review. Using this framework, we evaluated six open-source segmentation models, TotalSegmentator 1.5, TotalSegmentator 2.6, Auto3DSeg, MOOSE, MultiTalent, and CADS, on 18 CT scans from the NLST dataset hosted by the Imaging Data Commons. While agreement varied across anatomical structures, MOOSE and CADS showed consistent results across all evaluated structures and did not show visible segmentation errors during visual comparison. In contrast, the other four models produced visible segmentation errors or deficiencies in rib and vertebrae structures.
The goal of this Project Week is to select a representative subset of the NLST dataset, run the MOOSE segmentation model on it, and use radiomic features to identify and visually inspect potential segmentation outliers to confirm robustness of the model. Stretch goal is to process all of the CTs in NLST (or even beyond NLST) with MOOSE to generate segmentations and radiomics features, for the subsequent ingestion into Imaging Data Commons.
In addition, the 3DSlicer CrossSegmentationExplorer extension described in the preprint should be finished and published as an extension for 3D Slicer.
The representative subset selection was guided by an initial sampling strategy developed together with Claude AI. Based on this plan, a Python notebook was generated and iteratively refined. The notebook is available in the nlst-exploration repository.
Dataset filtering: We excluded all CT series without existing TotalSegmentator segmentations, as these had already been filtered out previously due to problematic acquisition parameters (e.g., invalid pixel spacing).
Selection of relevant DICOM attributes: Together with Claude AI, we discussed and defined a set of DICOM parameters that should be considered to capture relevant variability in acquisition, reconstruction, and scanner hardware. The following attribute groups were selected: Spatial: ‘SliceThickness’, ‘PixelSpacing’, ‘SpacingBetweenSlices’; Exposure: ‘KVP’, ‘Exposure’, ‘CTDIvol’; Reconstruction: ‘ConvolutionKernel’; Hardware’: ‘Manufacturer’, ‘ManufacturerModelName’; Geometry’: ‘PatientPosition’, ‘GantryDetectorTilt’, ‘SpiralPitchFactor’;
The following attributes were found to be constant or empty across the dataset and were therefore excluded from further analysis: SpacingBetweenSlices, CTDIvol, PatientPosition, and GantryDetectorTilt.
At this stage, the selection focuses exclusively on series-level acquisition and reconstruction parameters. Patient-related attributes (e.g. age, sex, or other clinical metadata) are not yet included in the sampling strategy and will be incorporated in a future iteration.
Data reduction: To reduce the combinatorial complexity of the parameter space we rounded continuous attributes as follows:
Dataset statistics (after filtering and data reduction):
| Numerical Attribute | Count | Unique | Min | Q25 | Median | Mean | Q75 | Max | Std |
|---|---|---|---|---|---|---|---|---|---|
| SliceThickness | 133273 | 13 | 0.60 | 2.00 | 2.50 | 2.47 | 2.50 | 6.50 | 0.90 |
| PixelSpacing | 133273 | 7 | 0.40 | 0.60 | 0.70 | 0.66 | 0.70 | 1.00 | 0.07 |
| KVP | 133273 | 7 | 80 | 120 | 120 | 121 | 120 | 140 | 5 |
| Exposure | 133256 | 62 | 0 | 0 | 100 | 504 | 1000 | 9000 | 668 |
| SpiralPitchFactor | 38629 | 9 | 0.75 | 1.38 | 1.50 | 1.47 | 1.50 | 1.75 | 0.15 |
| Categorical Attribute | Count | Unique | Most Frequent Value | Most Frequent (%) | Top 3 Values |
|---|---|---|---|---|---|
| Manufacturer | 133273 | 4 | GE MEDICAL SYSTEMS | 45.4 % | GE MEDICAL SYSTEMS, SIEMENS, Philips |
| ManufacturerModelName | 133273 | 23 | Volume Zoom | 20.8 % | Volume Zoom, Sensation 16, LightSpeed QX/i |
| ConvolutionKernel | 133273 | 36 | STANDARD | 29.9 % | STANDARD, B30f, B50f |
Clustering and sampling: Based on the normalized attributes, Claude AI proposed a clustering strategy to group series with similar acquisition and reconstruction characteristics. The dataset was clustered into 14 distinct clusters. From each cluster, 3 representative CT series were selected, this resulted in a total of 48 CT series. Post-hoc verification using summary statistics confirmed that the selected subset mostly reflects the global parameter distributions of the filtered NLST dataset, with most acquisition parameters covering the interquartile range Q25–Q75 of the full dataset.
| Numerical Attribute | Min | Median | Max |
|---|---|---|---|
| SliceThickness | 1.0 | 2.5 | 5.0 |
| PixelSpacing | 0.64 | 0.665 | 0.72 |
| KVP | 120 | 120 | 140 |
| Exposure | 0 | 100 | 3000 |
| SpiralPitchFactor | 0.75 | 1.5 | 1.5 |
A CSV file listing all selected representative CT series is available in the nlst-explorationrepository.
Segmentation Generation: Segmentation generation was initially planned using only the MOOSE model. However, based on results from a prior comparative analysis on the NLST dataset, CADS segmentations were additionally generated. Both models had previously shown the most consistent performance and did not show visible segmentation errors across the evaluated anatomical structures. Since it was not possible to determine which of the two models performs better on this dataset, both MOOSE and CADS were included in the analysis.
Next Steps (Future Work)
A Pull Request to include CrossSegmentationExplorer as a Tier 1 3D Slicer extension has been created. Pull Request: https://github.com/Slicer/ExtensionsIndex/pull/2310