Back to Projects List
Discuss our experiences and thoughts on the DICOM SEG standard.
Compare notes, benchmarks, and experience with interoperability and performance of DICOM SEG instances across platforms. Evaluate the extent to which any observed performance issues are inherent in the format or simply inefficient implementations. Consider proposals to improve the standard to address any inherent issues.
The DICOM SEG standard has been around for several years and has been implemented as part of several tools in various languages:
While interoperability has generally been good, performance of these SEG implementation has in general been orders of magnitude slower than research formats (e.g. nii.gz, nrrd, or seg.nrrd) at supporting segmentation use cases such as using segmentation data for machine learning. For example, this notebook shows that decoding a TotalSegmentator result from DICOM SEG with approximately 100 segments can take several minutes and consume very large amounts of memory for a segmentation that takes less than a second to read from a research format.
Poor performance is due to at least two factors:
We are interested in how the benefits of DICOM (standardized encoding, rich metadata, coded concepts, etc) can coexist with efficient read-write performance for real-world use cases.
A DICOM SEG may contain many segments (elsewhere known as “classes” or “labels”). But these segments are each stored in separate frames in the segmentation as multiple binary masks (0 or 1 everywhere). This is in contrast to many other formats that use a “label map” style encoding in which a single array contains many segments using pixel values to represent membership of a segment (i.e. pixel value 1 for segment 1, pixel value 2 for segment 2). Using separate frames does confer two important advantages over the label map approach:
However, this also comes at a steep cost for what is arguably the overwhelmingly common use case of non-overlapping non-fractional multi-segment segmentations. Especially in the case of a large number of segments (such as the TotalSegmentator mentioned above), this can lead to a very large number of frames and makes the memory/storage utilization much higher than would be necessary with a “label map” style. When you imagine doing instance segmentation of cells in a whole slide image, this becomes completely untenable.
It has been proposed that this could be solved relatively simply by adding a new Segmentation Type (e.g. “LABELED”) in addition to the existing “BINARY” and “FRACTIONAL”. This is not a formal proposal at this stage.
There is a highdicom draft implementation of what this could look like.
One issue is that currently SEGs images are limited to 8 bits per pixel, which would limit the number of segments representable in “LABELMAP” style to 255. This may not be high enough for some applications (e.g. instance segmentation). A proposal on “label map” encoding should consider whether this limitation should be relaxed.
Fractional segs are quantized and stored as integers. As mentioned above, the bits allocated is limited to a maximum of 8 currently. This means that fractional segmentations have limited precision and are quantized to 256 values, which is a lower level of precision than users would generally expect.
There are repeated reports of interoperability issues between segmentations created with highdicom and viewed in OHIF. See this issue.
Multiple users of highdicom have been asking for support for 2D+T files. This is possible but not straightforward due to the need to create a dimension organization methodology that includes time as a dimension. Due to time limitations this has not been a priority for highdicom but remains an open issue. See
A broader issue is whether these would be understood by viewing software unless the dimension organization method is standardized to some extent.