Edit this page

NA-MIC Project Weeks

Back to Projects List

Evaluation of imi-bigpicture/wsidicomizer as a tool for conversion into DICOM whole slide imaging format

Key Investigators

Project Description

DICOM standard is gaining acceptance in digital pathology imaging. Conversion of slide images into DICOM format can make the data more FAIR, improve quality and comprehensiveness of the associated metadata, and improve their interoperability with the commercial and open source tools implementing the standard.

DICOM format is used for slide microscopy images available in NCI Imaging Data Commons (IDC). Images submitted to IDC in the vendor-specific formats must be converted into DICOM representation, which is currently done using the PixelMed Toolkit based scripts available in https://github.com/ImagingDataCommons/idc-wsi-conversion.

Our goal is to migrate the DICOM WSI conversion to use community-supported open source tools. Based on our current assessment and experience, imi-bigpicture/wsidicomizer is the most promising tools available for this task. In this project we will work on evaluating this tool.

Objective

  1. Assemble inventory of the publicly available test samples representative of the variety of data encountered by IDC and perhaps outside of IDC.
  2. Document requirements for the conversion tool based on the needs of IDC.
  3. Complete evaluation of wsidicomizer and document the results (in terms of the features and performance of the conversion process).
  4. Document results and identified gaps to help with the next steps.

Approach and Plan

  1. Select representative source images in the original format and the results of conversion to DICOM available in IDC (as converted using PixelMed Toolkit), as a reference. Assemble information about the characteristics of those samples in a document (vendor, compression, …). Include the accompanying tabulated metadata that is needed for converting each particulare sample.
  2. Requirements: initialization of metadata, standard compliance of the result, transfer of ICC profile, acceptable performance …. (intentionally, DICOM-TIFF dual personality at this point is not a requirement)
  3. Create a publicly available script/notebook that performs conversion.
  4. Evaluate the results and summarize in a publicly available document.
  5. Document any identified problems by opening issues in the wsidicomizer repo.

Progress and Next Steps

  1. Set up conversion code in python (simple), confirmed conversion approach is consistent between what we use in IDC and what Max is using in Kaapana (wsidicomizer Python function - not command line tool).
  2. Prepared queries for selecting test images from IDC. Mapping to the source file in vendor format is stored in a private tag (0009,1001) (source non-DICOM files are in private buckets in IDC).
  3. Identified problems in selecting samples based on TransferSyntaxUID - did not realize initially it can vary across instances within the same series!
  4. Identified numerous very strange images in IDC - will need to investigate this further.
  5. Started testing wsidicomizer, tested with JPEG and uncompressed samples.
  6. Identified and reported converter issues, several of which have already been resolved (kudos to Erik Gabrielsson, wsidicomizer maintainer!):
    • https://github.com/imi-bigpicture/wsidicomizer/issues/117
    • https://github.com/imi-bigpicture/wsidicomizer/issues/118
    • https://github.com/imi-bigpicture/wsidicomizer/issues/123
  7. Discussed various issues related to conversion and shared experience; reached agreement wsidicomizer is the best choice given combined experience, and very good support from Erik.
  8. Identified issues in dicom3tools building it in Colab VM - fixed by David Clunie (notebook).

Query for selecting samples from IDC based on TransferSyntaxUID applied to the base layer of the image pyramid:

WITH
  RankedRows AS (
  SELECT
    SeriesInstanceUID,
    StudyInstanceUID,
    TotalPixelMatrixColumns*TotalPixelMatrixRows AS totalPixels,
    TransferSyntaxUID,
    ROW_NUMBER() OVER (PARTITION BY SeriesInstanceUID ORDER BY TotalPixelMatrixColumns*TotalPixelMatrixRows DESC) AS rn
  FROM
    `bigquery-public-data.idc_current.dicom_all`
  WHERE
    Modality = "SM" and collection_id not like "%htan%")
SELECT
  TransferSyntaxUID,
  StudyInstanceUID,
  SeriesInstanceUID,
  totalPixels,
  concat("https://viewer.imaging.datacommons.cancer.gov/slim/studies/",StudyInstanceUID,"/series/",SeriesInstanceUID)
FROM
  RankedRows
WHERE
  rn = 1
  # Explicit VR Little Endian
  AND TransferSyntaxUID = "1.2.840.10008.1.2.1"

ORDER BY
  totalPixels ASC

Illustrations

Background and References

Background reading:

Other related materials: