Back to
Projects List
Conversion of bone marrow smear dataset from MIRAX format into DICOM
Key Investigators
- Daniela Schacherer (Fraunhofer MEVIS)
- David Clunie (PixelMed, USA)
- Andrey Fedorov (BWH, USA)
Project Description
As the DICOM standard is increasingly used in digital pathology imaging, conversion of available datasets from proprietary formats into DICOM format can make the data more FAIR and improve transparency and reproducibility of research conducted with these data. For this reason, the NCI Imaging Data Commons (IDC) hosts all its data in DICOM format.
A set of bone marrow smear WSI available in MIRAX (.mrxs) format are to be ingested into the IDC. For that purpose they need to be converted into DICOM (.dcm) along with all available image and clinical metadata.
In addition, this dataset contains extensive deep-learning generated nuclei annotations (bounding boxes) that should also be converted into DICOM in a suitable way.
Objective
- Objective A: Have a working script for the conversion of the complete set of bone marrow smear WSI into DICOM format based on wsidicomizer.
- Objective B: Include clinical metadata in an IDC-conformant way.
- Objective C (optional): Have a script that converts the nuclei annotations into DICOM. Consider this issue: https://github.com/imi-bigpicture/wsidicomizer/issues/56
Approach and Plan
Objective A
- Implement and verify code for basic conversion of the .mrxs files as is into .dcm.
- Investigate automatically filled metadata (including pixel spacing). wsidicomizer’s default data can be found here, an overview of attributes for VL Whole Slide Microscopy IOD here.
- Add code for ingestion of metadata that are not obtained from the .mrxs files / correct potential falsely estimated metadata (via wsidicom API or JSON file).
- Verify correct conversion with dciodvfy on every file and dcentvfy on every set of files in a series.
- Have a few successfully converted samples and be ready to run code on complete collection.
Objective B
- Prepare additional clinical and lab data as table such that they can be ingested into IDC as BigQuery table.
Objective C (optional):
- Discuss and decide in what way available annotations can be best encoded in DICOM.
- Implement conversion pipeline for annotation conversion based on IDC annotation conversion code by Chris Bridge.
Progress and Next Steps
Objective A:
- We successfully wrote a conversion pipeline for .mrxs files into .dcm using wsidicomizer and have a couple of converted files. A few issues have been identified on the way, reported to wsidicomizer and mostly already been fixed by Erik Gabriellson.
Objective B:
Objective C:
- We discussed and decided that the best way to encode available annotations is in DICOM Microscopy Bulk Simple Annotations.
Next steps:
- Run conversion script on whole dataset.
- Do Objective B: Prepare additional clinical and lab data as table such that they can be ingested into IDC as BigQuery table.
- Finish and run annotation conversion pipeline.
Illustrations
Example image of bone marrow smears. Taken from: https://doi.org/10.1177/1040638712452731.
Background and References
Background reading:
- Herrmann, M. D., Clunie, D. A., Fedorov, A., Doyle, S. W., Pieper, S., Klepeis, V., Le, L. P., Mutter, G. L., Milstone, D. S., Schultz, T. J., Kikinis, R., Kotecha, G. K., Hwang, D. H., Andriole, K. P., John Lafrate, A., Brink, J. A., Boland, G. W., Dreyer, K. J., Michalski, M., Golden, J. A., Louis, D. N. & Lennerz, J. K. Implementing the DICOM standard for digital pathology. J. Pathol. Inform. 9, 37 (2018). http://dx.doi.org/10.4103/jpi.jpi_42_18
- Clunie, D. A. DICOM format and protocol standardization-A core requirement for digital pathology success. Toxicol. Pathol. 49, 738–749 (2021). http://dx.doi.org/10.1177/0192623320965893
Further resources: