Edit this page

NA-MIC Project Weeks

Back to Projects List

DICOM metadata databases

Key Investigators

Project Description

Medical imaging applications and systems which manage large collections of DICOM images usually need some kind of database to allow for browsing and selecting images or image collections, to support curation and control of ML training tasks, batch analysis etc. Goal of the project is to investigate existing and new approaches to handle the metadata of large image collections for different purposes, create experimental setups, and report on results.

Objective

  1. Objective A. A report on the experiments and their results.

Approach and Plan

  1. put DICOM JSON in JSON columns of sqlite or postgres, test jsonpath and similar
  2. create FHIR imaging studies, put to FHIR endpoint or other databases
  3. Connect with out-of-the-box visualization solutions for e.g. json documents

Progress and Next Steps

  1. Generate DICOM JSON representation of some TCIA datasets using pydicom
  2. Read about JSON and JSONB columns in SQLite
  3. Try different queries

[!CAUTION] SQLite’s arrow operator -> seems to behave differently if the key consists solely of digits. This can lead to surprises when working with DICOM tags whose hexadecimal representation sometimes contains letters, but sometimes only digits …

Count images per modality

SELECT COUNT(*) AS [Number of records], json_extract(jsonb_data, "$.00080060.Value")
FROM dicom_files
GROUP BY json_extract(jsonb_data, "$.00080060.Value");

It’s also possible to create an index on fields, e.g. for modality

CREATE INDEX modality ON dicom_files (json_extract(jsonb_data, "$.00080060.Value") );

Count number of series per modality

-- Count number of series per modality
-- SQLite
SELECT
 json_extract(jsonb_data, "$.00080060.Value"),
 COUNT(DISTINCT(json_extract(jsonb_data, "$.0020000D.Value"))) AS num_series
FROM
 dicom_files
GROUP BY
 json_extract(jsonb_data, "$.00080060.Value")
ORDER BY
 num_series desc
-- Count number of series per modality
-- IDC BigQuery
SELECT
 Modality,
 COUNT(DISTINCT(SeriesInstanceUID)) AS num_series
FROM
 `bigquery-public-data.idc_current.dicom_all`
GROUP BY
 Modality
ORDER BY
 num_series desc

Find all series which contain a LOCALIZER ImageType

-- Find all series which contain a LOCALIZER ImageType
-- SQLite JSON
SELECT DISTINCT
 json_extract(jsonb_data, "$.0020000D.Value")
FROM
 dicom_files, json_each (json_extract(jsonb_data, "$.00080008.Value") )
 WHERE
json_each.value IS 'LOCALIZER'
-- Find all series which contain a LOCALIZER ImageType
-- IDC BigQuery
WITH
 ImageTypeAgg AS (
 SELECT
   ARRAY_TO_STRING(ImageType,'/') AS image_type_str,
   SeriesInstanceUID
 FROM
   `bigquery-public-data.idc_current.dicom_all` )
SELECT
 SeriesInstanceUID,
 image_type_str
FROM
 ImageTypeAgg
WHERE
 image_type_str LIKE "%LOCALIZER%"

Findings and future experiments

Ideas for future experiments:

Sample code to do so could look like this:

#!/usr/bin/env python

"""Populate SQLite with DICOM metadata.

This code recursively crawls a directory and populates a SQLite database with JSON representations of
the DICOM files found.

Warning: highly experimental code, use at your own risk!

"""

import pydicom
import os
import sys
import sqlite3
import json

def bulk_data_handler(data_element):
    # we are not interested in any bulk data
    return None

crawl_directory = sys.argv[2]
db_path = sys.argv[1]

# Connect to the database
conn = sqlite3.connect(db_path)
cursor = conn.cursor()

# Create the table if it doesn't exist
cursor.execute('''
CREATE TABLE IF NOT EXISTS dicom_files (
    filename TEXT PRIMARY KEY NOT NULL,
    jsonb_data BLOB
)
''')

conn.commit()

# iterate over crawl directory, and insert json dumps of all files to the database
for root,dirs,files in os.walk(crawl_directory):
    for file_name in files:
        dicom_file_path = os.path.join(root,file_name)
        json_data = ""

        # check if file is already in the database
        cursor.execute("SELECT filename FROM dicom_files WHERE filename = ?", (dicom_file_path,))
        data=cursor.fetchone()

        if data:
            continue

        try:
            ds = pydicom.dcmread(dicom_file_path)
            json_data = ds.to_json_dict(bulk_data_element_handler=bulk_data_handler)

            # print(json_data)

        except pydicom.errors.InvalidDicomError:
            print("Skipped %s", dicom_file_path)
            continue

        cursor.execute('''
        INSERT INTO dicom_files (filename, jsonb_data )
        VALUES ( ? , jsonb( ? ) )
        ''', (dicom_file_path, json.dumps(json_data) ) )

        conn.commit()

        print(f"Inserted DICOM file: {dicom_file_path}")

conn.close()

https://github.com/nolden/namic-pw24/

Illustrations

Image

Image

Image

No response

Background and References