Back to Projects List
Location Sensitive Hashing for Web-Scale Medical Image Indexing
- Sandy Wells (BWH (NAC), USA)
- Steve Pieper (Isomics (NAC), USA)
A hash of data is a short description that is unique to that data, but most hashes are essentially random mappings.
A Location Sensitive Hash (LSH) is one where the hash keys preserve some meaning, such that similar hashes
indicate meaningful relationships among the original data elements.
In this work we hope to show that hashes based on image features can be used to group images in useful ways.
In particular, we’d like to show that the hashes can be used to determine human assigned labels of images.
We will test this on large public data sets.
- Develop hashing scheme based on SIFT-RANK features
- Apply technique to sample datasets from TCIA
- Test ability to predict labels based on feature based hashes
- Evaluate approach in the context of content based image retrieval
Approach and Plan
- Write python code to implement LSH-SIFT-RANK
- Test on subset of labeled TCIA datasets from 5 collections (Anti-PD-1_Lung, CPTAC-PDA, CPTAC-UCEC, NSCLC Radiogenomics, TCGA-UCEC)
- Develop plans for larger TCIA dataset (17,000+ volumes from 78 labeled collections)
- Explore other variables that could be correlated with these hashes
Progress and Next Steps
- SIFT RANK descriptors: 64 dimensions, same length, positive orthant
- Strategy: use Euclidean LSH in subspace orthogonal to : [1,1,1…..]
- Initial Python implementation of Euclidean LSH  for 3D SIFT RANK
- Initial evaluation on 400K features from 250 3D Scans from TCIA
- Approximate Nearest Neighhbor working, retrieved .2% percentile closest (0 would be perfect)
-  Datar M, Immorlica N, Indyk P, Mirrokni VS. Locality-sensitive hashing scheme based on p-stable distributions. InProceedings of the twentieth annual symposium on Computational geometry 2004 Jun 8 (pp. 253-262)
- Features extracted for 17k TCIA volumes (~1TB volumes => 20GB features (98% data reduction)
Background and References