Repository of DinoBloom: A Foundation Model for Generalizable Cell Embeddings in Hematology which was accepted at MICCAI 2024. It uses DINOv2 and is adapted from their original Github repository. DinoBloom is a model family (ViTs) trained on a large cohort of 13 diverse publicly available datasets of single cells in peripheral blood and bone marrow. The trained models in the can be downloaded on zenodo in the variants DinoBloom-S, DinoBloom-B, DinoBloom-L and DinoBloom-G. We show that our models outperforms existing medical and non-medical vision models in (i) linear probing and k-nearest neighbor evaluations for cell-type classification on peripheral blood and bone marrow smears and (ii) weakly supervised multiple instance learning for acute myeloid leukemia subtyping by a large margin.
Model | Feature dim | #params | Weights |
---|---|---|---|
DinoBloom-S | 384 | 22M | Download |
DinoBloom-B | 768 | 86M | Download |
DinoBloom-L | 1024 | 304M | Download |
DinoBloom-G | 1536 | 1136M | Download |
To train the model you need to specify the folder with .txt files holding the paths of the images you want to use to train in dinov2/configs/train/custom.yaml for training on a single GPU run:
python dinov2/train/train.py --config-file dinov2/configs/train/custom.yaml
for multiple GPUs on one node run
torchrun --nproc_per_node=#num_gpus dinov2/train/train.py --config-file dinov2/configs/train/custom.yaml
We provide a sample google colab notebook that shows feature extraction and how to do PCA visualization.
If you find this repository useful, please consider citing our work:
@misc{koch2024dinobloom,
title={DinoBloom: A Foundation Model for Generalizable Cell Embeddings in Hematology},
author={Valentin Koch and Sophia J. Wagner and Salome Kazeminia and Ece Sancar and Matthias Hehr and Julia Schnabel and Tingying Peng and Carsten Marr},
year={2024},
eprint={2404.05022},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Dataset | Modality | #Patient Labels | Patient Labels | Cell/Image Labels | Comment | Source Link | Publication Link |
---|---|---|---|---|---|---|---|
BMC | Bone marrow | 171,373 | - | 21: ABE (Abnormal eosinophils), ART (Artefacts), BAS (Basophils), BLA (Blasts), EBO (Erythroblasts), EOS (Eosinophils), FGC (Faggot cells), HAC (Hairy cells), KSC (Smudge cells), LYI (Immature lymphocytes), LYT (Lymphocytes), MMZ (Metamyelocytes), MON (Monocytes), MYB (Myelocytes), NGB (Band neutrophils), NGS (Segmented neutrophils), NIF (Not identifiable), OTH (Other cells), PEB (Proerythoblasts), PLM (Plasma cells), PMO (Promyelocytes) | - | Link | Link |
AML Hehr | Blood | 101,949 | 4: PML::RARA, NPM1, CBFB::MYH11, RUNX1::RUNX1T1 | - | - | Link | Link |
AML Matek | Blood | 18,365 | - | 15: BAS (Basophil), EBO (Erythroblast), EOS (Eosinophil), KSC (Smudge cell), LYA (Lymphocyte (atypical)), LYT (Lymphocyte (typical)), MMZ (Metamyelocyte), MOB (Monoblast), MON (Monocyte), MYB (Myelocyte), MYO (Myeloblast), NGB (Neutrophil (band)), NGS (Neutrophil (segmented)), PMB (Promyelocyte (bilobed)), PMO (Promyelocyte) | - | Link | Link |
Acevedo | Blood | 17,092 | - | 10: basophil, eosinophil, erythroblast, lymphocyte_typical, metamyelocyte, monocyte, myelocyte, neutrophil_band, neutrophil_segmented, promyelocyte | - | Link | Link |
Raabin WBC | Blood | 10,175 | - | 5: Eosinophil, Lymphocyte, Monocyte, Neutrophil, Basophil | - | Link | Link |
NuClick | Blood | 2,933 | - | - | Segmentation | Link | Link |
Warty pig | Blood | 2,871 | - | 4: Basophil, Eosinophil, Monocyte, Neutrophil | 667 raw images, 1464 augmented images, and 1408 cropped, classified images | Link | Link |
LISC | Blood | 2,263 | - | 5: Basophil, Eosinophil, Monocyte, Neutrophil, Lymphocyte | segmentation | Link | Rezatofighi, S. H. & Soltanian-Zadeh, H. Automatic recognition of five types of white blood cells in peripheral blood. Comput. Med. Imaging Graph 35, 333–343 (2011). |
KRD-WBC | Blood | 601 | - | 5: Eosinophil, Lymphocyte, Monocyte, Neutrophil, Basophil | Segmentation | Link | Taha, Haval; Alizadeh, Fattah ; Mohammad, Nawsherwan (2023), “Creating a white blood cell dataset for segmentation”, Mendeley Data, V2, doi: 10.17632/jzdj6h7gms.2 |
SSL Seg | Blood | 400 | - | - | Segmentation | Link | Zheng, X., Wang, Y., Wang, G. & Liu, J. Fast and robust segmentation of white blood cell images by self-supervised learning. Micron 107, 55–71 (2018). |
BCCD | Blood | 364 | - | 3: WBC, RBC, Platelet | detection | Link | Mohamed, M., Far, B. & Guaily, A. An efficient technique for white blood cells nuclei automatic segmentation. in 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC) 220–225 (2012). |
Aslan | Blood | 100 | - | 2: WBC, RBC | detection | Link | - |
Raabin Leukemia | Blood | ? | 4: Acute Lymphoblastic Leukemia, Acute Myeloblastic Leukemia, Chronic Lymphocytic Leukemia, Chronic Myelogenous Leukemia | - | - | Link | - |
APL_AML | Blood | 25,915 | 2: APL / AML non APL | Artifact, Band neutrophils, Basophil, Blast (no lineage spec), Eosinophils, Erythroblast, Giant thrombocyte, Lymphocyte, Lymphocyte (variant), Metamyelocyte, Monocyte, Myelocyte, Plasma cells, Prolymphocyte, Promonocyte, Promyelocyte, Segmented neutrophils, Smudge cells, Thrombocyte aggregation, Unidentified, Young Unidentified | - | Link | Link |
White-Blood-Cell-dataset | Blood | 376 | - | - | Segmentation | Link | Mohamed, M.M.A., Far, B.H.: An enhanced threshold based technique for white blood cells nuclei automatic segmentation. In: Healthcom, pp. 202–207. IEEE (2012) |