Imaging data sets (artificial intelligence)
The aggregation of an imaging data set is a critical step in building artificial intelligence (AI) for radiology. Imaging data sets are used in various ways including training and/or testing algorithms. Many data sets for building convolutional neural networks for image identification involve at least thousands of images but smaller data sets are useful for texture analysis, transfer learning, and other programs.
Many commercial AI products are built on proprietary data sets or specific hospital data sets not available due to concerns over patient privacy. There are however several imaging data sets of radiological images and/or reports publicly available at the following websites:
- 1000 Functional Connectomes Project: over 1000 functional MRI exams collected from sites across the globe
- ACR Data Science: list of ~20 data sets
- CheXpert: 224,316 chest radiographs
- Computed Tomography Emphysema Database small images specifically for texture analysis
- COVID-19 Open Annotated Radiology Database (RICORD) expert annotated COVID-19 imaging dataset. 1000 chest x-rays and 240 thoracic CT exams
- Johns Hopkins University Data Archive contains a data set of head CT scans
- The Medical Image Bank of Valencia
- MD.ai: a collection of public projects
- OpenI - The Open Access Biomedical Image Search Engine: data sets search engine, API (application programmer interface) to create customized data sets available at MedPix
- OpenNeuro: list of over 200 neuro data sets
- OASIS: open access neuro data sets
- Spineweb 16 spinal imaging data sets
- UCLH Stroke EIT Dataset
- MRNet: 1,370 annotated knee MRI examinations
- MURA: a large dataset of musculoskeletal radiographs
- MIMIC-CXR Database: 377,110 chest radiographs with free-text radiology reports
- PADCHEST: 160,000 chest X-rays with multiple labels on images
- RSNA Pulmonary Embolism CT (RSPECT) dataset 12,000 CT studies
- TB Portals
- UC Irvine Machine Learning Repository: various radiological and nuclear medicine data sets among other types of data sets
- York Cardiac MRI Dataset cardiac MRIs
- Zenodo searchable projects
Additionally, The Cancer Imaging Archive contains links to many open radiology data sets including the following:
- 4D-Lung
- ACRIN-FLT-Breast
- ACRIN-FLT-Breast
- ACRIN-FMISO-Brain
- ACRIN-NSCLC-FDG-PET
- Anti-PD-1 Immunotherapy Lung (Anti-PD-1_Lung)
- Anti-PD-1 Immunotherapy Melanoma (Anti-PD-1_MELANOMA)
- APOLLO-1-VA
- APOLLO2
- Brain-Tumor-Progression
- BREAST-DIAGNOSIS
- Breast-MRI-NACT-Pilot
- CBIS-DDSM
- CPTAC-AML
- CPTAC-CCRCC
- CPTAC-CM
- CPTAC-GBM
- CPTAC-HNSCC
- CPTAC-LSCC
- CPTAC-LUAD
- CPTAC-PDA
- CPTAC-SAR
- CPTAC-UCEC
- Credence Cartridge Radiomics Phantom CT Scans
- Credence Cartridge Radiomics Phantom CT Scans with Controlled Scanning Approach (CC-Radiomics-Phantom-2)
- CT COLONOGRAPHY
- CT Lymph Nodes
- Head-and-neck squamous cell carcinoma patients with CT taken during pre-treatment, mid-treatment, and post-treatment (HNSCC-3DCT-RT)
- Head-Neck Cetuximab
- Head-Neck-PET-CT
- ISPY1
- Ivy GAP
- LGG-1p19qDeletion
- LIDC-IDRI
- LungCT-Diagnosis
- Lung CT Segmentation Challenge 2017
- Lung Phantom
- Mouse-Astrocytoma
- Mouse-Mammary
- NaF Prostate
- NRG-1308
- NSCLC-Cetuximab
- NSCLC Radiogenomics
- NSCLC-Radiomics
- NSCLC-Radiomics-Genomics
- Osteosarcoma data from UT Southwestern/UT Dallas for Viable and Necrotic Tumor Assessment
- Pancreas-CT
- Phantom FDA
- Prostate-3T
- PROSTATE-DIAGNOSIS
- Prostate Fused-MRI-Pathology
- PROSTATE-MRI
- QIBA CT-1C
- QIN-BRAIN-DSC-MRI
- QIN-Breast
- QIN Breast DCE-MRI
- QIN GBM Treatment Response
- QIN-HEADNECK
- QIN LUNG CT
- QIN PET Phantom
- QIN PROSTATE
- QIN-PROSTATE-Repeatability
- QIN-SARCOMA
- Quantitative Imaging Network Collections
- REMBRANDT
- RIDER Breast MRI
- RIDER Collections
- RIDER Lung CT
- RIDER Lung PET-CT
- RIDER NEURO MRI
- RIDER PHANTOM MRI
- RIDER Phantom PET-CT
- Soft-tissue-Sarcoma
- SPIE-AAPM Lung CT Challenge
- SPIE-AAPM-NCI PROSTATEx Challenges
- Synthetic and Phantom MR Images for Determining Deformable Image Registration Accuracy (MRI-DIR)
- TCGA-BLCA
- TCGA-BRCA
- TCGA-CESC
- TCGA-COAD
- TCGA-ESCA
- TCGA-GBM
- TCGA-HNSC
- TCGA-KICH
- TCGA-KIRC
- TCGA-KIRP
- TCGA-LGG
- TCGA-LIHC
- TCGA-LUAD
- TCGA-LUSC
- TCGA-OV
- TCGA-PRAD
- TCGA-READ
- TCGA-SARC
- TCGA-STAD
- TCGA-THCA
- TCGA-UCEC
Related Radiopaedia articles
Artificial intelligence
- artificial intelligence (AI)
- imaging data sets
- computer-aided diagnosis (CAD)
- natural language processing
- machine learning (overview)
- visualizing and understanding neural networks
- common data preparation/preprocessing steps
- DICOM to bitmap conversion
- dimensionality reduction
- scaling
- centering
- normalization
- principal component analysis
- training, testing and validation datasets
- augmentation
- loss function
- optimization algorithms
- ADAM
- momentum (Nesterov)
- stochastic gradient descent
- mini-batch gradient descent
- regularisation
- linear and quadratic
- batch normalization
- ensembling
- rule-based expert systems
- glossary
- activation function
- anomaly detection
- automation bias
- backpropagation
- batch size
- computer vision
- concept drift
- cost function
- confusion matrix
- convolution
- cross validation
- curse of dimensionality
- dice similarity coefficient
- dimensionality reduction
- epoch
- explainable artificial intelligence/XAI
- feature extraction
- federated learning
- gradient descent
- ground truth
- hyperparameters
- image registration
- imputation
- iteration
- jaccard index
- linear algebra
- noise reduction
- normalization
- R (Programming language)
- Python (Programming language)
- segmentation
- semi-supervised learning
- synthetic and augmented data
- overfitting
- transfer learning