Skip to main content

Awesome Cytodata

A curated list of awesome cytodata resources.

Cytodata refers to a community of researchers and resources involved in the image-based profiling of biological phenotypes. These biological phenotypes are typically induced by genetic or chemical perturbations and often represent disease states. Image-based profiling is used to inspect these phenotypes to uncover biological insight including discovering the impact of genetic alterations and determining the mechanism of action of compounds.

This page represents a curated list of software, datasets, landmark publications, and image-based profiling methods. Our goal is to provide researchers, both new and established, a place to discover and document awesome Cytodata resources.



Annotated datasets, including raw images and processed profiles, for image-based profiling of chemical and genetic perturbations.

Raw Images

  • Broad Bioimage Benchmark Collection - The Broad Bioimage Benchmark Collection (BBBC) is a collection of freely downloadable microscopy image sets. In addition to the images themselves, each set includes a description of the biological application and some type of "ground truth" (expected results).
  • Image Data Resource - Public repository of image datasets from published scientific studies.
  • RxRx1 - RxRx1 is a set of 125,514 high-resolution 512x512 6-channel fluorescence microscopy images of human cells under 1,108 genetic perturbations in 51 experimental batches across four cell types. The images were produced by Recursion Pharmaceuticals in their labs in Salt Lake City, Utah. Researchers will use this dataset for studying and benchmarking methods for dealing with biological batch effects, as well as areas in machine learning such as domain adaptation, transfer learning, and k-shot learning.
  • RxRx19 - RxRx19 is the first morphological dataset that demonstrates the rescue of morphological effects of COVID-19.
  • Human Protein Atlas - Among other assays, the HPA performed confocal imaging of displaying the location of more than 2/3 of human proteins in cell lines. Raw images or infered protein subcellular locations can be downloaded.

Chemical Perturbations

  • Gustafsdottir et al. 2013 - Cell painting profiles from 1,600 bioactive compounds in U2OS cells (Access from public S3 bucket: s3://cytodata/datasets/Bioactives-BBBC022-Gustafsdottir/profiles/Bioactives-BBBC022-Gustafsdottir/).
  • Wawer et al. 2014 - Cell painting profiles from 31,770 compounds in U2OS cells (Click to download).
  • Bray et al. 2017 - Cell painting profiles from 30,616 compounds in U2OS cells (Center Driven Research Project CDRP) (Download from GigaDB | Access from public S3 bucket: s3://cytodata/datasets/CDRPBIO-BBBC036-Bray/profiles_cp/CDRPBIO-BBBC036-Bray/).
  • Haghighi et al. 2021 - Cell painting matched to L1000 profiles in 4 experiments, including compound and genetic screens (Details on GitHub).

Genetic Perturbations

  • Singh et al. 2015 - 3,072 cell painting profiles from 41 genes knocked down with RNA interference (RNAi) in U2OS cells (Access from GitHub).
  • Rohban et al. 2017 - Cell painting data from 220 overexpressed genes in U2OS cells (Access from public S3 bucket: s3://cytodata/datasets/TA-ORF-BBBC037-Rohban/profiles_cp/TA-ORF-BBBC037-Rohban/).
  • Unpublished - Cell painting profiles of 596 overexpressed alleles from 53 genes in A549 cells (Access from public S3 bucket: s3://cytodata/datasets/LUAD-BBBC043-Caicedo/profiles_cp/LUAD-BBBC043-Caicedo/)
  • Unpublished - 3,456 cell painting profiles from CRISPR experiments knocking down 59 genes in A549, ES2, and HCC44 cells (Access from GitHub).


Open source software packages for image-based profiling of biological phenotypes.

  • Advanced Cell Classifier - A software package for exploration, annotation and classification of cells within large datasets using machine learning.
  • CellProfiler - CellProfiler is a free open-source software for measuring and analyzing cell images.
  • CellProfiler Analyst - Interactive data exploration, analysis, and classification of large biological image sets.
  • Cytominer - Methods for image-based cell profiling in R.
  • EBImage - Image processing toolbox for R.
  • HTSvis - A web app for exploratory data analysis and visualization of arrayed high-throughput screens.
  • BioProfiling.jl - Toolkit for filtering and curation of morphological profiles in Julia.
  • PyCytominer - Methods for image-based cell profiling in Python.
  • ImJoy - A platform compiling tool for deep-learning based image analyses with a GUI.


Publications related to image-based profiling.



  • Deep learning in microscopy - A collection of review and research articles published in Nature Methods related to multiple use cases of deep learning, including noise reduction, segmentation, tracking and representation learning.
  • High-Content Imaging and Informatics - A collection of high-content imaging method and application articles published in SLAS Discovery.




Contributions welcome! Read the contribution guidelines first.


Contribute to this list: