Cancer Digital Slide Archive: an informatics resource to support integrated in silico analysis of TCGA pathology data

September 26, 2013

Gutman DA, Cobb J, Somanna D, Park Y, Wang F, Kurc T, Saltz JH, Brat DJ, Cooper LA.

J Am Med Inform Assoc. 2013 Jul 25. doi: 10.1136/amiajnl-2012-001469. [Epub ahead of print]


Background: The integration and visualization of multimodal datasets is a common challenge in biomedical informatics. Several recent studies of The Cancer Genome Atlas (TCGA) data have illustrated important relationships between morphology observed in whole-slide images, outcome, and genetic events. The pairing of genomics and rich clinical descriptions with whole-slide imaging provided by TCGA presents a unique opportunity to perform these correlative studies. However, better tools are needed to integrate the vast and disparate data types.

Objective: To build an integrated web-based platform supporting whole-slide pathology image visualization and data integration.

Materials and methods: All images and genomic data were directly obtained from the TCGA and National Cancer Institute (NCI) websites.

Results: The Cancer Digital Slide Archive (CDSA) produced is accessible to the public ( and currently hosts more than 20 000 whole-slide images from 22 cancer types.

Discussion: The capabilities of CDSA are demonstrated using TCGA datasets to integrate pathology imaging with associated clinical, genomic and MRI measurements in glioblastomas and can be extended to other tumor types. CDSA also allows URL-based sharing of whole-slide images, and has preliminary support for directly sharing regions of interest and other annotations. Images can also be selected on the basis of other metadata, such as mutational profile, patient age, and other relevant characteristics.

Conclusions: With the increasing availability of whole-slide scanners, analysis of digitized pathology images will become increasingly important in linking morphologic observations with genomic and clinical endpoints.