CASE STUDY: The Cancer Digital Slide Archive: A Web Platform for Accessing TCGA Data

May 7, 2014

Pathology slides are thin slices of tissue from a sample, prepared and stained on a piece of glass for examination under a microscope. Reviewing and analyzing these slides is a skillful task often performed by several pathologists to develop consensus around a diagnosis. In the past, the original slides had been mailed to the other pathologists, risking damage to the delicate pieces of glass and tissue. In the digital age, slides can be scanned to create image files, but this has presented a different set of problems. The massive image files may be several hundred megabytes in size, taking several hours to download.

These were the challenges David Gutman, M.D., Ph.D., faced when he started at Emory University's Center for Comprehensive Informatics in 2009. Dr. Gutman and his colleague, Lee Cooper, Ph.D., were examining whole-slide images from The Cancer Genome Atlas's (TCGA) glioblastoma multiforme (GBM) samples. Their goal was to better understand this deadly brain cancer by performing quantitative analysis on the slide images. Finding, downloading and annotating the slides were, in Dr. Gutman's words, "a real challenge." Each image is large, difficult to interpret, and not digitally connected to other data, like radiology reports and genomic information. To facilitate their research project, Dr. Gutman applied his expertise from one of his hobbies – programming. "I have been a computer nerd –for lack of a better word – my whole life. My dad was an electrical engineer and we built computers together." He continues, "I'm largely self-taught, but I've been playing around with websites for years."

To more quickly achieve their main research goal, Dr. Gutman and Dr. Cooper, along with their colleagues Dhanajaya Somanna, Ph.D., and Jake Cobb, M.S., developed a website to allow quick online access to slides, without needing to download them. "We realized [that] we did all the work setting up the site for GBM – we can basically roll it out for all the tumor types without that much added work," says Dr. Gutman. With Dr. Gutman's medical background and Dr. Cooper's skill in developing computer algorithms, they knew researchers all over Emory and could work in concert with them to expand the website to other types of TCGA's cancers selected for study. One of the benefits of collaboration, Dr. Gutman says, is that "we didn't have to develop our tools in isolation, which can be a huge problem."

With the foundation laid for what has become The Cancer Digital Slide Archive (CDSA), Dr. Gutman saw the underlying problems that it might solve, particularly related to team science, the term for the cooperation of multidisciplinary groups across institutions to answer the same research question. He explains, "One of the challenges I often face is that if you want to collaborate with people and you want to get quick feedback, the web seems to me the best way to do that.... The way we would do it before is to send a hard drive with a couple gigabytes of images or make someone download them." This is problematic not only because of the large file size, but also the installation of companion software. This can be especially due to the stringent security programs installed on computers owned and managed by a medical center, where pathology specialist collaborators often work.

"You need to be able to look at stuff on the web to get people's feedback. You shouldn't have to install or download anything," says Dr. Gutman. "Otherwise, you don't get feedback."

In its contemporary form, the CDSA is a user-friendly tool for navigating TCGA data, using pathology slides as the starting point. From there, a user is presented with a multitude of ways to investigate, annotate and share the data. Each slide is linked to TCGA's open access data. Within the website, a user can navigate to the genomics, clinical data, radiology information and more. Each slide can be digitally annotated. In addition to pan and zoom, a user can "mask" part of the slide for analysis, meaning that the section won't be used in an algorithmic analysis. Touring the slide images and coming across an abnormal slide with a black smear and a red mark, Dr. Gutman comments, "This razor blade had some shmutz on it, and here's a pen artifact." He continues, "We just tell the computer, 'Don't look here [during analysis]'... If you don't have the ability to quickly do these things, it becomes a real hassle."

One of the most useful features of CDSA is the "deep linking technology." A user can post a comment on an area of the slide or even a specific cell, and create a link to the comment and image and send it to a colleague. "It's like Google Maps," says Dr. Gutman. "You can send someone a latitude and a longitude on of the slides, they click the link, and it goes right to the spot the other person was looking at."

Dr. Gutman found for himself how useful deep linking technology is. He says that, while browsing the data pathology slides of lung cancer samples, "I found all of these interesting nuclei in the bottom right corner of a slide - I was excited!" After annotating the slide and sending the CDSA link to a colleague, Dr. Gutman was told that the nuclei were just run-of-the-mill lymphocytes, a type of white blood cell and a normal part of the tissue. "I did go to medical school, but I don't think pathology was one of my strongest subjects. Fifteen years after I learned what a lymphocyte looks like - apparently, I'd forgot[ten]!"

The CDSA website is both simple to use and intuitive. Its usability is a testament to its easy-to-navigate interface. Dr. Gutman was surprised and pleased to hear that the pathologist who trained Dr. Gutman's mentor maneuvered the website without instruction. "To get someone who didn't grow up in the digital age, like Lee and me, to use the website is pretty cool," says Dr. Gutman. Additionally, the pathologist was impressed by the image quality. "From someone who’s been using expensive glass microscopes and doing this for 40 years! That was the thing that made me most happy."

The ease of data accessibility is especially important for TCGA, which make data available as they are generated to the entire cancer research community. Dr. Cooper says, "It's important to expose [the data] to as broad an audience as possible." He continues, "The CDSA puts this information at people's fingertips. There's no barrier to getting in and looking at the data." Dr. Gutman adds that, "Making as much information available to people as easily as possible is how team science needs to be done and how consensus is built."

Considering potential applications for the CDSA, Dr. Gutman and Dr. Cooper realize that they have built a technology structure that can be utilized to view slides from any source, not just human tumors. Dr. Gutman collaborates with a veterinary pathologist at the University of Georgia. Dr. Gutman says, "He wanted to scan a bunch of canine slides, look at them on the web, and use them as a teaching resource for his vet students." Dr. Gutman was able to rework a version of CDSA to suit his colleague's goals.  Dr. Cooper adds that the flexibility of the software means it could be of great value in other fields such as Parkinson's and Alzheimer's research where Dr. Gutman notes that pathology may play an important role.

In considering the future of team science and other such partnerships, Dr. Gutman summarizes his goal in a single sentence: “The idea is everything should be able to talk to each other and should make as much data available to as many people with as few barriers as possible.”


Gutman, D.A., Cobb, J., Somanna, D., Park, Y., Wang, F., Kurc, T., Saltz, J.H., Brat, D.J. and Cooper, L.A. (2013) Cancer Digital Slide Archive: an informatics resource to support integrated in silico analysis of TCGA pathology data. J Am Med Inform Assoc. doi: 10.1136/amiajnl-2012-001469. View PubMed abstract