Blog Layout

CASE STUDY: The Cancer Digital Slide Archive: A Web Platform for Accessing TCGA Data

Pathology slides are thin slices of tissue from a sample, prepared and stained on a piece of glass for examination under a microscope. Reviewing and analyzing these slides is a skillful task often performed by several pathologists to develop consensus around a diagnosis. In the past, the original slides had been mailed to the other pathologists, risking damage to the delicate pieces of glass and tissue. In the digital age, slides can be scanned to create image files, but this has presented a different set of problems. The massive image files may be several hundred megabytes in size, taking several hours to download.

These were the challenges David Gutman, M.D., Ph.D., faced when he started at Emory University's Center for Comprehensive Informatics in 2009. Dr. Gutman and his colleague, Lee Cooper, Ph.D., were examining whole-slide images from The Cancer Genome Atlas's (TCGA) glioblastoma multiforme (GBM) samples. Their goal was to better understand this deadly brain cancer by performing quantitative analysis on the slide images. Finding, downloading and annotating the slides were, in Dr. Gutman's words, "a real challenge." Each image is large, difficult to interpret, and not digitally connected to other data, like radiology reports and genomic information. To facilitate their research project, Dr. Gutman applied his expertise from one of his hobbies – programming. "I have been a computer nerd –for lack of a better word – my whole life. My dad was an electrical engineer and we built computers together." He continues, "I'm largely self-taught, but I've been playing around with websites for years."


To more quickly achieve their main research goal, Dr. Gutman and Dr. Cooper, along with their colleagues Dhanajaya Somanna, Ph.D., and Jake Cobb, M.S., developed a website to allow quick online access to slides, without needing to download them. "We realized [that] we did all the work setting up the site for GBM – we can basically roll it out for all the tumor types without that much added work," says Dr. Gutman. With Dr. Gutman's medical background and Dr. Cooper's skill in developing computer algorithms, they knew researchers all over Emory and could work in concert with them to expand the website to other types of TCGA's cancers selected for study. One of the benefits of collaboration, Dr. Gutman says, is that "we didn't have to develop our tools in isolation, which can be a huge problem."


With the foundation laid for what has become The Cancer Digital Slide Archive (CDSA), Dr. Gutman saw the underlying problems that it might solve, particularly related to team science, the term for the cooperation of multidisciplinary groups across institutions to answer the same research question. He explains, "One of the challenges I often face is that if you want to collaborate with people and you want to get quick feedback, the web seems to me the best way to do that.... The way we would do it before is to send a hard drive with a couple gigabytes of images or make someone download them." This is problematic not only because of the large file size, but also the installation of companion software. This can be especially due to the stringent security programs installed on computers owned and managed by a medical center, where pathology specialist collaborators often work.

"You need to be able to look at stuff on the web to get people's feedback. You shouldn't have to install or download anything," says Dr. Gutman. "Otherwise, you don't get feedback."


In its contemporary form, the CDSA is a user-friendly tool for navigating TCGA data, using pathology slides as the starting point. From there, a user is presented with a multitude of ways to investigate, annotate and share the data. Each slide is linked to TCGA's open access data. Within the website, a user can navigate to the genomics, clinical data, radiology information and more. Each slide can be digitally annotated. In addition to pan and zoom, a user can "mask" part of the slide for analysis, meaning that the section won't be used in an algorithmic analysis. Touring the slide images and coming across an abnormal slide with a black smear and a red mark, Dr. Gutman comments, "This razor blade had some shmutz on it, and here's a pen artifact." He continues, "We just tell the computer, 'Don't look here [during analysis]'... If you don't have the ability to quickly do these things, it becomes a real hassle."


One of the most useful features of CDSA is the "deep linking technology." A user can post a comment on an area of the slide or even a specific cell, and create a link to the comment and image and send it to a colleague. "It's like Google Maps," says Dr. Gutman. "You can send someone a latitude and a longitude on of the slides, they click the link, and it goes right to the spot the other person was looking at."


Dr. Gutman found for himself how useful deep linking technology is. He says that, while browsing the data pathology slides of lung cancer samples, "I found all of these interesting nuclei in the bottom right corner of a slide - I was excited!" After annotating the slide and sending the CDSA link to a colleague, Dr. Gutman was told that the nuclei were just run-of-the-mill lymphocytes, a type of white blood cell and a normal part of the tissue. "I did go to medical school, but I don't think pathology was one of my strongest subjects. Fifteen years after I learned what a lymphocyte looks like - apparently, I'd forgot[ten]!"


The CDSA website is both simple to use and intuitive. Its usability is a testament to its easy-to-navigate interface. Dr. Gutman was surprised and pleased to hear that the pathologist who trained Dr. Gutman's mentor maneuvered the website without instruction. "To get someone who didn't grow up in the digital age, like Lee and me, to use the website is pretty cool," says Dr. Gutman. Additionally, the pathologist was impressed by the image quality. "From someone who’s been using expensive glass microscopes and doing this for 40 years! That was the thing that made me most happy."


The ease of data accessibility is especially important for TCGA, which make data available as they are generated to the entire cancer research community. Dr. Cooper says, "It's important to expose [the data] to as broad an audience as possible." He continues, "The CDSA puts this information at people's fingertips. There's no barrier to getting in and looking at the data." Dr. Gutman adds that, "Making as much information available to people as easily as possible is how team science needs to be done and how consensus is built."


Considering potential applications for the CDSA, Dr. Gutman and Dr. Cooper realize that they have built a technology structure that can be utilized to view slides from any source, not just human tumors. Dr. Gutman collaborates with a veterinary pathologist at the University of Georgia. Dr. Gutman says, "He wanted to scan a bunch of canine slides, look at them on the web, and use them as a teaching resource for his vet students." Dr. Gutman was able to rework a version of CDSA to suit his colleague's goals. Dr. Cooper adds that the flexibility of the software means it could be of great value in other fields such as Parkinson's and Alzheimer's research where Dr. Gutman notes that pathology may play an important role.


In considering the future of team science and other such partnerships, Dr. Gutman summarizes his goal in a single sentence: “The idea is everything should be able to talk to each other and should make as much data available to as many people with as few barriers as possible.”

 

Gutman, D.A., Cobb, J., Somanna, D., Park, Y., Wang, F., Kurc, T., Saltz, J.H., Brat, D.J. and Cooper, L.A. (2013) Cancer Digital Slide Archive: an informatics resource to support integrated in silico analysis of TCGA pathology data. J Am Med Inform Assoc. doi: 10.1136/amiajnl-2012-001469. View PubMed abstract



Share this Article with others

08 Mar, 2024
The aims of our case-control study were (1) to develop an automated 3-dimensional (3D) Convolutional Neural Network (CNN) for detection of pancreatic ductal adenocarcinoma (PDA) on diagnostic computed tomography scans (CTs), (2) evaluate its generalizability on multi-institutional public data sets, (3) its utility as a potential screening tool using a simulated cohort with high pretest probability, and (4) its ability to detect visually occult preinvasive cancer on prediagnostic CTs.
08 Mar, 2024
Cancer Mutations Converge on a Collection of Protein Assemblies to Predict Resistance to Replication Stress
08 Mar, 2024
International cancer registries make real-world genomic and clinical data available, but their joint analysis remains a challenge. AACR Project GENIE, an international cancer registry collecting data from 19 cancer centers, makes data from >130,000 patients publicly available through the cBioPortal for Cancer Genomics (https://genie.cbioportal.org). For 25,000 patients, additional real-world longitudinal clinical data, including treatment and outcome data, are being collected by the AACR Project GENIE Biopharma Collaborative using the PRISSMM data curation model. Several thousand of these cases are now also available in cBioPortal. We have significantly enhanced the functionalities of cBioPortal to support the visualization and analysis of this rich clinico-genomic linked dataset, as well as datasets generated by other centers and consortia. Examples of these enhancements include (i) visualization of the longitudinal clinical and genomic data at the patient level, including timelines for diagnoses, treatments, and outcomes; (ii) the ability to select samples based on treatment status, facilitating a comparison of molecular and clinical attributes between samples before and after a specific treatment; and (iii) survival analysis estimates based on individual treatment regimens received. Together, these features provide cBioPortal users with a toolkit to interactively investigate complex clinico-genomic data to generate hypotheses and make discoveries about the impact of specific genomic variants on prognosis and therapeutic sensitivities in cancer.
08 Mar, 2024
The majority of disease-associated variants identified through genome-wide association studies are located outside of protein-coding regions. Prioritizing candidate regulatory variants and gene targets to identify potential biological mechanisms for further functional experiments can be challenging. To address this challenge, we developed FORGEdb, a standalone and web-based tool that integrates multiple datasets, delivering information on associated regulatory elements, transcription factor binding sites, and target genes for over 37 million variants. FORGEdb scores provide researchers with a quantitative assessment of the relative importance of each variant for targeted functional experiments.
By Bo Zhang 08 Mar, 2024
Cancer is a leading cause of morbidity and mortality worldwide. While progress has been made in the diagnosis, prognosis, and treatment of cancer patients, individualized and data-driven care remains a challenge. Artificial intelligence (AI), which is used to predict and automate many cancers, has emerged as a promising option for improving healthcare accuracy and patient outcomes. AI applications in oncology include risk assessment, early diagnosis, patient prognosis estimation, and treatment selection based on deep knowledge. Machine learning (ML), a subset of AI that enables computers to learn from training data, has been highly effective at predicting various types of cancer, including breast, brain, lung, liver, and prostate cancer. In fact, AI and ML have demonstrated greater accuracy in predicting cancer than clinicians. These technologies also have the potential to improve the diagnosis, prognosis, and quality of life of patients with various illnesses, not just cancer. Therefore, it is important to improve current AI and ML technologies and to develop new programs to benefit patients. This article examines the use of AI and ML algorithms in cancer prediction, including their current applications, limitations, and future prospects. Lead Author: Bo Zhang
By Claudio Luchini 08 Mar, 2024
Artificial intelligence (AI) is concretely reshaping the landscape and horizons of oncology, opening new important opportunities for improving the management of cancer patients. Analysing the AI-based devices that have already obtained the official approval by the Federal Drug Administration (FDA), here we show that cancer diagnostics is the oncology-related area in which AI is already entered with the largest impact into clinical practice. Furthermore, breast, lung and prostate cancers represent the specific cancer types that now are experiencing more advantages from AI-based devices. The future perspectives of AI in oncology are discussed: the creation of multidisciplinary platforms, the comprehension of the importance of all neoplasms, including rare tumours and the continuous support for guaranteeing its growth represent in this time the most important challenges for finalising the ‘AI-revolution’ in oncology. First Author: Claudio Luchini,
By Panayiotis Petousis, PhD 08 Mar, 2024
In the United States, end-stage kidney disease (ESKD) is responsible for high mortality and significant healthcare costs, with the number of cases sharply increasing in the past 2 decades. In this study, we aimed to reduce these impacts by developing an ESKD model for predicting its occurrence in a 2-year period.  Lead Author: Panayiotis Petousis, PhD
By Evan D. Muse 08 Mar, 2024
Transforming the cardiometabolic disease landscape: Multimodal AI-powered approaches in prevention and management Lead Author: Evan D. Muse
By Danielle S. Bitterman, MD 08 Mar, 2024
Current status and future applications of LLMs for cancer symptom management and call for cross-disciplinary collaboration that centers the needs of patients and caregivers L ead Author: Danielle S. Bitterman, MD
01 Mar, 2023
New computer systems aim to peer inside our heads—and to help us fix what they find there In the nineteen-sixties, Joseph Weizenbaum, a computer scientist at M.I.T., created a computer program called Eliza. It was designed to simulate Rogerian therapy, in which the patient directs the conversation and the therapist often repeats her language back to her: U ser : My boyfriend made me come here. E liza : Your boyfriend made you come here. U ser : He says I’m depressed much of the time. E liza : I am sorry to hear you are depressed. U ser : It’s true. I am unhappy. E liza : Do you think coming here will help you not to be unhappy? Weizenbaum made Eliza as satire. He doubted that computers could simulate meaningful human interaction. He was alarmed, therefore, when many people who tried the program found it both useful and captivating. His own secretary asked him to leave the room so that she could spend time alone with Eliza. Worse, doctors saw it as a potentially transformative tool. “Several hundred patients an hour could be handled by a computer system designed for this purpose,” three psychiatrists wrote in The Journal of Nervous and Mental Disease, in 1966. “The human therapist, involved in the design and operation of this system, would not be replaced, but would become a much more efficient man since his efforts would no longer be limited to the one-to-one patient-therapist ratio as now exists.” Weizenbaum became an outspoken critic of artificial intelligence. “But the genie was out of the bottle,” Brian Christian, who chronicled the episode in his book “The Most Human Human,” told me. A few years later, a Stanford psychiatrist named Kenneth Colby created Parry, a program that attempted to simulate the language of a person with paranoid schizophrenia, to train students before they cared for real patients. Psychiatrists given transcripts of therapy sessions often couldn’t tell the difference between Parry and humans; in this narrow sense, the chatbot passed the Turing test. In 1972, Parry and Eliza met up for a therapy session: P arry : Do you know anything about bookies? E liza : What makes you think I know anything about bookies? P arry : A bookie didn’t pay me off once. E liza : What does that suggest to you? P arry : I beat up the bookie who did not pay off, and I thought he might use his friends in the underworld to get even with me.  Over time, programmers developed Jabberwacky, Dr. Sbaitso, and alice (the Artificial Linguistic Internet Computer Entity). Exchanges with these chatbots were often engaging, sometimes comical, and occasionally nonsensical. But the idea that computers could serve as human confidants, expanding therapy’s reach beyond the limits of its overworked practitioners, persisted through the decades. In 2017, Alison Darcy, a clinical research psychologist at Stanford, founded Woebot, a company that provides automated mental-health support through a smartphone app. Its approach is based on cognitive behavioral therapy, or C.B.T.—a treatment that aims to change patterns in people’s thinking. The app uses a form of artificial intelligence called natural language processing to interpret what users say, guiding them through sequences of pre-written responses that spur them to consider how their minds could work differently. When Darcy was in graduate school, she treated dozens of hospitalized patients using C.B.T.; many experienced striking improvements but relapsed after they left the hospital. C.B.T. is “best done in small quantities over and over and over again,” she told me. In the analog world, that sort of consistent, ongoing care is hard to find: more than half of U.S. counties don’t have a single psychiatrist, and, last year, a survey conducted by the American Psychological Association found that sixty per cent of mental-health practitioners don’t have openings for new patients. “No therapist can be there with you all day, every day,” Darcy said. Although the company employs only about a hundred people, it has counseled nearly a million and a half, the majority of whom live in areas with a shortage of mental-health providers. Link to original article on The New Yorker
Share by: