To enable gene-level searches of data from more than 10,000 cancer patient transcriptome sequences and proteomics data from 2,000 patients, researchers have developed an easy-to-use web platform for cancer data analysis called UALKAN. A new study that analyzed protein levels in 2002 primary tumors of 14 tissue-based cancer types identified 11 different molecular subtypes, yielding systematic knowledge that vastly expands a searchable online database that has become a go-to platform for users’ analysis of cancer data across worldwide.
The University of Alabama at Birmingham Cancer data analytics portal, or UALCAN, was developed in 2017 and released for public use as an easy-to-use portal for pan-cancer omics data analysis, including transcriptomics, epigenetics, and proteomics. UALCAN has had nearly 920,000 site visits from researchers in more than 100 countries and has been cited more than 2,750 times.
“UALCAN is an effort to disseminate comprehensive cancer data to researchers and clinicians in an easy-to-use format to make discoveries and find needles in a haystack,” said Sooryanarayana VaramballyPh.D., professor at UAB Department of Pathology Department of Molecular and Cellular Pathology and director of UAB’s Translational Oncologic Pathology Research program. “Cancer detection, diagnosis, treatment, cure and research require a global team effort, and understanding the sheer volume of data requires a way to analyze and interpret this data.”
Cancer is a complex disease, and its initiation, progression and metastasis, the spread to distant organs, involves dynamic molecular changes in every type of cancer. Individual cancer patients show variations aside from some of the common genomic events.
In the new study, Varambally worked with an old collaborator Chad Creighton, Ph.D., Baylor College of Medicine, Houston, Texas. Creighton led the proteomic study, published in Nature Communications, “Proteogenomic Characterization of Human Cancers in 2002 Reveals Pan-Cancer Molecular Subtypes and Associated Pathways.” This expands on two early proteomics studies published in 2019 and 2021.
Previously, the team conducted analyzes of RNA transcripts, providing the data to researchers through UALCAN, to determine which pathways the myriad cancers use to promote growth, spread and aggressiveness. With this recent study, the team conducted and processed large-scale proteomics analyses. The data and results provide new ideas for further research and possible therapeutic interventions.
A proteome is the complement of proteins expressed in a cell or tissue, and these can be measured quantitatively through recent technological advances in mass spectrometry. In cells, DNA makes mRNA and mRNA makes proteins, processes known as the central dogma of molecular biology. Proteins are important functional parts of cells, crucial in cell metabolism, structure, growth, signaling and movement.
The cancer types represented in UALCAN’s proteomic dataset include breast, colorectal, gastric, glioblastoma, head and neck, liver, lung adenocarcinoma, lung squamous, ovarian, pancreatic, pediatric brain, prostate, kidney and uterine cancers. The number of tumors in each cancer type in the study ranged from 76 to 230, with an average of 143. Intriguingly, the pan-cancer, proteome-based subtypes the current study found cut across tumor lines.
The compendium proteomic dataset came from 17 individual studies. Corresponding multi-omics data were available for most of these tumors, including mRNA levels, somatic DNA mutations and insertions/deletions, and changes in somatic DNA copy number.
Overall, the researchers found the protein expression of genes in tumors to be broadly correlated with corresponding mRNA levels or changes in copy number. However, there were some notable exceptions.
Sooryanarayana Varambally They identified 11 different proteome-based pan-cancer subtypes — called s1 through s11 — that may provide insight into the deregulated pathways and processes in tumors that make them cancerous. Each subtype included multiple tissue-based cancer types, although subtype s11 was specific for brain tumors, including glioblastomas and pediatric brain tumors.
Each subtype expressed specific gene categories, some previously seen in an earlier, less extensive proteomic study. Three subtypes showed new gene categories: subtype s7 with genes for “axon guidance” and “frizzled binding”, subtype s10 with genes for “DNA repair” and “chromatin organization”, and subtype s11 with “synapse”, “dendrite” and “axon” genes.
At the DNA level, the study detailed differences between the proteome-based subtypes in general gene copy number changes, and somatic mutations in subtypes associated with higher pathway activity, as inferred by proteome or transcriptome data.
“Our research results provide a framework for understanding the molecular landscape of cancers at the proteome level in order to integrate the data and compare it with other molecular correlates of cancers,” Varambally said. “The associated gene-level datasets and associations provide a resource to the research community, including helping to identify gene candidates for functional studies and further developing candidates as diagnostic markers or therapeutic targets for a specific subset of cancers.
In addition, this study reinforces the notion that cancers need to be extensively investigated at the protein level, although expression profiling on tumors has historically been mostly limited to the RNA transcript level. Many of the analyzes in this ever-evolving cancer data analysis platform are based on requests from users or experts, and the team is deeply indebted to the support and encouragement of the researchers using this platform to make discoveries that are making a difference in cancer research.”
Some of the large datasets for the UAB site are generated by consortia such as The Cancer Genome Atlas, or TCGAand the Clinical Proteomic Tumor Analysis Consortium, of CPTAC, from the National Cancer Institute. As the researchers also strive to address cancer health inequalities, UALCAN provides an option to analyze the data based on the patient’s race or ethnicity, where available.
Precise targeting of cancer requires the identification of individual or subclass-specific genomic and molecular changes. To help cancer researchers conduct various data analyzes to better understand these large data sets, Darshan Shimoga Chandrashekar, Ph.D., led the development of the UALCAN portal under Varambally’s mentorship. Updates to this continuously evolving portal were recently published in neoplasia†
The UALCAN initiative and its ongoing development include contributions from a team of experts including bioinformatics, computer scientists, statisticians, cancer biologists, pathologists and oncologists. “It’s a team science approach to empowering the global cancer research team to tackle cancer,” Varambally said.
Support came from National Institutes of Health grants CA125123 and U54-CA118948 and the US Department of Defense grant W81XWH-19-1-0588.
Co-first authors of this study are Yiqun Zhang and Fengju Chen, Baylor College of Medicine, and Chandrashekar, UAB Department of Pathology Division of Molecular and Cellular Pathology.
Pathology is a department in the Marnix E. Heersink School of Medicine at UAB. Varambaly is a senior scientist in the O’Neal Comprehensive Cancer Center and the Institute of Informatics at UAB and is co-director of Cancer biology Theme Graduate Biomedical Sciences at the UAB. He holds an adjunct position at the Michigan Center for Translational Pathology, University of Michigan, Ann Arbor.
#Proteomic #study #tumors #identifies #molecular #subtypes #pan #cancer #cancer #types