How do cancer cells differ from healthy cells? A new machine learning algorithm called “ikarus” has the answer, a team led by MDC bioinformatician Altuna Akalin reports in the journal Genome Biology. The AI program has found a gene signature that is characteristic of tumors.
When it comes to identifying patterns in mountains of data, humans are no match for artificial intelligence (AI). In particular, a branch of AI called machine learning is often used to find regularities in datasets – be it stock analysis, image and speech recognition, or the classification of cells. To reliably distinguish cancer cells from healthy cells, a team led by Dr. Altuna Akalin, head of the Bioinformatics and Omics Data Science Platform at the Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), has now developed a machine learning program called “ikarus.” The program found a pattern in tumor cells common to various cancers, consisting of a distinctive combination of genes. According to the team’s article in the journal Genome Biology, the algorithm also detected types of genes in the pattern that had never been clearly linked to cancer before.
Machine learning essentially means that an algorithm uses training data to learn how to answer certain questions on its own. It does this by looking for patterns in the data that help it solve problems. After the training phase, the system can generalize from what it has learned to evaluate unknown data.
It was a big challenge to get suitable training data where experts had already made a clear distinction between ‘healthy’ and ‘cancer cells’.”
Jan Dohmen, first author of the article
A surprisingly high success rate
In addition, single-cell sequencing data sets are often noisy. That means the information they contain about the molecular characteristics of individual cells isn’t very accurate — perhaps because a different number of genes are detected in each cell, or because the samples aren’t always processed in the same way. As Dohmen and his colleague Dr. According to Vedran Franke, co-head of the study, they have searched numerous publications and contacted many research groups to obtain adequate data sets. The team eventually used data from lung and colorectal cancer cells to train the algorithm before applying it to data sets from other types of tumors.
In the training phase, ikarus had to find a list of characteristic genes that it then used to categorize the cells. “We’ve tried and refined different approaches,” Dohmen says. It was time-consuming work, all three scientists say. “The key was that ikarus would end up using two lists: one for cancer genes and one for genes from other cells,” explains Franke. After the learning phase, the algorithm was also able to reliably distinguish between healthy and tumor cells in other cancer types, such as tissue samples from liver cancer or neuroblastoma patients. The success rate was usually extremely high, which surprised even the research group. “We didn’t expect there to be a common signature that defined the tumor cells of different cancers so precisely,” Akalin says. “But we still can’t say whether the method works for all types of cancer,” adds Dohmen. To make ikarus a reliable tool for cancer diagnosis, the researchers now want to test it on other types of tumors.
AI as a fully automated diagnostic tool
The project aims to go far beyond the classification of “healthy” versus “cancerous” cells. Ikarus has already shown in initial tests that the method can also distinguish other types (and certain subtypes) of cells from tumor cells. “We want to expand the approach,” Akalin says, “by developing it further so that it can distinguish between all possible cell types in a biopsy.”
In hospitals, pathologists tend to examine tissue samples from tumors only under the microscope to identify the different cell types. It is labour-intensive, time-consuming work. With ikarus, this step could one day become a fully automated process. In addition, Akalin notes, the data can be used to draw conclusions about the tumor’s immediate environment. And that could help doctors choose the best therapy. Because the composition of the cancer tissue and the microenvironment often indicates whether a certain treatment or medication is successful or not. In addition, AI can also be useful in developing new drugs. “Ikarus allows us to identify genes that are potential drivers of cancer,” Akalin says. New therapeutic agents could then be used to address these molecular structures.
What is striking about the publication is that it came about entirely during the COVID pandemic. At the Berlin Institute for Medical Systems Biology (BIMSB), part of the MDC, not everyone involved sat at their usual desks. Instead, they were in home offices, communicating with each other only digitally. In Franke’s view: “The project therefore shows that a digital structure can be created to facilitate scientific work under these circumstances.”
Dohmen, J., et al. (2022) Identification of tumor cells at the unicellular level using machine learning. Genome Biology. doi.org/10.1186/s13059‐022‐02683‐1†
#machine #learning #algorithm #finds #gene #signature #characteristic #tumors