In a recent study published in Nature Biotechnologyresearchers investigated the causes of cancer by mapping somatic mutation rates in the human genome.
To understand cancer, it is essential to identify mutations that cause cancer. While extensive research is being done to understand the same, most studies focus on specific noncoding elements and protein coding sequences because of the difficulty in modeling somatic mutation rates found in different tumor genomes.
About the study
In the current study, researchers described a genome-wide mutation rate model called Dig, which allowed rapid testing for the presence of selected driver mutations in a genome.
The team designed the Dig model to represent genome-wide somatic mutation rates for a given cancer type to allow for timely evaluation of excessive mutations anywhere in the genome. This allowed for the even distribution of neutral mutations across a group of genomic positions for a range of tumors of that particular cancer type.
The model used a probabilistic approach to deep learning that captured two central determinants of variability in the rates of somatic mutation: (1) kilobase-scale variation, which is influenced by epigenomic properties, including chromatin accessibility and replication timing that influence the efficacy of deoxyribonucleic acid (DNA) and (2) base pair scale variation affected by the sequence context bias of the processes that drive somatic mutations, including apolipoprotein B mRNA editing enzyme, catalytic (APOBEC) polypeptide-driven cytidine deamination also as ultraviolet (UV) exposure light.
The team then constructed maps of the mutation rates and inferred nucleotide mutation biases for a total of 37 cancer types according to somatic mutations recorded in the pan cancer analysis of whole genome (PCAWG) dataset. Mutation rates and inferred biases were also estimated for 723 chromatin markers in 111 tissues, as recorded in the Roadmap epigenomics. The accuracy of the somatic mutation rate was further benchmarked using the proportion variance metric.
The team also applied the Dig model to quantify the magnitude in which cryptic splice SNVs exist in excess compared to the mutation rate and assessed its role as a cancer-causing mutation. The impact of indels on gene expressions and subsequent disruption of transcription factor binding motifs was assessed by searching for promoters in the PCAWG dataset.
The study results showed that the Dig model accurately estimated that the variance in the single nucleotide variant (SNV) percentages was a median of 77.3% in the 10 kb region and 94.6% in one Mb region across in a total of 16 cancer types. The highest variation was observed in SNV in the 10 kb regions in 14 of the 16 cancer cohorts. On the other hand, all 16 cancer groups reported high non-synonymous SNV variation and 15 had high non-coding ribonucleic acid (RNA) SNV counts.
In addition, the Dig model matched or even surpassed the performance of other methods tuned to particular classes of elements in whole genomes or whole genomic samples. Dig also had the highest F1 score as 24 of the 32 PCAWG cohorts tested and was also found to be the most powerful of 14 of the cohorts in terms of burden-based driver gene detection. The team also noted that Dig identified potential driver elements one to five times faster than traditional methods for each element and cohort tested.
Reducing the size of elements analyzed to encompass tens to hundreds of positions resulted in an almost 20% increase in the ability to identify driver mutations in less than 1% of the samples tested. The team also found that the cryptic splice SNVs of the tumor suppressor genes (TSGs) recorded in the cancer gene count (CGC) were more common than expected under neutral conditions. The cryptic SNVs were enriched in introns and biased to be incidental at sites with a high predicted impact on splicing. Overall, the intronic splice SNVs accounted for approximately 4.5% of the excess SNVs found in the TSGs. The team also noted that the TP53 promoter was the only element that showed a genome-wide significant burden of indels.
Overall, the study results emphasized the utility of Dig as a tool for in vivo and in vitro studies due to its ability to prioritize precise groups of mutations that are potential drivers in the coding and noncoding genome. The researchers believe that the deep learning approach used in the current study could develop the experimental, computational and clinical utility of the sequencing data related to the cancer genome.
#Researchers #map #cancer #driver #mutations #human #genome