Open Source Program IDs Synthetic Naturally Occurring Gene Sequences

Rice University computer scientists and their collaborators have developed SeqScreen, a program to screen short DNA sequences, synthetic or natural, to determine their toxicity. Credit: Treangen Lab/Rice University

It’s a given that certain bacteria and viruses can cause illness and disease, but the real culprits are the sequences of concern that lie in the genome of these microbes.

It is getting easier to invite them.

Years of work by computer scientists at Rice University and their colleagues have led to an improved platform for DNA screening and characterization of pathogenic sequences, whether naturally occurring or synthetic, before they have a chance to impact public health.

Computer scientist Todd Treangen of Rice’s George R. Brown School of Engineering and genome specialist Krista Ternus of Signature Science LLC led the study that produced SeqScreen, a program to accurately characterize short DNA sequences, often called oligonucleotides.

Treangen said SeqScreen aims to improve the detection and tracking of a wide variety of pathogenic sequences.

“SeqScreen is the first open-source software toolkit available for synthetic DNA screening,” Treangen said. “Our program is an improvement on the previous state of the art for companies, individuals and government agencies for their DNA screening practices.”

The study, which began as a high-risk, high-payout research project, appears in the journal genome biology

SeqScreen leverages the work of partners from Austin, Texas-based company Signature Science to develop a database of thousands of gene sequences representing 32 types of virulence functions. “This curated database has taken years of biocuration and assessment to develop, and is at the heart of the training data of SeqScreen’s machine learning algorithm,” Treangen said.

The company teamed up with Treangen last year to find SARS-CoV-2 mutations that may have made the Omicron variant more resistant to antibodies, including those from vaccines. “SeqScreen came first and some of its ideas were transferred to the COVID project,” he said. “But SeqScreen is much broader in scope.”

“We focus on identifying functions of sequences of concern — which we call FunSoCs — while previous screening approaches have focused more on looking at ‘are you this bacterium?’ or ‘are you this virus?'” Treangen said. “SeqScreen doesn’t focus on the names of bacteria or viruses in your sample. Rather, we want to know if there are sequences in that sample that could be harmful, such as toxins that can destroy human cells.”

Focusing on features of concern is important, he said, because bacteria readily exchange DNA via horizontal gene transfer

“We’ve cited examples in the publication of bacteria whose genome is essentially identical, except one has an order of concern, like a toxin, that the other doesn’t,” Treangen said. “SeqScreen really addresses the presence or absence of functions that represent virulence factors.”

He said SeqScreen will also aid in the detection of new or emerging pathogens from the environment.

Computer scientists develop program to find ‘low-frequency’ variants in sequence data

More information:
Advait Balaji et al, SeqScreen: Accurate and sensitive functional screening of pathogenic sequences via ensemble learning, genome biology (2022). DOI: 10.1186/s13059-022-02695-x

Supplied by
Rice University

Quote: Open source program IDs synthetic, naturally occurring gene sequences (2022, June 21) retrieved June 21, 2022 from html

This document is copyrighted. Other than fair dealing for personal study or research, nothing may be reproduced without written permission. The content is provided for informational purposes only.

#Open #Source #Program #IDs #Synthetic #Naturally #Occurring #Gene #Sequences

Leave a Comment

Your email address will not be published. Required fields are marked *