In biology the notion of degenerate pattern plays a central role for describing various phenomena. For example, protein active site patterns, like those contained in the PROSITE database, e.g. [FY ]DPC[LIM][ASG]C[ASG], are in general represented by degenerate patterns with character classes. Researchers have developed several approaches over the years to discover degenerate patterns. Although these methods have been exhaustively and successfully tested on genomes and proteins, their outcome often far exceeds the size of the original input, making the output hard to be managed and then interpreted by refined analysis requiring manual inspection. In this article we discuss a characterization of degenerate patterns with character classes, and introduce the concept of pattern priority, for comparing and ranking different patterns without gaps, together with the class of underlying patterns, which permits to filter any set of degenerate patterns into a new set that is linear in the size of the input sequence. We present some preliminary results on the detection of subtle signals in protein sequences with remote homologies. Results show that our approach drastically reduces the number of patterns in output from a tool for protein sequence analysis, while retaining the functional ones.

Reducing the Space of Degenerate Patterns in Protein Remote Homology Detection

VERZOTTO D
2013-01-01

Abstract

In biology the notion of degenerate pattern plays a central role for describing various phenomena. For example, protein active site patterns, like those contained in the PROSITE database, e.g. [FY ]DPC[LIM][ASG]C[ASG], are in general represented by degenerate patterns with character classes. Researchers have developed several approaches over the years to discover degenerate patterns. Although these methods have been exhaustively and successfully tested on genomes and proteins, their outcome often far exceeds the size of the original input, making the output hard to be managed and then interpreted by refined analysis requiring manual inspection. In this article we discuss a characterization of degenerate patterns with character classes, and introduce the concept of pattern priority, for comparing and ranking different patterns without gaps, together with the class of underlying patterns, which permits to filter any set of degenerate patterns into a new set that is linear in the size of the input sequence. We present some preliminary results on the detection of subtle signals in protein sequences with remote homologies. Results show that our approach drastically reduces the number of patterns in output from a tool for protein sequence analysis, while retaining the functional ones.
2013
9781479921386
Combinatorics
Protein sequences
Degenerate motif analysis
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14252/1329
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact