DocumentCode :
1407427
Title :
Semantics and Ambiguity of Stochastic RNA Family Models
Author :
Giegerich, Robert ; Siederdissen, Christian Höner zu
Author_Institution :
Center of Biotechnol., Bielefeld Univ., Bielefeld, Germany
Volume :
8
Issue :
2
fYear :
2011
Firstpage :
499
Lastpage :
516
Abstract :
Stochastic models, such as hidden Markov models or stochastic context-free grammars (SCFGs) can fail to return the correct, maximum likelihood solution in the case of semantic ambiguity. This problem arises when the algorithm implementing the model inspects the same solution in different guises. It is a difficult problem in the sense that proving semantic nonambiguity has been shown to be algorithmically undecidable, while compensating for it (by coalescing scores of equivalent solutions) has been shown to be NP-hard. For stochastic context-free grammars modeling RNA secondary structure, it has been shown that the distortion of results can be quite severe. Much less is known about the case when stochastic context-free grammars model the matching of a query sequence to an implicit consensus structure for an RNA family. We find that three different, meaningful semantics can be associated with the matching of a query against the model-a structural, an alignment, and a trace semantics. Rfam models correctly implement the alignment semantics, and are ambiguous with respect to the other two semantics, which are more abstract. We show how provably correct models can be generated for the trace semantics. For approaches, where such a proof is not possible, we present an automated pipeline to check post factum for ambiguity of the generated models. We propose that both the structure and the trace semantics are worth-while concepts for further study, possibly better suited to capture remotely related family members.
Keywords :
bioinformatics; context-free grammars; hidden Markov models; maximum likelihood estimation; molecular biophysics; organic compounds; stochastic systems; NP-hard problem; Rfam model; SCFG; alignment semantics; hidden Markov models; maximum likelihood solution; query sequence; semantics ambiguity; stochastic RNA family models; stochastic context-free grammars; structural semantics; trace semantics; Biological system modeling; Computational biology; Context modeling; Hidden Markov models; Pipelines; Proteins; RNA; Sequences; Stochastic processes; Terminology; RNA family models; RNA secondary structure; covariance models; semantic ambiguity.; Algorithms; Nucleic Acid Conformation; RNA; Semantics; Sequence Alignment; Sequence Analysis, RNA;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2010.12
Filename :
5408365
Link To Document :
بازگشت