مرکز منطقه ای اطلاع رساني علوم و فناوري - New data structures for analyzing frequent factors in strings

DocumentCode :

2915215

Title :

New data structures for analyzing frequent factors in strings

Author :

Baena-García, Manuel ; Morales-Bueno, Rafael

Author_Institution :

Dipt. Lenguajes y Cienc. de la Comput., Univ. de Malaga, Malaga, Spain

fYear :

2011

fDate :

22-24 Nov. 2011

Firstpage :

900

Lastpage :

905

Abstract :

Discovering frequent factors from long strings is an important problem in many applications, such as biosequence mining. In classical approaches, the algorithms process a vast database of small strings. However, in this paper we analyze a small database of long strings. The main difference resides in the high number of patterns to analyze. To tackle the problem, we have developed a new algorithm for discovering frequent factors in long strings. This algorithm uses a new data structure to arrange nodes in a trie. A positioning matrix is defined as a new positioning strategy. By using positioning matrices, we can apply advanced prune heuristics in a trie with a minimal computational cost. The positioning matrices let us process strings including Short Tandem Repeats and calculate different interestingness measures efficiently. The algorithm has been successfully used in natural language and biological sequence contexts.

Keywords :

data mining; data structures; matrix algebra; string matching; biological sequence context; biosequence mining; data structures; database; frequent factor analysis; frequent factor discovery; natural language context; positioning matrix; prune heuristics; short tandem repeats; strings; Arrays; Bioinformatics; Complexity theory; Databases; Genomics; Organizations; frequent factors; short tandem repeats; string mining; trie data structures;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Intelligent Systems Design and Applications (ISDA), 2011 11th International Conference on

Conference_Location :

Cordoba

ISSN :

2164-7143

Print_ISBN :

978-1-4577-1676-8

Type :

conf

DOI :

10.1109/ISDA.2011.6121772

Filename :

6121772

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2915215