Title :
From Sequences to Papers: An Information Retrieval Exercise
Author :
Gonçalves, Célia Talma ; Camacho, Rui ; Oliveira, Eugénio
Author_Institution :
Inst. Super. de Contabilidade e Administracao do Porto, Univ. do Porto, Porto, Portugal
Abstract :
Whenever new sequences of DNA or proteins have been decoded it is almost compulsory to look at similar sequences and papers describing those sequences in order to both collect relevant information concerning the function and activity of the new sequences and/or know what is known already about similar sequences that might be useful in the explanation of the function or activity of the newly discovered ones. In current web sites and data bases of sequences there are, usually, a set of paper references linked to each sequence. Those links are very useful because the papers describe useful information concerning the sequences. They are, therefore, a good starting point to look for relevant information related to a set of sequences. One way is to implement such approach is to do a blast with the new decoded sequences, and collect similar sequences. Then one looks at the papers linked with the similar sequences. Most often the number of retrieved papers is small and one has to search large data bases for relevant papers. In this paper we propose a process of generating a classifier based on the initially set of relevant papers that are directly linked to the similar sequences retrieved and use that classifier to automatically enlarge the set of relevant papers by searching the MEDLINE using the automatically constructed classifier. We have empirically evaluated our proposal and report very promising results.
Keywords :
DNA; Web sites; bioinformatics; information retrieval; pattern classification; proteins; DNA sequences; MEDLINE; Web sites; automatically constructed classifier; information retrieval; paper retrieval; protein sequences; sequence retrieval; Abstracts; Databases; Dictionaries; Machine learning; Machine learning algorithms; Proteins; MEDLINE; classification; information retrieval system;
Conference_Titel :
Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on
Conference_Location :
Vancouver, BC
Print_ISBN :
978-1-4673-0005-6
DOI :
10.1109/ICDMW.2011.184