DocumentCode :
397291
Title :
Text pattern visualization for analysis of biology full text and captions
Author :
Grimes, Andrea Elaina ; Futrelle, Robert P.
Author_Institution :
Biol. Knowledge Lab., Northeastern Univ., Boston, MA, USA
fYear :
2003
fDate :
11-14 Aug. 2003
Firstpage :
648
Lastpage :
651
Abstract :
Large textbanks comprised of thousands of full-text biology papers are rapidly becoming available. We describe an approach to characterize all major language patterns in biology text in terms of frameworks. Frameworks are "containers" made up of common phrases surrounding specific informational items such as gene and protein names. A framework viewer has been developed that shows similar text frameworks aligned on the screen much as biosequence visualization tools do. Using the viewer, it is evident that frameworks have the power to find the types of structures needed to develop useful information retrieval systems. As a simple example, one framework was able to concisely select 45,000 nouns from a corpus of 5 million words without error. This work points the way to highly automated systems that will be able to extract and index information in biology textbanks. Work in progress includes extensions to characterize recursive structures in text, subsystems to retrieve figures in papers, and the discovery of semantic relations to aid concept-based retrieval.
Keywords :
biology computing; indexing; information retrieval systems; natural languages; proteins; text analysis; word processing; 45,000 nouns; 5 million words; biology captions; biology text; biology textbanks; biosequence visualization tools; concept-based retrieval; containers; corpus; extract information; frameworks; gene names; highly automated systems; index information; information retrieval systems; informational items; language patterns; protein names; recursive structures; semantic relations; text pattern visualization; Biology computing; Containers; Data mining; Educational institutions; Information retrieval; Laboratories; Pattern analysis; Proteins; Systems biology; Visualization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics Conference, 2003. CSB 2003. Proceedings of the 2003 IEEE
Print_ISBN :
0-7695-2000-6
Type :
conf
DOI :
10.1109/CSB.2003.1227434
Filename :
1227434
Link To Document :
بازگشت