• DocumentCode
    397291
  • Title

    Text pattern visualization for analysis of biology full text and captions

  • Author

    Grimes, Andrea Elaina ; Futrelle, Robert P.

  • Author_Institution
    Biol. Knowledge Lab., Northeastern Univ., Boston, MA, USA
  • fYear
    2003
  • fDate
    11-14 Aug. 2003
  • Firstpage
    648
  • Lastpage
    651
  • Abstract
    Large textbanks comprised of thousands of full-text biology papers are rapidly becoming available. We describe an approach to characterize all major language patterns in biology text in terms of frameworks. Frameworks are "containers" made up of common phrases surrounding specific informational items such as gene and protein names. A framework viewer has been developed that shows similar text frameworks aligned on the screen much as biosequence visualization tools do. Using the viewer, it is evident that frameworks have the power to find the types of structures needed to develop useful information retrieval systems. As a simple example, one framework was able to concisely select 45,000 nouns from a corpus of 5 million words without error. This work points the way to highly automated systems that will be able to extract and index information in biology textbanks. Work in progress includes extensions to characterize recursive structures in text, subsystems to retrieve figures in papers, and the discovery of semantic relations to aid concept-based retrieval.
  • Keywords
    biology computing; indexing; information retrieval systems; natural languages; proteins; text analysis; word processing; 45,000 nouns; 5 million words; biology captions; biology text; biology textbanks; biosequence visualization tools; concept-based retrieval; containers; corpus; extract information; frameworks; gene names; highly automated systems; index information; information retrieval systems; informational items; language patterns; protein names; recursive structures; semantic relations; text pattern visualization; Biology computing; Containers; Data mining; Educational institutions; Information retrieval; Laboratories; Pattern analysis; Proteins; Systems biology; Visualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics Conference, 2003. CSB 2003. Proceedings of the 2003 IEEE
  • Print_ISBN
    0-7695-2000-6
  • Type

    conf

  • DOI
    10.1109/CSB.2003.1227434
  • Filename
    1227434