• DocumentCode
    2909609
  • Title

    ODIN: A Model for Adapting and Enriching Legacy Infrastructure

  • Author

    Lewis, William D.

  • Author_Institution
    University of Washington/CSU Fresno, USA
  • fYear
    2006
  • fDate
    Dec. 2006
  • Firstpage
    137
  • Lastpage
    137
  • Abstract
    The Online Database of Interlinear Text (ODIN)1 is a database of interlinear text "snippets", harvested mostly from scholarly documents posted to theWeb. Although large amounts of language data are posted to the Web as part of scholarly discourse, making the existing "e-Linguistic infrastructure" surprisingly rich, most linguistic data available on the Web exists in legacy formats, is highly displaycentric, and is often difficult to locate or interoperate over. ODIN seeks to leverage this existing infrastructure into a rich, searchable, and interoperable resource by converting readily available semi-structured data to content-centric, searchable formats. To do this, ODIN mines scholarly papers and webpages for instances of linguistic data, focusing mostly on interlinear texts, extracts them, identifies source languages, and makes the instances available to search. Through ODIN¿s standard search feature, users can locate data by language name or Ethnologue code, and display lists of data by document for languages of interest. The newer Advanced Search feature allows users to locate instances by grammatical markup that is used (e.g., NOM, ACC, ERG, PST, 3SG), and by linguistic constructions (e.g., passives, conditionals, possessives, raising constructions, etc.). The latter are made possible through additional enrichment of discovered data using automated statistical taggers and parsers.
  • Keywords
    Best practices; Code standards; Data mining; Databases; Displays; Gold; Prototypes; Software standards; Standards development; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    e-Science and Grid Computing, 2006. e-Science '06. Second IEEE International Conference on
  • Conference_Location
    Amsterdam, The Netherlands
  • Print_ISBN
    0-7695-2734-5
  • Type

    conf

  • DOI
    10.1109/E-SCIENCE.2006.261070
  • Filename
    4031110