• DocumentCode
    2967796
  • Title

    Integrating Web resources and lexicons into a natural language query system

  • Author

    Katz, Boris ; Yuret, Deniz ; Lin, Jimmy ; Felshin, Sue ; Schulman, Rebecca ; Ilik, Adnan ; Ibrahim, Ali ; Osafo-Kwaako, Philip

  • Author_Institution
    Artificial Intelligence Lab., MIT, Cambridge, MA, USA
  • Volume
    2
  • fYear
    1999
  • fDate
    36342
  • Firstpage
    255
  • Abstract
    The START system responds to natural language queries with answers in text, pictures, and other media. START´s sentence-level natural language parsing relies on a number of mechanisms to help it process the huge, diverse resources available on the World Wide Web. Blitz, a hybrid heuristic- and corpus-based natural language preprocessor enables START to integrate a large and ever-changing lexicon of proper names, by using heuristic rules and precompiled tables of symbols to preprocess various highly regular and fixed expressions into lexical tokens. LaMeTH, a content-based system for extracting information from HTML documents, assists START by providing a uniform method of accessing information on the Web in real time. These mechanisms have considerably improved STARTS ability to analyze real-world sentences and answer queries through expansion of its lexicon and integration of Web resources
  • Keywords
    Internet; content-based retrieval; hypermedia markup languages; information resources; multimedia databases; natural language interfaces; real-time systems; Blitz; HTML documents; LaMeTH; START system; Web resources; World Wide Web; content-based system; heuristic rules; lexicon; natural language parsing; natural language query system; pictures; real time; symbols; text; Artificial intelligence; Data mining; HTML; Humans; Information analysis; Internet; Laboratories; Natural languages; Real time systems; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia Computing and Systems, 1999. IEEE International Conference on
  • Conference_Location
    Florence
  • Print_ISBN
    0-7695-0253-9
  • Type

    conf

  • DOI
    10.1109/MMCS.1999.778343
  • Filename
    778343