• DocumentCode
    454716
  • Title

    Strategies for Language Model Web-Data Collection

  • Author

    Wan, Vincent ; Hain, Thomas

  • Author_Institution
    Dept. of Comput. Sci., Sheffield Univ.
  • Volume
    1
  • fYear
    2006
  • fDate
    14-19 May 2006
  • Abstract
    This paper presents an analysis of the use of textual information collected from the Internet via a search engine for the purpose of building domain specific language models. A framework to analyse the effect of search query formulation on the resulting Web-data language model performance in an evaluation is developed. The framework gives rise to improved methods of selecting n-gram search engine queries, which return documents that make better domain specific language models
  • Keywords
    Internet; natural languages; query formulation; Internet; building domain specific language models; language model Web-data collection; search engine; search query formulation; textual information; Automatic speech recognition; Computer science; Domain specific languages; History; Information analysis; Internet; Natural languages; Performance analysis; Search engines; System performance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
  • Conference_Location
    Toulouse
  • ISSN
    1520-6149
  • Print_ISBN
    1-4244-0469-X
  • Type

    conf

  • DOI
    10.1109/ICASSP.2006.1660209
  • Filename
    1660209