• DocumentCode
    2765052
  • Title

    Deducing linguistic structure from the statistics of large corpora

  • Author

    Brill, Eric ; Magerman, David ; Marcus, Mitchell ; Santorini, Beatrice

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Pennsylvania Univ., Philadelphia, PA, USA
  • fYear
    1990
  • fDate
    22-25 Oct 1990
  • Firstpage
    380
  • Lastpage
    389
  • Abstract
    Two experiments that strongly suggest that largely distributional techniques might be developed to automatically provide both a set of part of speech tags for English and a skeletal parsing of free English text are described. In one experiment the authors have developed a constituent boundary parsing algorithm that derives an (unlabeled) bracketing, given text annotated for part of speech as input. In other experiment the authors have investigated whether a distributional analysis can discover a part of speech tag set which might prove adequate to support experiments. The state of a tagged natural language corpus to aid such experiments is summarized
  • Keywords
    computational linguistics; grammars; linguistics; natural languages; English text; boundary parsing algorithm; distributional analysis; large corpora; linguistic structure; skeletal parsing; speech tags; tagged natural language corpus; Data mining; Distributed computing; Error analysis; Information analysis; Mutual information; Natural languages; Speech analysis; Statistical distributions; Statistics; Stochastic processes;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology, 1990. 'Next Decade in Information Technology', Proceedings of the 5th Jerusalem Conference on (Cat. No.90TH0326-9)
  • Conference_Location
    Jerusalem
  • Print_ISBN
    0-8186-2078-1
  • Type

    conf

  • DOI
    10.1109/JCIT.1990.128309
  • Filename
    128309