DocumentCode
2765052
Title
Deducing linguistic structure from the statistics of large corpora
Author
Brill, Eric ; Magerman, David ; Marcus, Mitchell ; Santorini, Beatrice
Author_Institution
Dept. of Comput. & Inf. Sci., Pennsylvania Univ., Philadelphia, PA, USA
fYear
1990
fDate
22-25 Oct 1990
Firstpage
380
Lastpage
389
Abstract
Two experiments that strongly suggest that largely distributional techniques might be developed to automatically provide both a set of part of speech tags for English and a skeletal parsing of free English text are described. In one experiment the authors have developed a constituent boundary parsing algorithm that derives an (unlabeled) bracketing, given text annotated for part of speech as input. In other experiment the authors have investigated whether a distributional analysis can discover a part of speech tag set which might prove adequate to support experiments. The state of a tagged natural language corpus to aid such experiments is summarized
Keywords
computational linguistics; grammars; linguistics; natural languages; English text; boundary parsing algorithm; distributional analysis; large corpora; linguistic structure; skeletal parsing; speech tags; tagged natural language corpus; Data mining; Distributed computing; Error analysis; Information analysis; Mutual information; Natural languages; Speech analysis; Statistical distributions; Statistics; Stochastic processes;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Technology, 1990. 'Next Decade in Information Technology', Proceedings of the 5th Jerusalem Conference on (Cat. No.90TH0326-9)
Conference_Location
Jerusalem
Print_ISBN
0-8186-2078-1
Type
conf
DOI
10.1109/JCIT.1990.128309
Filename
128309
Link To Document