DocumentCode :
538055
Title :
Tools for syntactic concordancing
Author :
Seretan, Violeta ; Wehrli, Eric
Author_Institution :
Dept. of Linguistics, Univ. of Geneva, Geneva, Switzerland
fYear :
2010
fDate :
18-20 Oct. 2010
Firstpage :
493
Lastpage :
500
Abstract :
Concordancers are tools that display the immediate context for the occurrences of a given word in a corpus. Also called KWIC - Key Word in Context tools, they are essential in the work of lexicographers, corpus linguists, and translators alike. We present an enhanced type of concordancer, which relies on a syntactic parser and on statistical association measures in order to detect those words in the context that are syntactically related to the sought word and are the most relevant for it, because together they may participate in multi-word expressions (MWEs). Our syntax-based concordancer highlights the MWEs in a corpus, groups them into syntactically-homogeneous classes (e.g., verb-object, adjective-noun), ranks MWEs according to the strength of association with the given word, and for each MWE occurrence displays the whole source sentence as a context. In addition, parallel sentence alignment and MWE translation techniques are used to display the translation of the source sentence in another language, and to automatically find a translation for the identified MWEs. The tool also offers functionalities for building a MWE database, and is available both off-line and online for a number languages (among which English, French, Spanish, Italian, German, Greek and Romanian).
Keywords :
natural language processing; word processing; KWIC; Key Word in Context tools; MWE translation techniques; multiword expressions; parallel sentence alignment; statistical association measures; syntactic concordancing; syntactic parser; syntactically homogeneous classes; syntax based concordancer; Context; Data mining; Databases; Engines; Grammar; Joining processes; Syntactics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Information Technology (IMCSIT), Proceedings of the 2010 International Multiconference on
Conference_Location :
Wisla
ISSN :
2157-5525
Print_ISBN :
978-1-4244-6432-6
Type :
conf
DOI :
10.1109/IMCSIT.2010.5679742
Filename :
5679742
Link To Document :
بازگشت