Author :
Askarian, Narjes ; Fazly, Afsaneh ; Hamzeh, Ali
Author_Institution :
Dept. of Comput. Eng., Univ. of Shiraz, Shiraz, Iran
Abstract :
A multiword expression (MWE) is a combination of words with a meaning beyond the compositional combination of the part meanings. Light verb constructions (LVCs) are a type of MWE that are widely used in many languages, including English, Spanish, French, Japanese, Chinese, Urdu, and Persian, among others. An LVC consists of a semantically-light basic verb - such as take in English and gozâshtan (meaning `to put´) in Persian - combined with another word that can be an adjective, a prepositional phrase, or a noun. Examples of LVCs are take a walk in English, and ehteram gozâshtan in Persian (lit. put respect, meaning `t o respect´). In particular, most verbs in Persian are of the form of LVCs, and thus many linguistic studies have examined their properties. There is, however, not much computational work on the automatic identification and processing of Persian LVCs, despite its importance for the development of natural language processing systems, such as summarization and machine translation. In this study, we focus on the most common form of LVCs in Persian, in which a noun is combined with one of five commonly-used light verbs to form an LVC. Two standard measures of association are used as features of candidates as well as some linguistically-informed measures. We also propose a position-based fixedness measure and some translation-based measures based on the special properites of Persian LVCs and their translation to English. Our results show the good performance of the measures for identifying Persian LVCs.
Keywords :
language translation; natural language processing; statistical analysis; Chinese; English; French; Japanese; LVC; MWE; Persian; Spanish; Urdu; automatic Persian light verb construction identification; ehteram gozashtan; linguistically-informed measures; machine translation; multiword expression; natural language processing systems; position-based fixedness measure; semantically-light basic verb; statistical measures; take a walk; translation-based measures; Computational linguistics; Conferences; Educational institutions; Frequency measurement; Gravity; Pragmatics; Syntactics; Corpus-based statistical measures; Multiword expressions; Natural language processing; Persian Light verb constructions;