Title :
Mining bilingual linguistic patterns with aligned and parsed bilingual corpus
Author :
Wang, Bo ; Meng, Fanqi ; Hou, Yuexian
Author_Institution :
Sch. of Comput. Sci. & Technol., Tianjin Univ., Tianjin, China
Abstract :
Classical grammar for natural languages, which is defined by the linguistics, is widely used in many natural languages processing (NLP) tasks, such as information extraction, machine translation and parsing. The classical grammar is well defined but is context free and does not include the complex patterns which contain multiple linguistic units. On the other hand, there are also many simple patterns which are not included in the classical grammar but are useful in the NLP tasks. Therefore, the recognition of special linguistic patterns from natural language is an important step in various NLP systems. We propose an unsupervised method to automatically discover the complex monolingual linguistic patterns from a classically parsed and aligned bilingual corpus. And all the patterns in one language are qualified by the other parallel language. A specialized and efficient algorithm is applied to mine the frequent bilingual subtrees in the forest and the found subtrees are formalized as the linguistic patterns.
Keywords :
data mining; grammars; linguistics; natural language processing; program compilers; trees (mathematics); NLP systems; aligned bilingual corpus; bilingual linguistic pattern mining; classical grammar; complex monolingual linguistic patterns; context free; forest subtrees; found subtrees; frequent bilingual subtrees; information extraction; machine translation; natural languages processing; parallel language; parsed bilingual corpus; unsupervised method; Data mining; Pragmatics; alignment; linguistic patterns; parsing; subtree mining;
Conference_Titel :
Information Science and Digital Content Technology (ICIDT), 2012 8th International Conference on
Conference_Location :
Jeju
Print_ISBN :
978-1-4673-1288-2