Title :
Learning word representations for Turkish
Author :
Sen, Mehmet Umut ; Erdogan, H.
Author_Institution :
Elektron. Muhendisligi Bolumu, Sabanci Univ., Istanbul, Turkey
Abstract :
High-quality word representations have been very successful in recent years at improving performance across a variety of NLP tasks. These word representations are the mappings of each word in the vocabulary to a real vector in the Euclidean space. Besides high performance on specific tasks, learned word representations have been shown to perform well on establishing linear relationships among words. The recently introduced skip-gram model improved performance on unsupervised learning of word embeddings that contains rich syntactic and semantic word relations both in terms of accuracy and speed. Word embeddings that have been used frequently on English language, is not applied to Turkish yet. In this paper, we apply the skip-gram model to a large Turkish text corpus and measured the performance of them quantitatively with the "question" sets that we generated. The learned word embeddings and the question sets are publicly available at our website.
Keywords :
learning (artificial intelligence); natural language processing; text analysis; English language; Euclidean space; NLP tasks; Turkish text corpus; high-quality word representations; learned word embeddings; learned word representations; linear relationships; question sets; skip-gram model; unsupervised learning; word embeddings; Conferences; Natural language processing; Probabilistic logic; Recurrent neural networks; Signal processing; Vectors; Deep Learning; Natural Language Processing; Word embeddings;
Conference_Titel :
Signal Processing and Communications Applications Conference (SIU), 2014 22nd
Conference_Location :
Trabzon
DOI :
10.1109/SIU.2014.6830586