مرکز منطقه ای اطلاع رساني علوم و فناوري - Behavior extraction from tweets using character N-gram models

DocumentCode :

226936

Title :

Behavior extraction from tweets using character N-gram models

Author :

Yuji Yano ; Hashiyama, Tomonori ; Ichino, Junko ; Tano, Shun´ichi

Author_Institution :

Dept. of Human Media Syst., Univ. of Electro-Commun., Chofu, Japan

fYear :

2014

fDate :

6-11 July 2014

Firstpage :

1273

Lastpage :

1280

Abstract :

Human daily activities are stored in various kinds of data representations using ICT devices nowadays, named lifelogs. It is highly requested to retrieve useful information from lifelogs because these raw data are hard to handle. Extracting human activities from these logs is promising to enrich our life. Context-awareness services can be provided depending on user activities extracted from these logs. Recently, a lot of people post a message called tweet within Twitter to show what they are doing, thinking, feeling, and so on. Tweets have potential to record human activities, because many people post tweets so frequently every day. This paper focused on the tweets to retrieve human behavior from them. The length of tweets are limited within short sentence, so this causes some difficulties. The users will use domain-specific terms and will post grammatically incorrect sentences to fit with the constraints. These make us hard to analyze tweets with grammatical manner or with dictionaries. To tackle them, we are applying character n-gram tokenization and naive Bayes classifier to extract appropriate behavioral information from tweets. Using n-gram tokenizer, domain-specific words can be identified and incorrect grammar can be handled. Our approach is examined using real tweets in Japanese. The index of precision, recall and F-measure shows the promising results. Some experiments have been carried out to show the feasibility of our approach. At this point, our system applied to Japanese tweets but it is applicable to any other languages.

Keywords :

Bayes methods; behavioural sciences computing; information retrieval; natural language processing; pattern classification; social networking (online); F-measure index; ICT devices; Japanese tweets; Twitter; behavior extraction; character N-gram models; character n-gram tokenization; context-awareness services; data representations; human behavior retrieval; human daily activities; information retrieval; n-gram tokenizer; naive Bayes classifier; precision index; recall index; tweet message; Data mining; Dictionaries; Feature extraction; Grammar; Training; Training data; Twitter;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Fuzzy Systems (FUZZ-IEEE), 2014 IEEE International Conference on

Conference_Location :

Beijing

Print_ISBN :

978-1-4799-2073-0

Type :

conf

DOI :

10.1109/FUZZ-IEEE.2014.6891784

Filename :

6891784

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=226936