Title :
Search result clustering for Thai Twitter based on Suffix Tree Clustering
Author :
Thaiprayoon, Santipong ; Kongthon, Alisa ; Palingoon, Pornpimon ; Haruechaiyasak, Choochart
Author_Institution :
Speech & Audio Technol. Lab. (SPT, Nat. Electron. & Comput. Technol. Center (NECTEC), Pathumthani, Thailand
Abstract :
Today Twitter has become a popular online medium for posting and sharing news and events. Generally, many Twitter posts or “tweets” refer to the same topics or events. Searching on Twitter could return a long list of search results. To solve the problem, we propose an approach for clustering the Twitter search results based on the Suffix Tree Clustering (STC) algorithm. However, two main drawbacks of original STC are some of the returned cluster labels are unmeaningful and it is unable to create hierarchical structure. In this paper, we present a new approach called Suffix Tree Clustering with Label Merging (STC-LM). The key idea of the STC-LM is to merge partially overlapped cluster labels and then create two-level label structure. We performed experiments by using Thai Twitter posts from 12 topics such as flooding, traffic and entertainment. The performance based on the F1 measure is equal to 70%, an improvement of 9% from the baseline method.
Keywords :
information retrieval; pattern clustering; social networking (online); trees (mathematics); F1 measure; STC algorithm; STC-LM; Thai Twitter; Twitter posts; baseline method; cluster labels; hierarchical structure; search result clustering; suffix tree clustering with label merging; tweets; two-level label structure; Clustering algorithms; Filtering; Floods; Merging; Organizations; Search problems; Twitter; Suffix tree clustering; Thai Twitter; search result clustering;
Conference_Titel :
Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), 2012 9th International Conference on
Conference_Location :
Phetchaburi
Print_ISBN :
978-1-4673-2026-9
DOI :
10.1109/ECTICon.2012.6254293