Author_Institution :
Software Eng. Inst., East China Normal Univ., Shanghai, China
Abstract :
With the rapid increase of social media, more and more users generate data on social application platforms, such as Facebook, Twitter and Sina Weibo(weibo.com). Current platforms, however, only provides keyword-based search function on social data, which is far from enough to satisfy users´ query requirement in the view point of both structure and content aspects. The traditional structural join algorithms, which obtain results by matching both structure and content, do not work very well for social data. The challenges include (1) the size of social data is huge, (2) the online social applications require real time response. It is necessary to study the structural query on social data in order to meet the above requirements. This paper proposes the Post Dewey, a new numbering schema which is the structural summation of an element tag to reduce search space. A novel structural join algorithm, Post Structure Join (PSJ), was presented to address the limitation of the stack based algorithms, as a supplement strategy for structural joins. PSJ improves the overall performance by reducing the input size at the cost of losing some join efficiency. The approach is validated on real dataset crawled and extracted from Sina Weibo. The experimental results demonstrate the effectiveness of PSJ by comparing with the state-of-the-art structural join algorithms.
Keywords :
query processing; social networking (online); Facebook; PSJ algorithm; Post Dewey numbering schema; Sina Weibo; Twitter; join efficiency; keyword-based search function; post structure join algorithm; social data; social media; stack based algorithm; structural join algorithm; structural query evaluation; user content aspect; user query requirement; user structure aspect; Earthquakes; Indexes; Query processing; TV; XML; Post Structure Join; Social Network; XML Data Management;