Title :
Authorship attribution for Chinese text based on sentence rhythm features
Author :
Wang, Shaokang ; Yan, Baoping
Author_Institution :
Comput. Network Inf. Center, Chinese Acad. of Sci., Beijing, China
Abstract :
Authorship attribution, i.e., identifying the authorship of a piece of disputed text, is an important problem due to the increased concerns on copyright violations. While various authorship attribution algorithms have been proposed to identify the authorship of articles, they fail in several situations. This paper proposes a new authorship attribution algorithm for Chinese text using the sentence rhythm features of articles. In our algorithm, a rhythm feature matrix is proposed to depict the sentence rhythm of Chinese text. In order to determine the similarity of rhythm feature matrices, we compare two definitions of similarity based on Euclidean distance and improved Kullback-Leibler Divergence, respectively. Experimental results show that our algorithm achieves a success rate of 80%.
Keywords :
copyright; literature; text analysis; Chinese text; Euclidean distance; Kullback-Leibler divergence; authorship attribution; authorship attribution algorithm; copyright violations; rhythm feature matrix; sentence rhythm features; Algorithm design and analysis; Databases; Measurement; Probability distribution; Rhythm; Software algorithms; Writing; authorship attribution; multi-dimensional matrix; rhythm feature; text similarity;
Conference_Titel :
Information Computing and Telecommunications (YC-ICT), 2010 IEEE Youth Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-8883-4
DOI :
10.1109/YCICT.2010.5713152