Title :
Authorship analysis based on metrics
Author :
Yuntao Zhang ; Ling Gong ; Yongcheng Wang
Author_Institution :
Shanghai Jiaotong University
Abstract :
Summary form only given, as follows. Authorship analysis is to identify the authors of texts by genre, attributions, features and traits that are unique for a particular author. Another related issue of authorship analysis is to discriminate two authors by the distinguishing characteristic of authors. The computational linguistics will be divided into two layers. The bottom layer is interested in lexical information, stylistics and terminology in text and the upper layer is about structure and layout of text.Stylistic features and terminology statistics is concern with words and their pattern in particular corpus. The linguistic measures of bottom layer contain morpheme, average word length distributions, vocabulary distribution,word frequency, words order, average sentence length and sentence structure.The upper layer analysis does not only treats text as ??bag of words???? or ??set of words??. Furthermore, it contains not only structure and layout of text but also the uses and distribution of the various punctuation marks. The measures of texstructure contain the average paragraph length,the average section and chapter length, the uses and distribution of subtitles. Vocabulary richness of text is measured by word spectrum of a text and the weighted use of each vocabulary.
Keywords :
Computational linguistics; Frequency measurement; Iterative algorithms; Length measurement; Sections; Sliding mode control; Statistical distributions; Terminology; Text categorization; Vocabulary;
Conference_Titel :
Control and Automation, 2002. ICCA. Final Program and Book of Abstracts. The 2002 International Conference on
Conference_Location :
Xiamen, Fujian Province, China
Print_ISBN :
0-7803-7412-6
DOI :
10.1109/ICCA.2002.1229659