Title :
Automatic authorship classification of two ancient books: Quran and Hadith
Author_Institution :
Fac. of Electron. & Inf., USTHB Univ., Algiers, Algeria
Abstract :
Nowadays the need of a scientific and rigorous tool of automatic authorship classification has become pretty important, especially for ancient documents authentication such as religious or historical books. Hence, in this paper, we conduct some experiments of authorship classification on the Quran and Hadith in order to see if they could have the same author or not (ie. Was the Quran written by the Prophet or only sent down to him, as claimed?). This task, which is commonly called authorship discrimination, represents an important authorship classification application. It consists in checking whether two texts are written by the same author or not by using some AI (Artificial Intelligence) and TM (Text mining) techniques. In our case, two main investigations are conducted and presented: in the first one, the two books are analyzed in a global form; in the second investigation, the two books are segmented into 25 different text segments: 14 segments are extracted from the Quran and 11 ones are extracted from the Hadith. The different segments have more or less the same size, with approximately 2080 tokens per text segment. Several classifiers are employed: SMO-based Support Vector Machines (SVM), Multi Layer Perceptron (MLP) and Linear Regression (LR). This research work has allowed getting extremely interesting information on the ancient books origins.
Keywords :
data mining; history; multilayer perceptrons; pattern classification; regression analysis; support vector machines; text analysis; AI; Hadith; LR; MLP; Quran; SMO-based support vector machines; SVM; TM; ancient books; ancient documents authentication; artificial intelligence; authorship discrimination; automatic authorship classification; historical books; linear regression; multilayer perceptron; religious books; text mining; Artificial intelligence; Context; Linear regression; Support vector machines; Testing; Text mining; Vocabulary; Artificial intelligence; Authorship classification; Computational linguistics; Origin of religious books; Segmental analysis; Text mining;
Conference_Titel :
Computer Systems and Applications (AICCSA), 2014 IEEE/ACS 11th International Conference on
DOI :
10.1109/AICCSA.2014.7073263