Title :
Document verification using n-grams and histograms of words
Author :
Abdulwahed Almarimi;Gabriela Andrejkov?;Peter Sedm?k
Author_Institution :
Faculty of Science, P. J. ?af?rik University in Ko?ice, Institute of Computer Science
Abstract :
In the paper, there are analyzed and compared results of usable methods for a document verification based on n-grams and on local histograms build for its words for English and Arabic language. English and Arabic texts were analyzed from many statistical characteristics point of view. There were discovered some statistical differences between both languages and applied n-gram analysis and local histograms for discovering of text parts dissimilarities. The results for each text can show dissimilarities and call for an attention to the text (or not) if the text parts were written by the same author or not. The attention depends on selected parameters prepared in experiments.
Keywords :
"Histograms","Vocabulary","Standards","Computer science","Electronic mail","Plagiarism","Statistical analysis"
Conference_Titel :
Scientific Conference on Informatics, 2015 IEEE 13th International
Print_ISBN :
978-1-4673-9867-1
DOI :
10.1109/Informatics.2015.7377801