Title :
Multilingual plagiarism detection corpus
Author :
Vedran Juričić;Vanja Štefanec;Siniša Bosanac
Author_Institution :
Department of Information and Communication Sciences, Faculty of Humanities and Social Sciences, Ivana Luč
fDate :
5/1/2012 12:00:00 AM
Abstract :
The paper presents a system for generating multilingual corpora that can be used to determine performance of plagiarism detection systems. Implemented method uses parallel language corpora and because of its scalability, can be applied to any language. Authors have collected data from five parallel corpora and enabled corpus generation for Croatian, French, German, Spanish and Italian language.
Keywords :
"Plagiarism","Databases","Testing","Publishing","Detection algorithms","Semantics","Parallel languages"
Conference_Titel :
MIPRO, 2012 Proceedings of the 35th International Convention
Print_ISBN :
978-1-4673-2577-6