DocumentCode :
1791770
Title :
Scaling historical text re-use
Author :
Buchler, Marco ; Franzini, Greta ; Franzini, Emily ; Moritz, Maria
Author_Institution :
Gottingen Centre for Digital Humanities, Georg-August-Univ. Gottingen, Gottingen, Germany
fYear :
2014
fDate :
27-30 Oct. 2014
Firstpage :
23
Lastpage :
31
Abstract :
Text re-use describes the spoken and written repetition of information. Historical text re-use, with its longer time span, embraces a larger set of morphological, linguistic, syntactic, semantic and copying variations, thus adding complication to text-reuse detection. Furthermore, it increases the chances of redundancy in a digital library. In Natural Language Processing it is crucial to remove these redundancies before we can apply any kind of machine learning techniques to the text. In Humanities, these redundancies foreground textual criticism and allow scholars to identify lines of transmission. Identification can be accomplished by way of automatic or semi-automatic methods. Text re-use algorithms, however, are of squared complexity and call for higher computational power. The present paper addresses this issue of complexity, with a particular focus on its algorithmic implications and solutions.
Keywords :
computational complexity; digital libraries; humanities; learning (artificial intelligence); natural language processing; text analysis; computational power; copying variations; digital library; historical text reuse scaling; humanities; machine learning techniques; natural language processing; semiautomatic methods; squared complexity; text-reuse detection; textual criticism; Big data; Complexity theory; Equations; Force; Joining processes; Libraries; Pragmatics; performance; scalability; text re-use;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
Type :
conf
DOI :
10.1109/BigData.2014.7004449
Filename :
7004449
Link To Document :
بازگشت