Title :
The impact of sections headings on the document retrieval
Author :
Abdelli, Belkacem ; Pinon, Jean-Marie ; Kazar, Okba
Author_Institution :
Univ. of Biskra, Biskra, Algeria
fDate :
Sept. 29 2014-Oct. 1 2014
Abstract :
With online publications, the current Web has become the largest source of digital documents, often stored in HTML, XML, PDF or DOC. Among the features of documents, note especially their logical structure, which represents their components such as chapters, sections, paragraphs, the document title, chapter titles, sections, etc. The section headings are meaningful; they are a good indicator of the content of paragraphs. For this reason we pay particular attention to these titles during the indexing process and research. Our objective is to provide relevant access to digital documents, by the process of all sections titles to take advantage of their mining and importance in the research process. Experiments on a large corpus, INEX 2009 show effectiveness of our proposition an improvement in the precision of the results in IR.
Keywords :
XML; electronic publishing; indexing; information retrieval; text analysis; DOC; HTML; INEX 2009; PDF; XML; chapter titles; corpus; digital documents; document mining; document retrieval; document titles; indexing process; logical structure; online publications; section headings; Abstracts; Indexing; Information retrieval; Prototypes; Sections; XML; XML; information retrieval; logical structure; metadata; mining;
Conference_Titel :
Digital Information Management (ICDIM), 2014 Ninth International Conference on
Conference_Location :
Phitsanulok
DOI :
10.1109/ICDIM.2014.6991398