Title :
2T: two-term indexing of documents using syntactic and semantic constraints
Author :
Saarikoski, Harri M T
Author_Institution :
Helsinki Univ., Finland
Abstract :
Purpose of an index is to provide an intuitive navigational structure for the user into the subject matter. This paper specifies a novel, untrained method (2T) for automatically producing a two-level, semi-formal, concept-based index out of textual documents, which consists of topics rather than keywords. Using syntactic and semantic constraints and a domain ontology containing the relevant terms, we obtain high accuracy (high 80´s to low 90´s). Using concept categories (or semantic roles) to validate sensicality of an index term is a novel approach in semantic indexing. Resulting low-cost navigational structure can add value to businesses relying on accurate automatic document indexing (e.g. mobile news providers) or dealing with a critical learn-to-do requirement for their employees or clients (e.g. airplane maintenance). It can be implemented as a post-processing stage of full-text indexes or as readable and editable index to documentation - either printed or online.
Keywords :
database indexing; ontologies (artificial intelligence); text analysis; automatic document indexing; concept-based index; domain ontology; full-text index; semantic constraint; syntactic constraints; textual document; two-term document indexing; Aircraft navigation; Airplanes; Application software; Databases; Documentation; Frequency; Indexing; Information retrieval; Ontologies; Search engines;
Conference_Titel :
Database and Expert Systems Applications, 2005. Proceedings. Sixteenth International Workshop on
Print_ISBN :
0-7695-2424-9
DOI :
10.1109/DEXA.2005.5