Title :
Exploiting textual descriptions and dependency graph for searching mathematical expressions in scientific papers
Author :
Kristianto, Giovanni Yoko ; Topic, Goran ; Aizawa, Akiko
Author_Institution :
Univ. of Tokyo, Tokyo, Japan
fDate :
Sept. 29 2014-Oct. 1 2014
Abstract :
Mathematical expressions are important for communication of scientific information, for instance, to explain or define concepts written in natural language. Despite their importance, current conventional search systems can not establish access to the mathematical expressions contained in a scientific paper. The major focus of current development of mathematical search systems is mathematical tree structure indexing, but utilizing textual information surrounding the expressions in these systems is also important. We examine how textual information contributes to a mathematical search system, primarily in the ranking process. We investigate the impact of two types of textual information in the ranking performances of a mathematical search system: words in context windows (baseline), which is easily extracted from sentence tokenization result, and descriptions, which are extracted using a machine learning method. We also examine the improvement in ranking obtained by utilizing the dependency graph of mathematical expressions. The experiment results show that the use of description and dependency graph together deliver better ranking performance than the use of context or when no textual information is used. The results also show that the dependency graph is crucial for increasing the number of mathematical expressions being assigned descriptions, and thus its use with descriptions together presented higher ranking performance than the use of descriptions only. This study suggests that descriptions represent mathematical expressions better (more precisely) than context windows, and even descriptions from child (indirect) expressions still represent the target expression better than the context from the target expression itself.
Keywords :
directed graphs; feature extraction; indexing; learning (artificial intelligence); natural language processing; text analysis; tree data structures; word processing; dependency graph; machine learning method; mathematical search system; mathematical tree structure indexing; natural language; scientific papers; sentence tokenization; textual descriptions; word extraction; Context; Data mining; Indexing; Measurement; Natural languages; Search engines; Semantics;
Conference_Titel :
Digital Information Management (ICDIM), 2014 Ninth International Conference on
Conference_Location :
Phitsanulok
DOI :
10.1109/ICDIM.2014.6991403