Title :
Automatic categorization of figures in scientific documents
Author :
Lu, Xiaonan ; Mitra, Prasenjit ; Wang, James Z. ; Giles, C. Lee
Author_Institution :
Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA
Abstract :
Figures are very important non-textual information contained in scientific documents. Current digital libraries do not provide users tools to retrieve documents based on the information available within the figures. We propose architecture for retrieving documents by integrating figures and other information. The initial step in enabling integrated document search is to categorize figures into a set of pre-defined types. We propose several categories of figures based on their functionalities in scholarly articles. We have developed a machine-learning-based approach for automatic categorization of figures. Both global features, such as texture, and part features, such as lines, are utilized in the architecture for discriminating among figure categories. The proposed approach has been evaluated on a testbed document set collected from the CiteSeer scientific literature digital library. Experimental evaluation has demonstrated that our algorithms can produce acceptable results for real- world use. Our tools can be integrated into a scientific-document digital library
Keywords :
classification; digital libraries; information retrieval; learning (artificial intelligence); automatic categorization; digital library; machine-learning; nontextual information; scientific document retrieval; Computer science; Databases; Design engineering; Educational institutions; Flowcharts; Information retrieval; Permission; Search engines; Software libraries; Testing; documents; feature extraction; figures; machine learning; scientific literature;
Conference_Titel :
Digital Libraries, 2006. JCDL '06. Proceedings of the 6th ACM/IEEE-CS Joint Conference on
Conference_Location :
Chapel Hill, NC
Print_ISBN :
1-59593-354-9
DOI :
10.1145/1141753.1141778