DocumentCode :
3539812
Title :
Document analysis based automatic concept map generation for enterprises
Author :
Karannagoda, E.L. ; Herath, H.M.T.C. ; Fernando, K.N.J. ; Karunarathne, M.W.I.D. ; de Silva, N.H.N.D. ; Perera, A.S.
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of Moratuwa, Moratuwa, Sri Lanka
fYear :
2013
fDate :
11-15 Dec. 2013
Firstpage :
154
Lastpage :
159
Abstract :
Ever growing knowledge bases of enterprises present the demanding challenge of proper organization of information that would enable fast retrieval of related and intended information. Document repositories of enterprises consist of large collections of documents of varying size, format and writing styles. This diversified and unstructured nature of documents restrict the possibilities of developing uniform techniques for extracting important concepts and relationships for summarization, structured representation and fast retrieval. The documented textual content is used as the input for the construction of a concept map. Here a rule based approach is used to extract concepts and relationships among them. Sentence level breakdown enables these rules to identify those concepts and relationships. These rules are based on elements in a phase structure tree of a sentence. For improving accuracy and the relevance of the extracted concepts and relationships, the special features such as titles, bold and upper case texts are used. This paper discusses how to overcome the above mentioned challenges by utilizing high level natural language processing techniques, document pre-processing techniques and developing easily understandable and extractable compact representation of concept maps. Each document in the repository is converted to a concept map representation to capture concepts and relationships among concepts described in the said document. This organization would represent a summary of the document. These individual concept maps are utilized to generate concept maps that represent sections of the repository or the entire document repository. This paper discusses how statistical techniques are used to calculate certain metrics which are used to facilitate certain requirements of the solution. Principle component analysis is used in ranking the documents by importance. The concept map is visualized using force directed type graphs which represent concepts by nodes and r- lationships by edges.
Keywords :
document handling; graph theory; information retrieval; natural language processing; principal component analysis; automatic concept map generation; concept extraction; concept map construction; document analysis; document pre-processing techniques; document repositories; document summary; documented textual content; enterprises; fast information retrieval; force directed type graphs; graph edge; graph node; information organization; natural language processing techniques; phase structure tree; principle component analysis; rule based approach; sentence level breakdown; structured representation; summarization; Data mining; Feature extraction; Java; Natural language processing; Optimization; Sun; Concept Map; Concepts/Relationships Extraction; Natural Language Processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advances in ICT for Emerging Regions (ICTer), 2013 International Conference on
Conference_Location :
Colombo
Print_ISBN :
978-1-4799-1275-9
Type :
conf
DOI :
10.1109/ICTer.2013.6761171
Filename :
6761171
Link To Document :
بازگشت