DocumentCode
3723123
Title
Advancing the Terminological Classification of Semi-structured Documents
Author
Georgios Stratogiannis;Georgios Siolas;Georgios Stamou;Andreas Stafylopatis;Alexandros Chortaras;Athanasios Tagaris
Author_Institution
Dept. of Electr. &
fYear
2015
Firstpage
333
Lastpage
339
Abstract
Usually, documents are given in textual form, accompanied by a set of terminological classifications (metadata), based on vocabularies of domain ontologies. This paper presents a novel method for advancing the above classification, by extracting more properties of the analyzed documents. We first extract additional roles from the textual part and together with roles extracted from the ontology statements, we construct an extended document vector representation. We then introduce a pruning algorithm that, for a given document collection, merges concepts of the ontology to produce classes with a sufficient number of corresponding instances. We then classify the documents to ontology classes using the Stanford linear Classifier. Finally, we propose an algorithm that assigns additional concept labels to documents, using the output of the classifier. Our system is evaluated in a set of real data and ontological descriptions and its performance is measured in terms of various accuracy and specificity measures indicates that the proposed approach for documents classification produces correct labels for the majority of items.
Keywords
"Ontologies","Semantics","Natural languages","Feature extraction","Data mining","Training","Clothing"
Publisher
ieee
Conference_Titel
Tools with Artificial Intelligence (ICTAI), 2015 IEEE 27th International Conference on
ISSN
1082-3409
Type
conf
DOI
10.1109/ICTAI.2015.58
Filename
7372154
Link To Document