DocumentCode :
1808575
Title :
AHA: Asset Harvester Assistant
Author :
Mukherjee, Debdoot ; Mani, Senthil ; Sinha, Vibha Singhal ; Ananthanarayanan, Rema ; Srivastava, Biplav ; Dhoolia, Pankaj ; Chowdhury, Prahlad
Author_Institution :
IBM Res. India, New Delhi, India
fYear :
2010
fDate :
5-10 July 2010
Firstpage :
425
Lastpage :
432
Abstract :
Information assets in service enterprises are typically available as unstructured documents. There is an increasing need for unraveling information from these documents into a structured and semantic format. Structured data can be more effectively queried, which increases information reuse from asset repositories. This paper addresses the problem of extracting XML models, which follow a given target schema, from enterprise documents. We discuss why existing approaches for information extraction do not suffice for the enterprise documents created during service delivery. To address this limitation, we present the Asset Harvester Assistant (AHA), a tool that automatically extracts structured models from MS-Word documents, and supports manual refinement of the extracted models within an interactive environment. We present the results of empirical studies conducted using business-process documents from real service-delivery engagements. Our results indicate that the AHA approach can be effective in extracting accurate models from unstructured documents and improving user productivity.
Keywords :
XML; data structures; document handling; ontologies (artificial intelligence); AHA; XML models; asset harvester assistant; data structures; enterprise documents; information assets; information extraction; semantic format; service enterprises; Business; Data mining; Ontologies; Pediatrics; Semantics; Web pages; XML; documents; enterprise; harvesting; information extraction; services;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Services Computing (SCC), 2010 IEEE International Conference on
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4244-8147-7
Electronic_ISBN :
978-0-7695-4126-6
Type :
conf
DOI :
10.1109/SCC.2010.55
Filename :
5557199
Link To Document :
بازگشت