Title :
Conceptual schema extraction using POS annotations and weighted edit distance algorithm
Author :
Rohit Shinde;Rohini Kulkarni;Manasi Patwardhan;Suresh Sarda;Pooja Mantri
Author_Institution :
Computer Engineering, VIT, Pune, India
Abstract :
Database design process involves analysis of system requirements described in natural language for manual extraction of conceptual schema. This is a tedious process and prone to human error. Earlier approaches of automation of this process had made use of either a finite set of rules, Context Free Grammars (CFG) or semantic understanding. Rule and CFG based approaches are not robust enough to cover all possible scenarios; whereas semantic approaches are not generic and have domain dependencies. We have defined an approach where the sequence part-of-speech (POS) tags of a sentence are annotated to ER components and a set of such annotated sentences serves as our corpus. Use of POS tags instead of the actual terms in a sentence makes the approach more robust, generic and domain-independent. We have also defined our own algorithm, which takes an input sentence and uses an extension of Edit Distance technique, to find out similar matches for a POS sequence of an input sentence with an associated cost, if a perfect match is not found. The accuracy of our current system is 54%. The feedback provided by the user is used to update the underlying model making the approach more interactive and improving the accuracy of future predictions.
Keywords :
"Data models","Erbium","Databases","XML","Context","Computers","Semantics"
Conference_Titel :
Information Processing (ICIP), 2015 International Conference on
DOI :
10.1109/INFOP.2015.7489476