• DocumentCode
    637053
  • Title

    Unstructured data extraction in distributed NoSQL

  • Author

    Lomotey, Richard K. ; Deters, Ralph

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Saskatchewan, Saskatoon, SK, Canada
  • fYear
    2013
  • fDate
    24-26 July 2013
  • Firstpage
    160
  • Lastpage
    165
  • Abstract
    While “Big data” has brought good tidings in terms of easy accessibility to voluminous data, we are faced with challenges too. The existing Knowledge Discovery in Database (KDD) processes which have been proposed for schema-oriented data sources are no longer applicable since todays data is unstructured. Previously, we deployed a tool called TouchR which relies on the Hidden Markov Model (HMM) to extract terms from unstructured data sources (specifically, NoSQL databases). This paper has advanced on the initially deployed version where we introduced re-usable dictionary and association rules to improve on the quality of the extracted terms. Also, the tool in its present stage is more adaptable to the user search based on the most frequently searched term.
  • Keywords
    SQL; data mining; hidden Markov models; Big data; KDD; NoSQL databases; TouchR; association rules; distributed NoSQL; hidden Markov model; knowledge discovery in database; reusable dictionary; schema-oriented data sources; unstructured data extraction; Association rules; Databases; Dictionaries; Feature extraction; Hidden Markov models; Thesauri; Association rules; Hidden Markov Model (HMM); NoSQL; Re-usable dictionary; Unstructured data; big data; terms extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Ecosystems and Technologies (DEST), 2013 7th IEEE International Conference on
  • Conference_Location
    Menlo Park, CA
  • ISSN
    2150-4938
  • Print_ISBN
    978-1-4799-0784-7
  • Type

    conf

  • DOI
    10.1109/DEST.2013.6611347
  • Filename
    6611347