• DocumentCode
    652257
  • Title

    Real-Time Effective Framework for Unstructured Data Mining

  • Author

    Lomotey, Richard K. ; Deters, Ralph

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Saskatchewan, Saskatoon, SK, Canada
  • fYear
    2013
  • fDate
    16-18 July 2013
  • Firstpage
    1081
  • Lastpage
    1088
  • Abstract
    Today, the enterprise landscape faces voluminous amount of data. The information gathered from these data sources are useful for improving on product and services delivery. However, it is challenging to perform knowledge discovery in database (KDD) activities on these data sources because of its unstructured nature. Previous studies have proposed the hierarchical clustering methodology since it enhances human readability and provides clear dependency structure through topics, term and document organization. But, the methodology can be resource intensive and time consuming. In order to improve on the terms extraction process, we propose a tool called RSenter that searches through interconnected Hyperlinks and NoSQL database (specifically, CouchDB). We evaluate the tool based on search algorithms such as parallelization, random walk (or linear search), pessimistic search, and optimistic search. The tool shows high accuracy and optimality in view of the search time.
  • Keywords
    SQL; data mining; database management systems; information retrieval; pattern clustering; search problems; KDD activities; NoSQL database; RSenter; data sources; dependency structure; document organization; enterprise landscape; hierarchical clustering methodology; human readability; interconnected Hyperlinks; knowledge discovery in database activities; optimistic search; pessimistic search; product delivery; random walk; real-time effective framework; search algorithms; service delivery; term extraction process; unstructured data mining; Clustering algorithms; Communities; Data mining; Databases; Information retrieval; Organizations; Thesauri; big data; data mining; hierarchical clustering; information extraction; terms; topics; unstructured data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on
  • Conference_Location
    Melbourne, VIC
  • Type

    conf

  • DOI
    10.1109/TrustCom.2013.131
  • Filename
    6680952