• DocumentCode
    2516587
  • Title

    Dissimilarity algorithm on conceptual graphs to mine text outliers

  • Author

    Kamaruddin, Siti Sakira ; Hamdan, Abdul Razak ; Bakar, Afarulrazi Abu ; Nor, Fauzias Mat

  • Author_Institution
    Fac. of Inf. Sci. & Technol., Univ. Kebangsaan Malaysia, Bangi, Malaysia
  • fYear
    2009
  • fDate
    27-28 Oct. 2009
  • Firstpage
    46
  • Lastpage
    52
  • Abstract
    The graphical text representation method such as Conceptual Graphs (CGs) attempts to capture the structure and semantics of documents. As such, they are the preferred text representation approach for a wide range of problems namely in natural language processing, information retrieval and text mining. In a number of these applications, it is necessary to measure the dissimilarity (or similarity) between knowledge represented in the CGs. In this paper, we would like to present a dissimilarity algorithm to detect outliers from a collection of text represented with Conceptual Graph Interchange Format (CGIF). In order to avoid the NP-complete problem of graph matching algorithm, we introduce the use of a standard CG in the dissimilarity computation. We evaluate our method in the context of analyzing real world financial statements for identifying outlying performance indicators. For evaluation purposes, we compare the proposed dissimilarity function with a dice-coefficient similarity function used in a related previous work. Experimental results indicate that our method outperforms the existing method and correlates better to human judgements. In Comparison to other text outlier detection method, this approach managed to capture the semantics of documents through the use of CGs and is convenient to detect outliers through a simple dissimilarity function. Furthermore, our proposed algorithm retains a linear complexity with the increasing number of CGs.
  • Keywords
    data mining; document handling; graph theory; information retrieval; natural language processing; optimisation; NP-complete problem; conceptual graph interchange format; conceptual graphs; dice-coefficient similarity function; dissimilarity algorithm; graph matching; graphical text representation; information retrieval; natural language processing; text mining; text outliers; Character generation; Data mining; Humans; Information retrieval; Information science; NP-complete problem; Natural language processing; Optimization methods; Performance analysis; Text mining; Conceptual graphs; dissimilarity algorithm; outlier detection; text mining; text outliers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining and Optimization, 2009. DMO '09. 2nd Conference on
  • Conference_Location
    Kajand
  • Print_ISBN
    978-1-4244-4944-6
  • Type

    conf

  • DOI
    10.1109/DMO.2009.5341910
  • Filename
    5341910