• DocumentCode
    3544239
  • Title

    Measuring semantic similarity between digital forensics terminologies using web search engines

  • Author

    Karie, Nickson M. ; Venter, Hein S.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Pretoria, Pretoria, South Africa
  • fYear
    2012
  • fDate
    15-17 Aug. 2012
  • Firstpage
    1
  • Lastpage
    9
  • Abstract
    Semantic similarity between different terminologies is becoming a generic problem that extends across numerous domains, touching applications developed for computational linguistics, artificial intelligence, cognitive science and, in the case of this paper, digital forensics. Despite the usefulness of semantic similarity measures in different domains, accurately measuring semantic similarity between any two terms remains a challenging task. The main difficulty lies in developing a computational method with the ability to generate satisfactory results close to how human beings perceive these terminologies, especially when used in their domain of expertise. This paper presents a novel approach of using the Web to measure semantic similarity between two terms x and y in the digital forensics domain. The proposed approach is based on the Euclidean distance, a mathematical concept used to calculate the distance between two points. This paper also shows how computing the absolute value of the difference of the logarithms of the hit count percentages of any given terms x and y relates to the computed Euclidean distance of x and y. Percentages are computed from the total number of hit counts reported by any Web search engine for the search terms x, y and the logical x AND y together. Finally, these concepts are used to deduce a formula to automatically calculate a semantic similarity measure coined as the Digital Forensic Absolute Semantic Similarity Value of the terms x and y, denoted as DFASSV(x, y). Experiments conducted using the proposed DFASSV method focuses on the digital forensics domain. However, a comparison of the DFASSV approach with previously proposed Web-based semantic similarity measures shows that this approach is well suited for digital forensics domain terminologies. In the authors´ opinion however, the DFASSV approach can be applied in other domains as well because it does not require any human-annotated knowledge. DFASSV is a novel approach to semanti- similarity measure and constitutes the main contribution of this paper.
  • Keywords
    Internet; computer forensics; search engines; DFASSV x, y; Euclidean distance; Web search engines; artificial intelligence; cognitive science; computational linguistics; digital forensic absolute semantic similarity value terms x and y; digital forensics terminologies; logical x AND y; mathematical concept; semantic similarity measurement; Digital forensics; Engines; Equations; Euclidean distance; Mathematical model; Semantics; Web search; Euclidean distance; Semantic similarity; Web; Web search engines; absolute value; digital forensic domain terminologies; digital forensics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Security for South Africa (ISSA), 2012
  • Conference_Location
    Johannesburg, Gauteng
  • Print_ISBN
    978-1-4673-2160-0
  • Type

    conf

  • DOI
    10.1109/ISSA.2012.6320448
  • Filename
    6320448