• DocumentCode
    120709
  • Title

    An analysis into using unstructured non-expert text in the illicit drug domain

  • Author

    Carter, Bill ; Hofmann, Martin

  • Author_Institution
    Garda Siochana Anal. Service, An Garda Siochana, Dublin, Ireland
  • fYear
    2014
  • fDate
    21-22 Feb. 2014
  • Firstpage
    651
  • Lastpage
    657
  • Abstract
    The Pillreports.com database was mined in order to determine if the free-text fields in the database could be of use in differentiating regular pills from those that have been adulterated, i.e. contains ingredients not comparable to MDMA. The data was download and extracted using RapidMiner and Xpath queries. A Naive Bayes and SVM binary classification model was created. Pre-processing techniques of tokenisation, n-gram creation, stop-word removal, stemming as well as feature selection by weights were applied to the data, resulting in a 15 point improvement in the model. In addition we are reporting on a comprehensive cluster analysis. Frequent terms and differences between clusters were visualised using word clouds. Clusters were compared with values contained in nominal fields. Model results and interpretation are provided at various preprocessing stages. Key phrase extraction is identified as an area of possible future work.
  • Keywords
    Bayes methods; database management systems; drugs; feature selection; natural language processing; pattern classification; pattern clustering; query processing; support vector machines; text analysis; word processing; MDMA; Naive Bayes classification model; RapidMiner queries; SVM binary classification model; Xpath queries; cluster analysis; database mining; feature selection; free-text fields; illicit drug domain; key phrase extraction; n-gram creation; pill differentiation; preprocessing techniques; stop-word removal; tokenisation; unstructured nonexpert text analysis; word clouds; Accuracy; Computational modeling; Data mining; Databases; Support vector machines; Tag clouds; Vectors; Text analysis; classification; web content mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advance Computing Conference (IACC), 2014 IEEE International
  • Conference_Location
    Gurgaon
  • Print_ISBN
    978-1-4799-2571-1
  • Type

    conf

  • DOI
    10.1109/IAdCC.2014.6779401
  • Filename
    6779401