• DocumentCode
    3717271
  • Title

    Macau: Large-scale skill sense disambiguation in the online recruitment domain

  • Author

    Qinlong Luo;Meng Zhao;Faizan Javed;Ferosh Jacob

  • Author_Institution
    Data Science R & D, 5550-A Peachtree Parkway, Norcross, GA 30092, USA
  • fYear
    2015
  • Firstpage
    1324
  • Lastpage
    1329
  • Abstract
    Named entity sense disambiguation is a problem with important natural language processing applications. In the online recruitment industry, normalization and recognition of occupational skills play a key role in linking the right candidate with the right job. The disambiguation of multisense skills will help improve this normalization and recognition process. In this paper we discuss an automatic large-scale system to identify and disambiguate multi-sense skills, including: (1) Feature Selection: employing word embedding to quantify the skills and their contexts into vectors; (2) Clustering: applying Markov Chain Monte Carlo (MCMC) methods to aggregate vectors into clusters that represent respective senses; (3) Large-scale: implementing parallelization to process text blobs on a large-scale; (4) Pruning: cluster cleaning by analyzing intra-cluster cosine similarities. Based on experiments on sample datasets, the MCMC-based clustering algorithm outperforms other clustering algorithms for the disambiguation problem. Also based on data-driven in-house evaluations, our disambiguation system achieves 84% precision.
  • Keywords
    "Context","Clustering algorithms","Servers","Recruitment","Resumes","Big data","Mathematical model"
  • Publisher
    ieee
  • Conference_Titel
    Big Data (Big Data), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/BigData.2015.7363890
  • Filename
    7363890