• DocumentCode
    75092
  • Title

    Overcoming Asymmetry in Entity Graphs

  • Author

    Taesung Lee ; Young-rok Cha ; Seung-won Hwang

  • Author_Institution
    Dept. of Comput. Sci. & Eng., POSTECH, Pohang, South Korea
  • Volume
    26
  • Issue
    12
  • fYear
    2014
  • fDate
    Dec. 1 2014
  • Firstpage
    3051
  • Lastpage
    3063
  • Abstract
    This paper studies the problem of mining named entity translations by aligning comparable corpora. Current state-of-the-art approaches mine a translation pair by aligning an entity graph in one language to another based on node similarity or propagated similarity of related entities. However, they, building on the assumption of “symmetry”, quickly deteriorate on “weakly” comparable corpora with some asymmetry. In this paper, we pursue two directions for overcoming relation and entity asymmetry respectively. The first approach starts from weakly comparable corpora (for high recall) then ensures precision by selective propagation only to entities of symmetric relations. The second approach starts from parallel corpora (for high precision) then enhances recall by extending the translation matrix based on node similarity and contextual similarity. Our experimental results on English-Chinese corpora show that both approaches are effective and complementary. Our combined approach outperforms the best-performing baseline in terms of F1-score by up to 0.28.
  • Keywords
    data mining; entity-relationship modelling; graph theory; knowledge engineering; English-Chinese corpora; F1-score; contextual similarity; entity graphs; knowledge engineering methodologies; node similarity; parallel corpora; translation matrix; Context modeling; Electronic publishing; Graph theory; Internet; Semantics; Knowledge modeling; entity translation; knowledge engineering methodologies;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2014.2316799
  • Filename
    6787004