• DocumentCode
    3105798
  • Title

    Dirichlet Aspect Weighting: A Generalized EM Algorithm for Integrating External Data Fields with Semantically Structured Queries by Using Gradient Projection Method

  • Author

    Velivelli, Atulya ; Huang, Thomas S.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Illinois at Urbana-Champaign, Urbana, IL
  • fYear
    2006
  • fDate
    18-22 Dec. 2006
  • Firstpage
    633
  • Lastpage
    644
  • Abstract
    In this paper we address the problem of document retrieval with semantically structured queries - queries where each term has a tagged field label. We introduce Dirichlet Aspect Weighting model which integrates terms from external databases into the query language model in a bayesian learning framework. For this model, the Dirichlet prior distribution is governed by parameters which depend on the number of fields in the external databases. This model needs additional examples to be augmented to the semantically structured query. These examples are obtained using pseudo relevance feedback. We formulate a loglikelihood function for the Dirichlet Aspect Weighting model and maximize it using a novel Generalized EM algorithm. Comparison of the results of Dirichlet Aspect Weighting model on TREC 2005 Genomics Track dataset with baseline methods using pseudo relevance feedback, while incorporating terms from external databases shows an improvement.
  • Keywords
    SQL; belief networks; expectation-maximisation algorithm; information retrieval; query processing; relevance feedback; TREC 2005 Genomics Track dataset; bayesian learning; dirichlet aspect weighting; document retrieval; external data fields; external databases; generalized EM algorithm; gradient projection method; loglikelihood function; pseudo relevance feedback; query language model; semantically structured queries; Bayesian methods; Bioinformatics; Data engineering; Data mining; Database languages; Feedback; Genetic mutations; Genomics; Information retrieval; Natural languages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2006. ICDM '06. Sixth International Conference on
  • Conference_Location
    Hong Kong
  • ISSN
    1550-4786
  • Print_ISBN
    0-7695-2701-7
  • Type

    conf

  • DOI
    10.1109/ICDM.2006.55
  • Filename
    4053089