Graph-based methods for language processing and information retrieval

Author

Radev, D.R.

Author_Institution

Michigan Univ., Ann Arbor, MI

fYear

2006

fDate

10-13 Dec. 2006

Firstpage

4

Lastpage

4

Abstract

Summary form only given. A number of problems in information retrieval and natural language processing can be approached using graph theory. Some representative examples in IR include Brin and Page´s Pagerank and Kleinberg´s HITS for document ranking using graph-based random walk models. In NLP, one could mention Pang and Lee´s work on sentiment analysis using graph min- cuts, Mihalcea´s work on word sense disambiguation, Zhu et al.´s label propagation algorithms, Toutanova et al.´s prepositional attachment algorithm, and McDonald et al.´s dependency parsing algorithm using minimum spanning trees. In this talk I will quickly summarize three graph-based algorithms developed recently at the University of Michigan: (a) lexrank, a method for multidocument summarization based on random walks on lexical centrality graphs, (b) TUMBL, a generic method using bipartite graphs for semi-supervised learning, and (c) biased lexrank, a semi-supervised technique for passage ranking for information retrieval and discuss the applicability of such techniques to other problems in Natural Language Processing and Information Retrieval.

Keywords

graph theory; information retrieval; learning (artificial intelligence); natural language processing; NLP; bipartite graphs; graph theory; graph-based random walk models; information retrieval; lexical centrality graphs; minimum spanning trees; multidocument summarization; natural language processing; semi-supervised learning; Algorithm design and analysis; Bipartite graph; Graph theory; Information retrieval; Natural language processing; Semisupervised learning; Tree graphs;

fLanguage

English

Publisher

ieee

Conference_Titel

Spoken Language Technology Workshop, 2006. IEEE

Conference_Location

Palm Beach

Print_ISBN

1-4244-0872-5

Type

conf

DOI

10.1109/SLT.2006.326781

Filename

4123344