Title :
Deriving Link Context through Dependency Analysis
Author :
Jing, Tao ; Peng, Tao ; Zuo, Wanli
Author_Institution :
Coll. of Comput. Sci. & Technol., Ji Lin Univ., Changchun, China
Abstract :
Link context is a beneficial complement to the anchor text when we predict the topic of the target Web page. In this paper, we have defined the link context of the anchor text as a word set in which each word has dependency relationship with it. We have proposed an effective method for the extraction of the link context. Firstly, we have decomposed the whole sentence into some sub-clauses through dependency analysis of the sentence. Each sub-clause represents a semantic group. Secondly, we have found out the sub-clause set of each anchor. Finally, we have chosen one sub-clause(which contains the anchor text and meets the selection rule) from the sub-clause set as the link context of the anchor.To our best knowledge, it is the first time to derive link context by a NLP(natural language processing)technique, the dependency relationship analysis of sentence.The preliminary result has shown the quality of the link context obtained by this method has been significantly improved, and can fill up the deficiency of some heuristic methods based on the HTML structure of the Web page in the respect of text analysis.
Keywords :
Internet; hypermedia markup languages; natural language processing; text analysis; HTML structure; World Wide Web; anchor text; link context; natural language processing; sentence dependency analysis; target Web page; text analysis; Analytical models; Computer science; Computer science education; Educational institutions; Educational technology; HTML; Text analysis; Web mining; Web pages; Web sites; anchor text; natural language processing; parser;
Conference_Titel :
Education Technology and Computer, 2009. ICETC '09. International Conference on
Conference_Location :
Singapore
Print_ISBN :
978-0-7695-3609-5
DOI :
10.1109/ICETC.2009.66