Topological models of document-query sets in retrieval for Enterprise Information Management

Author

Deolalikar, Vinay

fYear

2014

fDate

27-30 Oct. 2014

Firstpage

18

Lastpage

23

Abstract

The tasks, challenges, and techniques of Information Retrieval (IR) should reflect the structure of the underlying document-query sets, and the needs of the domain. Are document-query sets obtained from the enterprise domain fundamentally different from standard research corpora gathered from the web? In order to identify, understand, and characterize such structural differences, we build a framework using point set topology to analyze document-query sets. Our framework tailors topological notions such as subbasis, cover, compactness, towards IR. Unlike previous topological approaches, we use the reverse of the relevance map to topologize the set of queries, not the set of documents. We show that the topological approach exposes sharp differences between enterprise and web-collected standard research document-query sets. These differences readily motivate research into new retrieval tasks that are of commercial importance in Enterprise Information Management (EIM).

Keywords

Internet; document handling; information management; query processing; topology; EIM; IR; Web-collected standard research document-query sets; enterprise information management; information retrieval; relevance map; topological models; topological notions; Benchmark testing; Indexes; Information management; Information retrieval; Law; Standards; Topology;

fLanguage

English

Publisher

ieee

Conference_Titel

Big Data (Big Data), 2014 IEEE International Conference on

Conference_Location

Washington, DC

Type

conf

DOI

10.1109/BigData.2014.7004426

Filename

7004426