Abstract :
The tasks, challenges, and techniques of Information Retrieval (IR) should reflect the structure of the underlying document-query sets, and the needs of the domain. Are document-query sets obtained from the enterprise domain fundamentally different from standard research corpora gathered from the web? In order to identify, understand, and characterize such structural differences, we build a framework using point set topology to analyze document-query sets. Our framework tailors topological notions such as subbasis, cover, compactness, towards IR. Unlike previous topological approaches, we use the reverse of the relevance map to topologize the set of queries, not the set of documents. We show that the topological approach exposes sharp differences between enterprise and web-collected standard research document-query sets. These differences readily motivate research into new retrieval tasks that are of commercial importance in Enterprise Information Management (EIM).
Keywords :
Internet; document handling; information management; query processing; topology; EIM; IR; Web-collected standard research document-query sets; enterprise information management; information retrieval; relevance map; topological models; topological notions; Benchmark testing; Indexes; Information management; Information retrieval; Law; Standards; Topology;