Abstract :
Full text databases are tightly linked to the application layer. Currently IR projects must be integrated in the back-end using, at best, a general-purpose language-independent API. This architecture limits and precludes the rapid prototyping. In this paper we present a new approach, a very simple architecture, towards the development of a general purpose full-text database. We implemented a standard inverted file index, providing various extra capabilities. For each document stored we simply added a set of qualifiers, MD5 hashes and keywords, algorithmic ally unrelated to the document content. This allows to hierarchically control access to the document, iteratively improve document categorization, add and delete annotations, and document versions. All transactions are done through a standard Web service interface. This feature facilitates system integration, and testing. We describe a set of applications where our concept can be useful. The universe of applications for our concept encompass those areas where document annotations are relevant. Once stored and annotated (with qualifiers), the documents can be retrieved by a combination of qualifiers and document content. Additionally, we show our prototype in action, explaining how can be extended to support retrieval and storage models appeared in some popular sites recently
Keywords :
full-text databases; indexing; information retrieval; MD5 hashes; annotation driven information retrieval; document access control; document categorization; full-text database; standard inverted file index; universal full text index; Access control; Computer architecture; Content based retrieval; Databases; Information retrieval; Prototypes; Software prototyping; System testing; Web search; Web services;