Title :
Identifying result subdocuments of XML search conditions
Author :
Kinutani, Hiroko ; Yoshikawa, Masatoshi ; Uemura, Shunsuke
Author_Institution :
Graduate Sch. of Inf. Si., Nara Inst. of Sci. & Technol., Nara, Japan
Abstract :
XML is becoming widely used as a standard data format on the Web. Unlike SGML, XML documents do not require having their schemas. Since design of schemas of XML documents is not an easy task, a significant number of XML documents will be simply well-formed. We focus on miscellaneous well-formed XML documents with no common schemas. We articulate issues related to query processing of those well-formed XML documents using standard data formats or vocabularies as namespaces. We believe that an end-user´s typical queries against XML databases will be very terse as found in current HTML searching engines. However, unlike HTML search engines, XML database systems should return appropriate XML subdocuments as a granule of query results. The authors formulate a class of queries, which is a counterpart of a simple class of queries in current HTML search engines. Then, we define a new function which serves as a basis for identifying appropriate XML subdocuments as results of such queries and we introduce indices in order to process such queries efficiently
Keywords :
electronic data interchange; hypermedia markup languages; information resources; information retrieval; naming services; search engines; HTML search engines; Web; XML database systems; XML databases; XML documents; XML search conditions; XML subdocuments; current HTML searching engines; namespaces; query processing; query results; result subdocuments; standard data format; standard data formats; typical queries; vocabularies; Books; Content based retrieval; Database systems; HTML; Information science; Query processing; SGML; Search engines; Vocabulary; XML;
Conference_Titel :
Digital Libraries: Research and Practice, 2000 Kyoto, International Conference on.
Conference_Location :
Kyoto
Print_ISBN :
0-7695-1022-1
DOI :
10.1109/DLRP.2000.942182