Abstract :
This research introduces a new paradigm: dynamic document organization, DDO, for managing documents. DDO consists of managing documents automatically under the assumption that the topic structure in a collection of documents is always variable and temporary. This paradigm is in contrast with static document organization, SDO, the currently used paradigm which assumes that the topic structure is fixed and permanent. In this work, we consider two scenarios, a three-phase-scenario and a four-phase-scenario, for managing documents based on DDO. In both scenarios, text clustering, cluster identification, and document classification are integrated into a cycle. In the four-phase-scenario, one more phase, classifier training, is added between the cluster identification and document classification phases. The goal of this research is to evaluate the two proposed scenarios and contrast the best one ti its best SDO counterpart. We show that the four-phase DDO scenario is more reliable than the three-phase DDO scenario, and that it generally outperforms the best SDO scenario.
Keywords :
document handling; cluster identification; document classification; documents collection; dynamic document organization; four phase scenarios; static document organization; text clustering; three phase scenarios; Clustering algorithms; Content management; Maintenance; Management training; Niobium; Prototypes; Support vector machines; Switches; Text categorization; Text mining;