Author/Authors :
Veronica L. Policicchio، نويسنده , , Adriana Pietramala، نويسنده , , Pasquale Rullo، نويسنده ,
Abstract :
While there has been a long history of rule-based text classifiers, to the best of our knowledge no M-of-N-based approach for text categorization has so far been proposed. In this paper we argue that M-of-N hypotheses are particularly suitable to model the text classification task because of the so-called “family resemblance” metaphor: “the members (i.e., documents) of a family (i.e., category) share some small number of features, yet there is no common feature among all of them. Nevertheless, they resemble each other”. Starting from this conjecture, we provide a sound extension of the M-of-N approach with negation and disjunction, called M-of-image, which enables to best fit the true structure of the data. Based on a thorough theoretical study, we show that the M-of-image hypothesis space has two partial orders that form complete lattices.
GAMoN is the task-specific Genetic Algorithm (GA) which, by exploiting the lattice-based structure of the hypothesis space, efficiently induces accurate M-of-image hypotheses.