International Journal of Computational Intelligence Research (IJCIR)

Volume 3, Number 1 (2007)

 


RVM ensemble for text classification



Silva Catarina
School of Technology and Management of the Polytechnic Institute of Leiria Morro do Lena - Alto do Vieiro, Portugal, P-2411-901 Leiria, Portugal.

Ribeiro Bernardete
Department of Informatics Engineering, Center for Informatics and Systems University of Coimbra, Polo II, P-3030-290 Coimbra, Portugal

 

Abstract
Automated classification of texts by their likeness or affinity has greatly eased the management and processing of the massive volumes of information we face everyday. Although Support Vector Machines (SVM) provide a state-of-the art technique to tackle this problem, Relevance Vector Machines (RVM), which rely on Bayesian inference learning, offer advantages such as their capacity to find sparser and probabilistic solutions. A known problem with the Bayesian approaches, however, is their relative inability to scale to larger problems where millions of documents are involved as well as real-time user's requests.

We propose an ensemble strategy to circumvent RVMs scalability problem by applying a divide-and-conquer technique to handle the overload of available data, where the training documents are divided amongst small RVM classifiers, then the ensemble combines their individual contributions. The solution achieved keeps a sparse decision function and is computationally efficient. Results with respect to Reuters-21578 clearly demonstrate the proposed strategy can surpass other techniques, in both in terms classification performance and response time.

Keywords
Text classification, Relevance Vector Machines, Ensembles, Scaling Machine Learning Algorithms.

______________________________________________________________________________________
[UP]