DIGITAL LIBRARY ARCHIVE
HOME > DIGITAL LIBRARY ARCHIVE
< Previous   List   Next >  
Optimal Selection of Classifier Ensemble Using Genetic Algorithms
Full-text Download
Myung-Jong Kim (Department of Business, Pusan National University)
Vol. 16, No. 4, Page: 99 ~ 112
Keywords
Neural Networks, Ensemble, Genetic Algorithms
Abstract
Ensemble learning is a method for improving the performance of classification and prediction algorithms. It is a method for finding a highly accurateclassifier on the training set by constructing and combining an ensemble of weak classifiers, each of which needs only to be moderately accurate on the training set. Ensemble learning has received considerable attention from machine learning and artificial intelligence fields because of its remarkable performance improvement and flexible integration with the traditional learning algorithms such as decision tree (DT), neural networks (NN), and SVM, etc. In those researches, all of DT ensemble studies have demonstrated impressive improvements in the generalization behavior of DT, while NN and SVM ensemble studies have not shown remarkable performance as shown in DT ensembles. Recently, several works have reported that the performance of ensemble can be degraded where multiple classifiers of an ensemble are highly correlated with, and thereby result in multicollinearity problem, which leads to performance degradation of the ensemble. They have also proposed the differentiated learning strategies to cope with performance degradation problem. Hansen and Salamon (1990) insisted that it is necessary and sufficient for the performance enhancement of an ensemble that the ensemble should contain diverse classifiers. Breiman (1996) explored that ensemble learning can increase the performance of unstable learning algorithms, but does not show remarkable performance improvement on stable learning algorithms. Unstable learning algorithms such as decision tree learners are sensitive to the change of the training data, and thus small changes in the training data can yield large changes in the generated classifiers. Therefore, ensemble with unstable learning algorithms can guarantee some diversity among the classifiers. To the contrary, stable learning algorithms such as NN and SVM generate similar classifiers in spite of small changes of the training data, and thus the correlation among the resulting classifiers is very high. This high correlation results in multicollinearity problem, which leads to performance degradation of the ensemble. Kim,s work (2009) showedthe performance comparison in bankruptcy prediction on Korea firms using tradition prediction algorithms such as NN, DT, and SVM. It reports that stable learning algorithms such as NN and SVM have higher predictability than the unstable DT. Meanwhile, with respect to their ensemble learning, DT ensemble shows the more improved performance than NN and SVM ensemble. Further analysis with variance inflation factor (VIF) analysis empirically proves that performance degradation of ensemble is due to multicollinearity problem. It also proposes that optimization of ensemble is needed to cope with such a problem. This paper proposes a hybrid system for coverage optimization of NN ensemble (CO-NN) in order to improve the performance of NN ensemble. Coverage optimization is a technique of choosing a sub-ensemble from an original ensemble to guarantee the diversity of classifiers in coverage optimization process. CO-NN uses GA which has been widely used for various optimization problems to deal with the coverage optimization problem. The GA chromosomes for the coverage optimization are encoded into binary strings, each bit of which indicates individual classifier. The fitness function is defined as maximization of error reduction and a constraint of variance inflation factor (VIF), which is one of the generally used methods to measure multicollinearity, is added to insure the diversity of classifiers by removing high correlation among the classifiers. We use Microsoft Excel and the GAs software package called Evolver. Experiments on company failure prediction have shown that CO-NN is effectively applied in the stable performance enhancement of NNensembles through the choice of classifiers by considering the correlations of the ensemble. The classifiers which have the potential multicollinearity problem are removed by the coverage optimization process of CO-NN and thereby CO-NN has shown higher performance than a single NN classifier and NN ensemble at 1% significance level, and DT ensemble at 5% significance level. However, there remain further research issues. First, decision optimization process to find optimal combination function should be considered in further research. Secondly, various learning strategies to deal with data noise should be introduced in more advanced further researches in the future.
Show/Hide Detailed Information in Korean
유전자 알고리즘을 이용한 분류자 앙상블의 최적 선택
김명종 (부산대학교 경영학과)
Abstract
앙상블 학습은 분류 및 예측 알고리즘의 성과개선을 위하여 제안된 기계학습 기법이다. 그러나 앙상블 학습은 기저 분류자의 다양성이 부족한 경우 다중공선성 문제로 인하여 성과개선 효과가 미약하고 심지어는 성과가 악화될 수 있다는 문제점이 제기되었다. 본 연구에서는 기저 분류자의 다양성을 확보하고 앙상블 학습의 성과개선 효과를 제고하기 위하여 유전자 알고리즘 기반의 범위 최적화 기법을 제안하고자 한다. 본 연구에서 제안된 최적화 기법을 기업 부실예측 인공신경망 앙상블에 적용한 결과 기저 분류자의 다양성이 확보되고 인공신경망 앙상블의 성과가 유의적으로 개선되었음을 보여주었다.
Cite this article
JIIS Style
Kim, M.-J., , "Optimal Selection of Classifier Ensemble Using Genetic Algorithms", Journal of Intelligence and Information Systems, Vol. 16, No. 4 (2010), 99~112.

IEEE Style
Myung-Jong Kim, "Optimal Selection of Classifier Ensemble Using Genetic Algorithms", Journal of Intelligence and Information Systems, vol. 16, no. 4, pp. 99~112, 2010.

ACM Style
Kim, M.-J.,, 2010. Optimal Selection of Classifier Ensemble Using Genetic Algorithms. Journal of Intelligence and Information Systems. 16, 4, 99--112.
Export Formats : BiBTeX, EndNote
Advanced Search
Date Range

to
Search
@article{Kim:JIIS:2010:422,
author = {Kim, Myung-Jong},
title = {Optimal Selection of Classifier Ensemble Using Genetic Algorithms},
journal = {Journal of Intelligence and Information Systems},
issue_date = {December 2010},
volume = {16},
number = {4},
month = Dec,
year = {2010},
issn = {2288-4866},
pages = {99--112},
url = {},
doi = {},
publisher = {Korea Intelligent Information System Society},
address = {Seoul, Republic of Korea},
keywords = { Neural Networks, Ensemble and Genetic Algorithms },
}
%0 Journal Article
%1 422
%A Myung-Jong Kim
%T Optimal Selection of Classifier Ensemble Using Genetic Algorithms
%J Journal of Intelligence and Information Systems
%@ 2288-4866
%V 16
%N 4
%P 99-112
%D 2010
%R
%I Korea Intelligent Information System Society