A neighborhood undersampling stacked ensemble (NUS-SE) in imbalanced classification

Document Type

Article

Publication Date

4-15-2021

Abstract

Stacked ensemble, which formulates an ensemble by using a meta-learner to combine (stack) the predictions of multiple base classifiers, suffers from the problem of suboptimal performance on imbalanced classification. To improve the classification performance of stacked ensemble on imbalanced datasets, we proposed a method named Neighborhood Undersampling Stacked Ensemble (NUS-SE) in this paper. In general, the NUS-SE can be broken down into two proposed components, an undersampling based stacked ensemble framework (US-SE) component and an undersampling technique component. In the metadata generation step of stacked ensemble, a cross-validation-like procedure (CV-prediction) is commonly used. Unfortunately, incomplete metadata with missing prediction values is generated when undersampling is performed within a stacked ensemble which utilized CV-prediction as the metadata generation procedure. Therefore, in the proposed US-SE component, we replaced the standard CV-prediction procedure with our proposed method coined as Subset and Out-of-Subset (S-OOS) prediction procedure as the metadata generation method. S-OOS prediction procedure will generate metadata without missing prediction values and thus enabling the integration of undersampling within stacked ensemble. By integrating undersampling within stacked ensemble, multiple undersampled-data-subsets are used in the training of US-SE's base learners. While in the undersampling component, we further proposed a novel undersampling technique - Neighborhood Undersampling (NUS) which selects majority instances based on their local neighborhood information. The performance of the NUS-SE is evaluated against those non-resampling based stacked ensemble as baseline methods. The experiment demonstrates that the proposed NUS-SE, which is an undersampling based stacked ensemble, is capable of achieving a better performance when compared to the non-resampling based stacked ensemble.

Keywords

Imbalanced classification, Class imbalance, Stacked generalization, Stacking, Super learning, Stacked ensemble

Divisions

ai,Computer

Publication Title

Expert Systems with Applications

Volume

168

Publisher

Elsevier

Publisher Location

THE BOULEVARD, LANGFORD LANE, KIDLINGTON, OXFORD OX5 1GB, ENGLAND

This document is currently not available here.

Share

COinS