research-article

Soft estimation by hierarchical classification and regression

Authors:

Shih-Wen Ke,

Wei-Chao Lin,

Chih-Fong Tsai,

Ya-Han HuAuthors Info & Claims

Neurocomputing, Volume 234, Issue C

Pages 27 - 37

https://doi.org/10.1016/j.neucom.2016.12.037

Published: 19 April 2017 Publication History

Abstract

Classification and numeric estimation are the two most common types of data mining. The goal of classification is to predict the discrete type of output values whereas estimation is aimed at finding the continuous type of output values. Predictive data mining is generally achieved by using only one specific statistical or machine learning technique to construct a prediction model. Related studies have shown that prediction performance by this kind of single flat model can be improved by the utilization of some hierarchical structures. Hierarchical estimation approaches, usually a combination of multiple estimation models, have been proposed for solving some specific domain problems. However, in the literature, there is no generic hierarchical approach for estimation and no hybrid based solution that combines classification and estimation techniques hierarchically. Therefore, we introduce a generic hierarchical architecture, namely hierarchical classification and regression (HCR), suitable for various estimation problems. Simply speaking, the first level of HCR involves pre-processing a given training set by classifying it into k classes, leading to k subsets. Three approaches are used to perform this task in this study: hard classification (HC); fuzzy c-means (FCM); and genetic algorithms (GA). Then, each training data containing its associated class label is used to train a support vector machine (SVM) classifier for classification. Next, for the second level of HCR, k regression (or estimation) models are trained based on their corresponding subsets for final prediction. The experiments based on 8 different UCI datasets show that most hierarchical prediction models developed with the HCR architecture significantly outperform three well-known single flat prediction models, i.e., linear regression (LR), multilayer perceptron (MLP) neural networks, and support vector regression (SVR) in terms of mean absolute percentage error (MAPE) and root mean squared error (RMSE) rates. In addition, it is found that using the GA-based data pre-processing approach to classify the training set into 4 subsets is the best threshold (i.e., k=4) and the 4-class SVM+MLP outperforms three baseline hierarchical regression models.

References

[1]

C.H. Achen, Two-step hierarchical estimation: beyond regression analysis, Political Anal., 13 (2005) 447-456.

Abstract

References

Cited By

Index Terms

Recommendations

Convex Hull Ensemble Machine for Regression and Classification

Intelligible models for classification and regression

Novel ensemble methods for regression via classification problems

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations