Customer feature selection from high-dimensional bank direct marketing data for uplift modeling

Hu, Jinping

doi:10.1057/s41270-022-00160-z

Customer feature selection from high-dimensional bank direct marketing data for uplift modeling

Original Article
Published: 11 February 2022

Volume 11, pages 160–171, (2023)
Cite this article

Journal of Marketing Analytics Aims and scope Submit manuscript

Jinping Hu ORCID: orcid.org/0000-0002-7475-7407¹

316 Accesses
Explore all metrics

Abstract

Uplift modeling estimates the incremental impact (i.e., uplift) of a marketing campaign on customer outcomes. These models are essential to banks’ direct marketing efforts. However, bank data are often high-dimensional, with hundreds to thousands of customer features; and keeping irrelevant and redundant features in an uplift model can be computationally inefficient and adversely affect model performance. Therefore, banks must narrow their feature selection for uplift modeling. Yet, literature on feature selection has rarely focused on uplift modeling. This paper proposes several two-step feature selection approaches to uplift models, structured to cluster highly relevant, low-redundant feature subsets from high-dimensional banking data. Empirical experiments show that fewer features in a selected set (20 out of 180 features) lead to 68.6% of these uplift models performing as well or better than complete feature set models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A framework for increasing the value of predictive data-driven models by enriching problem domain characterization with novel features

Article 07 January 2016

Enhancing Customer Prediction Using Machine Learning with Feature Selection Approaches

Efficient Feature Selection Framework for Digital Marketing Applications

References

Almuallim, H., and T.G. Dietterich. 1994. Learning boolean concepts in the presence of many irrelevant features. Artificial Intelligence 69 (1–2): 279–305.
Article Google Scholar
Angrist, J.D., and J.S. Pischke. 2008. Mostly harmless econometrics: An empiricist’s companion. Princeton University Press.
Book Google Scholar
Ascarza, E., and B.G. Hardie. 2013. A joint model of usage and churn in contractual settings. Marketing Science 32 (4): 570–590.
Article Google Scholar
Athey, S., and G.W. Imbens. 2015. Machine learning methods for estimating heterogeneous causal effects. Stat 1050 (5): 1–26.
Google Scholar
Ayinde, B.O., T. Inanc, and J.M. Zurada. 2019. Redundant feature pruning for accelerated inference in deep neural networks. Neural Networks 118: 148–158.
Article Google Scholar
Baesens, B.V. 2002. Bayesian neural network learning for repeat purchase modelling in direct marketing. European Journal of Operational Research 138 (1): 191–211.
Article Google Scholar
Bayrak, A.E., and F. Polat. 2019. Effective feature reduction for link prediction in location-based social networks. Journal of Information Science 45 (5): 676–690.
Article Google Scholar
Bell, D.A., and H. Wang. 2000. A formalism for relevance and its application in feature subset selection. Machine Learning 41 (2): 175–195.
Article Google Scholar
Blum, A.L., and P. Langley. 1997. Selection of relevant features and examples in machine learning. Artificial Intelligence 97 (1–2): 245–271.
Article Google Scholar
Cogswell, M., et al. 2016. Reducing overfitting in deep networks by decorrelating representations. In Proc. of the international conference on learning representations, pp. 1–12.
Dash, M., et al. 2002. Feature selection for clustering-a filter solution. In 2002 IEEE international conference on data mining, 2002, pp. 115–122. IEEE.
Duangsoithong, R., and T. Windeatt. 2009. Relevance and redundancy analysis for ensemble classifiers. In International workshop on machine learning and data mining in pattern recognition, pp. 206–220. Berlin, Heidelberg: Springer.
Elsalamony, H.A. 2014. Bank direct marketing analysis of data mining techniques. International Journal of Computer Applications 85 (7): 12–22.
Article Google Scholar
Elsalamony, H.A., and A.M. Elsayad. 2013. Bank direct marketing based on neural network and C5.0 Models. International Journal of Engineering and Advanced Technology (IJEAT) 2 (6): 392–400.
Google Scholar
Goldfarb, A., and C. Tucker. 2011. Online display advertising: Targeting and obtrusiveness. Marketing Science 30 (3): 389–404.
Article Google Scholar
Guelman, L., M. Guillén, and A.M. Pérez-Marín. 2015. Uplift random forests. Cybernetics and Systems 46 (3–4): 230–248.
Article Google Scholar
Gutierrez, P., and J.Y. Gérardy. 2017. Causal inference and uplift modelling: A review of the literature. In International conference on predictive applications and APIs (pp. 1–13). PMLR.
Guyon, I., and A. Elisseeff. 2003. An introduction to variable and feature selection. Journal of Machine Learning Research 3: 1157–1182.
Google Scholar
Hansotia, B., and B. Rukstales. 2002. Incremental value modeling. Journal of Interactive Marketing 16 (3): 35.
Article Google Scholar
Hitsch, G.J., and S. Misra. 2018. Heterogeneous treatment effects and optimal targeting policy evaluation. Available at SSRN 3111957.
Kim, Y., W.N. Street, and F. Menczer. 2000. Feature selection in unsupervised learning via evolutionary search. In Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp. 365–369.
Koller, D., and M. Sahami. 1996. Toward optimal feature selection. Stanford Info Lab.
Lambrecht, A., and C. Tucker. 2013. When does retargeting work? Information specificity in online advertising. Journal of Marketing Research 50 (5): 561–576.
Article Google Scholar
Lewis, R.A., and D.H. Reiley. 2014. Online ads and offline sales: Measuring the effect of retail advertising via a controlled experiment on Yahoo! Quantitative Marketing and Economics 12 (3): 235–266.
Article Google Scholar
Li, H., et al. 2017. Pruning filters for efficient convnets. In Proc. of the international conference on learning representations, pp. 1–12.
Ling, C.X., and C. Li. 1998. Data mining for direct marketing: Problems and solutions. In KDD, vol. 98, pp.73–79.
Liu, H., and H. Motoda, eds. 1998. Feature extraction, construction and selection: A data mining perspective, vol. 453. New York: Springer Science and Business Media.
Google Scholar
Lo, V.S. 2002. The true lift model: A novel data mining approach to response modeling in database marketing. ACM SIGKDD Explorations Newsletter 4 (2): 78–86.
Article Google Scholar
Maldonado, S., et al. 2015. Profit-based feature selection using support vector machines–General framework and an application for customer retention. Applied Soft Computing 35: 740–748.
Article Google Scholar
Marinakos, G., and S. Daskalaki. 2017. Imbalanced customer classification for bank direct marketing. Journal of Marketing Analytics 5 (1): 14–30.
Article Google Scholar
Meyer, P.E., and G. Bontempi. 2006. On the use of variable complementarity for feature selection in cancer classification. In Workshops on applications of evolutionary computation, pp. 91–102. Berlin, Heidelberg: Springer.
Miller, A. 2002. Subset selection in regression. Boca Raton: CRC Press.
Book Google Scholar
Nassif, H., et al. 2013. Uplift modeling with ROC: An SRL case study. In ILP (late breaking papers), pp. 40–45.
Parlar, T. 2017. Using data mining techniques for detecting the important features of the bank direct marketing data. International Journal of Economics and Financial Issues 7 (2): 692.
Google Scholar
Radcliffe, N.J. 2007. Using control groups to target on predicted lift: Building and assessing uplift models. Direct Marketing Analytics Journal 1 (3): 14–21.
Google Scholar
Reddy, N.S. 2021. Optimal feature selection and hybrid deep learning for direct marketing campaigns in banking applications. Evolutionary Intelligence, 1–22.
Risselada, H., P.C. Verhoef, and T.H. Bijmolt. 2010. Staying power of churn prediction models. Journal of Interactive Marketing 24 (3): 198–208.
Article Google Scholar
Robnik-Šikonja, M., and I. Kononenko. 2003. Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning 53 (1): 23–69.
Article Google Scholar
Rodríguez, et al. (2016). Regularizing cnns with locally constrained decorrelations. https://arxiv.org/abs/1611.01967.
Rubin, D.B. 1974. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66 (5): 688.
Article Google Scholar
Rzepakowski, P., and S. Jaroszewicz. 2012. Decision trees for uplift modeling with single and multiple treatments. Knowledge and Information Systems 32 (2): 303–327.
Article Google Scholar
Sołtys, M., S. Jaroszewicz, and P. Rzepakowski. 2015. Ensemble methods for uplift modeling. Data Mining and Knowledge Discovery 29 (6): 1531–1559.
Article Google Scholar
Song, Q., J. Ni, and G. Wang. 2011. A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Transactions on Knowledge and Data Engineering 25 (1): 1–14.
Article Google Scholar
Sulistiani, H., and A. Tjahyanto. 2017. Comparative analysis of feature selection method to predict customer loyalty. IPTEK the Journal of Engineering 3 (1): 1–5.
Article Google Scholar
Szegedy, C., et al. 2016. Rethinking the inception architecture for computer vision. In Proc. of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826. IEEE.
Tan, J., et al. 2013. Adaptive feature selection via a new version of support vector machine. Neural Computing and Applications 23 (3): 937–945.
Article Google Scholar
Venkatesh, B., and J. Anuradha. 2019. A review of feature selection and its methods. Cybernetics and Information Technologies 19 (1): 3–26.
Article Google Scholar
Xing, E.P., M.I. Jordan, and R.M. Karp. 2001. Feature selection for high-dimensional genomic microarray data. In Icml, vol. 1, pp. 601–608.
Xu, L., et al. 2015. Feature selection with integrated relevance and redundancy optimization. In 2015 IEEE international conference on data mining, pp. 1063–1068. IEEE.
Yang, Y., and J.P. Pederson. 1997. A comparative study on feature selection in text categorization. In Proceedings of the fourteenth international conference on machine learning (ICML’97), pp. 412–420.
Yoon, J., and S.J. Hwang (2017). Combined group and exclusive sparsity for deep neural net-works. In Proc. of the international conference on machine learning, pp. 3958–3966. PMLR.
Yu, L., and H. Liu. 2003. Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 20th international conference on machine learning (ICML-03), pp. 856–863.
Yu, L., and H. Liu. 2004. Efficient feature selection via analysis of relevance and redundancy. The Journal of Machine Learning Research 5: 1205–1224.
Google Scholar
Zhang, W., et al. 2018. Efficient feature selection framework for digital marketing applications. In Pacific-Asia conference on knowledge discovery and data mining, pp. 28–39. Cham: Springer
Zhao, Z., R. Anand, and M. Wang. 2019. Maximum relevance and minimum redundancy feature selection methods for a marketing machine learning platform. In 2019 IEEE international conference on data science and advanced analytics (DSAA), pp. 442–452. IEEE.
Zhao, Z., et al. 2020. Feature selection methods for uplift modeling. https://arxiv.org/abs/2005.03447.

Download references

Acknowledgements

Jinping acknowledges the financial support of the China Scholarship Council.

Author information

Authors and Affiliations

Shenzhen Technology University, 3002 Lantian Road, Pingshan District, Shenzhen , 518118, Guangdong, China
Jinping Hu

Authors

Jinping Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinping Hu.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, J. Customer feature selection from high-dimensional bank direct marketing data for uplift modeling. J Market Anal 11, 160–171 (2023). https://doi.org/10.1057/s41270-022-00160-z

Download citation

Revised: 22 June 2021
Accepted: 02 February 2022
Published: 11 February 2022
Issue Date: June 2023
DOI: https://doi.org/10.1057/s41270-022-00160-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Customer feature selection from high-dimensional bank direct marketing data for uplift modeling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A framework for increasing the value of predictive data-driven models by enriching problem domain characterization with novel features

Enhancing Customer Prediction Using Machine Learning with Feature Selection Approaches

Efficient Feature Selection Framework for Digital Marketing Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Customer feature selection from high-dimensional bank direct marketing data for uplift modeling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A framework for increasing the value of predictive data-driven models by enriching problem domain characterization with novel features

Enhancing Customer Prediction Using Machine Learning with Feature Selection Approaches

Efficient Feature Selection Framework for Digital Marketing Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation