Abstract
Most classification approaches aim at achieving high prediction accuracy on a given dataset. However, in most practical cases, some action such as mailing an offer or treating a patient is to be taken on the classified objects, and we should model not the class probabilities themselves, but instead, the change in class probabilities caused by the action. The action should then be performed on those objects for which it will be most profitable. This problem is known as uplift modeling, differential response analysis, or true lift modeling, but has received very little attention in machine learning literature. An important modification of the problem involves several possible actions, when for each object, the model must also decide which action should be used in order to maximize profit. In this paper, we present tree-based classifiers designed for uplift modeling in both single and multiple treatment cases. To this end, we design new splitting criteria and pruning methods. The experiments confirm the usefulness of the proposed approaches and show significant improvement over previous uplift modeling techniques.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Abe N, Verma N, Apte C, Schroko R (2004) Cross channel optimized marketing by reinforcement learning. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD-2004), pp 767–772
Adomavicius G, Tuzhilin A (1997) Discovery of actionable patterns in databases: The action hierarchy approach. In: Proceedings of the 3rd ACM SIGKDD international conference on knowledge discovery and data mining (KDD-1997), pp 111–114
Bellamy S, Lin J, Ten Have T (2007) An introduction to causal modeling in clinical trials. Clin Trials 4(1): 58–73
Brieman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont
Buntine W (1992) Learning classification trees. Stat Comput 2(2): 63–73
Chickering DM, Heckerman D (2000) A decision theoretic approach to targeted advertising. In: Proceedings of the 16th conference on uncertainty in artificial intelligence (UAI-2000), Stanford, CA, pp 82–88
Csiszár I, Shields P (2004) Information theory and statistics: a tutorial. Found Trends Commun Inf Theory 1(4): 417–528
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7: 1–30
Drabent W, Małuszyński J (2010) Hybrid rules with well-founded semantics. Knowl Inf Syst 25(1): 137–168
Goetghebeur E, Lapp K (1997) The effect of treatment compliance in a placebo-controlled trial: regression with unpaired data. Appl Stat 46(3): 351–364
Han TS, Kobayashi K (2001) Mathematics of information and coding. American Mathematical Society, USA
Hansotia B, Rukstales B (2002) Incremental value modeling. J Interact Market 16(3): 35–46
Im S, Raś Z, Wasyluk H (2010) Action rule discovery from incomplete data. Knowl Inf Syst 25(1): 21–33
Jaroszewicz S, Ivantysynova L, Scheffer T (2008) Schema matching on streams with accuracy guarantees. Intell Data Anal 12(3): 253–270
Jaroszewicz S, Simovici DA (2001) A general measure of rule interestingness. In: Proceedings of the 5th European conference on principles of data mining and knowledge discovery (PKDD-2001), Freiburg, Germany, pp 253–265
Larsen K (2011) Net lift models: optimizing the impact of your marketing. In: Predictive Analytics World. Workshop presentation
Lee L (1999) Measures of distributional similarity. In: Proceedings of the 37th annual meeting of the association for computational linguistics (ACL-1999), pp 25–32
Lo VSY (2002) The true lift model—a novel data mining approach to response modeling in database marketing. SIGKDD Explor 4(2): 78–86
Manahan C (2005) A proportional hazards approach to campaign list selection. In: Proceedings of the thirtieth annual SAS users group international conference (SUGI), Philadelphia, PA
Mitchell T (1997) Machine learning. McGraw Hill, New York
Pearl J (2000) Causality: models, reasoning, and inference. Cambridge University Press, Cambridge
Quinlan JR (1986) Induction of decision trees. Mach Learn 1: 81–106
Quinlan JR (1987) Simplifying decision trees. Int J Man-Mach Stud 27(3): 221–234
Quinlan JR (1992) C4.5: programs for machine learning. Morgan Kauffman, Los Altos
Radcliffe NJ (2007) Generating incremental sales. White paper, Stochastic Solutions Limited
Radcliffe NJ (2007) Using control groups to target on predicted lift: building and assessing uplift models. Direct Market J Direct Market Assoc Anal Council 1: 14–21
Radcliffe NJ, Simpson R (2007) Identifying who can be saved and who will be driven away by retention activity. White paper, Stochastic Solutions Limited
Radcliffe NJ, Surry PD (1999) Differential response analysis: Modeling true response by isolating the effect of a single action. In: Proceedings of Credit Scoring and Credit Control VI. Credit Research Centre, University of Edinburgh Management School
Radcliffe NJ, Surry PD (2011) Real-world uplift modelling with significance-based uplift trees. Portrait Technical Report TR-2011-1, Stochastic Solutions
Raś Z, Wyrzykowska E, Tsay L-S (2009) Action rules mining. In: Encyclopedia of Data Warehousing and Mining, vol 1, pp 1–5. IGI Global
Robins J (1994) Correcting for non-compliance in randomized trials using structural nested mean models. Commun Stat Theory Methods 23(8): 2379–2412
Robins J, Rotnitzky A (2004) Estimation of treatment effects in randomised trials with non-compliance and a dichotomous outcome using structural mean models. Biometrika 91(4): 763–783
Rzepakowski P, Jaroszewicz S (2010) Decision trees for uplift modeling. In: Proceedings of the 10th IEEE international conference on data mining (ICDM-2010), Sydney, Australia, pp 441–450
Salicrú M (1992) Divergence measures: invariance under admissible reference measure changes. Soochow J Math 18(1): 35–45
Taneja IJ (2001) Generalized information measures and their applications. http://www.mtm.ufsc.br/~taneja/book/book.html (on-line book)
Toussaint GT (1978) Probability of error, expected divergence, and the affinity of several distributions. IEEE Trans Syst Man Cybern (SMC) 8: 482–485
Wang T, Qin Z, Jin Z, Zhang S (2010) Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning. J Syst Softw 83(7): 1137–1147
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Los Altos
Zhang R, Tran T (2011) An information gain-based approach for recommending useful product reviews. Knowl Inf Syst 26(3): 419–434
Zhang S (2010) Cost-sensitive classification with respect to waiting cost. Knowl Based Syst 23(5): 369–378
Acknowledgments
This work was supported by Research Grant no. N N516 414938 of the Polish Ministry of Science and Higher Education (Ministerstwo Nauki i Szkolnictwa Wyższego) from research funds for the period 2010–2012.
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Rzepakowski, P., Jaroszewicz, S. Decision trees for uplift modeling with single and multiple treatments. Knowl Inf Syst 32, 303–327 (2012). https://doi.org/10.1007/s10115-011-0434-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-011-0434-0