In recent years, microeconometrics experienced the ‘credibility revolution’, culminating in the 2... more In recent years, microeconometrics experienced the ‘credibility revolution’, culminating in the 2021 Nobel prices for David Card, Josh Angrist, and Guido Imbens. This ‘revolution’ in how to do empirical work led to more reliable empirical knowledge of the causal effects of certain public policies. In parallel, computer science, and to some extent also statistics, developed powerful (so-called Machine Learning) algorithms that are very successful in prediction tasks. The new literature on Causal Machine Learning unites these developments by using algorithms originating in Machine Learning for improved causal analysis. In this non-technical overview, I review some of these approaches. Subsequently, I use an empirical example from the field of active labour market programme evaluation to showcase how Causal Machine Learning can be applied to improve the usefulness of such studies. I conclude with some considerations about shortcomings and possible future developments of these methods a...
There is great demand for inferring causal effect heterogeneity and for open-source statistical s... more There is great demand for inferring causal effect heterogeneity and for open-source statistical software, which is readily available for practitioners. The mcf package is an open-source Python package that implements Modified Causal Forest (mcf), a causal machine learner. We replicate three well-known studies in the fields of epidemiology, medicine, and labor economics to demonstrate that our mcf package produces aggregate treatment effects, which align with previous results, and in addition, provides novel insights on causal effect heterogeneity. For all resolutions of treatment effects estimation, which can be identified, the mcf package provides inference. We conclude that the mcf constitutes a practical and extensive tool for a modern causal heterogeneous effects analysis.
The order of actions in contests may have a significant effect on performance. In this study we e... more The order of actions in contests may have a significant effect on performance. In this study we examine the role of schedule in round-robin tournaments with sequential games between three and four contestants. Our propensity-score matching estimation, based on soccer FIFA World Cups, UEFA European Championships and Olympic wrestling events, reveals that there is a substantial advantage to the contestant who competes in the first and third matches, which is in line with game-theoretical predictions. Our finding implies that the round-robin structure with sequential games is endogenously unfair, since it systematically favours one of the contestants.
This paper considers the practically important case of nonparametrically estimating heterogeneous... more This paper considers the practically important case of nonparametrically estimating heterogeneous average treatment effects that vary with a limited number of discrete and continuous covariates in a selection-on-observables framework where the number of possible confounders is very large. We propose a two-step estimator for which the first step is estimated by machine learning. We show that this estimator has desirable statistical properties like consistency, asymptotic normality and rate double robustness. In particular, we derive the coupled convergence conditions between the nonparametric and the machine learning steps. We also show that estimating population average treatment effects by averaging the estimated heterogeneous effects is semi-parametrically efficient. The new estimator is an empirical example of the effects of mothers' smoking during pregnancy on the resulting birth weight.
In this note, we show that the OLS and fixed-effects (FE) estimators of the popular differ-ence-i... more In this note, we show that the OLS and fixed-effects (FE) estimators of the popular differ-ence-in-differences model may deviate when there is time varying panel non-response. If such non-response does not affect the common-trend assumption, then OLS and FE are consistent, but OLS is more precise. However, if non-response is affecting the common-trend assumption, then FE estimation may still be consistent, while OLS will be inconsistent. We provide simulation as well as empirical evidence for this phenomenon to occur. We conclude that in case of unbalanced panels, any evidence of deviating OLS and FE estimates should be considered as evidence that non-response is not ignorable for the differences-in-differences estimation.
This paper models payment evasion as a source of profit by letting the firm choose the price char... more This paper models payment evasion as a source of profit by letting the firm choose the price charged to paying consumers and the fine collected from detected payment evaders. The consumers choose whether to purchase, evade payment, or refrain from consumption. We show that payment evasion allows the firm to charge a higher price to paying consumers and to generate a higher profit. We also show that higher fines do not necessarily reduce payment evasion. Finally, we provide empirical evidence which is consistent with our theoretical analysis, using comprehensive micro data on fare dodging on the Zurich Transport Network.
Switzerland is a multi-lingual developed country that provides an attractive stage to test ingrou... more Switzerland is a multi-lingual developed country that provides an attractive stage to test ingroup favoritism that is driven by linguistic differences. To that end, we utilize data from soccer games in the top two Swiss divisions between the seasons 2005/06 and 2017/18. In these games, the referee was from the same linguistic area with one team, whereas the other team was from a different linguistic area. Using very rich data on teams’ and games’ characteristics, our causal forest-based estimator reveals that referees assign significantly more penalties in the form of yellow and red cards to teams from a different linguistic area. This form of ingroup favoritism is large enough so that it is likely to affect the outcome of the game. As evidence, we find that the difference in points in favor of the home team increases significantly when a referee is from the same linguistic area.
The home advantage phenomenon is a well-established feature in sports competitions. In this paper... more The home advantage phenomenon is a well-established feature in sports competitions. In this paper, we examine data from 1,908 soccer matches played in the German Bundesliga during the seasons from 2007-08 to 2015-16. Using a very rich data set, our econometric analysis that is based on matching methods reveals that the usual home advantage disappears when the game is in the middle of the week instead of being on the weekend. Our results indicate that, since the midweek matches are unevenly allocated among teams, the actual schedules of the Bundesliga favour teams with fewer home games in midweek. The paper also shows that these soccer-specific findings have some implications for the design of contests in general.
We reassess the effects of natural resources on economic development and conflict, applying a cau... more We reassess the effects of natural resources on economic development and conflict, applying a causal forest estimator and data from 3,800 Sub-Saharan African districts. We find that, on average, mining activities and higher world market prices of locally mined minerals both increase economic development and conflict. Consistent with the previous literature, mining activities have more positive effects on economic development and weaker effects on conflict in places with low ethnic diversity and high institutional quality. In contrast, the effects of changes in mineral prices vary little in ethnic diversity and institutional quality, but are non-linear and largest at relatively high prices.
We investigate heterogenous employment effects of Flemish training programmes. Based on administr... more We investigate heterogenous employment effects of Flemish training programmes. Based on administrative individual data, we analyse programme effects at various aggregation levels using Modified Causal Forests (MCF), a causal machine learning estimator for multiple programmes. While all programmes have positive effects after the lock-in period, we find substantial heterogeneity across programmes and types of unemployed. Simulations show that assigning unemployed to programmes that maximise individual gains as identified in our estimation can considerably improve effectiveness. Simplified rules, such as one giving priority to unemployed with low employability, mostly recent migrants, lead to about half of the gains obtained by more sophisticated rules.
Uncovering the heterogeneity of causal effects of policies and business decisions at various leve... more Uncovering the heterogeneity of causal effects of policies and business decisions at various levels of granularity provides substantial value to decision makers. This paper develops new estimation and inference procedures for multiple treatment models in a selection-on-observables framework by modifying the Causal Forest approach suggested by Wager and Athey (2018) in several dimensions. The new estimators have desirable theoretical, computational and practical properties for various aggregation levels of the causal effects. While an Empirical Monte Carlo study suggests that they outperform previously suggested estimators, an application to the evaluation of an active labour market programme shows the value of the new methods for applied research.
In recent years, microeconometrics experienced the ‘credibility revolution’, culminating in the 2... more In recent years, microeconometrics experienced the ‘credibility revolution’, culminating in the 2021 Nobel prices for David Card, Josh Angrist, and Guido Imbens. This ‘revolution’ in how to do empirical work led to more reliable empirical knowledge of the causal effects of certain public policies. In parallel, computer science, and to some extent also statistics, developed powerful (so-called Machine Learning) algorithms that are very successful in prediction tasks. The new literature on Causal Machine Learning unites these developments by using algorithms originating in Machine Learning for improved causal analysis. In this non-technical overview, I review some of these approaches. Subsequently, I use an empirical example from the field of active labour market programme evaluation to showcase how Causal Machine Learning can be applied to improve the usefulness of such studies. I conclude with some considerations about shortcomings and possible future developments of these methods a...
There is great demand for inferring causal effect heterogeneity and for open-source statistical s... more There is great demand for inferring causal effect heterogeneity and for open-source statistical software, which is readily available for practitioners. The mcf package is an open-source Python package that implements Modified Causal Forest (mcf), a causal machine learner. We replicate three well-known studies in the fields of epidemiology, medicine, and labor economics to demonstrate that our mcf package produces aggregate treatment effects, which align with previous results, and in addition, provides novel insights on causal effect heterogeneity. For all resolutions of treatment effects estimation, which can be identified, the mcf package provides inference. We conclude that the mcf constitutes a practical and extensive tool for a modern causal heterogeneous effects analysis.
The order of actions in contests may have a significant effect on performance. In this study we e... more The order of actions in contests may have a significant effect on performance. In this study we examine the role of schedule in round-robin tournaments with sequential games between three and four contestants. Our propensity-score matching estimation, based on soccer FIFA World Cups, UEFA European Championships and Olympic wrestling events, reveals that there is a substantial advantage to the contestant who competes in the first and third matches, which is in line with game-theoretical predictions. Our finding implies that the round-robin structure with sequential games is endogenously unfair, since it systematically favours one of the contestants.
This paper considers the practically important case of nonparametrically estimating heterogeneous... more This paper considers the practically important case of nonparametrically estimating heterogeneous average treatment effects that vary with a limited number of discrete and continuous covariates in a selection-on-observables framework where the number of possible confounders is very large. We propose a two-step estimator for which the first step is estimated by machine learning. We show that this estimator has desirable statistical properties like consistency, asymptotic normality and rate double robustness. In particular, we derive the coupled convergence conditions between the nonparametric and the machine learning steps. We also show that estimating population average treatment effects by averaging the estimated heterogeneous effects is semi-parametrically efficient. The new estimator is an empirical example of the effects of mothers' smoking during pregnancy on the resulting birth weight.
In this note, we show that the OLS and fixed-effects (FE) estimators of the popular differ-ence-i... more In this note, we show that the OLS and fixed-effects (FE) estimators of the popular differ-ence-in-differences model may deviate when there is time varying panel non-response. If such non-response does not affect the common-trend assumption, then OLS and FE are consistent, but OLS is more precise. However, if non-response is affecting the common-trend assumption, then FE estimation may still be consistent, while OLS will be inconsistent. We provide simulation as well as empirical evidence for this phenomenon to occur. We conclude that in case of unbalanced panels, any evidence of deviating OLS and FE estimates should be considered as evidence that non-response is not ignorable for the differences-in-differences estimation.
This paper models payment evasion as a source of profit by letting the firm choose the price char... more This paper models payment evasion as a source of profit by letting the firm choose the price charged to paying consumers and the fine collected from detected payment evaders. The consumers choose whether to purchase, evade payment, or refrain from consumption. We show that payment evasion allows the firm to charge a higher price to paying consumers and to generate a higher profit. We also show that higher fines do not necessarily reduce payment evasion. Finally, we provide empirical evidence which is consistent with our theoretical analysis, using comprehensive micro data on fare dodging on the Zurich Transport Network.
Switzerland is a multi-lingual developed country that provides an attractive stage to test ingrou... more Switzerland is a multi-lingual developed country that provides an attractive stage to test ingroup favoritism that is driven by linguistic differences. To that end, we utilize data from soccer games in the top two Swiss divisions between the seasons 2005/06 and 2017/18. In these games, the referee was from the same linguistic area with one team, whereas the other team was from a different linguistic area. Using very rich data on teams’ and games’ characteristics, our causal forest-based estimator reveals that referees assign significantly more penalties in the form of yellow and red cards to teams from a different linguistic area. This form of ingroup favoritism is large enough so that it is likely to affect the outcome of the game. As evidence, we find that the difference in points in favor of the home team increases significantly when a referee is from the same linguistic area.
The home advantage phenomenon is a well-established feature in sports competitions. In this paper... more The home advantage phenomenon is a well-established feature in sports competitions. In this paper, we examine data from 1,908 soccer matches played in the German Bundesliga during the seasons from 2007-08 to 2015-16. Using a very rich data set, our econometric analysis that is based on matching methods reveals that the usual home advantage disappears when the game is in the middle of the week instead of being on the weekend. Our results indicate that, since the midweek matches are unevenly allocated among teams, the actual schedules of the Bundesliga favour teams with fewer home games in midweek. The paper also shows that these soccer-specific findings have some implications for the design of contests in general.
We reassess the effects of natural resources on economic development and conflict, applying a cau... more We reassess the effects of natural resources on economic development and conflict, applying a causal forest estimator and data from 3,800 Sub-Saharan African districts. We find that, on average, mining activities and higher world market prices of locally mined minerals both increase economic development and conflict. Consistent with the previous literature, mining activities have more positive effects on economic development and weaker effects on conflict in places with low ethnic diversity and high institutional quality. In contrast, the effects of changes in mineral prices vary little in ethnic diversity and institutional quality, but are non-linear and largest at relatively high prices.
We investigate heterogenous employment effects of Flemish training programmes. Based on administr... more We investigate heterogenous employment effects of Flemish training programmes. Based on administrative individual data, we analyse programme effects at various aggregation levels using Modified Causal Forests (MCF), a causal machine learning estimator for multiple programmes. While all programmes have positive effects after the lock-in period, we find substantial heterogeneity across programmes and types of unemployed. Simulations show that assigning unemployed to programmes that maximise individual gains as identified in our estimation can considerably improve effectiveness. Simplified rules, such as one giving priority to unemployed with low employability, mostly recent migrants, lead to about half of the gains obtained by more sophisticated rules.
Uncovering the heterogeneity of causal effects of policies and business decisions at various leve... more Uncovering the heterogeneity of causal effects of policies and business decisions at various levels of granularity provides substantial value to decision makers. This paper develops new estimation and inference procedures for multiple treatment models in a selection-on-observables framework by modifying the Causal Forest approach suggested by Wager and Athey (2018) in several dimensions. The new estimators have desirable theoretical, computational and practical properties for various aggregation levels of the causal effects. While an Empirical Monte Carlo study suggests that they outperform previously suggested estimators, an application to the evaluation of an active labour market programme shows the value of the new methods for applied research.
Uploads
Papers by Michael Lechner SEW