Abstract
Driven by recent advances in technology, tracking devices allow to collect high-frequency data on the position of players in (association) football matches and in many other sports. Although such data sets are available to every professional team, most teams still rely on time-consuming video analysis when analysing future opponents, for example with regard to how goals were scored or a team’s general style of play. In this contribution, we provide a data-driven approach for automated classification of tactics in football. For that purpose, we consider hidden Markov models (HMMs) to analyse high-frequency tracking data, where the underlying states serve for a team’s tactic. In particular, as space control in football has been considered a major driver of success, we focus on the effective playing space, which is the convex hull created by the players excluding the goalkeeper. This quantity relates to both playing style and team behavior. Using copula-based HMMs, we model jointly the effective playing space of both teams to account for the competitive nature of the game. Our model thus provides an estimate of a team’s playing style at each time point, which can be beneficial for team managers but also of huge interest to football fans.
Similar content being viewed by others
References
Baptista, J., Travassos, B., Gonçalves, B., Mourão, P., Viana, J. L., & Sampaio, J. (2020). Exploring the effects of playing formations on tactical behavior and external workload during football small-sided games. The Journal of Strength & Conditioning Research, 34(7), 2024–2030.
Bueno, MJd. O., Silva, M., Cunha, S. A., Torres, Rd. S., & Moura, F. A. (2021). Multiscale fractal dimension applied to tactical analysis in football: A novel approach to evaluate the shapes of team organization on the pitch. PlOS One, 16(9), e0256771.
Cervone, D., D’Amour, A., Bornn, L., & Goldsberry, K. (2016). A multiresolution stochastic process model for predicting basketball possession outcomes. Journal of the American Statistical Association, 111(514), 585–599.
Fernandez, J., & Bornn, L. (2018). Wide open spaces: A statistical technique for measuring space creation in professional soccer. In: Sloan Sports Analytics Conference.
Franks, A., Miller, A., Bornn, L., Goldsberry, K., et al. (2015). Characterizing the spatial structure of defensive skill in professional basketball. Annals of Applied Statistics, 9(1), 94–121.
Frencken, W., Lemmink, K., Delleman, N., & Visscher, C. (2011). Oscillations of centroid position and surface area of soccer teams in small-sided games. European Journal of Sport Science, 11(4), 215–223.
Goes, F., Kempe, M., van Norel, J., & Lemmink, K. (2021). Modelling team performance in soccer using tactical features derived from position tracking data. IMA Journal of Management Mathematics, 32(4), 519–533.
Goes, F., Meerhoff, L., Bueno, M., Rodrigues, D., Moura, F., Brink, M., Elferink-Gemser, M., Knobbe, A., Cunha, S., Torres, R., et al. (2021). Unlocking the potential of big data to support tactical performance analysis in professional soccer: A systematic review. European Journal of Sport Science, 21(4), 481–496.
Goes, F. R., Kempe, M., Meerhoff, L. A., & Lemmink, K. A. (2019). Not every pass can be an assist: A data-driven model to measure pass effectiveness in professional soccer matches. Big Data, 7(1), 57–70.
Gonçalves, B., Folgado, H., Coutinho, D., Marcelino, R., Wong, D., Leite, N., & Sampaio, J. (2018). Changes in effective playing space when considering sub-groups of 3 to 10 players in professional soccer matches. Journal of Human Kinetics, 62, 145.
Härdle, W. K., Okhrin, O., & Wang, W. (2015). Hidden Markov structures for dynamic copulae. Econometric Theory, 31(5), 981–1015.
Joe, H. (2014). Dependence modeling with copulas. CRC Press.
Kempe, M., Goes, F.R., & Lemmink, K.A. (2018). Smart data scouting in professional soccer: Evaluating passing performance based on position tracking data. In 2018 IEEE 14th International Conference on e-Science, IEEE, pp 409–410.
Kosmidis, I., & Karlis, D. (2016). Model-based clustering using copulas with applications. Statistics and Computing, 26(5), 1079–1099.
Lopez, M. J. (2020). Bigger data, better questions, and a return to fourth down behavior: An introduction to a special issue on tracking datain the National Football League. Journal of Quantitative Analysis in Sports, 16(2), 73–79.
Martino, A., Guatteri, G., & Paganoni, A. M. (2020). Multivariate hidden Markov models for disease progression. Statistical Analysis and Data Mining, 13(5), 499–507.
Memmert, D., Raabe, D., Schwab, S., & Rein, R. (2019). A tactical comparison of the 4-2-3-1 and 3-5-2 formation in soccer: A theory-oriented, experimental approach based on positional data in an 11 vs 11 game set-up. PlOS One, 14(1), e0210191.
Orfanogiannaki, K., & Karlis, D. (2018). Multivariate Poisson hidden Markov models with a case study of modelling seismicity. Australian & New Zealand Journal of Statistics, 60(3), 301–322.
Ötting, M., Langrock, R., & Maruotti, A. (2021). A copula-based multivariate hidden Markov model for modelling momentum in football. AStA Advances in Statistical Analysis pp 1–19.
Pohle, J., Langrock, R., van Beest, F. M., & Schmidt, N. M. (2017). Selecting the number of states in hidden Markov models: Pragmatic solutions illustrated using animal movement. Journal of Agricultural, Biological and Environmental Statistics, 22(3), 270–293.
R Core Team. (2021). R: A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria, https://www.R-project.org/
Ric, A., Torrents, C., Gonçalves, B., Torres-Ronda, L., Sampaio, J., & Hristovski, R. (2017). Dynamics of tactical behaviour in association football when manipulating players’ space of interaction. PlOS One, 12(7), e0180773.
Silva, P., Aguiar, P., Duarte, R., Davids, K., Araújo, D., & Garganta, J. (2014). Effects of pitch size and skill level on tactical behaviours of association football players during small-sided and conditioned games. International Journal of Sports Science & Coaching, 9(5), 993–1006.
Vardi, Y., & Zhang, C. H. (2000). The multivariate L\(_1\)-median and associated data depth. Proceedings of the National Academy of Sciences, 97(4), 1423–1426.
Zucchini, W., MacDonald, I. L., & Langrock, R. (2016). Hidden Markov Models for Time Series: An Introduction Using R. Boca Raton: Chapman & Hall/CRC.
Acknowledgements
The authors would like to thank the Associate Editor and the referees for helpful comments that helped improve the paper. Marius Ötting received financial support from the German Research Foundation (DFG), grant number 431536450, which is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ötting, M., Karlis, D. Football tracking data: a copula-based hidden Markov model for classification of tactics in football. Ann Oper Res 325, 167–183 (2023). https://doi.org/10.1007/s10479-022-04660-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-022-04660-0