Abstract
In this paper, we review the literature on Sports Analytics (SA) and predict football players’ scoring performance. Based on previous years’ performance, we predict the number of goals that players scored during the 2021–22 season. To achieve this, we collected advanced statistics for players from five major European Leagues: the English Premier League, the Spanish La Liga, the German Bundesliga, the French Ligue1 and the Italian Serie A, for seasons from 2017–18 up to 2021–22. Additionally, we used one-season lag features, and three supervised Machine Learning (ML) algorithms for experimental benchmarking: Linear Regression (LR), Random Forest (RF) and Multilayer Perceptron (MLP). Furthermore, we compared these models based on their performance. All models’ results are auspicious and comparable to each other. LR was the best performing model with Mean Absolute Error (MAE) 1.60, Mean Squared Error (MSE) 7.06 and Root Mean Square Error (RMSE) 2.66. Based on feature importance analysis, we established that every player’s upcoming scoring performance is strongly associated with previous season’s goals (Gls) and expected goals (xG).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Al-Asadi, M.A., Tasdemir, S.: Predict the value of football players using FIFA video game data and machine learning techniques. IEEE Access 10, 22631–22645 (2022). https://doi.org/10.1109/ACCESS.2022.3154767
Acharya, M.S., Armaan, A., Antony, A.S.: A comparison of regression models for prediction of graduate admissions. In: 2019 International Conference on Computational Intelligence in Data Science (ICCIDS) (2019). https://doi.org/10.1109/iccids.2019.8862140
Apostolou, K., Tjortjis, C.: Sports analytics algorithms for performance prediction. In: 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA), pp. 1–4 (2018). https://doi.org/10.1109/IISA.2019.8900754
Babbar, M., Rakshit, S.K.: A systematic review of sports analytics. Int. Conf. Bus. Manage. (2019)
Calleja, P., Muscat, A., Decelis, A.: The effects of audience behaviour on football players’ performance. J. New Stud. Sport Manage. 3(1), 336–353 (2022). https://doi.org/10.22103/JNSSM.2022.18890.1055
Chai, T., Draxler, R.R.: Root mean square error (RMSE) or mean absolute error (MAE)?- arguments against avoiding RMSE in the literature. Geoscientific Model Dev. 7, 1247–1250 (2014). https://doi.org/10.5194/gmd-7-1247-2014
Cintia, P., Pappalardo, L., Rinzivillo, S.: A network-based approach to evaluate the performance of football teams. In: Machine Learning and Data Mining for Sports Analytics Workshop (MLSA 2015) (2015)
Frey, M., Murina, E., Rohrabach, J., Walser, M., Haas, P., Dettling, M.: Machine learning for position detection in football. In: 2019 6th Swiss Conference on Data Science (SDS), pp. 111–112 (2019). https://doi.org/10.1109/SDS.2019.00009
Ghafari, S.M., Tjortjis, C.: A survey on association rules mining using heuristics. Wiley Interdisc. Rev.: Data Min Knowl. Disc. 9(4), (2019). https://doi.org/10.1002/widm.1307
Gupta, S., Gupta, A.: Dealing with noise problem in machine learning datasets: a systematic review. Procedia Comput. Sci. 161, 466–474 (2019)
Gyarmati, L., Hefeeda, M.: Competition-wide evaluation of individual and team movements in soccer. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), pp. 144–151 (2016). https://doi.org/10.1109/icdmw.2016.0028
Hinton, G.E.: How neural networks learn from experience. Sci. Am. 267(3), 144–151 (1992). https://doi.org/10.1038/scientificamerican0992-144
Iqbal, M.A.: Application of regression techniques with their advantages and disadvantages. Elektron Magazine 4, 11–17 (2021)
Jackson, E., Agrawal, R.: Performance evaluation of different feature encoding schemes on cybersecurity logs. In: 2019 SoutheastCon (2019). https://doi.org/10.1109/SoutheastCon42311.2019.9020560
Kapoteli, E., Koukaras, P., Tjortjis, C.: Social media sentiment analysis related to COVID-19 vaccines: case studies in English and Greek language. Artif. Intell. Appl. Innov. (2022). https://doi.org/10.1007/978-3-031-08337-2_30
Koukaras, P., Tjortjis, C., Rousidis, D.: Mining association rules from COVID-19 related twitter data to discover word patterns, topics and inferences. Inf. Syst. 109, 1–21 (2022). https://doi.org/10.1016/j.is.2022.102054
Krogh, A.: What are artificial neural networks? Nat. Biotechnol. 26(2), 195–197 (2008). https://doi.org/10.1038/nbt1386
Kursa, M., Rudnicki, W.: The all relevant feature selection using random forest (2011)
Manish, S., Bhagat, V., Pramila, R.: Prediction of football players performance using machine learning and deep learning algorithms. In: 2021 2nd International Conference for Emerging Technology (INCET). pp. 1–5 (2021). https://doi.org/10.1109/INCET51464.2021.9456424
Pantzalis, V.C., Tjortjis, C.: Sports analytics for football league table and player performance prediction. In: 2020 11th International Conference on Information, Intelligence, Systems and Applications (IISA) (2020). https://doi.org/10.1109/iisa50023.2020.9284352
Pariath, R., Shah, S., Surve, A., Mittal, J.: Player performance prediction in football game. In: Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), pp. 1148–1153 (2018). https://doi.org/10.1109/ICECA.2018.8474750
Sarlis, V., Chatziilias, V., Tjortjis, C., Mandalidis, D.: A data science approach analyzing the impact of injuries on basketball player and team performance. Inf. Syst. 99, 101750 (2021). https://doi.org/10.1016/j.is.2021.101750
Singh, J.: Random Forest: Pros and Cons. Medium (2020). https://medium.datadriveninvestor.com/random-forest-pros-and-cons-c1c42fb64f04
Sports Reference, https://www.sports-reference.com/. Accessed 1 Sept 2022
Srinivasan, B.: A social network analysis of football – evaluating player and team performance. In: 2017 Ninth International Conference on Advanced Computing (ICoAC), pp. 242–246 (2017). https://doi.org/10.1109/ICoAC.2017.8441301
Subramanya, T., Harutyunyan, D., Riggio, R.: Machine learning-driven service function chain placement and scaling in MEC-enabled 5G networks. Comput. Netw. 1–20 (2019). https://doi.org/10.1016/j.comnet.2019.106980
Tzirakis, P., Tjortjis, C.: T3C: improving a decision tree classification algorithm’s interval splits on continuous attributes. Adv. Data Anal. Classif. 11(2), 353–370 (2016). https://doi.org/10.1007/s11634-016-0246-x
Willmott, C., Matsuura, K.: Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Res. 30, 79–82 (2005). https://doi.org/10.3354/cr030079
Yang, J.B., Shen, K.Q., Ong, C.J., Li, X.P.: Feature selection for MLP neural network: the use of random permutation of probabilistic outputs. IEEE Trans. Neural Netw. 20(12), 1911–1922 (2009). https://doi.org/10.1109/tnn.2009.2032543
Acknowledgments
This research is co-financed by Greece and the European Union (European Social Fund-SF) through the Operational Programme «Human Resources Development, Education and Lifelong Learning 2014–2020» in the context of the project “Support for International Actions of the International Hellenic University”, (MIS 5154651).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
We include here both basic and advanced features for forecasting experiments.
Rights and permissions
Copyright information
© 2023 IFIP International Federation for Information Processing
About this paper
Cite this paper
Giannakoulas, N., Papageorgiou, G., Tjortjis, C. (2023). Forecasting Goal Performance for Top League Football Players: A Comparative Study. In: Maglogiannis, I., Iliadis, L., MacIntyre, J., Dominguez, M. (eds) Artificial Intelligence Applications and Innovations. AIAI 2023. IFIP Advances in Information and Communication Technology, vol 676. Springer, Cham. https://doi.org/10.1007/978-3-031-34107-6_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-34107-6_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34106-9
Online ISBN: 978-3-031-34107-6
eBook Packages: Computer ScienceComputer Science (R0)