Maths IA First Draft
Maths IA First Draft
Maths IA First Draft
Candidate Name:
Candidate Number
Table of Contents
Table of Contents ......................................................................................................................... 1
1
Introduction................................................................................................................................. 2
Brief Introduction of xG ................................................................................................................ 3
Method of Data Collection ............................................................................................................ 4
Research Question........................................................................................................................ 5
Results......................................................................................................................................... 6
The Relationship Between xG and GF ......................................................................................... 6
Scatter plot of xG and GF ....................................................................................................... 6
Pearson Test on xG and GF:.................................................................................................... 6
Spearmans Ranking on xG and GF:.......................................................................................... 6
Summary .............................................................................................................................. 7
The Relationship Between xG and GA......................................................................................... 8
Scatter plot of xG and GA....................................................................................................... 8
Pearson Test on xG and GA: ................................................................................................... 9
Spearman's Ranking on xG and GA ......................................................................................... 9
Summary .............................................................................................................................. 9
The Relationship Between xG and W........................................................................................ 11
Scatter Plot of xG and W ...................................................................................................... 11
Pearson Test on xG and W: .................................................................................................. 11
Spearmans Ranking on xG and W ......................................................................................... 12
Summary: ........................................................................................................................... 12
The Relationship Between xG and L ......................................................................................... 14
Scatter Plot of xG and L........................................................................................................ 14
Pearson Test on XG and L:.................................................................................................... 14
Spearmans Ranking on xG and L ........................................................................................... 14
Summary ............................................................................................................................ 15
The Relationship Between xG and Pts ...................................................................................... 17
Scatter Plot of xG and Pts..................................................................................................... 17
Pearson Test on xG and Pts:................................................................................................. 17
Spearmans Ranking on xG and Pts ........................................................................................ 17
Summary ............................................................................................................................ 18
Conclusion.............................................................................................................................. 19
Reflection............................................................................................................................... 20
Bibliography ........................................................................................................................... 20
2
Brief Introduction of xG
xG was a feature created in 2012 by Opta’s Sam Green used to measure how good
a scoring chance. This was done by calculating the likelihood of whether the player can
score based on similar information in past seasons that can determine the player’s ability to
score. During the 22/23 season, Darwin Nuñez was initially predicted to have a xG of 0.72
(Sky Sports Premier League 2023) . This is because of his good performance in the 21/22
where he had a 0.87 xG (FootyStats 2024) . Whereas Erling Haland initially he was
predicted to have a xG 0.70 for the 22/23, where he was predicted to score less than Darwin
Nunez (Sky Sports Premier League 2023). However, by the end of the 22/23 season
Haaland has a 0.82 xG whereas Darwin had 0.44 xG (FootyStats 2024) . This implied that
Haaland had exceeded the average amount of goals he was expected to score. Whereas,
Darwin had failed to meet the predictions for the average amount goals he was expected to
score in the 22/23 season (Sky Sports Premier League 2023). This shows that Expected
goals as an indicator may not always be reliable due to external factors that affecting team
performance in a match.
Expected goals (xG) as a team performance indicator has been questioned for its
accuracy for providing an overview of how teams would perform within their respective
leagues. On one hand, xG can give slightly accurate predictions on match day performance.
For instance, it can be 66% accurate with home games results and 58% accurate for away
games results, showing its reliance in predicting team performance (Football XG 2024).
However, xG has issues in maintaining accuracy on a consistent basis. For example, during
the 19/20 season; based on the results from xG. Manchester City was expected to win the
premier league by 13 points. Despite this, Liverpool won the premier league title in spite of
being predicted from the last two seasons to score fewer than 39 points (Macinnes 2020).
Hence, showing the limitations of xG as a reliable measure of team performance when being
unable to consider of other factors of team performance that include the quality of players
3
Introduction
Team performance is currently being measured with using the 5 statistics of GF
(Goals For), GA (Goals Against), W (wins), L (Losses) and Pts (Points). Goals For is the
number of goals scored by a team in a season against their opponents. Goals Against is the
number of goals conceded by a team in a season by their opponents. Wins are the number
of matches won in a season by a team against their opponents. Losses are the number of
matches lost in a season by a team against their opponents. Lastly, points are values where
a team accumulates that determines their league placement in a season where wins = 3
points, draws = 1 point and losses = 0. xG only became mainstream in 2017 (Willams 2020).
Furthermore, this would be done through using Pearson Coefficient and Spearman
Ranking to measure their relationship with xG. Subsequently, the results of the test for each
factor of team performance will be compared to xG . The data was presented as scatter
plots.
whether xG can objectively give be reliable accurate predictions that can be used by football
fans to accurately assess how well their teams would do in their respective football leagues
not being a reliable indicator for team performance. It is used as a marker despite its
inaccuracies. Hence, my math IA would be looking to settle the debate in seeing the validity
of xG in football analysis.
data is from Opta which is company that collects accurate and reliable data that is football
4
I chose these xG, GF, GA, W, L and pts because they are obvious determiners for
team performance. Data was collated into an excel spreadsheet. I grouped the data where I
compared xG with GF, GA, W, L and Pts as factors relating to team performance can also
influence a team’s league standing. Thus, when comparing with xG, this would help with
seeing whether these factors are good for determining team performance.
Step 1 Input data sets for xG and the indicators of team performance on to the
GDC
Step 2 Label x axis and y axis to do a linear regression stat calculation on the
scatter graph
Step 3 Go to menu--> stat calculation --> linear regression (mx+b)
Spearman’s Rank Correlation Calculation
Research Question
Does xG determine team performance within the five factors that influences it?
Hypothesis: xG is key an indicator for all five statistics in determining team performance
5
Results
The Relationship Between xG and GF
The results suggest that xG and GF have a strong positive linear relationship of 0.9095.
B:0.1722 r=0.9991
(0.988) is close to 1. Hence, imply that when there is an increase in xG, there would also be
an increase in GF. Further, the r2 value shows that there is a 99% fit in the data hence
6
Summary
The relationship between xG goals and GF suggest a strong positive correlation
between xG goals and GF. This shows that xG is an accurate determiner for better team
performance. As a higher the xG, would mean the more goals scored which shows better
team performance.
When looking at all the top three leagues within Europe, there seems to be
correlation between xG and goals scored as well as team performance but there are some
exceptions.
Out of the three teams that had highest xG, Inter and Napoli scored the most goals
which ranked them second and first in the league. Therefore, suggesting some correlation
that xG is a good determiner for GF and therefore team performance. However, Milan
despite being within the top three teams for xG, they are not in the top three for league
standing nor GF which could suggest the possibility of other factors affecting team
performance.
therefore team performance. The teams with the three highest xG were placed in the top
three in league standing and GF. Thus, suggesting that results in La Liga strongly supports
7
2 Brighton (73.3) Arsenal (88) Arsenal
3 Newcastle Utd Liverpool (75) Manchester Utd
(71.9)
Table 3 Comparison of the Top Three rankings of XG, GF and League Standing in The Premier
League
The results in the EPL indicate a low correlation. This can be shown that out of the
three teams with the highest xG, only Manchester City was placed within the top three for
GF and within the league standing. It can that xG can predict team performance that the
higher the xG, the higher amount goals scored thus a higher league standing. However,
Brighton and Newcastle Utd despite being predicted to be within the top three for xG isn’t
with the top three in league standing nor GF, indicating that xG is not an accurate measure
of team performance.
performance within La Liga. As the more goals a team scores, the more likely they would
perform well; seen by Barcelona and Atlético Madrid being joint second for the most goals
scored, which allowed for them to be within the top three within La Liga. This could imply that
La Liga teams are much more expansive in their playstyle, suggesting emphasis on
attacking tactics in improving their team performance within the league. However, for the
Premier League and Serie A, it appears that the amount goals score by a team doesn’t
seem to have strong correlation to improving their team performance within their league
standing implying that their other factors influencing team performance than GF.
8
The graph looks like it shows a moderate negative correlation
and GA suggesting that if there is an increase of xG, there would be a decrease in GA.
Furthermore, the r2 value shows a 35% good fit suggesting that the data moderately
B:0.00212 r=0.9991
The r value (0.997) between xG and GA implies a strong positive linear relationship
between xG and GA. This implies that when there is a high xG, the GA rank will increase
which means that goals against will decrease. Furthermore, the r2 value suggest that the
Summary
The relationship between xG and GA shows a moderate negative correlation. This
implies that xG and GA have some relationship with each other, implying xG is not a good
There seems to be negative correlation that can suggest that a higher xG, would
mean lower the GA. However, they may be exceptions to it when looking at the top three
9
1 Sampdoria (34.1) Sampdoria (71) Sampdoria
2 Hellas Verona (35.8) Cremonese (69) Cremonese
3 Lecce (36.1) Salernitana (62) Hellas Verona
Table 4 : Comparison of the Top Three Lowest Ranking of XG, GA and League Standing in Serie A
representing team performance. This can be shown by Lecce where they have the highest
xG out of the top three lowest ranked teams in terms xG; leading them to not be within the
top three in GA. This suggest that a higher the xG, would mean a lower GA; translating to
better team performance. Thus, showing that xG is a determiner for team performance for
Standing in La Liga
performance in La Liga. This can be shown by Elche which was a team that has the highest
xG out of the top lowest xG; Elche ended up ranked with the second lowest GA and placed
within the top three lowest ranked teams. This could imply about xG being a flawed indicator
Table 6: Comparison of the Top Three Lowest Ranking Teams in terms of xG, GA and League
The results show some negative correlation between xG and GA in showing that they
having the second lowest xG; leading it to be within the top three lowest ranked team and
highest GA. However, there are outliners: Wolves were ranked to have the lowest xG.
10
However, despite it doesn’t have the highest GA nor is within the top three lowest ranking
teams. This suggest that xG unable to accurately assess team performance in terms of GA.
Thus, rejecting the idea that having the highest xG will give a team the lowest GA that would
Overall, this implies that GA and xG play somewhat of a role in measuring team
performance. As in the Serie A, Lecce had the highest xG out of the top three lowest ranking
teams which led them to not be within the top three in GA. This could suggest the
importance of defensive tactics within the league that infer its importance to maintaining
within La Liga and The Premier League, doesn’t seem to have played much of a role in
judging team performance. The data shows that La Liga has a null hypothesis whereas the
Premier League has a negative correlation, thus suggesting the importance of other factors
relationship between each other. This indicates that xG somewhat determines the number of
11
The r value suggests (0.84) a strong positive linear relationship between xG and
wins. This means the higher the xG; the higher the amount of wins a team would get, thus
affecting team performance. The r2 value (0.84) suggest that the data gathered strongly
supports this.
B:0.00212 r=0.9991
The r value (0.996) implies a strong positive linear relationship between xG and wins.
This could suggest that the higher the xG, the better the chances of a team winning their
matches. Furthermore, r2 values implies that the data strongly supports the answer for good
fit 99%.
Summary:
The relationship between xG and W shows a moderate positive correlation. This
indicates that xG somewhat determines the number of wins that a team would get implying
This shows a positive correlation that suggest that xG and the amount of wins a team
would get can determine team performance. This can show by Inter and Napoli having the
highest amount of xG leading to be top three with the highest number of wins. However,
Milan is an outliner, as is not within the top three in league standing nor for the most amount
of wins. Thus, suggesting the presence of other factors such as draws could have influenced
12
the league standing of a team by a one-point difference that could have impacted team
This shows a positive correlation that xG can predict how well a term would do based
on the number of wins. This is exemplified by Atlético Madrid, Real Madrid and Barcelona
being placed within top three in xG and league standing. This implies that xG is an accurate
measure for team performance, as the higher the xG, the more wins a team would get.
Consequently, supporting the hypothesis that suggest xG can predict team performance
based on the number of wins in La Liga that would allow them to be in the top three in the
league standing.
League
This shows some positive correlation between xG and the number of wins to
determining league. For example, Brighton had the second highest xG; despite they weren’t
second in the league and with the most wins. However, Manchester City had the highest xG
leading to wins; causing them to be 1st in the league. This could suggest that xG as a
measure for team performance can have inconsistencies for determining team performance
that can make it unreliable. As a result, this makes team performance unapplicable to real
Overall, the Serie A and Premier league suggest winning seem to have more an
important factor than xG. As factors such as draws can influence team performance by one
point in league standings. However, La Liga seems to be the outlier where xG, seems to be
13
more important than wins; to be able to place higher within league standing. This suggest
that wins within the top three European leagues is mostly a key indicator for team
The graph below shows a negative correlation implying that the higher the xG the lower
The r value (-0.7835) shows a negatively strong linear relationship between xG and L
(s). This could imply that xG is a good determiner for the amount of losses a team would
make 1. However, the r2 value suggest that the data strongly support the answer that as a -
78% of good fit that suggest a higher xG would a lower amount of loses.
B:0.06809 R= 0.997
14
The r value (0.997) of the data set suggests a strong negative linear relationship as
with an increase of xG, there would be a decrease number of losses a team would face.
Further, the r2 value suggest that the data set strongly support this answer as it has a 99%
good fit.
Summary
The relationship between xG and L (losses made by a football team during a football
season) shows a strong negative correlation. This suggest that xG is a reliable indicator for
determining the amount of losses a team would make during a football season.
There seems a be a negative correlation that suggest that xG does determining team
performance in the amount of loses that a team would get. As exemplified, Sampdoria being
ranked the lowest in xG has the highest number of losses and is one of the bottom
performing teams. This alludes that a lower xG the more loses a team would get as this
The results show some correlation between xG and L that doesn’t support team
performance. This be shown by Elche where it had the highest xG amongst the bottom three
teams; despite this it was placed last in La Liga and ranked high for the most losses. This
implies that having a higher xG would not guarantee the team to do better within their league
15
Ranking xG L League Standing
1 Wolves (36.8) Southampton (25) Leicester City
2 Southampton (37.8) Leicester City (22) Leeds United
3 Bournemouth (38.5) Bournemouth (21) Southampton
Table 12: Comparison of the Top Three Bottom Teams in terms of rankings based on xG, L and
The findings suggest little correlation between xG and the amount of L. This is
exemplified by Bournemouth having the highest xG amongst the top three bottom ranking
teams; where it was top three for most losses but was not bottom three for league standing.
However, Leeds and Leicester City seem to be the teams that do will not perform well,
despite not being ranked for having the top three lowest xG nor top three highest number of
determine team performance within the premier league, thus rejecting hypothesis 3.
depending on the contexts of the top three European leagues. However, Serie A indicate
that losses play a role for team performance where the higher the xG, the more likely the
team would perform better that would translate to lower losses within Serie A. Despite this,
the Premier League and La Liga, suggest that xG and L do not play a role of deciding team
performance, which suggests the limitations of these measures for determining team
performance. As for instance, La Liga views losses as insignificant because of other factors
such as wins and draws that could have played a role in influencing team performance and
league standing.
16
The Relationship Between xG and Pts
The graph shows a moderate positive correlation. This would imply that the higher
the xG, the higher amount of points a team would get. The relationship between xG and pts
is shown to be moderate positive correlation. This implies that xG can influence how many
points a football team would get that would determine their success in their national leagues.
The r value (0.84) shows a strong positive linear relationship between xG and Pts .
This could suggest that xG can influence the number of points teams would get in a season.
Further, the r2 value of 0.71 suggests that data strongly supports this answer.
B:0.0024579 R= 0.999555
0.0024579
The r value (0.99) mplies a strong positive linear relationship where an increase of
xG, would mean an increase of points. Further, the r2 value of 0.999 suggest that the data
17
Summary
The relationship between expected goals and Pts shows a moderate positive
correlation. This implies that xG would influence how many points a football team would get
that would determine team performance based on whether they get relegated, have
performance based on pts. This is referenced by Inter and Napoli having one of the highest
xGs that placed to be top three with the highest number of points and in their league
standings. This is with the exception for Milan where despite being within the top three for
the highest xG, it doesn’t have highest amount points nor top three in league standing. This
suggest leads to hypothesis 3 being accepted, that means that having more points would
The results indicate a strong positive correlation between xG and Pts. This is by
Barcelona, Real Madrid and Atlético Madrid being placed within the top three in xG making
them more likely to have the highest amount points earned and be top three in league
standing. Thus, supporting hypothesis 3 of xG being a key indicator for the most amounts
18
Table 15: Comparison of the Top Three rankings based on xG, Pts and League Standing in The
Premier League
for the number of points that a team would to determine their team performance. This is
evident by Newcastle and Brighton where despite having highest xG; they do not have
highest number of points nor is top three in the league standing. However, there is an
exception where xG can determine team performance. This can be shown by Manchester
City being placed first for the highest xG leading to them being 1st of the most points and 1st
in league standing. This signfies that xG can be an indicator for team performance based on
Overall, xG and Pts can play a role in judging team performance. As La Liga
especially shows a strong positive correlation that suggest a strong relationship between xG
and Pts as being accurate measures for team performance. However, for the Premier
League and Serie A; they don’t have strong positive correlations that suggest that xG can
determine team performance. As Pts seems to be sole determiner for team performance as
exemplified by Napoli and Manchester City having the highest amount of Pts, that lead them
to be placed top three in league standing. Thus, implying about the lack of generalisability
and validity of xG and Pts to determining team performance for the top three European
leagues.
Conclusion
In conclusion, the results suggest that xG can determine team performance but with
varying results. For instance, xG can work within La Liga where they can influence team
performance based on GF, W and pts. This has implications of La Liga being dependent on
more expansive playstyles for a better league standing and team performance. However, xG
cannot be generalisable for team performance for the top three European Leagues. This is
because of Serie A xG predicting only 2/3 of the top teams for Ws, Pts and GF that breaks
the stereotype that alludes to Serie A’s dependence on defensive and goalkeeping.
similarly shown in the Premier League where only 1/3 of the top teams in these three factors;
19
implying a more well-rounded approach to determining team performance that emphasise
Reflection
Spearman ranking was used to investigate whether xG would show a positive or
negative corelation for assessing team performance. Whereas Pearson's test is used to
identify the relationship between xG and team performance indicators to recognise whether
However, Spearman's ranking can be unreliable as it is not suitable for graphs that
have non-linear relationships. Henceforth, the results gathered from the Spearman's ranking
would not be reliable. Furthermore, the limitations of using a Pearson coefficient would be
the inclusion of called a spurious correlation that can make two factors like related when they
are not; that can make the findings unreliable. (Ghouse et al. 2024).
are needed to accurately assess team performance based on xG. This would be done by
including two more seasons and leagues such as the Bundesliga and Ligue 1. Another way
to improve the study would have been to avoid data that have external issues. For example,
COVID-19 lead to football games being suspended on 13th March 2020. This would result in
data on team performance being unreliable for our IA (Premier League 2020). Lastly, an
alternative statistical test like Anova would allow for hypothesis testing between different
group means; to determine whether there is a significant difference between the use of xG
in determining a specific team performance indicator for a specific league (Bevans 2024).
Bibliography
Bevans. R. (2024). One-way Anova test | when and how to use it (with examples). [online].
Available From: ttps://www.scribbr.com/statistics/one-way-
anova/#:~:text=The%20null%20hypothesis%20(H0,use%20a%20t%20test%20instead
[accessed from 26th July 2024].
20
FBref.com. (2023c). 2022-2023 Serie A stats [online]. Available from:
https://fbref.com/en/comps/11/2022-2023/2022-2023-Serie-A-Stats [accessed April 16
2024].
Footy Stats. (2024). Darwin Nunez stats – Goals, xG, assists & career Stats | FootyStats
[online]. Available From: https://footystats.org/players/uruguay/darwin-nunez [accessed from
21 July 2024].
Footy Stats. (2024). Erling Haaland stats – Goals, xG, assists & career stats | FootyStats
[online]. Available From: https://footystats.org/players/norway/erling-haaland [accessed from
21 July 2024].
Football XG. (2024). What are expected Goals (xG)? [online]. Available From:
https://footballxg.com/what_are_expected_goals/#:~:text=23%2B00%3A00-
,So%20how%20much%20better%20is%20expected%20goals%3F,worse%20on%20the%20
home%20results [accessed from 21 July 2024].
Ghouse, G., Rehman, A.U. & Bhatti, M.I. (2024). Understanding of causes of spurious
associations: Problems and prospects. J Stat Theory Appl 23, 44–66.
https://doi.org/10.1007/s44199-024-00072-0
Macinnes. P. (2020). ‘It is beyond the model’: Have Liverpool exposed the limits of
xG?[online]. Available From: https://www.theguardian.com/football/2020/aug/09/liverpool-xg-
jurgen-klopp [accessed from 21 July 2024].
Premier League. (2020). How has the COVID-19 Pandemic affected premier league
matches?. [online]. Available From: https://www.premierleague.com/news/1682374
[accessed from 26th July 2024].
Sky Sports Premier League. (2023). Darwin Nunez tops the Premier League’s average XG
per game this season! [online]. Available From:
https://x.com/SkySportsPL/status/1633482957385023491?lang=en [accessed from 21 July
2024].
https://thesefootballtimes.co/2020/04/08/the-roots-of-expected-goals-xg-and-its-journey-from-
nerd-nonsense-to-the-mainstream/
21