IA Math HL
IA Math HL
IA Math HL
INTERNATIONAL BACCALAUREATE
IB Math AA HL IA
4 April 2023
Exploring Poisson distribution and its relation with the binomial and
exponential distribution through FC Barcelona’s goals conceded in
La Liga Santander 2022-23 season
1. Introduction
Football is said to be one of the most unpredictable sports because it is a low-scoring game. This
infers that if the supposedly ‘better’ team misses their few chances in a match, the team might tie
or even lose. Therefore, the outcome of the game depends entirely on the human with the
opportunity to score. This exploration aims to analyze FC Barcelona’s games using the Poisson
distributions. The model will be expanded to the exponential distribution to determine different
probabilities of conceding goals with respect to time throughout the remainder of the season.
Despite the unpredictability of human behavior, FCB’s defense has been on fire this season, with
16 clean sheets (0 goals against) in 21 games. Therefore, I believe that the consistency of such
defense might enable the use of Poisson distribution to accurately predict different variables such
as time between goals conceded and the probability of a goal being scored against Barça with
respect to time. Despite not knowing about the Poisson distribution, I was interested in
exploring the concept of probability in sports, thus, by researching and reflecting with my
supervisor, I concluded that the Poisson distribution would be the best approach to determine the
probability of conceding goals. This investigation will explore the Poisson distribution through
Barça’s defense and determine whether it has been consistent enough for goals conceded to be
considered an independent event, meaning that the probability of conceding a goal is not
dependent on time. Although many may argue that players can get injured, ruining the whole
mathematics and probability calculated in this investigation, Barça seems to have found a sweet
spot, with incredible reserves such as Jordi Alba, Marcos Alonso, Eric Garcia, Iñaki Peña, and
Sergi Roberto ready to demonstrate their worth. Additionally, players like Ronald Araujo and
3
Kounde can adapt to different positions and roles to help the team perform, increasing Barça’s
The Poisson distribution was invented to predict the number “of events occurring in the future”
(Aerin). For Poisson to be applicable, events must occur independently, and the probability of an
event occurring in a given length of time must not change through time. Although goals are not
independent events, Barça’s defensive consistency has encouraged me to assume so and analyze
whether goals conceded can be modeled using the Poisson distribution. One of Poisson’s
characteristics is its asymmetric nature, as it is slightly skewed to the right, because “it is
inhibited by the zero occurrence barrier” (Aerin). It is derived from the binomial probability
mass function, which is a discrete function that states the probability of having r successes in n
trials. The binomial distribution only has two possible outcomes, success (no goal conceded) or
failure (at least one goal conceded), where the probability of succeeding/failing is constant for
every trial and does not change with respect to time. The derivation is the following:
.
4
To use L’Hopital’s rule, the derivative for both the denominator and the numerator must be
found.
Therefore
Finally,
5
where P(x) is the probability that Barça concedes x goals, x stands for the number of goals
conceded in a single game, and lambda (λ) is the expected number of goals per game conceded
by FCB (Barcelona). It is a discrete function, as goals conceded can only be modeled using
integers, meaning that Barça cannot concede 1.5 goals. It is important to note that the greater the
value of lambda (λ), the more events can occur in a given interval, thus, there are more
probability values. In other words, because λ increases, the average probability values decrease,
so the Poisson will look more similar to a normal distribution as it will be less skewed to the
right.
The expected goals against (xGA) will be represented using λA, the goal difference (GD) will be
represented with λD, while FCB’s expected goals (xG) will be represented using λS. These three
variables will all be calculated using the same formula, in which the average goals per game are
found. The first step is to find the team’s expected goals against (xGA) per game, which will be
calculated using the following formula: , using the data as of the 12th of
February 2023. It is important to note that only this season’s results will be considered and only
La Liga games count towards calculating λ. A data table with the basic information of
The value for Barça’s expected goals will not be used due to the irregularity of the team
offensively speaking. The GD cannot be modeled using Poisson because it cannot calculate the
probability of a negative goal difference. Thus, it would not accurately illustrate the probability
of Barça losing a game. Theoretically, in any given Barcelona game in La Liga, the most likely
result to occur is a 2-0 in favor of the Catalan club. However, Barça has won only one out of
their 21 games with that exact score, further expanding on the unpredictability of football, and
with it, of Barça’s offensive records. This is the reason why the investigation will focus on goals
conceded, as Poisson Distribution is not applicable to model any other aspect of Barça’s game.
To make sure that the Poisson model can accurately depict Barça’s defensive results, one must
start by comparing the value in a data table. The Actual xGA represents the number of times that
Barça has conceded x goals in a game. These values are then divided by 21 (the total number of
games played) to convert them into actual probabilities, which can be compared with the Poisson
0 16 0.762 0.719
1 4 0.190 0.237
2 0 0.000 0.039
3 1 0.048 0.004
4 0 0.000 0.000
5 0 0.000 0.000
As previously mentioned, Graph 1 is discrete because there can only be an integer number of
goals. Despite not being continuous, the trendlines are included to portray the closeness between
conceding goals. Thus, if Barça maintains its defensive consistency, it should end up closer to the
Poisson distribution than it already is. A Poisson distribution in football looks exactly like the red
Now that the similarity between the actual goals conceded by Barça and the Poisson distribution
has been demonstrated, I can start getting deeper into the analysis of the correlation between time
and goals conceded by FC Barcelona. Additionally, the derivation of the cumulative distribution
function (CDF), and the probability distribution function (PDF) will be explained. Both of these
functions are exponential distributions, as they are continuous functions that “measure the
the probability of no goals being scored against Barça in one unit of time (one full game).
8
Assuming that conceding a goal is an independent event, meaning that the probability of
conceding a goal does not change with time, both sides of the equation above can be raised to the
time (t games). For example, the probability of conceding no goals in 2 straight games is
(50%) that Barça concedes no goals in the given time interval. Although this might not seem
impressive, football fans will understand that a clean sheet secures at least a point (draw). When
combining this defense with Barça’s offensive, which might be unpredictable but effective, as it
has scored at least one goal for 20 out of the 21 La Liga games so far, FCB has a real chance at
getting the maximum amount of points possible in two straight games (6 points). Moreover, the
probability of at least one goal in t units of time (f(t)) is the complement: , thus,
, which indicates that there is no absolute certainty that Barça will receive an
additional single league goal in the remaining games of the season. In other words, although it is
very likely that Barça concedes one or more goals in the 17 games left in the La Liga season:
(99.7%), it is not 100% certain that Barça will concede one. The
y-intercept is at (0,0), as there is a probability of zero that a goal will be conceded by Barça at
The graph has a limit of 0 ≤ t ≤ 17, because there cannot be a negative amount of games played,
and there are only 17 games left for Barça to play in the 2022-23 edition of La Liga. The graph
provides a clear demonstration of the proportional relation between the probability of conceding
at least one goal and the number of games played (t). Graph 2 is a cumulative frequency
distribution (CDF) because as the name implies, the probability of a random event occurring is
accumulated as time passes. Graph 2 is a continuous function, as there is a probability value for
By definition, the cumulative density function (CDF) is the integral of a probability density
function (PDF), which makes sense when considering that the CDF is the accumulation of all
PDF values. Since Graph 2 is the cumulative density function, to get the probability density
function, one must do the reverse of integration; differentiation. The derivative of f(t) can be
10
Similarly to Graph 2, Graph 3 also has a limit of 0 ≤ t ≤ 17, due to the same reasons. The
secondly, g(t) ≥ 0, stating that the probability of the event occurring at any time (t) cannot be 0.
Graph 3 resembles Graph 2 in that it is a continuous function, as there is a value for the
probability of conceding no goals for every value of t. As seen in Graph 3, the y-intercept of g(t)
the time taken (t) for the probability that Barça concedes at least one goal to be f(t). The time
For example, another value that can be calculated is the probability that Barça concedes 7 goals
in 21 games (so far): . If one were to remove the game against Real
Madrid, in which Barça conceded 3 goals, it is clear that Barça’s defensive record, 4 goals
conceded in 20 games (averaging 1 goal every 5 games), is outstanding to say the least. The
If these probabilities were taken and compared to other teams defenses, Barça would
undoubtedly be on top. The second team with less goals conceded in the top 5 European leagues
is Napoli with 15 goals in 22 games (0.682 goals conceded per game), which is more than twice
the amount of goals conceded by Barça (0.333 per game). These numbers can be extremely
helpful to Barça, as it provides an insight into what they need to boost the team to the next level.
For instance, the team needs to improve its offensive consistency by buying solid strikers in
4. Conclusion
Altogether, this investigation has demonstrated the relationship between the Poisson distribution,
the binomial, and exponential distribution, while providing interesting data on Barça’s defense.
Despite the similarity, displayed in Graph 1, between real life and the Poisson distribution for
expected goals against, they are not identical, because human error exists. By assuming that
Barça’s goals conceded are independent events, the three different distributions work for any
given time interval. Thus, one can analyze Barça’s defensive performance throughout the season
12
and understand the probability of other events that have already occurred. For example, Barça’s
longest run without conceding a goal was 6 La Liga games in a row, equivalent to 540 minutes of
game time. The probability of this occurring is . Not only this, but
Barça has only conceded 1 goal in the last 6 games. Therefore, if it weren’t for Kounde’s own
goal, a mistake which he made at the very end of the match, Barça would have been on two
six-game streaks without conceding a single goal, which is even more unlikely. This adds to the
point that football players are not machines, so momentum and motivation can single-handedly
conceded almost half of its goals in a single league game against Real Madrid. Therefore, the
such as the offensive ability of the opponent. Nevertheless, as seen in Graph 1, both functions
have similar probability values on the expected goals conceded per game. Football is guided by
passion, but motivation and recent form are equally if not more important in deciding the
outcome of a match. The challenge of applying mathematics in football is deciding where to start
counting: from the start of the season, solely using last month's results, or only the past 5 games.
This is a significant dilemma since it radically alters the statistics and probability computed.
Using the previous example, if we just utilized Barça's past five games, the likelihood of
allowing a goal would be substantially lower since they only conceded one goal in the past five
league games. Other factors not taken into account such as home and away games, or the
opponent Barça faces (ex. Real Madrid) can also affect the probability of FC Barcelona
conceding goals. This would also affect the closeness between Poisson and actual xGA, thus,
season because of their consistency, and the chance to achieve a historic season. Consistency is a
13
synonym of independence, meaning that due to Barça’s consistency, events are more likely to
occur randomly, which enabled me to use the Poisson distribution to analyze Barça’s defense. If
Barça maintains its form, it should concede a total of 13 goals in the entire league campaign
(0.342 per game), overthrowing Mourinho’s record-holder Chelsea, which only conceded 15
Additionally, and despite the fact that goals conceded theoretically cannot be considered an
independent variable, as they depend on many factors, the results from this exploration indicate
that they could be when taking into account a full season. In other words, I believe that the
longer the time period relied on to assess a team’s performance, the closer the goals conceded
will be to the Poisson Distribution, since the players’ ‘ups and downs’ will balance out.
Therefore, the longer the time period, the more the goals conceded by Barça will resemble an
independent event. Finally, an extension to this investigation could be to explore the connection
between the exponential and the geometric distribution, which would allow for a deeper
distributions. Moreover, if one were provided with additional data on Barça’s defensive and
offensive records this season, it would enable one to make conclusions and predictions on future
games. Therefore, I hope that this investigation encourages others to expand on the data found
Aerin. “Poisson Distribution Intuition (and Derivation).” Medium, Towards Data Science, 18
Sept. 2022,
https://towardsdatascience.com/poisson-distribution-intuition-and-derivation-1059aeab90d.
Ciliar, Nick. “4.1) PDF, Mean, & Variance.” Introduction to Engineering Statistics, 2023,
http://matcmath.org/textbooks/engineeringstats/pdf-mean-variance/.
FootyStats Editors, “FC Barcelona Stats, Form & XG.” 2022/23 FC Barcelona Statistics, 12 Feb.
2023, https://footystats.org/clubs/fc-barcelona-83.
Kissell, Robert, and James Poserina. Optimal Sports Math, Statistics, and Fantasy. Academic
Press, 2017.