Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

IA Math HL

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

1

INTERNATIONAL BACCALAUREATE
IB Math AA HL IA

4 April 2023

Session: May 2023


Personal Code: kdg038

Exploring Poisson distribution and its relation with the binomial and
exponential distribution through FC Barcelona’s goals conceded in
La Liga Santander 2022-23 season

Page Count: 14.


2

1. Introduction

Football is said to be one of the most unpredictable sports because it is a low-scoring game. This

infers that if the supposedly ‘better’ team misses their few chances in a match, the team might tie

or even lose. Therefore, the outcome of the game depends entirely on the human with the

opportunity to score. This exploration aims to analyze FC Barcelona’s games using the Poisson

distribution, to further understand the interconnectedness between different probability

distributions. The model will be expanded to the exponential distribution to determine different

probabilities of conceding goals with respect to time throughout the remainder of the season.

Despite the unpredictability of human behavior, FCB’s defense has been on fire this season, with

16 clean sheets (0 goals against) in 21 games. Therefore, I believe that the consistency of such

defense might enable the use of Poisson distribution to accurately predict different variables such

as time between goals conceded and the probability of a goal being scored against Barça with

respect to time. Despite not knowing about the Poisson distribution, I was interested in

exploring the concept of probability in sports, thus, by researching and reflecting with my

supervisor, I concluded that the Poisson distribution would be the best approach to determine the

probability of conceding goals. This investigation will explore the Poisson distribution through

Barça’s defense and determine whether it has been consistent enough for goals conceded to be

considered an independent event, meaning that the probability of conceding a goal is not

dependent on time. Although many may argue that players can get injured, ruining the whole

mathematics and probability calculated in this investigation, Barça seems to have found a sweet

spot, with incredible reserves such as Jordi Alba, Marcos Alonso, Eric Garcia, Iñaki Peña, and

Sergi Roberto ready to demonstrate their worth. Additionally, players like Ronald Araujo and
3

Kounde can adapt to different positions and roles to help the team perform, increasing Barça’s

adaptability to any injury or setback in defense.

2. Poisson Distribution & Derivation from the Binomial Distribution

The Poisson distribution was invented to predict the number “of events occurring in the future”

(Aerin). For Poisson to be applicable, events must occur independently, and the probability of an

event occurring in a given length of time must not change through time. Although goals are not

independent events, Barça’s defensive consistency has encouraged me to assume so and analyze

whether goals conceded can be modeled using the Poisson distribution. One of Poisson’s

characteristics is its asymmetric nature, as it is slightly skewed to the right, because “it is

inhibited by the zero occurrence barrier” (Aerin). It is derived from the binomial probability

mass function, which is a discrete function that states the probability of having r successes in n

trials. The binomial distribution only has two possible outcomes, success (no goal conceded) or

failure (at least one goal conceded), where the probability of succeeding/failing is constant for

every trial and does not change with respect to time. The derivation is the following:

, substituting , with λ being the number of successes

and n being the number of trials. The combination nCr = is expanded to be , as

indicated in the formula booklet (Topic 1 - HL only section).

.
4

To use L’Hopital’s rule, the derivative for both the denominator and the numerator must be

found.

On the other hand,

Therefore

Finally,
5

Thus, the Poisson distribution is expressed by the following formula: ,

where P(x) is the probability that Barça concedes x goals, x stands for the number of goals

conceded in a single game, and lambda (λ) is the expected number of goals per game conceded

by FCB (Barcelona). It is a discrete function, as goals conceded can only be modeled using

integers, meaning that Barça cannot concede 1.5 goals. It is important to note that the greater the

value of lambda (λ), the more events can occur in a given interval, thus, there are more

probability values. In other words, because λ increases, the average probability values decrease,

so the Poisson will look more similar to a normal distribution as it will be less skewed to the

right.

The expected goals against (xGA) will be represented using λA, the goal difference (GD) will be

represented with λD, while FCB’s expected goals (xG) will be represented using λS. These three

variables will all be calculated using the same formula, in which the average goals per game are

found. The first step is to find the team’s expected goals against (xGA) per game, which will be

calculated using the following formula: , using the data as of the 12th of

February 2023. It is important to note that only this season’s results will be considered and only

La Liga games count towards calculating λ. A data table with the basic information of

Barcelona’s expected goals for and against can be represented:

FC Barcelona xG (λS) xGA (λA) GD (λD)

2022-2023 2.050 0.333 1.710


6

The value for Barça’s expected goals will not be used due to the irregularity of the team

offensively speaking. The GD cannot be modeled using Poisson because it cannot calculate the

probability of a negative goal difference. Thus, it would not accurately illustrate the probability

of Barça losing a game. Theoretically, in any given Barcelona game in La Liga, the most likely

result to occur is a 2-0 in favor of the Catalan club. However, Barça has won only one out of

their 21 games with that exact score, further expanding on the unpredictability of football, and

with it, of Barça’s offensive records. This is the reason why the investigation will focus on goals

conceded, as Poisson Distribution is not applicable to model any other aspect of Barça’s game.

To make sure that the Poisson model can accurately depict Barça’s defensive results, one must

start by comparing the value in a data table. The Actual xGA represents the number of times that

Barça has conceded x goals in a game. These values are then divided by 21 (the total number of

games played) to convert them into actual probabilities, which can be compared with the Poisson

values calculated using P(x).

Table 1. FC Barcelona’s Actual against Poisson distribution number of goals


conceded.

x Actual xGA Actual Probability Poisson Probability


(λA)

0 16 0.762 0.719

1 4 0.190 0.237

2 0 0.000 0.039

3 1 0.048 0.004

4 0 0.000 0.000

5 0 0.000 0.000

When graphing Table 1, the following results can be seen:


7

As previously mentioned, Graph 1 is discrete because there can only be an integer number of

goals. Despite not being continuous, the trendlines are included to portray the closeness between

both graphs, as it demonstrates Barça’s regularity, therefore, predictability when it comes to

conceding goals. Thus, if Barça maintains its defensive consistency, it should end up closer to the

Poisson distribution than it already is. A Poisson distribution in football looks exactly like the red

line, in which the probability of 0 events (goals) occurring cannot be 0.

3. xGA Analysis using Poisson and Exponential Distribution

Now that the similarity between the actual goals conceded by Barça and the Poisson distribution

has been demonstrated, I can start getting deeper into the analysis of the correlation between time

and goals conceded by FC Barcelona. Additionally, the derivation of the cumulative distribution

function (CDF), and the probability distribution function (PDF) will be explained. Both of these

functions are exponential distributions, as they are continuous functions that “measure the

expected time for an event to occur” (Kissel). To start with, is

the probability of no goals being scored against Barça in one unit of time (one full game).
8

Assuming that conceding a goal is an independent event, meaning that the probability of

conceding a goal does not change with time, both sides of the equation above can be raised to the

power of t (number of games). Thus, is the probability of no goals in t units of

time (t games). For example, the probability of conceding no goals in 2 straight games is

, meaning that there is a probability slightly above 1 in 2

(50%) that Barça concedes no goals in the given time interval. Although this might not seem

impressive, football fans will understand that a clean sheet secures at least a point (draw). When

combining this defense with Barça’s offensive, which might be unpredictable but effective, as it

has scored at least one goal for 20 out of the 21 La Liga games so far, FCB has a real chance at

getting the maximum amount of points possible in two straight games (6 points). Moreover, the

probability of at least one goal in t units of time (f(t)) is the complement: , thus,

. The function f(t) will have asymptotic behavior at

, which indicates that there is no absolute certainty that Barça will receive an

additional single league goal in the remaining games of the season. In other words, although it is

very likely that Barça concedes one or more goals in the 17 games left in the La Liga season:

(99.7%), it is not 100% certain that Barça will concede one. The

y-intercept is at (0,0), as there is a probability of zero that a goal will be conceded by Barça at

t = 0 units of time (0 games played).


9

The graph has a limit of 0 ≤ t ≤ 17, because there cannot be a negative amount of games played,

and there are only 17 games left for Barça to play in the 2022-23 edition of La Liga. The graph

provides a clear demonstration of the proportional relation between the probability of conceding

at least one goal and the number of games played (t). Graph 2 is a cumulative frequency

distribution (CDF) because as the name implies, the probability of a random event occurring is

accumulated as time passes. Graph 2 is a continuous function, as there is a probability value for

every t, for example, at t = 0.3 games, which is equal to around , the

probability that Barça concedes one or more goals is .

By definition, the cumulative density function (CDF) is the integral of a probability density

function (PDF), which makes sense when considering that the CDF is the accumulation of all

PDF values. Since Graph 2 is the cumulative density function, to get the probability density

function, one must do the reverse of integration; differentiation. The derivative of f(t) can be
10

calculated as . Function g(t) is

an exponential PDF distribution because it follows the formula .

Similarly to Graph 2, Graph 3 also has a limit of 0 ≤ t ≤ 17, due to the same reasons. The

function g(t) exists because it meets two requirements: firstly, , and

secondly, g(t) ≥ 0, stating that the probability of the event occurring at any time (t) cannot be 0.

Graph 3 resembles Graph 2 in that it is a continuous function, as there is a value for the

probability of conceding no goals for every value of t. As seen in Graph 3, the y-intercept of g(t)

is λA, the expected amount of goals conceded by FC Barcelona in a single game.


11
After deriving both exponential functions (f(t) and g(t)), certain values can be calculated, such as

the time taken (t) for the probability that Barça concedes at least one goal to be f(t). The time

taken turns out to be expressed as a logarithmic function: .

For example, another value that can be calculated is the probability that Barça concedes 7 goals

in 21 games (so far): . If one were to remove the game against Real

Madrid, in which Barça conceded 3 goals, it is clear that Barça’s defensive record, 4 goals

conceded in 20 games (averaging 1 goal every 5 games), is outstanding to say the least. The

probability of this occurring is .

If these probabilities were taken and compared to other teams defenses, Barça would

undoubtedly be on top. The second team with less goals conceded in the top 5 European leagues

is Napoli with 15 goals in 22 games (0.682 goals conceded per game), which is more than twice

the amount of goals conceded by Barça (0.333 per game). These numbers can be extremely

helpful to Barça, as it provides an insight into what they need to boost the team to the next level.

For instance, the team needs to improve its offensive consistency by buying solid strikers in

order to capitalize and take advantage of their monstrous defense.

4. Conclusion
Altogether, this investigation has demonstrated the relationship between the Poisson distribution,

the binomial, and exponential distribution, while providing interesting data on Barça’s defense.

Despite the similarity, displayed in Graph 1, between real life and the Poisson distribution for

expected goals against, they are not identical, because human error exists. By assuming that

Barça’s goals conceded are independent events, the three different distributions work for any

given time interval. Thus, one can analyze Barça’s defensive performance throughout the season
12
and understand the probability of other events that have already occurred. For example, Barça’s

longest run without conceding a goal was 6 La Liga games in a row, equivalent to 540 minutes of

game time. The probability of this occurring is . Not only this, but

Barça has only conceded 1 goal in the last 6 games. Therefore, if it weren’t for Kounde’s own

goal, a mistake which he made at the very end of the match, Barça would have been on two

six-game streaks without conceding a single goal, which is even more unlikely. This adds to the

point that football players are not machines, so momentum and motivation can single-handedly

affect the exploration’s mathematics. Additionally, although Barça’s defense is consistent, it

conceded almost half of its goals in a single league game against Real Madrid. Therefore, the

probability of conceding a goal is not an independent event as it is affected by several variables

such as the offensive ability of the opponent. Nevertheless, as seen in Graph 1, both functions

have similar probability values on the expected goals conceded per game. Football is guided by

passion, but motivation and recent form are equally if not more important in deciding the

outcome of a match. The challenge of applying mathematics in football is deciding where to start

counting: from the start of the season, solely using last month's results, or only the past 5 games.

This is a significant dilemma since it radically alters the statistics and probability computed.

Using the previous example, if we just utilized Barça's past five games, the likelihood of

allowing a goal would be substantially lower since they only conceded one goal in the past five

league games. Other factors not taken into account such as home and away games, or the

opponent Barça faces (ex. Real Madrid) can also affect the probability of FC Barcelona

conceding goals. This would also affect the closeness between Poisson and actual xGA, thus,

changing everything. Nonetheless, I decided to investigate FC Barcelona’s defensive record this

season because of their consistency, and the chance to achieve a historic season. Consistency is a
13
synonym of independence, meaning that due to Barça’s consistency, events are more likely to

occur randomly, which enabled me to use the Poisson distribution to analyze Barça’s defense. If

Barça maintains its form, it should concede a total of 13 goals in the entire league campaign

(0.342 per game), overthrowing Mourinho’s record-holder Chelsea, which only conceded 15

goals in the Premier League 2004/05 season (0.395 per game).

Additionally, and despite the fact that goals conceded theoretically cannot be considered an

independent variable, as they depend on many factors, the results from this exploration indicate

that they could be when taking into account a full season. In other words, I believe that the

longer the time period relied on to assess a team’s performance, the closer the goals conceded

will be to the Poisson Distribution, since the players’ ‘ups and downs’ will balance out.

Therefore, the longer the time period, the more the goals conceded by Barça will resemble an

independent event. Finally, an extension to this investigation could be to explore the connection

between the exponential and the geometric distribution, which would allow for a deeper

understanding of Barça’s goals conceded, and the interconnectedness between probability

distributions. Moreover, if one were provided with additional data on Barça’s defensive and

offensive records this season, it would enable one to make conclusions and predictions on future

games. Therefore, I hope that this investigation encourages others to expand on the data found

and try and predict future events using probability.


14
Works Cited

Aerin. “Poisson Distribution Intuition (and Derivation).” Medium, Towards Data Science, 18

Sept. 2022,

https://towardsdatascience.com/poisson-distribution-intuition-and-derivation-1059aeab90d.

Ciliar, Nick. “4.1) PDF, Mean, & Variance.” Introduction to Engineering Statistics, 2023,

http://matcmath.org/textbooks/engineeringstats/pdf-mean-variance/.

FootyStats Editors, “FC Barcelona Stats, Form & XG.” 2022/23 FC Barcelona Statistics, 12 Feb.

2023, https://footystats.org/clubs/fc-barcelona-83.

International Baccalaureate Organization (IBO), “Mathematics: analysis and approaches formula

booklet” (Version 1.3), 2019.

Kissell, Robert, and James Poserina. Optimal Sports Math, Statistics, and Fantasy. Academic

Press, 2017.

You might also like