SSAC2018 PaperID5697
SSAC2018 PaperID5697
SSAC2018 PaperID5697
This paper introduces a new unsupervised method based on Latent Dirichlet Allocation to
automatically detect soccer team and player playing styles. In this approach the model is trained
on Opta’s F9 data to learn a set of underlying playing styles that best describe differences in the
training dataset. Once fit, this model can be used to describe the playing style of teams and
players at scale with minimal human intervention. This method proved to be effective in aligning
data driven style detection with grounded applications that are immediately available for
practitioners such as clubs and coaches to include in their day-to-day operation.
1. Introduction
Soccer data analytics lags far behind its counterparts in other North American sports for widely
cited reasons: Soccer is a complex, dynamic and fluid invasion sport, with highly interdependent
events occurring simultaneously and continuously. On top of this soccer is an extremely low-
scoring sport, with the majority of events on a pitch not influencing the final result directly (with a
goal for example); meaning that not only is the data complex, but the signal from which we want to
discern structure is infrequent. As such, a traditional supervised approach to soccer has little
traction in short time-frames, and is reserved for stakeholders that can afford to hedge on long-
term odds and outcomes like betting syndicates. However, the insight of this type of research can
many times be purely abstract and seen only in applications like small differences in parameters
that either increase or decrease a team’s chances of winning a league in a model; but rarely trickles
down in a descriptive, applicable or accessible way to short-term stakeholders like clubs and
coaches interested in winning the next game or planning their tactical system.
In defiance to the circumstances we just outlined, in this document we will present a novel
unsupervised approach aimed at discerning robust ‘stylistic’ insight from aggregate feature data on
a short match-by-match timescale. Most importantly, our methodology aims to reconcile
mathematical robustness and scalability with descriptiveness and applicability, so that hands-on
practitioners such as coaches or journalists can immediately leverage the results in their
continuous exercise.
The methodology we present here is inspired from the theory of Natural Language Processing
(NLP), more specifically of Topic Extraction, which is concerned with automatically sorting text
documents into the different semantic topics that constitute it. In the information age, automatic
and scalable methods to classify documents semantically are incredibly practical. Burgeoning fields
such as digital marketing and sentiment analysis of social media content rely heavily on the
scalability of text mining: most humans can classify tweets about a brand into different ‘sentiment
categories’, but the manpower needed to do this across the vast quantities of data available is
The document is organised as follows: In Section 2 we introduce what an LDA model is and how we
can train such a model, explaining how we have re-conceptualised and repurposed the training
algorithm to deal with learning styles of teams and players rather than topics of documents. In
Section 3 we provide more details on the data used and how it was pre-processed to train the LDA
style model, as well as exploring the fitted models to understand what the learned styles mean in a
soccer context. Section 4 beckons to practitioners by providing an in depth account of applied
possibilities for the method’s results. Finally, Section 5 closes out with some conclusions and
proposes some lines of future work which this research opens the door to.
Introduced by Blei, Ng and Jordan (2003) in their seminal paper, LDA is used primarily in tasks of
document classification where the underlying topic of a document is explained by unobserved
groups of latent topics. In theory, an LDA model is a generative model that conceptualises an
instance of a document as a mixture of unobserved latent topics sampled from a sparse Dirichlet
distribution, where a topic is a probability distribution on the dictionary (set of words). In a
hypothetical example, this means that latent topics such as religion or politics assign probabilities to
certain words (religion will assign higher probability to words like “god” or “virtue” while politics
will assign higher probability to words like “law” or “constituency”); and the mixture of a document
on these latent topics (i.e. 65% religion and 35% politics) will determine the probability of different
words appearing in it with different frequencies.
FIGURE 1: Generative scheme of LDA model: each document d in the corpus has a sparse Dirichlet distributions over the
latent topics with 𝜃𝑑 ~𝐷𝑖𝑟(𝛼), and each word will be sampled from the document’s topics by 𝑀𝑢𝑙𝑡𝑖𝑛𝑜𝑚𝑖𝑎𝑙(𝜃𝑑 ). In
parallel, words in the dictionary also have sparse Dirichlet priors so that for topic k, 𝜙𝑘 ~𝐷𝑖𝑟(𝛽𝑘 ) is the distribution over
the dictionary over which words picked from topic k are sampled.
In practice however, NLP researchers won’t realistically have access to a full representation of all
possible latent topics and their probability distribution over all existing words. In light of this, an
2018 Research Papers Competition
Presented by:
2
unsupervised implementation using Bayesian inference techniques which fit an LDA model and
learns the latent topics and their distribution on the dictionary from a corpus of documents is
where LDA has become the celebrated champion of unobserved topic inference (Newman, Smyth,
Welling and Asuncion; 2008). This is a point which is worth emphasizing and asking the reader to
reflect on for a moment: there is absolutely no supervision involved in LDA inference, meaning that
the algorithm has no knowledge at all of topics of human language and the words associated with
them. It simply fits a model which helps explain the frequencies of the different words (‘words’ as
abstract signals which for the algorithm hold no semantic meaning at all) in a collection of
documents. It is the actual structure of language and semantics that cause the algorithm to actually
learn meaningful topics (by ‘meaningful’ we mean semantic topics in the set of documents). The fact
that we will need no supervision to learn meaningful styles of teams/players (in exploring the
results the readers with a knowledge in european soccer will realise that the learned styles actually
correspond to empiric styles usually discussed in soccer) is truly valuable.
A fitted LDA model will output a set of n unlabelled topics (where n is passed as a parameter) along
with the corresponding probability distribution over the dictionary. Even if the researcher does not
previously know the topics in the training set, it will usually not be difficult to infer them by
empirically inspecting the words with higher probabilities for each topic. Additionally, a fitted LDA
model will allow observations (documents) to be explained as a mixture of the learned topics using
maximum likelihood methods (i.e. the topic of said document is 35% topic 1, 20% topic 2, etc.).
In the end, a fitted LDA model is no more than a dimensionality reduction method on new
observations; but in contrast with other methods of non-descriptive dimensionality reduction like
Principal Component Analysis, LDA’s strength lies precisely in the rich descriptiveness of its
components. Its explanatory nature as a generative model for frequencies of discrete features from
different unobserved types has made it appealing and successful in information retrieval problems
other than text mining. It has even made its way to soccer data analytics, although in a different
approach: in a fascinating and truly recommended application, Wang et al (2015) use a variation of
LDA inference to passing combination frequencies by fitting a model to what the authors coin as
“patterns of play”.
The traditional input of an LDA inference algorithm will be a Term Frequency Matrix; a matrix
whose rows correspond to documents and columns correspond to words so that the entry (i,j) of
the matrix corresponds to the number of times that word j appears in document i. A reasonable
parallel can be made between these sort of matrices and aggregate metric data matrices (matrices
where each row correspond to a team’s performance in a match and columns correspond to metrics
such as ‘clearance with head’) where entry (i,j) corresponds to the number of times that feature j
was performed in match i. The figure below illustrates this conceptual analogy.
The interpretation of the analogy we are making between topics in the setting of document
classification and team/player style or persona in the setting of soccer data is natural: just as a
document’s mixture of topics will determine with what frequency different words will appear, a
team/player’s mixture of styles or personas can be thought of as latent characteristics that
determine the frequency with which they perform certain actions on the pitch. As an example, for a
team that employs the notorious tiki-taka style, the most likely features will probably be something
along the lines of ‘successful passes’, ‘touches’ and ‘accurate short pass’; while a long-ball counter-
attacking team will assign more probability to ‘long balls into opposition half’, ‘fast-break’ and ‘flick-
ons’.
With this framework in mind, our research consists of two stages. First, we will fit an LDA model
using historical match aggregate data. In this stage the model will learn the underlying styles (along
with their probability distribution over the set of actions) in an unsupervised way that best
represent the differences in teams/players’ actions frequencies, and the emerging styles will be
labelled empirically by inspecting the associated distributions.
In the second stage, we transform match observations under the fitted models to express each
observation as a mixture of the styles that were learned in the first stage. The descriptiveness of the
style components will allow us to tell a story and create a mental picture of what type of
performance a team/player produced in a match. However, it is also worth remembering that the
transformed observations are vectorised in an n-dimensional space (where n is the number of
learned topics), and as such we can make full use of these vectorisations to produce a rich dossier of
applications. For example, although the match by match mixture of styles of a team or player can be
highly contextual to the specific match’s circumstances, average team/player vectors throughout a
whole season can produce rich and robust insight into their underlying style or intent of play.
Section 4 will provide a general overview of the applications.
We used the complete 2016-17 season, and any matches completed before December 2017 in the
2017/18 season (for the English Premier League for example that is 14 matches per team). The
model was trained on what is known as ‘the big 5’ leagues: the English Premier League, Spanish La
Liga, German Bundesliga, French Ligue 1, and Italian Serie A. Once the model was fit, it was
deployed to transform observations from these same leagues with the addition of the Turkish Süper
Lig, Portuguese Primeira Liga, and Dutch Eredivisie.
The purpose of tf-idf is to represent the relative importance of a word in a document by weighting
the frequency of the word in the document offset by the number of documents in which it appears.
Its design ensures that common stop-words such as ‘the’ are weighted lowly even if they appear
with high frequency as they will appear in most if not all documents, while highly-frequent words
which don’t appear in most other documents will have a higher weighting.
Tf-idf has become one of the most popular relative importance of terms definition in NLP (Beel,
Gipp, Langer and Breitinger; 2016); but for our soccer data methodology we cannot implement a
completely analogous statistic since our problem has a small dictionary (the number of features
collected in the dataset which is much lower than the number of words found in a corpus of
documents), and most will appear in almost all matches (every match has passes, shots, head
clearances, etc.). As a relative importance weighting, we break away from traditional text-based LDA
and instead simply use a standard z-score scaler per feature. In other words, instead of feeding the
raw features to the LDA inference algorithm (83 passes for example), we feed the z-score over the
training dataset (2.5 standard deviations above the mean for passes).
Stern Defending Effective clearance, total clearance, effective head clearance, blocked cross, effective
blocked cross, high claims, good high claims, lost corners, punches, outfielder block,
attempts conceded out of box, possession won in defensive third, clean sheet, total launches
Conceding Chances saves, diving saves, saved in box, attempts conceded in box, saved out of box, attempts
conceded out of box, challenge lost, interceptions in box, outfielder block, error lead to goal,
attempted tackle foul, lost corners, free kick given, yellow cards
Crosses into Box total cross, accurate cross, cross not from corner, corners into box. won corners, crosses
behind 18 yards, crosses after 18 yards, penalty area entries, accurate crosses not from
corners, shot off target, missed headed attempt, total headed attempts, missed attempt in
box
Fast Breaks and attempted fast breaks, shot from fastbreak, total fast breaks, big chance created, one-on-
Playing Behind one attempt, big chance scored, big chance missed, attempt from center of box, close miss,
Defenders miis in box, shot off target, accurate through ball, on target scoring attempt
Long Balls and possession lost, possession lost control, total long balls, accurate launches, long pass from
Launches own half into opposition’s, total launches, total flick ons, aerial won, aerial lost, accurate
flick on, ball recovery, unsuccessful touch, duel won, duel lost, possession won middle third
Many Shots and on target scoring attempt, accurate pull back, attempt on target right foot, total pull back,
Attempts attempt on target in box, accurate through ball, total through ball, attempt saved in low
centre, big chance created, attempt on target left foot, one-on-one attempt, attempt on
target out of box, big chance scored, big chance missed, attempt from open play
In light of the above discussion, we decided to train 3 separate models for defenders, midfielders
and forwards respectively (the labels are available in the datasets). This choice eliminates
comparability between these different categories of players as once the observations have been
transformed under different models the resulting dimensionality-reduction is completely
unrelated, but it has the added advantage that the learned styles for each category are perhaps less
obvious and widely recorded, so that the scalability of the method provides true value.
For consistency of the theoretical framework, despite training three essentially unrelated models,
we trained each one with 7 topics/style. Again, this aspect is discussed in Section 5 in more detail,
but for now the tables below present the 7 learned topics for each player model.
DEFENDER MODEL
Topic Label Most Probable Features
Passing - Forward Passes left, rightside pass, accurate forward zone pass, touches, total forward zone pass,
Areas successful final third passes, open play pass, total pass, successful open play pass, accurate
pass, final third entries, forward pass, backward pass, clean sheet, possession won middle
third
Passing in the Back accurate back zone pass, total back zone pass, leftside pass, accurate pass, successful open
play pass, total pass, open play pass, successful long passes from own half into opposition’s,
touches, forward pass, rightside pass, passes right, accurate forward zone pass, possession
won defensive third, offside provoked, accurate long balls, clean sheet, head pass, ball
recovery
Gritty Defending Interceptions in box, interceptions, offside provoked, yellow card, total tackle, won tackle,
possession won defensive third, outfielder block,, attempted tackle foul, fouls, challenge
lost, duel won, ball recovery, was fouled, successful put through, head pass, possession won
middle third, aerial lost
Stern Defending effective clearance, effective head clearance, total clearance, outfielder block, offside
provoked, aerial won, clean sheet, head pass, aerial lost, possession won defensive third,
duel won, total launches, accurate launches, yellow card, total back zone pass, accurate
back zone pass, long passes from own half into opposition’s
Defending on the effective blocked cross, blocked cross, blocked pass, total tackle, won tackle, put through,
Touchline possession won defensive third, duel won, attempted tackle foul
Crossing into the crosses after 18 yards, total crosses not from corners, accurate cross not from corner,
Box crosses behind 18 yards, total cross, penalty area entries, passes right, off target shot assist,
possession lost, won corner, attempted assist from open play, total attempted assist, total
final third passes, final third entries
Long Balls and Total chipped pass, accurate chipped pass, long pass from own half into opposition’s,
Launches offside provoked, accurate long balls, successful long balls, forward pass, possession won
defensive third, total launches, outfielder block, accurate launches, aerial won, total
clearances
MIDFIELDER MODEL
Topic Label Most Probable Features
Goal Attempts Attempt to the centre from out of box, missed out of box attempt, blocked out of box
attempt, total attempts with right foot, blocked scoring attempt, attempt open play, total
scoring attempt, shot off target, total attempts left foot, on target attempts right foot, on
target scoring attempts, missed attempt in box, won corners, touches in opposition box,
Defensive Work Total tackle, won tackle, attempted tackle foul, challenge lost, interception, interception
won, yellow card, fouls, possession won middle third, ball recovery, possession won
defensive third, outfielder block, duel won, successfull put through, duel lost, clean sheet,
blocked pass, chipped passes
Dominate Passing Total pass, accurate pass, open play pass, successful open play pass, accurate forward zone
and Possession pass, touches, leftside pass, rightside pass, final third entries, accurate chipped pass,
accurate back zone pass, total back zone pass, successful long passes from own half into
opposition’s, successful final third passes, forward pass, possession won middle third,
accurate long balls
Aerial Game Accurate flick ons, aerial won, head pass, total flick on, aerial lost, effective head clearance,
head clearance, duel won, duel lost, was fouled, fouls, attempt from centre of box
High Risk/High Turnover, unsuccessful touch, overrun, total contest, won contest, blocked pass, put
Reward through, dispossessed, successful put through, duel lost, fouled final third, was fouled,
possession lost, possession lost control, duel won, challenge lost, fouls, possession won
attacking third, aerial lost
Creating Chances Accurate layoffs, total layoffs, attempts assisted open play, on target attempts assisted, total
and Playing within attempts assisted, successful final third passes, total final third passes, backward pass,
Lines accurate forward zone pass, possession won attacking third, passes left, passes right,
touches in opposition box, fouled final third, successful open play pass, penalty area entries
Crosses into Box Total crosses, accurate crosses, penalty area entries, accurate cross not from corner,
crosses behind 18 yards, crosses after 18 yards, off target attempts assisted, total attempts
assisted, won corner, possession lost control, on target attempt assisted, attempted assists
open play, total forward zone passes, total final third passes, blocked pass, total contest
FORWARD MODEL
Topic Label Most Probable Features
High-Risk/High Unsuccessful touch, turnover, dispossessed, overrun, duel lost, fouled final third, total
Reward contest, was fouled, blocked pass, put through, won contest, fouls, successful put through,
possession won attacking third, duel won, challenge lost
Blocked and Missed Missed attempt in box, shot off target, attempted missed to the left, attempt missed to the
Attempts right, missed headed attempt, total headed attempts, high missed attempt, total scoring
attempts, big chance missed, attempt in box blocked, blocked scoring attempt, total offside
Aerial Target Man Total flick on, accurate flick on, aerial lost, aerial won, head pass, duel lost, total offside,
duel won, total layoffs, unsuccessful touch, turnover, accurate layoffs, fouls, dispossessed,
total headed attempts, was fouled, possession lost control, effective head clearance
Goal-Scoring Goals, goal inside box, goals open play, big chance scored, goal right foot, on target scoring
attempt, attempt from centre of box, touches in opposition box, total scoring attempts, total
offside, total attempts left foot, total headed attempts, total layoffs
Creating Chances Accurate layoffs, total layoffs, total final third passes, successful final third passes,
and Playing within backward pass, accurate forward zone pass, total forward zone pass, attempt assisted open
Lines play, on target attempt assisted, total attempt assisted, passes left, rightside pass, won
contest, touches in opposition box, off target attempt assisted, penalty area entries, touches
Crosses into Box Big chance created, accurate cross not from a corner, total attempted assist, accruate cross,
corsses behind 18 yards, total crosses not from a corner, total cross, attempt assisted open
play, off target attempt assisted, on target attempt assisted, penalty area entries, crosses
after 18 yard
Before immersing ourselves into the exciting practical applications of the results, on the topic of
readily available labels of player styles/roles, the dataset used also contained slightly more
granular labels like Central Attacking Midfielder (CAM), Defensive Midfielder (DM), Full Back (FB),
etc. A first glance at the projection onto the first 2 principal components of the results for defenders
and midfielders show that this unsupervised method can successfully differentiate between the
style of these different positions. We build on this promising taste of the results in Section 4.
FIGURE 3: Projection onto the first 2 principal component of the Defender model (left) and Midfielder model (right).
Granular role labels available in the data are colored in the results.
REMARK: Radar charts are a common feature in soccer data analytics, but in a very different use case
which might make our visualisations ‘miss the point’ if the main difference isn’t explained: in
traditional uses of radar charts, a team/player’s chart can be indefinitely large in all directions of the
League Styles
Let's begin by the Team model. A first application is identifying stylistic variation across
competitions:
Average ‘league styles’ can be conceptualised by averaging the match projections of matches played
in those league. As can be seen, there are significant differences in the types of matches that tend to
occur across the ‘big 5’ leagues. There are more matches - relatively speaking - characterised by a
large volume of shots and attempts in the French Ligue 1. Strikingly, and in line with general
opinion, matches in the German Bundesliga are more likely to fall under the ‘fast breaks and playing
behind defenders’ category, as well as being a ‘high-energy contest’ (it should be noted that we
believe there may be some idiosyncrasy to the coding of F9 data across these competitions,
especially for the German Bundesliga). The English Premier League, again typifying its stereotype,
contains a disproportionate amount of matches falling under the topic of ‘stern defending’. Games
in the Italian Serie A are most likely to be characterised by ‘crosses into box’ or domination of
‘passing and possession’. They also feature a lot of chances conceded. Somewhat surprisingly, the
Spanish La Liga features a relatively high amount of matches falling under the ‘long balls and
launches’ topic, as well as games in which there are lots of chances conceded.
The examples above plot the average team profiles over our whole sample of data, but they are also
available on a match-by-match basis to tell a story of team performances in an individual match.
This concept can be demonstrated initially with the stylistic effect of Lionel Messi, widely
considered the best player of the world and one of the all time greats, on hic club FC Barcelona:
Indeed, the stylistic impact of the world’s best players is substantial on the team that they play for.
Since Paul Pogba arrived for an at-the-time world record fee at Manchester United at the beginning
of last season, he has become invaluable to their style of play:
The impact of some players to their team’s tactical intentions becomes easily available through our
methodology and radar visualisations. Roberto Firmino, who plays as a striker for Liverpool, plays a
unique role in facilitating their goal-scoring wingers, Mohamed Salah and Sadio Mane. This takes an
unusual sort of striker play that makes Firmino extremely valuable to Liverpool:
In addition to quantifying the stylistic effect of current first-team players, we can attempt to
anticipate the effects of a retiring star. Francesco Totti, AS Roma’s talismanic player, retired at the
end of last season after more than 25 years as a central player at the club. Below we can see the
dramatic influence that his absence creates in terms of style.
Lastly, this simple framework can also be used in day to day decision making at clubs, like lineup
choices. As an example, we will look two pairs of players that are ‘substitutes’ for each other at their
clubs: at Liverpool, James Milner and Alberto Moreno compete for the left-back spot; at Tottenham,
Danny Rose and Ben Davies grapple for the same position.
In the similar personnel choice at Tottenham, Ben Davies seems to provide added competence and
efficacy in possession. The team also have less matches characterised by ‘conceding chances’ when
he is playing.
Team Similarity
Another way in which we can leverage the results comes from the fact that the model is a
dimensionality reduction onto stylistically relevant components. This means that we can trust that
the metric of the transformed space can serve robustly as a similarity proxy, meaning we can
Arsenal, Sevilla, and Paris Saint-Germain come out as the most similar teams.
REMARK: Since the 8 components onto which the method is projecting are a mixture model (i.e. they
must add up to 1 per observation), the projected transformations actually live on a 7-dimensional
hyperplane; meaning that the covariance matrix is singular. To be able to compute the mahalanobis
distance, we need to drop one of the components to get a non-singular covariance matrix.
The most obvious application is comparing types of players within the same overarching ‘position’
(as in the ‘defender’, ‘midfielder’ and ‘forward’ labels to which each of the 3 Player models
correspond):.
Player Styles
We can begin by looking at the projection under the ‘Forwards’ model of some of the world’s
foremost forwards:
Looking at different forwards within the Premier League, of perhaps less glamorous reputations,
we can still appreciate how the method can help us quickly understand what type of player
somebody is.
Contrasting Lukaku’s playstyle at Manchester United and Everton offers valuable insight into the
effects of team quality on a striker’s output:
At Manchester United, Lukaku plays with better attackers who can supply him chances. This is
evident through the increased prominence of the ‘Attempts on Target’, ‘Goal-Scoring’, and ‘Blocked
and Missed Attempts’ topic. That he has to play less like an ‘Aerial Target-Man’ is a function of the
difference in styles between Manchester United and Everton.
Player Similarity
Once again using the mahalanobis metric on the observation space as a proxy for similarity, we can
provide similarity scores for players. The examples below provide the 10 most similar players to
Lionel Messi and Paul Pogba.
However, it’s also worth noting that the applications which we have explored are dependent solely
on the fact that results are deployed as a bayesian mixture model and in essence in a dimensionality
reduction setting; and are essentially independent of the actual assumptions of the underlying
generative model of LDA (i.e. any dimensionality reduction technique could be exploited in the
Additionally, the process to select the number of topics highlighted an interesting research question
which further adds to potential lines of future work. To make the selection of this parameter in our
research, we kept track of the top features representing each topic/style for each choice of n using
different random seeds, and empirically made a choice as to when the learned styles ceased to be
interpretable and descriptive in a natural way. In general, however, the question on the optimum
number of topics is an ongoing open problem in NLP research (Greene, O’Callaghan and
Cunningham, 2014). Our experience in reviewing the distribution over features for different
emerging topics revealed that broad topics either survive from one choice of n to another, or divide
to form two distinct topics. However, this process is muddled by the variability of the training
random state. This thread of thought suggests that a methodology consisting of Monte Carlo
simulations and the tracking of topic persistence using a measure of statistical distance between
feature distributions of the learned styles for different choices of n can provide a framework for the
decision on the optimum number of topics; and additionally can remove the empirical element in
evaluating the learned styles which has the added bonus that potentially emerging styles which are
not registered empirically could be found and studied.
Another interesting question which our research has surfaced is how to effectively use a metric of
the transformed observation space as a proxy for similarity. This is not an uncommon idea in soccer
data analytics (Meza, 2017; Gyarmati, Kwak and Rodriguez, 2014; Peña and Navarro, 2015), but it
requires some design to ensure that the metric chosen is appropriate to the structure of the
transformed entries. In this document we decided to use mahalanobis over euclidean since the
different stylistic components aren’t distributed equally, but the mahalanobis metric also has
deficiencies in that there are collinearities between the different components. What type of metric
to use given the structure of the problem is an interesting question in its own right, and its result (a
robust measure of team/player style similarity) is definitely appealing for the sport.
Finally, another compelling line of research which this document opens up for future researchers is
structuring how players in a team contribute to a team’s style mixture. The features with which the
model is trained for teams are mostly the sum of features performed by their players (i.e. the
number of shots, passes or interceptions that a team performs is the sum of those performed by its
players). This opens the door for quantifying a team’s style mixture into smaller constituent pieces
of its different players, in a similar flavour to the applications showcased in section 4, but in a more
robust, comparable and scalable way.
References
[1] Blei, D.M., Ng, A.Y. and Jordan, M.I., 2003. Latent dirichlet allocation. Journal of machine Learning
research, 3(Jan), pp.993-1022.
2018 Research Papers Competition
Presented by:
21
[2] Greene, D., O’Callaghan, D. and Cunningham, P., 2014, September. How many topics? stability
analysis for topic models. In Joint European Conference on Machine Learning and Knowledge Discovery
in Databases (pp. 498-513). Springer, Berlin, Heidelberg.
[3] Gyarmati, L., Kwak, H. and Rodriguez, P., 2014. Searching for a unique style in soccer. arXiv preprint
arXiv:1409.0308.
[4] Meza, D.A.P., 2017. Flow Network Motifs Applied to Soccer Passing Data. In Proceedings of
MathSport International 2017 Conference (p. 305)
[5] Newman, D., Smyth, P., Welling, M. and Asuncion, A.U., 2008. Distributed inference for latent dirichlet
allocation. In Advances in neural information processing systems (pp. 1081-1088).
[6] Peña, J.L. and Navarro, R.S., 2015. Who can replace Xavi? A passing motif analysis of football
players. arXiv preprint arXiv:1506.07768.
[7] Wang, Q., Zhu, H., Hu, W., Shen, Z. and Yao, Y., 2015, August. Discerning tactical patterns for
professional soccer teams: an enhanced topic model with applications. In Proceedings of the 21th
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 2197-2206).
ACM.