Mini Project Report On Ipl Win Probability Predictor"

MINI PROJECT REPORT
on
"IPL WIN PROBABILITY PREDICTOR”
By
BHAVESH SONJE – 119A3010
GAURAV SUVARNA-119A3018
UDAYAN IYER-119A3023
UNDER THE GUIDANCE OF
Mrs.Seema Redekar
SUBMITTED IN PARTIAL FULFILLMENT FOR THE DEGREE OF

BACHELOR OF ENGINEERING
In
INFORMATION TECHNOLOGY
DEPARTMENT OF INFORMATION TECHNOLOGY SIES GRADUATE

SIES GRADUATE SCHOOL OF TECHNOLOGY
NERUL, NAVI MUMBAI – 400706
ACADEMIC YEAR
2021– 2022
1
CERTIFICATE
This is to certify that this is a Bonafide record of Mini Project of the project titled “IPL WIN
PROBABILITY PREDICTOR” carried out by the following students of second year in
Information Technology.
Sr. No. Name Roll No.
1. BHAVESH SONJE 119A3010
2. GAURAV SUVARNA 119A3018
3. UDAYAN IYER 119A3023
The report is submitted in partial fulfillment of the degree course of Bachelor of Engineering
in Information Technology of University of Mumbai during the academic year 2021-22
Internal Guide Head of Department Principal
We have examined this report as per University requirements at SIES Graduate School of
Technology, Nerul (E), Navi Mumbai on ____________.
Name of External Examiner: ___________________________________________
Signature with Date: _____________________________
Name of Internal Examiner: ___________________________________________
Signature with Date: _____________________________
2
ACKNOWLEDGEMENT
We wish to express our deep sense of gratitude to thank our project guide Mrs.Seema
Redekar providing timely assistant to our query and guidance. We take this opportunity to
thank our HOD. Dr. K. Lakshmisudha and Principal Dr. Atul Kemkar for their valuable
guidance and immense support in providing all the necessary facilities.
We would also like to thank the entire faculty of IT Department for their valuable ideas and
timely assistance in this project. Last but not the least, we would also like to thank teaching
and non teaching staff members of our college for their support, in facilitating timely
completion of mini project.
Project Team
BHAVESH SONJE – 119A3010
GAURAV SUVARNA-119A3018
UDAYAN IYER-119A3023
3
CONTENTS
Sr. No. Topic Page No.
I Abstract
1 Introduction
1.1 Motivation
1.2 Problem Statement & Objectives
2 Literature Survey
2.1 Survey of existing System
2.2 Limitation Existing System or Research Gap
3 Proposed System
3.1 Introduction
3.2 Architecture /Framework
3.3 Details of hardware and software
3.4 Experiment Results
4 Machine learning
5 Code
4
6 Future Enhancements
7 Conclusion
8 References
I.ABSTRACT
India's most popular sport is cricket and is played across all over the nation in different
formats like T20, ODI, and Test. The Indian Premier League (IPL) is a national cricket
match where players are drawn from regional teams of India, National Team and also from
international team. Many factors like live streaming, radio, TV broadcast made this league
as popular among cricket fans. The prediction of the outcome of the IPL matches is very
important for online traders and sponsors. We can predict the match between two teams
based on various factors like team composition, batting and bowling averages of each player
in the team, and the team's success in their previous matches, in addition to traditional
factors such as toss, venue, and day-night, the probability of winning by batting first at a
specified match venue against a specific team. In this project, we have proposed a model for
predicting outcome of the IPL matches using Machine learning Algorithms namely Random
Forest Classifier (RFC), Logistic Regression.
5
1. INTRODUCTION
Cricket is an outdoor game which is played by bat and bowl which includes 2 teams of 11
players each. Cricket is a teamwork game and is played mostly in three formats and
occupies the 2 spots in the list of the most popular sport around the World. Like in any
sport, there are many factors that plays an important role in deciding the winner of the
match. Selection of a team is based on the player performance and other
considerations like pitch factor, team size, venue etc. There are many variables and
constraints which makes The Analysis of Cricket Match Difficult. There are three
different formats of Cricket namely - Tests, Twenty-twenty (T20) and One Day
International (ODI). Cricket is not only a nation game but also an international game. In
this game, every ball is crucial because every ball can change the whole match in Cricket
[15, 16]. Indian Premier League (IPL) is a national cricket match where players are drawn
from regional teams of India, National Team and also from international team. It is based
on 20-20 format and is owned by Celebrities, Businessmen and others and the entire IPL is
controlled by Board of Control for Cricket in India (BCCI). For the current year (2021)
there are total of 8 Teams in IPL namely, Royal Challengers Bangalore (RCB),
Rajasthan Royals (RR), Chennai Super Kings (CSK), Mumbai Indians (MI), Kolkata
Knight Riders (KKR), Delhi Capitals (DC), Punjab Kings (PK) and SunRisers
Hyderabad (SRH). The motivation behind this paper includes the answers to following
questions: “What is the probability of winning the game at a particular venue based on
decision to field/bat first on winning the toss?”, “Most dismissals by a bowler in a match?”,
“Does Home Ground have any effect on the result of the game? In this project,We are
trying to find out the match winner of an IPL match based on the stadium they are choosing
and the toss decision using machine learning techniques like Random Forest, Logistic
Regression etc.
6
1.1.Motivation
Consider any generation whether Gen-Z or Millennials, no matter how big the generation
gap is,if there is one thing that all the generations have in common is love & passion for the
game of cricket.Our project is a small but effective effort for all the people out there to get to
know if their favourite team in the IPL is going to win the particular match or not.It is not
possible for everybody to watch the match,the bare minimum that one could do is get to
know the winner of the match.Due to sudden rise on betting apps will make our project even
more dynamic. We build a system which will predict the outcome of the ongoing IPL match
based on the current situation of the match and their previous performances .Our application
provides almost accurate results and helps calm down the anxious fans who eagerly want to
know which way the IPL match is going.Keeping all these factors in mind our group was
motivated enough to build this project.
7
1.2 Problem Statement & objective
Given IPL datasets of past 9 years, the main objective of this paper is to predict the outcome
of an IPL match between two teams based on the analysis of previously stored data using
Machine Learning algorithms. The information will be analyzed and preprocessed. After
preprocessing the data will be used to train different models in order to give the outcomes.
We will analyze the various datasets and use key variables such as strike rate, bowler
economy, etc. and feed it as input to an algorithm will help us get the probable outcome of a
match.
8
2. LITERATURE SURVEY
2.1 Survey of Existing System
1)Ahmad et al. , predicted the emerging players from batsman as well as from the bowlers
using machine learning techniques. Song et al. predicted estimation of the location of a
moving ball based on the value of the cricket sensor network. Roy et al. predicted
rankingsystem which is based on the social network factors and their evaluation in the form
of composite distributed framework using Hadoop framework and MapReduce
programming model is used for processing the data. Priyanka et al. , predicted the outcome
of IPL-2020 based on the 2008-2019 IPL datasets using Data Mining Algorithms with an
accuracy of 82.73%.
2) Kaluarachchi et al. , predicted match outcome using home ground, time of the match,
match type,winning the toss and then batting first by using Naïve Bayes classifier. Passi et
al. , predicted the performance of players based on the runs and the number of wickets. Both
the type of problems is treated as classification problems where the list of runs, and list of
wickets are classified in different ranges based on machine learning algorithms. The
Random Forest algorithm outperforms better than other algorithms. Nigel Rodrigues et al. ,
predicted the value of the traits of the batsmen and the bowlers in the current match. This
would help in selecting the players for the upcoming matches by using past performances of
a player against a specific opposition team by using Multiple Random Forest Regression.
3) Kansal et al. , predicted player evaluation in IPL based on the 2008-2019 datasets using
Data MiningTechnique. Data mining algorithms are used which gives evaluation using
player statistics assessing a player's performance and determining his base price. They
predicted about how to select a player in the IPL, based on every player’s performance
history using algorithms like decision tree, Naïve Bayes and Multilayer perceptron (MLP).
MLP outperforms better than other algorithms.Agrawal et al. ,used Support Vector Machine
(SVM), CTree, and Naïve Baiyes classifiers with accuracies of 95.96%, 97.97% and 98.98%
respectively, to predict the probability of the winner of the matches. Barot et al. ,predicted
the match outcome based on the toss and venue.
2.2 Limitation Existing System or Research Gap
Wright , predicted the possible fixture for a cricket match based on the various
venue,teams, number of holidays between each match in a fair and efficient manner. A
metaheuristic procedure is used to progress from the basic solution to a complex final
solution by a technique, Subcost-Guided Simulated Annealing (SGSA). Maduranga et al. ,
predicted the outcome of any cricket match by using data mining algorithms and provided
solutions for the approach used by other authors. Shetty et al. , predicted the capabilities
of each player depending on various factors like the ground, pitch type, opposition team
9
and several others by using machine learning techniques. The model gave an accuracy of
76%, 67%, and 96% for batsmen, bowlers, and all-rounders respectively by using
Random Forest Algorithm. This model helped them to select the best players of the game
and predict outcomes of the match .
10
3. PROPOSED SYSTEM
3.1 Introduction
Step 1: Dataset: The first step in the Architecture of model is to collect datasets from
various sources. The data which is fed into the model decides how the model acts and reacts.
If the data is accurate and up-to-date, then we will have accurate outcomes or predictions.
So, we have collected 6 datasets from Kaggle.
Dataset 1:Teamwise home & away dataset- The Teamwise Home and Away dataset contains
6 columns for the datasets which are as follows:home_wins, away_wins, home_matches,
away_matches,home_win_percentage and away_win_percentage. It describes about the
11
team performance in the home and away conditions with their win percentage. The table 1
shows the dataset and its description.
12
Dataset 2:Matches dataset-The Matches datasets contains 16 columns i.e season, city, date,
team1, team2, toss_winer, toss_decision, result, dl_applied, winer, win_by_runs,
win_by_matches and player_of_the_match, venue, umpire1 and umpire2.This dataset tells
about the matches that are played between two teams and who was the winner of the match.
It also tells about the toss decision taken in the match.The table 2 shows the dataset column
and its description.
13
14
Dataset 3:Players dataset- The Player’s dataset contains 5 columns namely Player_Name,
DOB, Batting_Hand, Bowling_Skill and Country. This dataset tells about the player and his
bowling and batting style. The table 3 shows the dataset column and its description.
Dataset 4:Teams Dataset-The teams’ datasets contain a single column named as team1
which shows the various IPL teams. The table 4 shows the dataset column and its
description.
Dataset 5:Deliveries Dataset-The Deliveries dataset contains 20 columns i.e innings,

batting_team, bowling_team, over, Ball, Batsmen, Non_striker, Bowler, is_super_over, wide
run, bye_run, Legbye_run, no_ball_run, penalty_run, batsmen_run, extra_runs, total_runs,
players_dismissed, dismissal_kind, fielder. The table 5 shows the dataset column and its
description3.2 Architecture and framework.
15
16
Dataset 6-most_runs_average_strikerate dataset-The Most_runs_average_strikerate dataset
contains 6 columns namely batsman, total_runs, out, numberofballs, average, strikerate. This
tells about player batting statistics. The table 6 shows the dataset column and its description.
Step 2:Splitting data

In this step the dataset is splitted into two groups, one for testing and one for training. The
training data is used to train the Machine learning algorithms using supervised learning
techniques. The trained model is then tested using the algorithms and the result is predicted.
The testing data and training data are divided in the ratio 30:70.
Step 3:Training the model
Training is the most important stage in Machine Learning. In this step the model is trained
using training data to find patterns and make predictions. It results in the model learning
from the dataset so that it can accomplish the given task.
17
Step 4:Testing the model
After training the model, the performance of the model is checked. This is done by testing
the model with previously unseen datasets. The unseen Datasets is called the testing
datasets. So, in this step the model is tested by providing the unseen testing data.
3.2 Architecture / Framework
3.3 Details of hardware and software
HARDWARE REQUIREMENTS:
Windows 64 bit
18
Core:i3 8th generation
SOFTWARE REQUIREMENTS:
Google Colab
LIBRARIES:
Numpy,Pandas,streamlit & matplotlib.
3.4 Experiment results
19
4. Machine learning
Machine learning (ML) is a type of artificial intelligence (AI) that allows software
applications to become more accurate at predicting outcomes without being explicitly
programmed to do so. Machine learning algorithms use historical data as input to predict
new output values. Machine learning algorithms are mathematical model mapping
methods used to learn or uncover underlying patterns embedded in the data. Machine
learning comprises a group of computational algorithms that can perform pattern
recognition, classification, and prediction on data by learning from existing data (training
set).
We have used the following machine learning algorithms: -

1. Logistic Regression
2. Random Forest Regressoion
1. Logistic Regression
Logistic regression is a process of modeling the probability of a discrete

outcome given an input variable. The most common logistic regression
models a binary outcome; something that can take two values such as
true/false, yes/no, and so on. Multinomial logistic regression can model
scenarios where there are more than two possible discrete outcomes.
Logistic regression is a useful analysis method for classification
problems, where you are trying to determine if a new sample fits best into
a category. As aspects of cyber security are classification problems, such
as attack detection, logistic regression is a useful analytic technique.
Logistic regression is a simple and more efficient method for binary and
20
linear classification problems. It is a classification model, which is very
easy to realize and achieves very good performance with linearly
separable classes. It is an extensively employed algorithm for
classification in industry
2.Random Forest Regressoion
Random Forest Regression is a supervised learning algorithm that

uses ensemble learning method for regression. Ensemble learning
method is a technique that combines predictions from multiple machine
learning algorithms to make a more accurate prediction than a single
model.
A Random Forest Regression model is powerful and accurate. It usually
performs great on many problems, including features with non-linear
relationships. Disadvantages, however, include the following: there is no
interpretability, overfitting may easily occur, we must choose the number
of trees to include in the model.
21
5.Code
app.py
import streamlit as st
import pickle
import pandas as pd
teams = ['Sunrisers Hyderabad',

'Mumbai Indians',
'Royal Challengers Bangalore',
'Kolkata Knight Riders',
'Kings XI Punjab',
22
'Chennai Super Kings',
'Rajasthan Royals',
'Delhi Capitals']
cities = ['Hyderabad', 'Bangalore', 'Mumbai', 'Indore', 'Kolkata', 'Delhi',

'Chandigarh', 'Jaipur', 'Chennai', 'Cape Town', 'Port Elizabeth',
'Durban', 'Centurion', 'East London', 'Johannesburg', 'Kimberley',
'Bloemfontein', 'Ahmedabad', 'Cuttack', 'Nagpur', 'Dharamsala',
'Visakhapatnam', 'Pune', 'Raipur', 'Ranchi', 'Abu Dhabi',
'Sharjah', 'Mohali', 'Bengaluru']
pipe = pickle.load(open('pipe.pkl','rb'))
st.title('IPL Win Predictor')
col1, col2 = st.beta_columns(2)
with col1:
batting_team = st.selectbox('Select the batting team',sorted(teams))
with col2:
bowling_team = st.selectbox('Select the bowling team',sorted(teams))
selected_city = st.selectbox('Select host city',sorted(cities))
target = st.number_input('Target')
col3,col4,col5 = st.beta_columns(3)
with col3:
score = st.number_input('Score')
with col4:
overs = st.number_input('Overs completed')
with col5:
wickets = st.number_input('Wickets out')
23
if st.button('Predict Probability'):
runs_left = target - score
balls_left = 120 - (overs*6)
wickets = 10 - wickets
crr = score/overs
rrr = (runs_left*6)/balls_left
input_df = pd.DataFrame({'batting_team':[batting_team],'bowling_team':
[bowling_team],'city':[selected_city],'runs_left':[runs_left],'balls_left':[balls_left],'wickets':
[wickets],'total_runs_x':[target],'crr':[crr],'rrr':[rrr]})
result = pipe.predict_proba(input_df)
loss = result[0][0]
win = result[0][1]
st.header(batting_team + "- " + str(round(win*100)) + "%")
st.header(bowling_team + "- " + str(round(loss*100)) + "%")
Setup.sh
mkdir -p ~/.streamlit/
echo "\
[server]\n\
port = $PORT\n\
enableCORS = false\n\
headless = true\n\
\n\
" > ~/.streamlit/config.toml
24
6.FUTURE ENHANCEMENTS
In future, making minor changes the model can also be made to work with the ODI
and test matches. The international matches can be analysed in a similar way and more
visualizations can be added to the functions.The system can also be made to adapt
more file formats of data for better analysis of varied forms of data collected. the focus
can be on each player’s performance and evaluate that on a regular basis for the
season. His ratings for bowling and batting can also be predicted. There can be a
chance to predict the man of the match for the two teams.
25
26
7. CONCLUSION
Predicting a winner in a sport such as cricket is especially challenging and

involves very complex processes. But with the introduction of machine
learning, this can be made much easier and simpler. In this paper, various
factors have been identified that contribute to the results of the Indian Premier
League matches. Factors have a major impact on the outcome of an IPL match
include the teams playing, the venue, the city, the toss winner and the toss
decision. We have analyzed IPL data sets and predicted game results based on
player performance. The methods used in the work to obtain the final test are
Logistic regression, Random Forest classifier outperforms the other
algorithms.
27
8. REFERENCES
1)Haseeb Ahmad, Ali Daud, Licheng Wang, Haibo Hong, Hussain Dawood and Yixian
Yang,
Prediction of Rising Stars in the Game of Cricket,IEEE Access, Volume 5, PP. 4104 – 4124,
14 March 2017.
2)Haryong Song, Vladimir Shin and Moongu Jeon,Mobile Node Localization Using Fusion
Prediction-Based Interacting Multiple Model in Cricket Sensor Network, IEEE Transactions
on Industrial Electronics, Volume: 59, Issue: 11,November 2012.
3)Sarbani Roy, Paramita Dey and Debajyoti Kundu,Social Network Analysis of Cricket
Community Using a Composite Distributed Framework: From Implementation Viewpoint,
IEEE Transactions on Computational Social Systems, Volume: 5,Issue: 1, PP. 64-81, March
2018.
4)Priyanka S, Vysali K, K B Priya Iyer, Score Prediction of Indian Premier League- IPL
2020using Data Mining Algorithms, International Journal for Research in Applied Science
& Engineering Technology (IJRASET), Volume 8,Issue II, PP. 790-795.
5)Prince Kansal, Pankaj Kumar, Himanshu Arya, Aditya Methaila, Player valuation in
Indian premier league auction using data mining technique, International Conference on
Contemporary Computing and Informatics (IC3I), 27-29 Nov 2014
6)Shilpi Agrawal, Suraj Pal Singh, Jayash Kumar Sharma, predicting results of IPL T-20
Match using Machine Learning, 2018 8th International Conference on Communication
Systems and Network Technologies (CSNT), 24-26 Nov. 2018.
7)Harshit Barot, Arya Kothari, Pramod Bide, Bhavya Ahir, Romit Kankaria, Analysis and
Prediction of Indian Premier League, 2020 International Conference for Emerging
Technology (INCET), 5-7 June 2020.
28

Mini Project Report On Ipl Win Probability Predictor"

Uploaded by

Copyright:

Available Formats

Mini Project Report On Ipl Win Probability Predictor"

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mini Project Report On Ipl Win Probability Predictor"

Uploaded by

Copyright:

Available Formats

MINI PROJECT REPORT

"IPL WIN PROBABILITY PREDICTOR”

BHAVESH SONJE – 119A3010

UNDER THE GUIDANCE OF

SUBMITTED IN PARTIAL FULFILLMENT FOR THE DEGREE OF

DEPARTMENT OF INFORMATION TECHNOLOGY SIES GRADUATE

Sr. No. Name Roll No.

1. BHAVESH SONJE 119A3010

2. GAURAV SUVARNA 119A3018

3. UDAYAN IYER 119A3023

Internal Guide Head of Department Principal

Name of External Examiner: ___________________________________________

Signature with Date: _____________________________

Name of Internal Examiner: ___________________________________________

Signature with Date: _____________________________

BHAVESH SONJE – 119A3010

Sr. No. Topic Page No.

1.2 Problem Statement & Objectives

2.1 Survey of existing System

2.2 Limitation Existing System or Research Gap

3.2 Architecture /Framework

3.3 Details of hardware and software

3.4 Experiment Results

2.1 Survey of Existing System

Dataset 5:Deliveries Dataset-The Deliveries dataset contains 20 columns i.e innings,

Step 2:Splitting data

3.3 Details of hardware and software

We have used the following machine learning algorithms: -

Logistic regression is a process of modeling the probability of a discrete

2.Random Forest Regressoion

Random Forest Regression is a supervised learning algorithm that

teams = ['Sunrisers Hyderabad',

cities = ['Hyderabad', 'Bangalore', 'Mumbai', 'Indore', 'Kolkata', 'Delhi',

col1, col2 = st.beta_columns(2)

selected_city = st.selectbox('Select host city',sorted(cities))

Predicting a winner in a sport such as cricket is especially challenging and

You might also like