Estimating the IPL Winner using Machine Learning - Ijaresm

Page created by Rene Sandoval
 
CONTINUE READING
Estimating the IPL Winner using Machine Learning - Ijaresm
International Journal of All Research Education and Scientific Methods (IJARESM), ISSN: 2455-6211
 Volume 9, Issue 6, June -2021, Impact Factor: 7.429, Available online at: www.ijaresm.com

 Estimating the IPL Winner using Machine Learning
Dr. Vanitha.K1, Bhargav Reddy.K2, Govardhan Reddy.N3, Jameel Basha.S4, Chinmay Sai.Y5
 1
 Assistant Professor, Department of Computer Science and Engineering, Madanapalle Institute of Technology and
 Science, Madanapalle
 2,3,4,5
 B. Tech IV Year, Department of Computer Science and Engineering, Madanapalle Institute of Technology and
 Science, Madanapalle

-----------------------------------------------------------------*****************----------------------------------------------------------------

 ABSTRACT
Cricket could be a popular sport all around the world, notably in India. Cricket tournaments and competitions,
like the IPL, require a big amount of resources, effort, and time to execute (Indian Premier League). As a result,
players, coaches, and club management are under tremendous pressure to perform well with such high stakes at
stake. As a result, we worked on developing a victory prediction system during this research., which expects the
probability of a team winning during a match, through various parameters (features), supported their past
matches. The input parameters are changed, in order that the best probability of winning will be attained, in
this match. Thus, it helps team captains, coaches, and management to settle on those constraints (players),
therein match, to extend their win probability. In addition, the strengths and weaknesses of the team's bowling
and batting orders are identified in order to improve team performance. Predictive analytics is one of the
domains of machine learning in which the probability of a particular lead, in the long run, is forecasted based on
historical data. Before making any predictions, we need to thoroughly investigate and examine the data

Keywords: Machine Learning, IPL, Data Analysis, Model Classifiers, Prediction, Prediction Models

 INTRODUCTION

Machine Learning is the subdivision of Artificial Intelligence where the real-world problems can be resolved in Real
world Engineering. This Procedure does not need any programming whereas only depends on data learning where the
machine learns from the pastdata and predict the result accordingly. Machine Learning approaches have advantage of
using decision trees, heuristic learning, knowledge acquisition, and mathematical models. It is a Twenty-20 cricket
competition league which is played in India for inspiring the young and dynamic players. Since technology is
improving at a faster rate, and because there is such a vast market for betting and such a great demand for cricket, the
general public has been influenced to utilise machine learning calculations to predict the outcomes of cricket matches.
Machine learning and data science make life easier in every way; for example, applying machine learning and
forecasting the outcome before a match will assist players and coaches in identifying weak areas. It has strong ties to
numerical improvement, which allows it to communicate techniques, hypotheses, and application areas to the
industry.Machine learning and data processing are sometimes confused, however the latter domain focuses more on
exploratory data analysis and is referred to as supervised learning. Predicting the outcome of a match has become so
simple thanks to advancements in technology and, more recently, in sports. To train the algorithms, we use career
statistics as well as team performances such as batting and bowling. As a result, we use supervised learning algorithms
to forecast the outcome of the sport.

Machine learning employs a variety of approaches, each of which is tailored to the datasets and parameters employed
and predicts the outcomes accordingly. The 756 records that mostly matched have been taken into account and fitted to
the modelling techniques that fit in it appropriately. The noisy data is separated, and the data is pre-processed before
the models are trained. Some of the data is used to create a training set, and models are trained using that datasets,
while the remaining data is utilized to test the models. The accuracy is one statistic that could be used to evaluate
whether model has generated effective predictive results.

 LITERATURE SURVEY

Indian PremierLeague has huge popularity there is a lot of associated work that are done on the estimation of the
outcome of the match. Random Forest Classifier, Support Vector Machine, KNN, Logistic Regression, Gaussian NB,
Gradient Boosting Classifier, Decision Tree Classifier, and more models have been utilised in articles.There
arevariousresearch papers related IPL the preciseoutcome was not produced due to the discrepancy of the data.

In the paper [2] the complete weight of a team is measured by taking each player performance. Seven types of machine
learning models were trained accordingly and used for predicting the result. Among them Decision Tree Classifier and

 IJARESM Publication, India >>>> www.ijaresm.com Page 1725
Estimating the IPL Winner using Machine Learning - Ijaresm
International Journal of All Research Education and Scientific Methods (IJARESM), ISSN: 2455-6211
 Volume 9, Issue 6, June -2021, Impact Factor: 7.429, Available online at: www.ijaresm.com

Random Forest has given the highest accuracy. The paper focuseson analysing and predicting the winner using some
machine learning techniques. [3] By using the existed data mining algorithms the outcome of an IPL is
measured of both balanced and imbalanced datasets. For the inconsistent datasets oversampling technique is used
and then procedures are applied to it. Here the precision for outcome is used as the performance metric and algorithms
are used for calculating. The previous [3] IPLinformation is taken and analysed and classified accordingly. By using
larger dataset, the model efficiency can be increased.The probability for the last few years matches determine that
which team is going to win for the upcoming match. Seven variables of datasets were taken to fit the model
and results are predicted according. Different models of machine learning techniques are used in this paper.Here the
research is based on the previous information that we have taken from Kaggle resource. Decision Tree Classifier and
Random Forest has given accurate values for this research paper.

Problem Statement
A cricket match has two outcomes: either the team wins or loses. However, focusing solely on winning or losing does
not provide an exact assessment. Other elements to consider include home grounds, venue, toss decision, toss winner,
city, and so on. Considering the other elements will aid in deciding the match prediction outcome, as well as the
strength that supports the decision that was previously predicted.

In general, a T20 match has a variety of characteristics that influence the game's outcome; in this project, we've focused
on all of these characteristics that have a chance of becoming the match's decision-making feature; thus, by including
such characteristics, we've increased the efficiency of our analysis. If there are two teams, X and Y, the outcome will
not be that either X or Y will win the match, but this analysis will provide us with the predicted winner as well as some
accuracy, which is the strength that we have gotten.

 Process Flo
The process flow Contains Certain Steps
 Data Collection
 Data Cleaning
 Data visualization

 IJARESM Publication, India >>>> www.ijaresm.com Page 1726
Estimating the IPL Winner using Machine Learning - Ijaresm
International Journal of All Research Education and Scientific Methods (IJARESM), ISSN: 2455-6211
 Volume 9, Issue 6, June -2021, Impact Factor: 7.429, Available online at: www.ijaresm.com

 Fig 1: Process of Predicting the Winning Team

 Data Collection
The information (data) from the years (2008-2017) has been taken into consideration for analysing the data and the
variables are selected from the data. The data is taken from the Kaggle repository. A Library named pandas is used for
the transformation of data into numerical data for prediction. A total of 756 match records were taken into
consideration for analysing and estimating the result of the match. In the Collected data there will be some attributes
that were irrelevant for the prediction of the outcome of the match. These attributes must be removed from Prediction
in order to maximize its performance.

 Fig 2: Collection of data with various attributes for prediction

 IJARESM Publication, India >>>> www.ijaresm.com Page 1727
International Journal of All Research Education and Scientific Methods (IJARESM), ISSN: 2455-6211
 Volume 9, Issue 6, June -2021, Impact Factor: 7.429, Available online at: www.ijaresm.com

Data Cleaning
The practice of removing inconsistencies and replacing them with genuine values is known as data cleaning. The
datasets collected contain noisy data which consists of null values and irrelevant values for some rows in the dataset
that must be removed. So, the null values will be replaced with 0 and irrelevant values with appropriate values in the
data set so that analysis can be made efficiently. with the removal of null values and replacing them with correct values
and removal of irrelevant attributes rises the precision of the match outcome

Data Cleaning Steps
 Removing Unwanted Observations
 Missing Data Handling
 Structural error solving
 Outliers management

 Fig 3: Removal of Null values from collected Data

 Fig 4: Removal of Irrelevant attributes for prediction (figure shows relevant Attributes)

 IJARESM Publication, India >>>> www.ijaresm.com Page 1728
International Journal of All Research Education and Scientific Methods (IJARESM), ISSN: 2455-6211
 Volume 9, Issue 6, June -2021, Impact Factor: 7.429, Available online at: www.ijaresm.com

Data Visualization
 The collected data is used for visualizing for better understanding of the information. Python contains Matplotlib
library used for visualizing the graphs.

 Fig 5: Team winning the toss and winning the match

 IJARESM Publication, India >>>> www.ijaresm.com Page 1729
International Journal of All Research Education and Scientific Methods (IJARESM), ISSN: 2455-6211
 Volume 9, Issue 6, June -2021, Impact Factor: 7.429, Available online at: www.ijaresm.com

The above graph shows if the team wins the toss there is more probability that team wins match

 Fig 6: Distribution of Runs

The graph shows there were more than 120 instances where teams has won the match with less than nearly 15 run
difference.

 Fig 7: Team batting First and Winning the Match

 IJARESM Publication, India >>>> www.ijaresm.com Page 1730
International Journal of All Research Education and Scientific Methods (IJARESM), ISSN: 2455-6211
 Volume 9, Issue 6, June -2021, Impact Factor: 7.429, Available online at: www.ijaresm.com

 Fig 8: Team batting Second and Winning the Match

Predicting the results with the help of models
The outcome of a match is predicted after fitting the appropriate data in the required models for the prediction.
The models that are used in here are Random Forest, support vector machine,DecisionTreeClassifier, KNN, Naive
Bayes which random forest, Decision Tree Classifier has given better result rather than support vector machine.

Random Forest Classifier:
Random Forest classifier is used for regression and classification which is a supervised learning procedure where the
model learns from the former information and predicts the outcome of the match. Random forest Classifier work with
the decision trees on data samples and lastly gets the best result among the predicted ones. In this project the random
forest has given the best accuracy for the variables that have been taken.

Support Vector Machine
The Support Vector Machine (SVM) is a type of supervised learning problem that can be used to handle both
regression and classification issues. It is mostly used in Machine Learning to solve Classification difficulties. Every
data item is a point as a spot in n-dimensional space in the SVM algorithm, with the value of each character being the
value of a coordinate. Then we conduct analysis by locating the hyper-plane that best distinguishes the two classes.

Decision Tree Classifier
The Decision Tree is a supervised training technique applied to classification and regression problems, though it is
most commonly employed to solve classification problems. It's a tree-structured predictor with interior nodes reflecting
data set properties, branches reflecting rule base, and each leaf node reflecting the conclusion.
The Node Represents and the Leaf Node are the two types of nodes of a Decision tree. Leaf nodes are indeed the result
of those conclusions and do not contain any further branches, while Decision nodes are being used to make decision
and have multiple branches.

KNN
The full training dataset is used as the model representation for KNN. There is no training required since KNN seems
to have no strategy other than holding the complete dataset.

To make glance and matching for patterns during prediction efficient, efficient implementations could store the data
using advanced data structures as k-d trees.

 IJARESM Publication, India >>>> www.ijaresm.com Page 1731
International Journal of All Research Education and Scientific Methods (IJARESM), ISSN: 2455-6211
 Volume 9, Issue 6, June -2021, Impact Factor: 7.429, Available online at: www.ijaresm.com

Since this entire training data is saved, you should consider the consistency of the training samples thoroughly.
Curating it, upgrading it regularly as new data arrives, and eliminating erroneous and abnormal data could be a good
idea.

Naive Bayes
The Naive Bayes process is effective to build and is especially useful for huge data sets. Naive Bayes is renowned to
outperform even the most advanced classification systems due to its simplicity.
Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c). Look at the
equation below:
 P(x|c)p(c)
P(c|x)=
 ( )

 Fig 9: comparison of Naïve bayes vs SVC vs Random Forest Vs KNN vs Decision Tree

 RESULTS

The algorithms that we used are random forest, support vector machine, Naïve Bayes, KNN, Decision Tree. Among
them random forest and Decision tree has given the best result for the parameters that taken into consideration.
The random forest has given accuracy of86.640% whereas the support vector modelhas also given the accuracy of
86.508%and KNN has given accuracy 66.66% and Decision Tree with accuracy of 86.640%.The variables that we
taken here are team1, team2, city, toss decision, toss winner and venue.

Id for Each Team Team Name Short Form

1 Mumbai Indians MI

2 Kolkata Night Riders KKR

3 Royal Challengers Bangalore RCB

4 Deccan Chargers DC

5 Chennai Super Kings CSK

6 Rajasthan Royals RR

7 Delhi Daredevils DD

8 Gujarat Lions GL

9 Kings XII Punjab KXIP

10 Sunrises Hyderabad SRH

 IJARESM Publication, India >>>> www.ijaresm.com Page 1732
International Journal of All Research Education and Scientific Methods (IJARESM), ISSN: 2455-6211
 Volume 9, Issue 6, June -2021, Impact Factor: 7.429, Available online at: www.ijaresm.com

11 Rising PuneSupergiants RPS

12 Kochi Tuskers Kerala KTK

13 Pune Warriors PW

14 Draw

Here eachteam of categorical form is encoded into the numeric value for better understanding and the teams are listed
in the table. The parameter venue was also encoded to the numerical format. Each venue has different values.

 Fig 10: The data encoded in the numerical Format

 Fig 11: Result of Actual Winner vs Predicted Winner

 CONCLUSION

The result of the match mainly depends on the selection of the team and the player performances in the match. Not only
the performance but also depends on some other factors like toss wining, toss decision,venue, team1, team2, city where
the match is played. Predicting the IPL is not so easy because the game depends on so many factors. The main
source of this paper is that the predicting the winner according to the past data from 2008 to 2017. In this paper five
types of classification algorithms were used and predict the results. The tools that are used in implementation are
python programming. Among the two classification algorithms random forest gave the highest accuracy of 86.640%
and next support vector system gave the accuracy of 66.667%. This information will be used in the future predictionof
winner and team selections accordingly so that there will be more chance for winning the next match.

 IJARESM Publication, India >>>> www.ijaresm.com Page 1733
International Journal of All Research Education and Scientific Methods (IJARESM), ISSN: 2455-6211
 Volume 9, Issue 6, June -2021, Impact Factor: 7.429, Available online at: www.ijaresm.com

 REFERENCES

[1] R. P. Schumaker, O. K. Solieman and H. Chen, "Predictive Modeling for Sports and Gaming” in Sports Data
 Mining, vol. 26, Boston, Massachusetts: Springer, 2010.
[2] Bunker, Rory &Thabtah, Fadi. (2017) “A Machine Learning Framework for Sport Result Prediction.
 Applied Computing and Informatics”, 15. 10.1016/j.aci.2017.09.005. [6] Ramon Diaz-Uriarte and Sara, “Gene
 selection and classification of microarray data using random forest, BMC Bioinformatics”, doi:10.1186/1471-
 2105-7-3.
[3] Akhil Nimmagadda et. Al, “Cricket score and winning prediction using data mining”, IJARnD Vol.3, Issue3.
[4] A. L. Samuel, “Some studies in machine learning using the game of checkers. iirecent progress,” in
 Computer Games I,pp. 366–400, Springer, 1988.
[5] S. Kampakis and W. Thomas, “Using machine learning to predict the outcome of English county twenty
 over cricket matches,” arXiv preprint arXiv:1511.05837, 2015
[6] . Bandulasiri, “Predicting the winner in one day international cricket,” Journal of Mathematical Sciences &
 Mathematics Education, vol. 3, no. 1, pp. 6–17, 2008
[7] Vistro, Daniel, Leo Gertrude David, “The cricket winner prediction with application ofmachine learning and
 data analytics,” International Journal of Scientific and TechnologyResearch, Volume 8, Issue 09, 2019.
[8] Jhanwar, Vikram, “Predicting the Outcome of ODI Cricket Matches: A Team CompositionBased Approach,"
 International Institution of Information Technology, Hyderabad, 2016.
[9] Lokhande,Chawan, Pramila, “Prediction of Live Cricket Score and Winning,” InternationalJournal of Trend in
 Research and Development, Volume 5(1),(2018).
[10] Jaishankar, Rajkumar, “A review paper on cricket predictions using various machine learningalgorithms and
 comparisons among them,” International Journal for Research in AppliedScience and Engineering Technology,
 2018.
[11] Rory, Fadi. “A Machine Learning Framework for Sport Result Prediction,” AppliedComputing and
 Informatics, Volume 15, Issue 1 2017.
[12] Jayanth, Sandesh Bananki, Akas Anthony, GududuruAbhilasha, Noorni , Gowri Srinivasa,“A team
 recommendation system and outcome prediction for the game of cricket,” Journal ofsports analytics, vol 4, pp.
 263-273, 2018.
[13] Akhil, Venkata, Venkatesh, Sai, Chavali, "Cricket score and winning prediction using datamining,"
 International journal of advance research idea and innovations in technology, 2018.
[14] Tejinder, Vishal, Parteek, “Score and Winning Prediction in Cricket through Data Mining,”International
 Conference on Soft Computing Techniques and Implementations, 2015.
[15] Stylianos, William, “Using machine learning to predict the outcome of English county twentyover cricket
 matches,” Cornell university, 2015.

 IJARESM Publication, India >>>> www.ijaresm.com Page 1734
You can also read