IPL CLONE - A Journal of Composition Theory

Page created by Vivian Potter
 
CONTINUE READING
IPL CLONE - A Journal of Composition Theory
JAC : A JOURNAL OF COMPOSITION THEORY                                                                                   ISSN : 0731-6755

                                                           IPL CLONE
                               Pandit Samuel1, Kavya.V2, Dhanunjay.Ch3, Ramesh.G4, Manish.G5

                                                Department of Information Technology

                                 Anil Neerukonda Institute of Technology and Sciences, Visakhapatnam

           Abstract:
           The main purpose of this project is to predict the end result of IPL matches. Indian Premier League (IPL) is one
           in all the foremost popular cricket tournaments, and its financial is rising each season, its viewership has
           increased in marketing and also the betting marketplace for IPL matches is growing in per annum. Since IPL has
           huge popularity, it's needful to examine the possible predictors that affect the general results of the matches.
           IPL prediction, one in all the foremost appreciated and awaited cricket league. Cricket, especially the Twenty20
           format, has maximum uncertainty, where one over can completely change the momentum of the sport.
           With various people following the Indian Premier League (IPL), developing a model for predicting the end
           result of its matches may be a real-world problem. This paper explains machine learning technology to cope
           with the matter of predicting match results supported previous match data of the IPL seasons. A match depends
           upon various factors and every player's performance within the match is taken into account to search out the
           general strength of the teams. The prediction may be done quarter-hour before the gameplay, immediately after
           the toss using Random Forest algorithm since it's better in terms of precision,accuracy and recall
           metrics compared to other models.Mainly on the Toss featured subset, none of the machine learning algorithms
           performed well in generating accurate predictive models.

           Introduction:
           Sports have gained much importance in both national and international level. Cricket is one such game, which is
           marked because the prominent sports within the world. The Indian Premier League could be a professional
           Twenty20 cricket league in India. T20 is one amongst the sorts of cricket which is recognized by the
           International Cricket Council (ICC).Due to the short duration of your time and also the excitement generated,
           T20 has become an enormous success. The T20 format gave a productive platform to the IPL, which is now
           pointedbecause it is the biggest revolution within the field of cricket.

           It is currently contested by nine teams, consisting of players from round the world. it absolutely was started after
           an altercation between the BCCI and also the Indian Cricket League. IPL is an annual tournament usually
           played within the month of April and may. Each team in IPL represents a state or a component of nation in
           India. IPL has taken the T20 cricket’s popularity to sparkling heights. it's the foremost attended cricket
           league withinthe world and within the year 2010, IPL became the primary sporting event to be broadcasted live.
           Till date, IPL has successfully completed 11 seasons from the year of its inauguration. Currently, there are 8
           teams that compete with one another, organized in an exceedingly round robin fashion during the stages of the
           league. After the completion of league stages, the highest 4 teams within the points table are eligible to the
           playoffs. In playoffs, the winner between 1st and 2nd team qualifies for the ultimate and also the loser gets a no
           other opportunity to qualify for the finals by playing against the winner between 3rd and 4th team. In the
           end, the two qualified teams play against one another for the IPL title. the importance is that IPL employs
           television timeouts and so there's no time constraint during which teams on complete the innings. This game is
           exceedingly unpredictable because at each phase of the sport, the momentum changes to at least one of the
           teams between the two. Many times the results are decided within the last ball of the match where the sport gets
           really closer. Considering of these aspects, there's immense interest among the viewer to form predictions either
           at the start of the match or during the match. IPL games will be predicted by making use of statistics and teams
           past                                                  match’s                                                  data.

Volume XIII, Issue VI, JUNE 2020                                                                                                  Page No: 16
IPL CLONE - A Journal of Composition Theory
JAC : A JOURNAL OF COMPOSITION THEORY                                                                                 ISSN : 0731-6755

           The goal of our project is to develop a model to predict likelihood of a team winning the match. In this, we
           predict the players performance in previous matches by analysing their characteristics and statistics using
           supervised machine learning algorithms. For this, we predict batsmen’s and bowler’s performance separately
           as what number runs will a batsman score and inthe same way how many wickets will a bowler soak up a
           selected match.

           The literature survey concluded that there was a necessity for a machine learning model i.e., Random Forest
           algorithm to predict the end result of an IPL match before the sport begins. Among all formats of cricket,
           Twenty20 format sees lots of turnarounds within the momentum of the sport. An over can completely change a
           game. Hence, predicting an outcome for a Twenty20 game is sort of a challenging task. Besides, developing a
           prediction model for a league which is wholly supported auction is another hurdle. IPL matches cannot
           be predicted just by making use of statistics over historical data solely. due to players foundering auctions, the
           players are guaranteed to change their teams; is why the continued performance of each player must be taken
           into consideration while developing a prediction model.

           The     contributionsofthispaperareasfollows::
           1.To prepare thestatistical analysis of players supported different characteristics.
           2.To predict the performance of a team looking on individual player statistics.
           3.To successfully predict the result of IPL match.

           Literature Review:
           With the evolution of Cricket, it became a very hot topic for sports analysts. A lot of research has been made on
           cricket but due to inconsistent and complicated data sets, they could not get breakthrough in predicting match
           winner accurately. There are many techniques that has been used in predicting match winner like KNN, Logistic
           Regression, SVM, Naïve Bayes but nobody has achieved the accuracy.

           Prince Kansal et al [1] as built several predictionmodels for predicting the choice of a player in IPL basedoneach
           player’s past performance. Various data processingalgorithms are applied namely Decision Tree, Naïve
           Bayesand Multilayer Perceptron (MLP) on the dataset to meet theobjective. MLP gave the most
           effective accuracy among all otheralgorithms.

           RabindraLamsal et al [2] as proposed a linearregression based solution to calculate the load age of
           ateam supported the past performance of its players who haveappeared most for the team using 2 Machine
           Learningalgorithms: multivariate analysis and Random Forest and therefore theclassification results are
           satisfactory.

           A N Wickramsingheet al [3] created a model topredict match outcome using Machine
           Learningalgorithms like SVM, Logistic Regression, Naïve Bayesand Random Forest. Final results indicated that
           twitter -basedmodel is best than natural parameter -based model.

           Tejinder Singh et al [4] created a model that predictsthe score of 1st inning and therefore the outcome of the
           match within the 2ndinning. Implementation is completed using regression andNaïve Bayes. it absolutely
           was found that the accuracy of Naïve Bayesin predicting the match outcome is more.

           Shimona.S and Nivetha.S et al [5] states that the article aims at analysing the IPL match results from the
           datasetcollected (2008-2016) .It focuses on measuring the result of Indian Premier League (IPL) matches by
           applying the prevailing data processing algorithms to the balanced likewise as imbalanced dataset.Oversampling
           technique is employed for imbalanced dataset and so the algorithm is applied. Accuracy is employed because
           the performance metric and calculated by using data processing algorithms. it's also considered as evaluation
           criteria and percentage will vary in keeping with the various algorithms.

           KalpdrumPassi et al [6] attempted to predict theperformance of players. they need used Naïve Bayes,
           Random Forest, Multiclass SVM and Decision Treeclassifiers to get the prediction models for the matter.
           Random Forest classifier was found to be most accurate.

           Shimona S et al [7] aim at analysing the IPL cricketmatch results from the dataset collected by applying
           existingData Mining algorithm to both balanced and imbalanceddataset. The model was built successfully with
           accuracy rateof 97% for the balanced dataset and error rate was found tobe more in imbalanced dataset in
           comparison to it of thebalanced dataset.

Volume XIII, Issue VI, JUNE 2020                                                                                                Page No: 17
IPL CLONE - A Journal of Composition Theory
JAC : A JOURNAL OF COMPOSITION THEORY                                                                                   ISSN : 0731-6755

           Ahmed &Nazir et al [8] they implemented different statistical approaches for formation of datasets and tried
           various classification techniques to predict the winner of One Day Cricket (50 over) match. He has predicted the
           winner with 80 % accuracy.

           Jhawar et al [9] have done research on predicting the winner of the match at end of the over, player’s
           performance recent and past performance and other statistics’ which are necessary for predicting the winner of
           the match has been used. First challenge is to estimate the score that first team will score at the end of first
           innings. In Features combination to predict the match outcome, is relative strength of Team B divided by
           relative strength of Team A is successful in measuring and comparing the strength of the playing teams. By
           Random Forest classifier R.F.C. accuracy of 84% has been achieved.

           Yasir et al [10] predicted outcome of cricket match and for the winner prediction techniques, he proposed a
           method for predicting the team results and elaborated the working of method which is by using properties of
           dynamic team for the winner’s prediction like player’s history, weather conditions, ground history and winning
           percentage. He applied this technique on 100 matches and got 85 % prediction.

           Data Set and Methodology:
           Data Set:
           We collected data from Kaggle which contains details about 636 matches with 21 attributes.

           The data set contains two files:deliveries.csv and matches.csv

           1. Deliveries.csv: This data set contains ball by ball data of all the IPL cricket matches for all seasons including
           data of the batting team,bowling team,batsman,bowler,non-striker,runs scored, etc.

           2. Matches.csv: This data set contains details related to the match such as location, contesting teams, umpires,
           results etc.

           Run Rate = Total number of runs scored/Number of over’s bowled

           1) Required Run Rate: It is the number of runs per over the batting side must score

           in order to win the current match.

           Required Run Rate = Total runs required to win/Total over’s left

           2) Batsman Strike Rate: It is the average number of runs scored per 100 balls faced.

           Batsman Strike Rate = (Runs scored/Total balls faced)*100

           3) Bowler Average: It is the number of runs conceded by a bowler per wicket taken.

           Bowler Average = Runs conceded/Wickets taken

           Description of the attributes for deliveries.csv:
            Attributes                                                Description
            Match id                                                  Number assigned to each match
            Inning                                                    Division of a match
            Batting team                                              The team which is currently batting
            Bowling team                                              The team which is currently bowling
            Over                                                      The number of over’s bowled at a particular stage of
                                                                      the batting team
            Batsman                                                   Person who is batting
            Bowler                                                    Person who is bowling
            Total runs                                                Final score of the team
            Current Run Rate                                          Number of runs that a team scores in one over

Volume XIII, Issue VI, JUNE 2020                                                                                                  Page No: 18
IPL CLONE - A Journal of Composition Theory
JAC : A JOURNAL OF COMPOSITION THEORY                                                                                 ISSN : 0731-6755

            Required Run Rate                                       Number of runs per over the batting side must score
                                                                    in order to win the current match
            Batsman strike rate                                     Average number of runs scored per 100 balls faced
            Bowler Average                                          Number of runs conceded by a bowler per wicket
                                                                    taken

           Description of the attributes for matches.csv:

            Attributes                           Data Type                             Description and Values
            Id                                   Numeric                               Unique identifier of match
            Season                               Numeric                               Season of the match
            City                                 String                                City of match
            Date                                 Date                                  Date of match
            Team1                                String                                Bat first team
            Team2                                String                                Bowl first team
            Toss winner                          String                                Winner of toss
            Toss result                          String                                Values: Bat, Bowl
            Result                               String                                Values: Win, Lose, No result
            Dl_applied                           Boolean                               Duck Worth Lewis applied
            Winner                               String                                Winner of the match
            Win_by_runs                          Numeric                               Winner by runs
            Win_by_wickets                       Numeric                               Winner by wickets
            Player_of_match                      String                                Player of match
            Venue                                String                                Match held
            Umpire1                              String                                Umpire 1
            Umpire2                              String                                Umpire 2

           Methodology:
           1. Data Collection:

           It contains historical data from previous IPL matches.The Indian Premier League's official website is that
           theprincipal basis of knowledge for this project. the information waswebscrapped from the web site. Thedataset
           has the columns regarding match-number, IPL seasonyear, the place where match has been held and also
           the stadiumname, the match winner details, participating teams, themargin of winning and also the umpire
           details, player of the match.Indian Premier League was only 11 years old, which is why,after the pre-processing,
           only 634 matches were available.Here, a number of the columns may contain null values and a fewof the
           attributes might not be required for match winnerprediction which is discussed in data pre-processing.

           2. Data Pre-processing:

           The Dataset collected has some noisy data, so the data is first pre-processed. Pre-processing includes filling of
           missing values, scaling of values and encoding of categorical data.Here, during this step we've got tried to
           explore more within the datasetto find any anomalies present, every dataset may needcertain defects which need
           to be regulated to create it astandard form for performing calculations. Defects are often likehaving null values
           in certain attribute values or like havingempty values within the certain required attributes. This stepprovides
           us an in-depth format or understanding the dataset andpresenting in anexceedingly structured format which easy
           to process.

           3. Data cleaning:

           There are some null values within the dataset within the columns suchas winner, city, venueetc. thanks to the
           presence of those nullvalues, the classification cannot be done accurately. So, wetried to switch the null
           values inseveral columnswithdummy values.

Volume XIII, Issue VI, JUNE 2020                                                                                               Page No: 19
IPL CLONE - A Journal of Composition Theory
JAC : A JOURNAL OF COMPOSITION THEORY                                                                                  ISSN : 0731-6755

           4. Choosing Required Attributes:

           This step is that the main part where we will eliminate somecolumns of the dataset that aren't useful for the
           estimation ofmatch winning team. This can be estimated using featureimportance. The considered attributes
           have the subsequentfeature importance.

           Random Forest:

           Random Forest is a supervised learning algorithm which is used for both classification and regression.But
           however, it is mainly used for classification problems.As we know that a forest is made up of trees and more
           trees means more robust forest. Similarly, random forest algorithm creates decision trees on data samples and
           then gets the prediction from each of them and finally selects the best solution by means of voting.It is an
           ensemble method which is better than a single decision tree because it reduces the over fitting by averaging the
           result.

           Working:
           1.First, start with the selection of random samples from a given data set.
           2. Next, this algorithm will construct a decision tree for every sample.Then it will get the prediction result from
           every decision tree.
           3.Then voting will be performed for every predicted result.
           4. At last,select the most voted prediction result as the final prediction result.

           Classification form of problem always have a discrete value asthe output which are completely different to
           every other. Themain strategy behind random forest is that it divides the entirestrategy into multiple
           trees leading to various solutionsresulting in the foremost prominent tree path because the final accuracy.This
           helps in many classification algorithm,to classifyvarious object depending their behaviour. Here the
           expectedprediction error is calculated for each time, this error is additionallyknown as test error.

           The above steps are applied for the dataset using Random Forest as follows

                    The input dataset contains data which contains the details like Team, Venue, Toss Winner, City, Toss
           Decision.It has some missing and noisy data, so it was pre-processed. The missing values are filled with the
           average of remaining rows of same column. Encoding of Categorical data is done by replacing with values.

           The code below shows the implementation of Random Forest algorithm in python.

           defclassification_model(model, data, predictors, outcome):
           model.fit(data[predictors],data[outcome])
           predictions = model.predict(data[predictors])
           accuracy = metrics.accuracy_score(predictions,data[outcome])
           print('Accuracy : %s' % '{0:.3%}'.format(accuracy))
           kf = KFold(data.shape[0], n_folds=7)
           error = []
           for train, test in kf:
           train_predictors = (data[predictors].iloc[train,:])
           train_target = data[outcome].iloc[train]
           model.fit(train_predictors, train_target)
           error.append(model.score(data[predictors].iloc[test,:], data[outcome].iloc[test]))
           print('Cross-Validation Score : %s' % '{0:.3%}'.format(np.mean(error)))
           model.fit(data[predictors],data[outcome])
           model = RandomForestClassifier(n_estimators=100)

Volume XIII, Issue VI, JUNE 2020                                                                                                 Page No: 20
JAC : A JOURNAL OF COMPOSITION THEORY                                                                               ISSN : 0731-6755

           outcome_var = ['winner']
           predictor_var = ['team1', 'team2', 'venue', 'toss_winner','city','toss_decision']
           classification_model(model, df,predictor_var,outcome_var)
           df.head(7)
            team1='DC'
            team2='MI'
           toss_winner='DC'
           input=[dicVal[team1],dicVal[team2],'14',dicVal[toss_winner],'2','1']
           input = np.array(input).reshape((1, -1))
           output=model.predict(input)
           print(list(dicVal.keys())[list(dicVal.values()).index(output)]) #find key by value search output
           imp_input = pd.Series(model.feature_importances_, index=predictor_var).sort_values(ascending=False)
           print(imp_input)

           Graphs:

           Figure 1:Ratio of teams winning and losing the toss but winning the match

           Graph depicting theratio of teams winning the toss and also the match and teams losing the toss and winning the
           match.

Volume XIII, Issue VI, JUNE 2020                                                                                             Page No: 21
JAC : A JOURNAL OF COMPOSITION THEORY                                                                              ISSN : 0731-6755

           Figure 2:Comparison between the teams won the toss and also the game

           Comparison of number of times the teams have won the toss to the number of times the teams have ended
           winning.

           Figure 3:Scenario of the results between Mumbai Indians and Delhi capitals at each venue.

           The graph shows that Delhi capitals have a significant win advantage at the Feroz Shah Kotla while
           mumbaiindians have a Significant win advantage at Wankhede stadium.

Volume XIII, Issue VI, JUNE 2020                                                                                       Page No: 22
JAC : A JOURNAL OF COMPOSITION THEORY                                                                           ISSN : 0731-6755

                                    Figure 4: No. of Matches won by each team.

           Graph representing
                    presenting the no. of matches won by individual teams,According to the graph Mumbai Indians won
           the highest no. of matches of all the teams.

           Accuracy:

           We have achieved 86% accuracy in our model, it means if we predict the match outcome when Chennai Super
           Kings is playing in IPL, the probability will be 0.86

           Results:

Volume XIII, Issue VI, JUNE 2020                                                                                      Page No: 23
JAC : A JOURNAL OF COMPOSITION THEORY                                                                               ISSN : 0731-6755

           As stated in the methodology the historical data of matches are taken into consideration while predicting the
           outcome of the match. Random Forest classifier, the simple and easy to interpret classification algorithm is
           applied on data sets to check the accuracy. We have achieved 86% accuracy in our model,it means if we predict
           the match outcome when Chennai Super Kings is playing in IPL, the probability will be 0.86

           Conclusion:
           In the proposed work Random Forest algorithm is implemented on the data collected from different sources.
           Predicting the winner in sports, cricket specifically may be a challenge and extremely complex. But by
           incorporating machine learning, this will be made much simpler and easier. In this study, the assorted factors
           that influence the result of an Indian Premier League matches were identified. The factors which significantly
           influence the result of an IPL match included the playing teams, match venue, city, the toss winner and the toss
           decision.A generic function for classifier model was designed to measure the points earned by each team
           supported their pastperformances, including team1, team2, venue of the match,toss winner, city and toss
           decision. Differentclassification-based machine learning algorithms were trainedon the IPL dataset developed
           for this work. Themethodologies employed in our work to search out the ultimate evaluationare Logistic
           regression, Decision trees, Random forest andK-nearest neighbours. Among these techniques, the Randomforest
           classifier and Decision Tree provided the highestaccuracy of 86%.For future work, we plan to expand our work
           using moreattributes like the previous match score of the chosen teamand opponent team, the quantity of skilled
           batsmen within theopponent team, and more. The machine learning methodsused in our research can also be
           accustomed predict the result inother outdoor sports such as football, baseball and more.
           References

           1. [1] Prince Kansal, Pankaj Kumar, HimanshuArya and AdityaMethaila, “Player Valuation in Indian Premier
           League Auction using Data Mining Technique”, IEEE, 2014.
           2. [2] RabindraLamsal and Ayesha Choudhary, “Predicting Outcome of Indian Premier League (IPL) Matches
           using Machine Learning”.
           3. [3] A N Wickramasinghe and Roshan D Yapa, “Cricket Match Outcome Prediction using Tweets and
           Prediction of Man of the Match using Social Networks Analysis: Case Study using IPL Data”, IEEE, 2018.
           4. [4] Tejinder Singh, Vishal Singha and Parteek Bhatia, “Score and Winning Prediction in Cricket through Data
           Mining”, IEEE, 2015.
           5. [5] (Shimona.S ,Nivetha.S) “Analyzing IPL match results using data mining algorithms”
           6. [6] KalpdrumPassi and Niravkumar Pandey, “Increased Prediction Accuracy in the Game of Cricket using
           Machine Learning”, International Journal of Data Mining and Knowledge Management Process (IJDKP), Vol.8,
           No.2, March – 2018.
           7. [7]Shimona S, Nivetha S and Yuvarani P, “Analzing IPL Match Results using Data Mining Algorithms”,
           International Journal of Scientific and Engineering Research, Volume 9, Issue 3, March – 2018.
           8. [8] Ahmed, W. &Nazir, K., 2015. A Multivariate Data Mining Approach to Predict Match Outcome in One-
           Day International Cricket. 10.13140/RG.2.2.30683.46880.
           9. [9] Jhawar, M. G., Viswanadha, S., Sivalenka, K. &Pudi, V., 2017. Dynamic Winner Prediction in Twenty20
           Cricket: Based on Relative Team Strengths.. Conference: Machine Learning For Sports Analytics at ECML-
           PKDD
           10. [10]Yasir, M. et al., 2017. Ongoing Match Prediction in T20 International.IJCSNS International Journal of
           Computer Science and Network Security.

Volume XIII, Issue VI, JUNE 2020                                                                                              Page No: 24
You can also read