Evaluation of soccer team defense based on prediction models of ball recovery and being attacked

Page created by Christine Hartman
 
CONTINUE READING
Evaluation of soccer team defense based on prediction models of ball recovery and being attacked
Evaluation of soccer team defense based on prediction
                                                      models of ball recovery and being attacked
                                                      Kosuke Toda1 , Masakiyo Teranishi2 , Keisuke Kushiro1 , Keisuke Fujii2,3,4 *

                                                      1 Graduate School of Human and Environmental Studies, Kyoto University, Kyoto,
                                                      Kyoto, Japan
                                                      2 Graduate School of Informatics, Nagoya University, Nagoya, Aichi, Japan.
                                                      3 RIKEN Center for Advanced Intelligence Project, Fukuoka, Fukuoka, Japan.
arXiv:2103.09627v2 [cs.AI] 19 Mar 2021

                                                      4 PRESTO, Japan Science and Technology Agency, Kawaguchi, Saitama, Japan.

                                                      * fujii@i.nagoya-u.ac.jp

                                                      Abstract
                                                      With the development of measurement technology, data on the movements of actual
                                                      games in various sports are available and are expected to be used for planning and
                                                      evaluating the tactics and strategy. In particular, defense in team sports is generally
                                                      difficult to be evaluated because of the lack of statistical data. Conventional evaluation
                                                      methods based on predictions of scores are considered unreliable and predict rare events
                                                      throughout the entire game, and it is difficult to evaluate various plays leading up to a
                                                      score. On the other hand, evaluation methods based on certain plays that lead to
                                                      scoring and dominant regions are sometimes unsuitable to evaluate the performance
                                                      (e.g., goals scored) of players and teams. In this study, we propose a method to evaluate
                                                      team defense from a comprehensive perspective related to team performance based on
                                                      the prediction of ball recovery and being attacked, which occur more frequently than
                                                      goals, using player actions and positional data of all players and the ball. Using data
                                                      from 45 soccer matches, we examined the relationship between the proposed index and
                                                      team performance in actual matches and throughout a season. Results show that the
                                                      proposed classifiers more accurately predicted the true events than the existing
                                                      classifiers which were based on rare events (i.e., goals). Also, the proposed index had a
                                                      moderate correlation with the long-term outcomes of the season. These results suggest
                                                      that the proposed index might be a more reliable indicator rather than winning or
                                                      losing with the inclusion of accidental factors.

                                                      Introduction
                                                      The development of measurement technology has allowed for the generation of data on
                                                      the movements in various sports games for use in planning and evaluating the tactics
                                                      and strategy. For example, tracking data during a game of soccer, including the
                                                      positional data of the players and ball, is commonly used for players’ conditioning (e.g.,
                                                      running distance or the number of sprints) [1, 2]. However, during a soccer match, all 22
                                                      players and the ball interact in complex ways for scoring goals or preventing being
                                                      scored (it is sometimes referred to as conceding) for each team. Hence, it is then
                                                      necessary to evaluate the performance of not only individuals but also the entire
                                                      team [3]. Defensive tactics are particularly considered difficult to evaluate because of
                                                      the limited amount of available statistics, such as goals scored in the case of attacks.

                                     March 22, 2021                                                                                        1/12
Evaluation of soccer team defense based on prediction models of ball recovery and being attacked
There are three main approaches to quantitatively evaluate teams and players in
                 soccer, mainly from an attacking perspective. The first approach is based on scoring
                 prediction, which evaluates plays based on changes in the expected values of goals
                 scored and conceded based on a prediction of scoring using tracking data [4–7]
                 (reviewed in [8]) and action data such as dribbling and passing [9], as well as other
                 rule-based methods (e.g., [10]). The second approach is used to evaluate plays such as
                 passes and effective attacks which lead to shots. For example, a previous study
                 evaluated the value of passes based on relationships to the expected score and the
                 difficulty in successfully completing a pass [11]. An effective attack can be defined as a
                 play that will likely lead to a score [12]. Previous studies have analyzed pass
                 networks [13] and three player interactions [14, 15], as well as pass reception [16] and of
                 the related defensive weaknesses [17]. For defenses, researchers have evaluated
                 interception [18] and the effectiveness of defensive play by the expected value of a
                 goal-scoring opportunity conceded [19]. For the third approach, spatial positioning of
                 the players is evaluated by calculating the dominant region with the use of a Voronoi
                 diagram [20] and the Gaussian distribution [21]. Recent research has also been
                 conducted on the evaluation of movements that create space for teammates [22, 23].
                     However, these approaches have have several limitations. For evaluation based on
                 the prediction of scoring (i.e., the first approach), the evaluation is not reliable because
                 it predicts events that are rare throughout a game, and the process leading up to the
                 goals is sometimes difficult to evaluate. Furthermore, the second approach to evaluate
                 specific plays that lead to goals and the third approach regarding positioning have
                 difficulties in relating the evaluation to overall performance (such as wins and losses).
                 Also, since many studies on the first and second approaches have used only the actions
                 and coordinates of players around the ball, it would be difficult to evaluate players at
                 greater distances from the ball and the team as a whole.
                     To address these issues, we propose a method called Valuating Defense by Estimating
                 Probabilities (VDEP), which utilizes the actions and positional data of all players and
                 the ball. The main contributions of this work are as follows: (i) the proposed method is
                 based on the prediction of ball possession and effective attacks, which occur more
                 frequently than the rare goals; (ii) based on a comprehensive perspective related to
                 team outcomes, we evaluated the team’s defense. Methodologically, we modified the
                 existing method called VAEP (Valuating Actions by Estimating Probabilities) [9], which
                 is based on the classifiers to predict scoring and conceding, so that the defensive process
                 can be evaluated by applying the approach to ball recovery and being attacked. We
                 validated the classifiers of the proposed and existing methods and shows that the
                 proposed classifiers predicted the true events more accurately than the existing
                 classifiers. Moreover, we examined the relationship between VDEP and the team
                 performance in actual matches and throughout the season, as compared with VAEP. We
                 also present examples of evaluating a game and a complete season of a specific team.

                 Materials and methods
                 Dataset
                 In this study, we used event data (i.e., labels of actions, such as passing and shooting,
                 recorded at 30 Hz and the simultaneous xy coordinates of the ball) and tracking data
                 (i.e., xy coordinates of all players and the ball recorded at 25 Hz) of a total of 45 games
                 from week 30 to week 34 of the Meiji Yasuda Seimei J1 League 2019 season provided by
                 Data Stadium, Inc. (Tokyo, Japan; for the details, see Acknowledgments). In all 45
                 games, there were 106 goals scored, 1,174 shots, 3,701 effective attacks, and 9,408 ball
                 recoveries (all based on the provided event data). An effective attack is defined as an

March 22, 2021                                                                                          2/12
event that finally ends in a shot or penetrates the penalty area. Also, ball recovery is
                 defined as a change in the attacking team before or after the play due to some factors
                 other than an effective attack. In this study, an effective attack is defined as being
                 attacked from the defender’s perspectives. When calculating VDEP and VAEP values,
                 we used a cross-validation procedure, which repeats the learning of classifiers using the
                 data of four weeks (36 games) and a prediction using the data of one week (9 games)
                 five times (i.e., data of all five weeks were finally predicted and evaluated) to analyze all
                 games [24–26].

                 Proposed Method
                 The ultimate goal of defense in soccer is to prevent the opposing team from scoring a
                 goal. However, since goal-scoring scenes are rare events, it may lead to ineffective
                 training of a classifier and evaluating the events in a unreliable manner (the validation
                 results of the VAEP method [9] will be presented later). Therefore, to reasonably
                 evaluate the defense of a team, we propose the VDEP method to evaluate important
                 factors for preventing goals from being scored. The VDEP method evaluates the
                 potential increase in the number of ball recoveries and the potential decrease in the
                 number of effective attacks. The number of effective attacks was chosen instead of the
                 number of shots because of the following scenarios as defensive failures, in which an
                 attacker selects to pass the ball rather than to shoot. Therefore, in this study, we
                 evaluate the process of defense based on the expected value computed by the classifiers
                 to predict ball recovery and being attacked in an analogous way of the VAEP
                 method [9] based on the prediction of scoring and conceding.
                     Suppose that the state of the game is given by S = [s1 , . . . , sN ] in chronological
                 order. We consider si = [ai , oi ], whereas the previous study [9, 19] uses only ai , which
                 includes the ith action involving the ball and its coordinates. The proposed method
                 utilizes classifiers trained with the state si , which includes the feature oi far from the
                 ball (off-ball) at the time of the action. Since all defensive and offensive actions in this
                 study are evaluated from the defender’s point of view, the following time index i is used
                 as the ith event.
                     Given the game state Si of a certain interval, we define the probability of future ball
                 recovery Precoveries (Si ) and the probability of being attacked Pattacked (Si ) in a state Si
                 at an event i based on the classifier trained from the data. Defensive players are
                 considered to act so that Precoveries (Si ) becomes higher or Pattacked (Si ) becomes lower.
                 Therefore, the value of defense in the proposed method Vvdep is defined as follows:

                                      Vvdep (Si ) = Precoveries (Si ) − C ∗ Pattacked (Si ),               (1)

                 where C is a parameter that adjusts the values of ball recoveries and effective attacks.
                 In this study, we adjusted these values based on the frequency of each event in the
                 training data. As described below, we determined C ≈ 3 because the ratio of ball
                 recoveries and effective attacks is approximately 3 : 1 (the value differs for each of 5-fold
                 cross-validation). Since the main aim of this study is to evaluate the team, we define
                 the evaluation value per game for team p as follows:
                                                            1
                                              Rvdep (p) =     Σ    p V
                                                                       vdep (Si ),                         (2)
                                                            M Si ∈SM
                                                                                  p
                 where M is the number of events for team p in a match and SM        is the set of states S of
                 team p up to the M th event. Similarly, the sum of evaluation values using only
                 Precoveries and Pattacked are defined as Rrecoveries (p) and Rattacked (p), respectively. For
                 the VAEP [9] method in the previous study, the value averaged by the playing time of
                 each player was used. However, since the time each team played the game was almost

March 22, 2021                                                                                           3/12
the same, in this study, each team is evaluated by the sum of Svaep (p) as the VAEP
                 value. Also, Sscores (p) and Sconcedes (p) are used in the analysis as separate evaluation
                 values, although the VAEP [9] value is calculated based on the prediction of goals
                 scored and conceded.

                 Procedures
                 The feature ai near the ball in this study was constructed using the action and tracking
                 data with reference to the previous study [9]. Specifically, we used the types of events
                 used in the previous study [9] (19 types including pass, shot, tackle, and so on), the
                 start/end time of the event, the displacement of movement, and elapsed time from the
                 start to the end of the event, the distance and angle between the ball and the goal, and
                 whether there was a change in offense or defense from the previous event (73 dimensions
                 in total). Moreover, in this study, the off-ball feature oi at the time that the event
                 occurred was included in the state si . Specifically, for each team, we used the x and y
                 coordinates of positions of all players and the distance of each player from the ball,
                 sorted in the order of closest to the ball (137 dimensions in total). We used XGBoost
                 (eXtreme Gradient Boosting) [27], which was used in the previous study [9], as the
                 classifier to predict ball recoveries and being attacked. Gradient boosting methods are
                 known to perform well on a variety of learning problems with heterogeneous features,
                 noisy data, and complex dependencies. The time range of the input Si to the classifier
                 was ith, i − 1th, and i − 2th actions in the previous study [9]. In this study, since the
                 effect of si−2 on the prediction performance was small in the preliminary experiments,
                 we used s = [a, o] including the ith and i − 1th actions.
                     In the first classification for estimating Precoveries (Si ), we assigned a positive label
                 (= 1) to the game state Si if the defending team in the state Si recovered the ball in a
                 subsequent k actions, and a negative label (= 0) if the ball was not recovered. Similarly,
                 in the second classification for estimating Pattacked (Si ), we assigned a positive label
                 (= 1) to the game state Si when an effective attack was made in a subsequent k actions.
                 In both classifications, k is a parameter freely determined by the user. If k is small, the
                 prediction is short-term and reliable, and if k is large, the prediction is long-term and
                 includes many factors. In this study, we set k = 5 based on the results of preliminary
                 validation.
                     In the data used in this study, defined by k above, the total number of events for all
                 teams was 97,335, with 35,286 positive cases of ball recovery and 13,353 positive cases
                 of being attacked. In terms of goals scored and conceded for the calculation of the
                 VAEP value [9], there were 753 positive cases of goals scored and 227 positive cases of
                 goals conceded (the total number of events was the same, but we set k = 10 in
                 accordance with the previous study [9]). These indicate that goals scored and conceded
                 are rare events compared to ball recoveries and being attacked. Therefore, the goals
                 scored and conceded may not be correctly evaluated by the area under the receiver
                 operating characteristic curve (AUC) and Brier scores used in the previous study [9].

                 Evaluation and Statistical Analysis
                 To validate the classifier, we used the F1 score in addition to the AUC and Brier scores
                 used in the previous study [9]. AUC is calculated by plotting the cumulative
                 distribution function of the true positive rate against the false positive rate. AUC
                 indicates 0.5 for random prediction and 1 for perfect prediction. Brier score is the mean
                 squared error between the predicted probability and the actual outcome, where a
                 smaller value indicates more accurate prediction. However, these evaluations may not
                 be correct when there are extremely more negative than positive cases, as in this and
                 previous studies (for example, AUC and Brier score are good even when all negative

March 22, 2021                                                                                           4/12
cases are predicted for the data with only 10% positive cases). In this study, we also
                 used the F1 score to evaluate whether the true positives can be classified without
                 considering the true negatives. The F1 score is expressed as F1score = (2 × Precision ×
                 Recall) / (Precision + Recall), where the Recall is defined as the ratio of the sum of
                 true positives and true negatives to the number of true positives (the true-positive rate),
                 and the Precision is defined as the ratio of the sum of true positives and true negatives
                 to false positives. In this index, only true positives are evaluated, not true negatives. To
                 compare F1 scores among the various classifiers for testing our hypothesis (other AUC
                 and Brier scores are shown only as references), since the hypothesis of homogeneity of
                 variances between methods was not rejected with Levene’s test, a one-way analysis of
                 variance was performed. As a post-hoc comparison, Tukey’s test was used within the
                 factor where a significant effect in one-way analysis of variance was found. Furthermore,
                 the contribution of the input variables to the prediction of the VDEP method was
                 calculated by SHAP (SHapley Additive exPlanations) [28], which utilizes an
                 interpretable approximate model of the original nonlinear prediction model.
                     For the evaluation of defense using the VDEP and VAEP values [9], we present
                 examples to quantitatively and qualitatively evaluate a game and a season of a specific
                 team. Next, we examined the relationships with the outcomes of actual games (goals
                 scored, conceded, and winning points, where win, draw, and lose were assigned as 3, 1,
                 and 0 points, respectively) and the relationship with the team results throughout the
                 season using the Pearson’s correlation coefficient among all 18 teams.
                     For all statistical analysis, p < 0.05 was considered significant. However, since the
                 sample size was small (N = 18) in the correlation analysis, the r value indicating the
                 magnitude of the correlation was also used as an effect size for evaluation. As described
                 in a previous study [29], correlation coefficients of less than 0.20 were interpreted as
                 slight almost negligible relationships, correlations of 0.20 to 0.40 as low correlation;
                 correlations of 0.40 to 0.70 as moderate correlation; 0.70 to 0.90 as high correlation and
                 correlation greater than 0.90 as very high correlation. In this study, the correlation
                 coefficients were rounded off to the third decimal place for interpretation. All statistical
                 analyses were performed using SciPy in the Python library.

                 Results
                 Validation of Classifiers
                 To validate the VDEP and VAEP [9] methods, we first investigated the prediction
                 performances of their classifiers. In Table 1, the classifiers of VDEP shows more
                 accurate predictions compared to those of VAEP [9] (note that the output and number
                 of occurrences to be predicted are different). The AUCs of Rrecoveries and Rattacked in
                 VDEP were better than those of Sscores and Sconcedes in VAEP, and vice versa in
                 regard to the Brier scores. However, again, these indices may not be validly evaluated
                 because they include a large number of true negatives in the evaluation (thus, we did
                 not perform statistical analysis in these variables). Instead, the F1 score was calculated,
                 and the statistical analysis identified significant main effect among Rrecoveries ,
                 Rattacked , and Sscores (F = 144.40, p < 1.0 × 10−6 ; Sconcedes was eliminated because of
                 the average is near zero value). The post-hoc analysis shows that F1 scores of VDEP
                 (Rrecoveries , Rattacked ) were significantly higher than that of Sscores (ps < 0.002). This
                 indicates that the VDEP method predicted true positives correctly, while the VAEP did
                 not.
                     Next, the contribution of the input variables to the prediction of the VDEP method
                 was calculated by SHAP [28]. For Rrecoveries , in Fig 1, the distance to the ball of the
                 defender closest to the ball had the highest contribution, followed by the events where

March 22, 2021                                                                                         5/12
Table 1. Evaluation of classifiers for the proposed and conventional        methods.
                                              AUC        Brier score           F1 score
                             Rrecoveries  0.770 ± 0.014 0.184 ± 0.009       0.522 ± 0.036
                             Rattacked    0.862 ± 0.003 0.079 ± 0.003       0.484 ± 0.038
                             Sscores [9]  0.698 ± 0.066 0.007 ± 0.002       0.201 ± 0.021
                            Sconcedes [9] 0.701 ± 0.040 0.003 ± 0.001       0.000 ± 0.000

                                 Fig 1. Contribution of the input variables to the prediction of Precoveries .
                                 The input variables related to the prediction of Precoveries are presented in the order of
                                 their contributions. Of the top 20 features, those at the top had greater contribution
                                 than those at the bottom. Each dot represents each event. The color represents the
                                 value of the feature (blue and red indicate low and high, respectively). The horizontal
                                 axis shows the impact on the prediction (strongly positive and negative impacts are
                                 plotted to the right and left, respectively). For example, when the value of type foul a1
                                 is 1, the prediction is likely to be zero.

                                 there was an offensive or defensive change immediately beforehand. For Rattacked , in
                                 Fig 2, the x-coordinate of the attacker closest to the ball (in the direction of the goal)
                                 and the displacement of the attacker from the beginning to the end of the action had
                                 the largest contribution, followed by the distance to the ball of the defender closest to
                                 the ball.

                                 Examples of Team Defense Evaluation
                                 Evaluation of a defensive play
                                 An advantage of the VDEP method is the ability to show the effectiveness of the
                                 formation of the defending team against the attacking team at a particular moment in
                                 the game. For example, with the use of VDEP for a goal conceded, it can be easily
                                 understood where the factor of the goal is placed in the series of events. As an example,
                                 consider the first goal in the match between Yokohama F. Marinos and FC Tokyo
                                 shown in Fig 3. A positive VDEP value can be interpreted as a good defense and a
                                 negative value as a bad defense. In this example, the VDEP values were positive in all

March 22, 2021                                                                                                        6/12
Fig 2. Contribution of the input variables to the prediction of Pattacked . The
                 input variables related to the prediction of Pattacked are shown in the order of their
                 contribution. The configuration is the same as in Fig 1. For example, when the value of
                 offense x1 a0 is positively and negatively large, the prediction value is also likely to be
                 positively and negatively large, respectively.

                 events, indicating that the defense was not so bad that the goal was conceded. However,
                 to be precise, the VDEP values decreased between Matsubara’s pass and Erik’s trap,
                 and between Erik and Wada’s trap and pass, suggesting that the goal was conceded
                 because of a forward pass or because the ball holder was allowed to go free.

                 Evaluation of a game
                 Since it is sometimes difficult to score goals in soccer, the team that dominates the
                 game does not always win. Therefore, to continuously strengthen the team, it is
                 necessary to analyze the game regardless of the immediate outcomes. The VDEP
                 method is expected to be used as a more stable evaluation index than wins and losses
                 which are limited by contingent factors.
                    For example, in the match between Yokohama F. Marinos and FC Tokyo, Yokohama
                 won the match by a score of 3 to 0. We examined the reasons for the unexpectedly large
                 gap in the matchup of the two top teams (the numbers of shots taken by both teams
                 were the same in the game). Although Rrecoveries for Yokohama (0.371) was better
                 than that for Tokyo (0.348), Rattacked and Rvdep for Tokyo (0.116 and 0.049) were
                 better than that for Yokohama (0.159 and -0.040). These indicate that Tokyo’s defense
                 made it difficult for Yokohama to score goals. As in this game, there are cases where
                 the evaluation results do not match the game outcome even if the defensive evaluation is
                 good, due to the quality of shots taken by the attackers (note that the proposed method
                 did not reflect how likely an effective attack is to score). Thus, the use of the VDEP
                 method to quantitatively evaluate the defense of each match will allow for a more
                 detailed analysis than wins and scores.
                    Statistically, correlation analysis was performed between the outcome of the game
                 and the proposed and existing indices (analyzed data is given in S1 Data). In the case
                 of Rvdep , there were moderate positive correlations with winning points

March 22, 2021                                                                                         7/12
Fig 3. Example of defensive play analysis. (A) The VDEP value for each event is
                 indicated, including the type of action, and the player who took the action. (B) The
                 position of all players when the shot was performed are visualized (red: defending team;
                 blue: attacking team) and the flow of the event with the ball. In this scene, the VDEP
                 values were positive in all events, suggesting that the defense was not so bad that the
                 goal was conceded. The VDEP values decreased between Matsubara’s pass and Erik’s
                 trap, and between Erik and Wada’s trap and pass, suggesting that the goal was
                 conceded because of a forward pass or because the ball holder was allowed to go free.

                 (r16 = 0.464, p = 0.050) and low positive correlation with goals scored
                 (r16 = 0.392, p = 0.106). In the case of Svaep , there were high positive correlation with
                 winning points (r16 = 0.830, p < 0.001) and very high positive correlation with goals
                 scored (r16 = 0.953, p < 0.001). It is obvious that Svaep can accurately predict the
                 number of goals scored in a match because it is based on the prediction of scores.
                 Interestingly, even though Svaep is also based on the prediction of conceded goals, it had
                 slight almost negligible relationships with goals conceded (r16 = −0.040, p > 0.05). On
                 the other hand , Rvdep had low correlation with the goals scored in the game
                 (r16 = −0.245, p > 0.05).

                 Defensive evaluation of teams in multiple games
                 It is also possible to characterize and evaluate team defenses throughout a season using
                 the VDEP values in multiple games. Fig 4 shows the average VDEP values for each
                 team. For example, Yokohama was able to defend with a high probability of recovering
                 the ball, suggesting the probability of a high number of goals (see S1 Data). On the
                 other hand, the probability of being attacked was also high, suggesting that the team
                 adopted a high-risk, yet high-return, defensive tactic. Meanwhile, Hiroshima that had
                 the fewest number of goals conceded in the league (see S1 Data), shows high probability
                 of ball recovery and low probability of being attacked, suggesting that these properties
                 led to the low number of goals conceded.
                     Statistically, we performed the correlation analysis between the team’s performance
                 over the whole season and the evaluation indices (the data is shown in S1 Data). Rvdep
                 had moderate positive correlations with winning points (r16 = 0.397, p = 0.103), and
                 low correlation with goals scored (r16 = 0.342, p = 0.162) and goals conceded
                 (r16 = −0.291, p = 0.239) . Meanwhile, Svaep had moderate positive correlation with
                 goals scored (r16 = 0.497, p = 0.034), but slight almost negligible relationships with

March 22, 2021                                                                                        8/12
winning points (r16 = 0.177, p > 0.05) and goals conceded (r16 = −0.098, p > 0.05). In
                 the case of VDEP, the correlation coefficients with the game performances and those
                 with the entire season were similar, whereas, in VAEP, the associations were very
                 different.

                 Fig 4. Defensive evaluation of teams in multiple games. The vertical axis is
                 Rattacked and the horizontal axis is Rrecoveries . The vertical and horizontal lines are the
                 averaged values of Rattacked and Rrecoveries among all teams, respectively. The more
                 points plotted to the right, the more likely the defense is to recover the ball, and the
                 more points plotted below, the less likely the defense is to concede. The black line is the
                 league average. For example, Yokohama defended with a high probability of recovering
                 the ball. On the other hand, the probability of being attacked was also high, suggesting
                 that the team adopted a high-risk yet high-return defensive tactic.

                 Discussion
                 In this study, we proposed a method to comprehensively evaluate a team’s defense
                 related to the team’s performance, based on the prediction of ball recovery and being
                 attacked, which occur more frequently than goals, using player actions and positional
                 data of all players and the ball. First, we verified the proposed and existing indices
                 based on the prediction performance. Second, we quantitatively analyzed the defensive
                 evaluations of the proposed and existing methods. Finally, we discuss the limitations of
                 the proposed methods and future perspectives.
                     The proposed VDEP and VAEP [9] evaluate players and teams based on the
                 assumption that the prediction is accurate. To validate the classifiers, the previous
                 study [9] used AUC and Brier scores. However, as mentioned above, these indices may
                 not be reliably evaluated because they include a large number of true negatives.
                 Therefore, we computed the F1 score and the results showed that the VDEP method
                 predicted true positives correctly, while the VAEP did not. This suggests that the
                 VDEP method was a reliable method that can evaluate defensive performances based on
                 accurate predictions.
                     Regarding the team evaluations using the proposed and existing indices, the
                 correlation analysis revealed moderate positive correlation between the season outcome

March 22, 2021                                                                                          9/12
(winning points) and the proposed VDEP value, whereas there were strong positive
                 correlations between the game outcome (winning points and goals scored) and the
                 existing VAEP value [9]. Furthermore, overall, in the VDEP value, the correlation
                 coefficients with the analyzed game performances and those with the entire season were
                 similar, whereas those of the VAEP value were very different. These results suggest that
                 Rvdep could be a well-balanced indicator to evaluate both attacks (after the ball
                 recovery) and defense itself (prevention of being attacked and the ball recovery). On the
                 other hand, the VAEP method [9] is based on the prediction of offensive play and shows
                 no correlation with the goals conceded. We expect that the use of VDEP in addition to
                 the various indicators used so far will lead to the continuous strengthening of the team,
                 regardless of immediate wins and losses which would be associated with contingent
                 factors.
                     There are several recommendations for future studies. The first is the increase in the
                 number of analyzed games for more accurate prediction of longer-term game
                 performances. The second is the determination of the weighting constant C in Equation
                 1 for ball recovery and being attacked. Although this study determined C based on the
                 number of occurrences of both events, the constant should be determined in more
                 suitable ways for the practical values in soccer. The last is the evaluations of individual
                 players. Since VDEP evaluates team defense, it is difficult to evaluate the performance
                 of individuals. For example, future studies are necessary to compute the change in
                 VDEP when a player moves in different directions.

                 Supporting information
                 S1 Data.     Analyzed data.

                 Acknowledgments
                 The data used in this research was provided by the Research Center for Medical and
                 Health Data Science in the Institute of Statistical Mathematics and Data Stadium Inc.
                 We would like to thank Atom Scott and Masaki Onishi for their valuable comments on
                 this work. This work was supported by JSPS KAKENHI (Grant Numbers 19H04941,
                 20H04075, 20H04087) and JST Presto (Grant Number JPMJPR20CA).

                 References
                    1. Andrzejewski M, Chmura J, Pluta B, Strzelczyk R, Kasprzak A. Analysis of
                       sprinting activities of professional soccer players. The Journal of Strength &
                       Conditioning Research. 2013;27(8):2134–2140.
                    2. Andrzejewski M, Chmura J, Pluta B, Konarski JM. Sprinting activities and
                       distance covered by top level Europa league soccer players. International Journal
                       of Sports Science & Coaching. 2015;10(1):39–50.
                    3. Fujii K. Data-driven Analysis for Understanding Team Sports Behaviors. arXiv
                       preprint arXiv:210207545. 2021;.
                    4. McHale I, Scarf P. Modelling soccer matches using bivariate discrete distributions
                       with general dependence structure. Statistica Neerlandica. 2007;61(4):432–445.
                    5. McHale IG, Scarf PA, Folker DE. On the development of a soccer player
                       performance rating system for the English Premier League. Interfaces.
                       2012;42(4):339–351.

March 22, 2021                                                                                       10/12
6. Pappalardo L, Cintia P, Ferragina P, Massucco E, Pedreschi D, Giannotti F.
                     PlayeRank: data-driven performance evaluation and player ranking in soccer via
                     a machine learning approach. ACM Transactions on Intelligent Systems and
                     Technology (TIST). 2019;10(5):1–27.
                  7. Decroos T, Davis J. Player vectors: Characterizing soccer players’ playing style
                     from match event streams. In: Joint European Conference on Machine Learning
                     and Knowledge Discovery in Databases. Springer; 2019. p. 569–584.
                  8. Van Roy M, Robberechts P, Decroos T, Davis J. Valuing on-the-ball actions in
                     soccer: a critical comparison of XT and VAEP. In: Proceedings of the AAAI-20
                     Workshop on Artifical Intelligence in Team Sports. AI in Team Sports Organising
                     Committee; 2020.
                  9. Decroos T, Bransen L, Van Haaren J, Davis J. Actions speak louder than goals:
                     Valuing player actions in soccer. In: Proceedings of the 25th ACM SIGKDD
                     International Conference on Knowledge Discovery & Data Mining; 2019. p.
                     1851–1861.
                 10. Teranishi M, Fujii K, Takeda K. Trajectory prediction with imitation learning
                     reflecting defensive evaluation in team sports. In: IEEE 9th Global Conference
                     on Consumer Electronics (GCCE 2020); 2020. p. 124–125.
                 11. Power P, Ruiz H, Wei X, Lucey P. Not all passes are created equal: Objectively
                     measuring the risk and reward of passes in soccer from tracking data. In:
                     Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge
                     Discovery and Data Mining; 2017. p. 1605–1613.
                 12. Ueda F, Honda M, Horino H. The Causal Relationship between Dominant
                     Region and Offense-Defense Performance - Focusing on the Time of Ball
                     Acquisition. In: Football Science. vol. 11; 2014. p. 1–17.
                 13. Yamamoto Y, Yokoyama K. Common and unique network dynamics in football
                     games. PloS one. 2011;6(12):e29638.
                 14. Yokoyama K, Yamamoto Y. Three people can synchronize as coupled oscillators
                     during sports activities. PLoS Comput Biol. 2011;7(10):e1002181.
                 15. Yokoyama K, Shima H, Fujii K, Tabuchi N, Yamamoto Y. Social forces for team
                     coordination in ball possession game. Physical Review E. 2018;97(2):022410.
                 16. Fujii K, Yoshihara Y, Matsumoto Y, Tose K, Takeuchi H, Isobe M, et al.
                     Cognition and interpersonal coordination of patients with schizophrenia who have
                     sports habits. PLoS One. 2020;15(11):e0241863.
                 17. Llana S, Madrero P, Fernández J, Barcelona F. The right place at the right time:
                     Advanced off-ball metrics for exploiting an opponent’s spatial weaknesses in
                     soccer. In: Proceedings of the 14th MIT Sloan Sports Analytics Conference; 2020.
                 18. Piersma JPT. Valuing Defensive Performances of Football Players. Master Thesis
                     in Erasmus School of Economics. 2020;.
                 19. Robberechts P. Valuing the Art of Pressing. In: StatsBomb Innovation in
                     Football Conference; 2019.
                 20. Taki T, Hasegawa JI. Visualization of Dominant Region in Team Games and Its
                     Application to Teamwork Analysis. In: Proceedings of Computer Graphics
                     International; 2000. p. 227–235.

March 22, 2021                                                                                 11/12
21. Kijima A, Yokoyama K, Shima H, Yamamoto Y. Emergence of self-similarity in
                     football dynamics. The European Physical Journal B. 2014;87(2):41.

                 22. Spearman W. Beyond expected goals. In: Proceedings of the 12th MIT Sloan
                     Sports Analytics Conference; 2018. p. 1–17.
                 23. Fernandez J, Bornn L. Wide open spaces: A statistical technique for measuring
                     space creation in professional soccer. In: Proceedings of the 12th MIT Sloan
                     Sports Analytics Conference; 2018.

                 24. Fujii K, Takeishi N, Hojo M, Inaba Y, Kawahara Y. Physically-interpretable
                     classification of network dynamics for complex collective motions. Scientific
                     Reports. 2020;10(3005).
                 25. Fujii K, Inaba Y, Kawahara Y. Koopman spectral kernels for comparing complex
                     dynamics: Application to multiagent sport plays. In: European Conference on
                     Machine Learning and Knowledge Discovery in Databases (ECML-PKDD’17).
                     Springer; 2017. p. 127–139.
                 26. Fujii K, Kawasaki T, Inaba Y, Kawahara Y. Prediction and classification in
                     equation-free collective motion dynamics. PLoS Computational Biology.
                     2018;14(11):e1006545.

                 27. Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In: Proceedings
                     of the ACM SIGKDD International Conference on Knowledge Discovery & Data
                     Mining; 2016. p. 785–794.
                 28. Lundberg SM, Lee SI. A Unified Approach to Interpreting Model Predictions. In:
                     Advances in Neural Information Processing Systems 30; 2017. p. 4765–4774.

                 29. Guilford JP. Fundamental statistics in psychology and education. McGraw-Hill;
                     1950.

March 22, 2021                                                                                 12/12
You can also read