Using Experts' Opinions in Machine Learning Tasks - arXiv

Page created by Brent Marquez

Sports

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Using Experts' Opinions in Machine Learning Tasks

Amir Fazelinia*, Issa Annamoradnejad*, Jafar Habibi*^
*
Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
^
Email: jhabibi@sharif.edu

Abstract— In machine learning tasks, especially in the tasks of This being said, the problems of data collection are out of the
prediction, scientists tend to rely solely on available historical data scope of this paper, and our focus is to present a framework for
and disregard unproven insights, such as experts’ opinions, polls, building strong ML models using experts’ opinions. The main
and betting odds. In this paper, we propose a general three-step contributions of this study are: 1) to propose a general three-step
framework for utilizing experts’ insights in machine learning tasks framework to employ experts’ viewpoints, 2) apply that
and build four concrete models for a sports game prediction case framework in a case study and build models exclusively based
study. For the case study, we have chosen the task of predicting on experts’ opinions, and 3) compare the accuracy of the
NCAA Men's Basketball games, which has been the focus of a proposed models with some baseline models.
group of Kaggle competitions in recent years. Results highly
suggest that the good performance and high scores of the past For the case study analysis, we selected the task of predicting
models are a result of chance, and not because of a good- results of NCAA Men's Division I Basketball Tournament,
performing and stable model. Furthermore, our proposed models widely known as NCAA March Madness. Started in 1939, it is
can achieve more steady results with lower log loss average (best one of the biggest annual sporting events in the world, consisting
at 0.489) compared to the top solutions of the 2019 competition of 67 single-elimination games across the United States [6]. The
(>0.503), and reach the top 1%, 10% and 1% in the 2017, 2018 and main reason for selecting this was the instability of the
2019 leaderboards, respectively. previously proposed machine learning models.
Keywords— Experts’ opinions; machine learning; sports game The structure of this article is as follows: Section 2 describes
prediction; march madness the related topics of this paper and dives into the past works of
using experts’ opinions. Section 3 explains the general
I. INTRODUCTION framework of the methodology with its concrete applications to
the selected case study. In Section 4, we discuss our
People are engaged in some sort of method to predict the experiments, data, baseline models, and results. Section 5 is the
future for meeting with it more successfully [1], and become concluding remarks.
experts in a subject by possessing sufficient knowledge, skill,
experience, training or education. Many examples from the daily
world events have demonstrated the high accuracy of experts’ II. BACKGROUND
opinions in various aspects of life, from weather forecasts and In this section, first, we will review the literature on using
sport result predictions to market and political analysis. expert insight, and then, we will move on to discussing the case
Using experts’ opinions in Machine Learning (ML) tasks study.
have been neglected by researchers, especially in the tasks of
prediction. This is largely due to a lack of aggregated expert A. Experts’ Opinions
opinions in an objective format (e.g. numerical rather than texts). The best-known method for using experts’ opinions is the
To overcome this, data scientists usually have to make sense of Delphi Technique. Developed for the US Army, the Delphi
textual comments aggregated from different sources. However, Technique is an iterative process to collect and distill
with the latest advancements in Natural Language Processing anonymous judgments of respondents within their domain of
(NLP) and classification algorithms, and the ever-increasing expertise interspersed with feedback [7], [8]. The main
existence of online historical data, it is becoming easier to advantage of the Delphi is reported to be the achievement of
overcome this challenge. For example, it is possible to crawl consensus in a given area of uncertainty or lack of empirical
film critics reviews, and accurately classify these subjective evidence [9]. There is a large corpus of research papers
texts with little training and model adjustments using pre-trained incorporating the Delphi method, mostly in research areas that
models ([2], [3]), or sentence embedding ([4], [5]). appreciate experts’ insights because of uncertainty, such as

healthcare [10]–[12], business [13], [14], law [15], and reflect the predicted margin of victory in a game played on a
education [16], [17]. A recent literature survey [18] shows the neutral court [6].
vast impact of the Delphi Technique by identifying more than
2600 published scholarly papers that applied the technique. The earliest works in predicting the NCAA results (such as
[26], [27]) tried to find a linear relationship between the rank
Experts’ opinions have been partially applied or differences and the probability to win a game. The winner team
acknowledged in the field of machine learning. For example, of the 2014’s Kaggle’s March Madness competition used team-
[19] used opinions of more than 100 experts to determine the based possession metrics to a logistic regression model [28].
most suitable method for nine machine learning tasks. Another [29] applied a classical soccer-forecasting model to make
study [20] considered using the opinions of multiple experts for predictions for the college basketball tournament.
classifying clinical data using machine learning. They used three
expert reviewers and one meta-reviewer to build a consensus [30] designed a new predictive model using matrix
model, which showed improved learning. completion to predict the results of the NCAA tournament. Their
method is based on formulating performance details from
regular season games of the same year into matrix form and
B. Case study: NCAA March Madness Games applying matrix completion to forecast the potential
The NCAA (National Collegiate Athletic Association) hosts performance accomplishments by teams in tournament games.
a college basketball tournament each year, which is the focus of They utilized their work in the 2016 Kaggle competition and
many betting websites. Started in 1939, it is one of the biggest reached 341th out of 596 teams in the leaderboard [31].
annual sporting events in the world. Completing the tournament
requires 67 single-elimination games to be played at 14 different [6] based their work on two famous ranking systems and
sites spread out across the United States, all of which take place proposed a combined model of two methods: the least squares
over a two-and-a-half-week span between mid-March and early pairwise comparison model and isotonic regression. They were
April [21]. able to reach the top 5% in 2016 and 2017 Kaggle competitions.
Other related prediction research papers include [32] that
Traditionally, the tournament included 64 teams to compete applied three variations of linear/polynomial models on
in five rounds of games, however, in 2011; the NCAA expanded Pomeroy ratings to see their accuracy in 2016-2018
it from 64 to 68 teams where the last eight teams play for four tournaments, and [33] which used a dual-proportion probability
spots making the field into 64 before the first round [22]. In the model on a ranking system to produce the bracket predictions.
current format, the so-called march madness tournament holds
67 single-elimination games: 4 in first-four round; 32 in round Finally, some studies focused on the problem from a
one; 16 in round two; 8 in round three; 4 in round four; 2 in round different angle by trying to predict the first round upsets. [24]
five and 1 for the championship. used a Balance Optimization Subset Selection model to
determine games that are statistically similar to historical upsets.
These games are preceded by more than four months (133 They applied their model and achieved a 38.4% accuracy. [34]
days) of regular season games that are played between more than is another study that tried to analyze the apparent anomalies of
330 teams of Division I to determine the winners of the 32 the previous tournaments.
conferences, other qualified teams and their seeding for the next
rounds. The team’s seed (from 1 to 16; 1 being the strongest) is III. METHODOLOGY
assigned by the selection committee and has been considered as
a major predictor of success by past researchers [6]. In this section, we will explain the conceptual framework for
using experts’ opinions, alongside a concrete application to a
Several ranking and rating systems present weekly case study.
estimations for the NCAA basketball league, by taking the latest
performance of teams into account. They are monitored and A. Conceptual framework
presented by experts in the field and are widely considered in
people’s bracket predictions. Large news agencies, such as Our framework to use experts’ opinions is composed of three
Associated Press, USA Today, NCAA, and ESPN display a general steps:
designed or supported ranking system. 1. The first step is to rank experts based on the accuracy of
Pomeroy system is the most popular ranking system for their previous predictions or predictions based on their
NCAA tournaments and is based on some efficiency metrics opinions/ranking/rating. By doing this, we reach a new
calculated per possession [23], [24]. Pomeroy rates both offense ordered list of experts, ER1 , … , ERi , … , ERn, where n is the
and defense on each possession, using efficiency metrics (such number of experts, and i : 1

on the accuracy of expert merging in previous average with exponentially decreasing weights to merge
predictions. This step is formulated as Formula 1: predictions of experts.
E3 model: In this model, we rank experts based on the
(1) = 1≤ ≤ _ ( [ER1 , . . ER ])
accuracy of their predictions in regular season games of the
, where MERGE is a function to build a novel prediction current year. There are around 130 days of regular season games
by combining multiple models, and ACC_PRE is a when the ranking systems update their rankings in a weekly
function to evaluate the accuracy of a prediction in a manner. To be exact, we use the experts’ rankings around the
previous round of predictions (such as year/season). 100th day to predict the remaining games of the regular season.
By evaluating the accuracy of each system, we reach a new
3. The final step is to meld predictions of top-N experts for ranking for experts. Using this new ranking, we select the top N
the target games (final rounds), and evaluate the experts and merge them using a simple average method.
resulting predictions. It can be formulated as Formula 2:
E4 model: This is very similar to the E3 model, except that
(2) = _ ( [ER1 , . . ER ]) it uses a weighted average with exponentially decreasing
weights to merge experts’ predictions.
, in which ACC_CUR is a function to assess the overall
accuracy of a prediction for target/current time range IV. EXPERIMENTS
(year/season). This means that the output of this step can
show the success of the whole method. It should be We selected Kaggle’s 2017 to 2019 March Madness
noted that it is not possible to perform ACC_CUR for a competitions1 as the basis for our evaluation. These
future or ongoing event, as the ground truth have not competitions have attracted hundreds of data science teams since
been revealed; however, the output of MERGE function 2014, with an aim to predict the win probability of every
possible game between the teams of the final rounds. In the
can be used for generating final predictions. current version of the tournament, 68 teams will continue to the
final rounds, which creates 2278 unique possible games.
B. Application to case study
In this section, first, we take a brief look at the data structure
In this section, we present an application of the framework
and evaluation metric. Then, we introduce the baseline models
to predict the outcomes of the NCAA tournaments in recent
used for comparison and discuss the results of our experiments.
years. In total, we propose four models that use experts’ opinions
in the form of ranking/rating systems to predict the outcome of
games of final rounds, commonly known as March Madness. A. Data
Each of these four models follows the three general steps of the We use data provided by Kaggle for the machine learning
conceptual framework: competitions. These data are separated into six sections with
more than 20 tables. The first section provides basic metadata
1. Rank experts based on previous predictions. about teams, such as team names, seeds, and final scores of all
2. Choose optimal N based on previous results. games since 1984-85. The second section of data includes game-
by-game detailed statistics (such as free throws attempted,
3. Merge predictions of the top N experts for the current defensive rebounds, turnovers, etc.) at a team level for all regular
season. season, conference tournaments, and NCAA® tournament
As we briefly noted in Section 2, the NCAA tournament has games since the 2002-03 season. The third section is about
two stages: The first stage is the regular season and the second geographical data and locations of previous games since the
stage is the final knockout rounds. While the first two models 2009-10 season. The fourth section, which is the focus of this
rank experts based on the results of previous season, the third article, provides weekly team rankings for dozens of top rating
and fourth models use regular season results of the same year as systems since the 2002-2003 season. Section five includes
a basis for ranking experts. detailed events of games alongside game-player relations since
2009-10 season, and finally, section six provides supporting data
Here are the descriptions of the four proposed models for the and metadata, such as coaches, conference affiliations, and
case study: bracket structure [35].
E1 model: In this model, we rank experts based on the While competitors build models using tables from all
accuracy of their predictions in the final rounds of the previous sections of the dataset and are encouraged to use external
year tournament. For step three, we merge the predictions of the datasets, our proposed models will only use the data table from
top experts using a simple equal-weight average. Therefore, the section four that contains rankings of teams based on different
optimal N is selected in a way that the arithmetic mean of the ranking systems, such as Pomeroy, Sagarin, RPI, ESPN, etc..
top N experts’ predictions in the previous year achieve the
highest accuracy in that year. The dataset for section four contains seven columns:
Season/year, team ID, system name, ranking day number
E2 model: The model is similar to the E1 model, including relative to start of the season (from 0 to 133, the final pre-
the first step; therefore, the ranking of experts remains the same. tournament rankings), and the overall ranking of the target team
The difference is in the third step, where we use a weighted

1
The official title has changed a few times over the years.

in the underlying system. Most systems provide a complete Table 1. Results for evaluation of the proposed models on three
ranking from #1 through #350+, but sometimes they include ties most recent competitions and a comparison with the baseline
and provide rankings for a smaller set of teams, as with the models.
Associated Press's top 25 [35]. Notes: Each cell in 2019, 2018, and 2017 columns contains log loss
value of the prediction (in the first line) and the rank of the
B. Evaluation prediction in Kaggle’s March Madness competition leaderboard in
the second line.
In Kaggle’s March Madness competitions, each team can
submit up to two files, judged using the Logarithmic loss
NAME DESCRIPTION 2019 2018 2017 MEAN
function (the predictive binomial deviance):
GROUP 1: BASELINE MODELS - KAGGLE 2019’S TOP RANKED TEAMS

1 B1 2019 first rank 0.41477 0.60213 0.49082 0.503
(3) = − ∑( ( ) + (1 − ) (1 − )) 1st/866 349th/934 88th/441
B2 2019 second rank 0.42012 0.85021 1.40911 0.886
=1
2nd/866 808th/934 435th/441
, where is the number of games, is the winning B3 2019 third rank 0.42698 0.60685 0.50452 0.511
probability of team 1 playing against team 2, and 3rd/866 395th/934 150th/441
equals 1 if team 1 wins over team 2 and 0 otherwise. A B1+B2 Ensemble of top three 0.48548 0.58755 0.59708 0.556
smaller value of indicates better performance of +B3 models of 2019 265th/866 154th/934 346th/441
the predictive model [36]. GROUP 2: PROPOSED MODELS SOLELY BASED ON EXPERTS’ OPINIONS
Contrary to several previous research articles on predicting E1 Mean of top experts 0.49388 0.58088 0.49734 0.523
NCAA games that were evaluated based on one or two years of based on previous 329th/866 83rd/934 121th/441
season results
competitions, we will evaluate our proposed method on three
E2 Weighted mean of top 0.47376 0.58750 0.48506 0.515
most recent consecutive seasons of competitions2. This step is
experts based on 191th/866 155th/934 61st/441
taken to prevent over-fit or by-chance models, as it was the case
previous season results
for many if not all winning solutions of Kaggle’s March
E3 Mean of top experts 0.45355 0.58894 0.46441 0.502
Madness competitions. For example, a submission exactly based based on regular season 56th/866 172nd/934 6th/441
on the published model of the first rank solution of the 2019 games of the same year
competition would rank as 349th out of 934 participants and 88th E4 Weighted mean of top 0.45334 0.57293 0.44496 0.491
out of 441 participants in the 2018 and 2017 competitions, experts based on 55th/866 32nd/934 2nd/441
respectively. regular season games of
the same year
C. Baselines GROUP 3: BLEND OF THE TOP MODELS OF THE FIRST TWO GROUPS
In order to evaluate the efficiency and stability of our B1+E2 Ensemble of 2019’s first 0.43231 0.58858 0.47956 0.500
proposed models, we will compare them with the top three rank with E2 7th/866 171th/934 46th/441
models of Kaggle’s 2019 March Madness competition. The B1+E4 Ensemble of 2019’s first 0.41622 0.58546 0.46505 0.489
winner of the competition achieved log loss of 0.41477, gaining rank with E4 2th/866 130th/934 7th/441
proposed models of ours that only depend on the expert’s
the lowest value among 866 teams of data scientists. The next
opinions, as described in Section III.B, and 3) Two ensemble
two teams in line scored 0.42012 and 0.42698 log loss values,
models, as a result of merging models of the first two groups.
respectively (Table 1).
We observe from the results of the baseline models that the
Since the winners of the competition were obliged to share
good performance and high scores of the competition winners
their winning solutions, we used their code to create submissions
are most likely a result of chance, and not of a good-performing
for the 2017 and 2018 competitions. In addition, we used
and stable model. For example, the first baseline model (B1)
ensemble learning to blend the three models to generate new
which has the lowest mean among the four baseline models,
submissions. As a result, the ensemble submission of baselines
achieved much higher log loss values for 2017 and 2018 seasons
for 2018 was stronger compared to the original baseline models.
compared to that of 2019 competition. In addition, no member
Finally, we used a Kaggle feature that allows users to submit and
of the top-10 teams of 2019 have not been awarded any Kaggle’s
evaluate their codes after the competition deadline.
competition medals before; a strange and uncommon incident
that strengthens the premise of chance involvement.
D. Results and discussion
In Table 1, we display the log loss for each method in the The second group models show promising results in sense of
three most recent seasons of NCAA college basketball games. stability and cohesion over the years. We can observe that the
Furthermore, we included the leaderboard rank for participating log loss scores have little variance compared to the baseline
in the corresponding Kaggle’s competition. The table shows models. Even in the case of 2018 season that involved a few
three groups of methods: 1) The first group is for baseline upsets, the loss value for the proposed models are closer to the
models, containing the top-3 models of 2019 Kaggle rest. The stability of the proposed models, as evident in Figure
competition and an ensemble of the three models, 2) The four 1, is a great outcome considering that the scores of the previous

2
Due to the spread of COVID-19, the NCAA canceled the 2020
tournament.

Figure 1. Results of evaluation for the three groups of models

methods vary from season to season, meaning that some seasons Linguistics: Human Language Technologies - Proceedings of the
are more predictable than others [6]. Conference, 2019.
Since the first group methods used other data than experts’ [3] I. Annamoradnejad, M. Fazli, and J. Habibi, “Predicting Subjective
opinions and showed a great loss in some cases, we considered Features from Questions on QA Websites using BERT,” in 2020 6th
merging them with the models of the second group. This step is International Conference on Web Research, ICWR 2020, 2020, doi:
taken, first, to keep the stability of the second group models, and 10.1109/icwr49608.2020.9122318.
second, to use the knowledge from other data sources. To this
[4] N. Reimers and I. Gurevych, “Sentence-BERT: Sentence
end, we used the first baseline model since it scored better than
embeddings using siamese BERT-networks,” in EMNLP-IJCNLP
the rest in all three years, and merged it with one model using
the previous season results and one model using the regular 2019 - 2019 Conference on Empirical Methods in Natural
season games. The idea of blending was a success and we Language Processing and 9th International Joint Conference on
achieved lower loss values by keeping some stability over the Natural Language Processing, Proceedings of the Conference,
years. For example, ensemble of 2019’s first rank model with 2020, doi: 10.18653/v1/d19-1410.
the fourth proposed model would rank as 2nd, 130th and 7th in the [5] Issa Annamoradnejad, “ColBERT: Using BERT Sentence
2019, 2018 and 2017 Kaggle competitions, respectively. Embedding for Humor Detection,” arXiv 2004.12765, 2020.
This experiment shows that by incorporating experts’ [6] A. Neudorfer and S. Rosset, “Predicting the NCAA basketball
opinions, whether by ranking experts based on the previous tournament using isotonic least squares pairwise comparison
season results or the regular season of the same year, we can model,” J. Quant. Anal. Sport., vol. 14, no. 4, pp. 173–183, 2018,
improve the predictive accuracy of the baseline models. This doi: 10.1515/jqas-2018-0039.
result can be applied to other areas of machine learning, where
previous knowledge of experts exists. [7] N. Rescher, Predicting the Future: An Introduction to the Theory of
Forecasting. 1998.

V. CONCLUSION [8] C. C. Hsu and B. A. Sandford, “The Delphi technique: Making
sense of consensus,” Pract. Assessment, Res. Eval., vol. 12, no. 10,
The theoretical framework established in the current study
pp. 1–8, 2007.
can be used for the tasks of machine learning to build stable and
good performing models based on experts’ viewpoints. Our [9] P. Catherine, “The Delphi technique: myths and realities.,” J. Adv.
experimental results on the selected case study show that by Nurs., vol. 41, no. 4, pp. 376–382, 2003, [Online]. Available:
incorporating experts’ opinions in the specified methods, we can http://www.embase.com/search/results?subaction=viewrecord&fro
improve the predictive accuracy of existing models. The m=export&id=L36478234.
framework can be applied to all areas of machine learning, [10] K. Fujimoto, M. Cao, L. M. Kuhns, D. Li, and J. A. Schneider,
where previous knowledge of experts exists.
“Statistical adjustment of network degree in respondent-driven
sampling estimators: Venue attendance as a proxy for network size
among young MSM,” Social Networks, vol. 54. pp. 118–131, 2018,
REFERENCES doi: 10.1016/j.socnet.2018.01.003.
[11] Y. Geng, L. Zhao, Y. Wang, Y. Jiang, K. Meng, and D. Zheng,
[1] P. Twitchell, The Eck-Vidya: Ancient Science of Prophecy. “Competency model for dentists in China: Results of a Delphi
Illuminated Way Press, 1972. study,” PLoS One, 2018, doi: 10.1371/journal.pone.0194411.
[2] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre- [12] A. Bessa et al., “Consensus in Bladder Cancer Research Priorities
training of deep bidirectional transformers for language Between Patients and Healthcare Professionals Using a Four-stage
understanding,” in NAACL HLT 2019 - 2019 Conference of the Modified Delphi Method,” European Urology. 2019, doi:
North American Chapter of the Association for Computational 10.1016/j.eururo.2019.01.031.

[13] E. Hurmekoski, M. Lovrić, N. Lovrić, L. Hetemäki, and G. Winkel, tournament upsets using Balance Optimization Subset Selection,” J.
 “Frontiers of the forest-based bioeconomy – A European Delphi Quant. Anal. Sport., vol. 13, no. 2, pp. 79–93, 2017, doi:
 study,” For. Policy Econ., 2019, doi: 10.1016/j.forpol.2019.03.008. 10.1515/jqas-2016-0062.
[14] P. Rikkonen, P. Tapio, and H. Rintamäki, “Visions for small-scale [25] J. Sagarin, “Jeff Sagarin Ratings,” USA Today, 2020.
 renewable energy production on Finnish farms – A Delphi study on https://www.usatoday.com/sports/ncaab/sagarin/ (accessed Aug. 10,
 the opportunities for new business,” Energy Policy, 2019, doi: 2020).
 10.1016/j.enpol.2019.03.004. [26] B. P. Carlin, “Improved NCAA Basketball Tournament Modeling
[15] R. Bernal, L. San-Jose, and J. L. Retolaza, “Improvement actions via Point Spread and Team Strength Information,” Am. Stat., 1996,
 for a more social and sustainable public procurement: A Delphi doi: 10.2307/2685042.
 analysis,” Sustain., vol. 11, no. 15, pp. 1–15, 2019, doi: [27] N. C. Schwertman, K. L. Schenk, and B. C. Holbrook, “More
 10.3390/su11154069. Probability Models for the NCAA Regional Basketball
[16] J. M. Swank and A. Houseknecht, “Teaching Competencies in Tournaments,” Am. Stat., 1996, doi:
 Counselor Education: A Delphi Study,” Couns. Educ. Superv., 10.1080/00031305.1996.10473539.
 2019, doi: 10.1002/ceas.12148. [28] M. J. Lopez and G. J. Matthews, “Building an NCAA men’s
[17] I. Mat Nashir, A. Yusoff, M. Khairudin, M. R. Idris, and N. N. basketball predictive model and quantifying its success,” J. Quant.
 Imam Ma’arof, “Delphi Method: the Development of Robotic Anal. Sport., 2015, doi: 10.1515/jqas-2014-0058.
 Learning Survey in Tertiary Education,” J. Vocat. Educ. Stud., vol. [29] F. J. R. Ruiz and F. Perez-Cruz, “A generative model for predicting
 2, no. 1, p. 13, 2019, doi: 10.12928/joves.v2i1.761. outcomes in college basketball,” J. Quant. Anal. Sport., 2015, doi:
[18] A. Flostrand, L. Pitt, and S. Bridson, “The Delphi technique in 10.1515/jqas-2014-0055.
 forecasting– A 42-year bibliographic analysis (1975–2017),” [30] H. Ji, E. O. Saben, A. Boudion, and Y. Li, “March Madness
 Technol. Forecast. Soc. Change, vol. 150, no. January 2018, p. Prediction : A Matrix Completion Approach,” 2014.
 119773, 2020, doi: 10.1016/j.techfore.2019.119773.
 [31] H. Ji, E. O. Saben, R. Lambi, and Y. Li, “Matrix Completion Based
[19] V. Moustakis, M. Lehto, and G. Salvendy, “Survey of Expert Model V2 . 0 : Predicting the Winning Probabilities of March
 Opinion: Which Machine Learning Method May Be Used for Madness Matches.”
 Which Task?,” Plast. Rubber Compos. Process. Appl., vol. 8, no. 3, [32] S. Downs, “Using Statistics to Create the Perfect March Madness
 pp. 221–236, 1996, doi: 10.1080/10447319609526150. Bracket,” pp. 0–33, 2019.
[20] H. Valizadegan, Q. Nguyen, and M. Hauskrecht, “Learning [33] A. A. Gupta, “A new approach to bracket prediction in the NCAA
 classification models from multiple experts,” J. Biomed. Inform., Men’s Basketball Tournament based on a dual-proportion
 vol. 46, no. 6, pp. 1125–1135, 2013, doi: 10.1016/j.jbi.2013.08.007. likelihood,” J. Quant. Anal. Sport., vol. 11, no. 1, pp. 53–67, 2015,
[21] T. (Peculiar) economics of ncaa BaskeTBall and T. A. McFall, The doi: 10.1515/jqas-2014-0047.
 (peculiar) Economics of NCAA Basketball. 2014.
 [34] D. L. Zimmerman, N. D. Zimmerman, and J. T. Zimmerman,
[22] D. Wilco, “When did March Madness expand to 68 teams?,” NCAA, “March Madness ‘Anomalies’: Are They Real, and If So, Can They
 2019. https://www.ncaa.com/news/basketball-men/article/2019-01- Be Explained?,” Am. Stat., vol. 0, no. 0, pp. 1–10, 2020, doi:
 09/when-did-march-madness-expand-68-teams (accessed Aug. 10, 10.1080/00031305.2020.1720814.
 2020). [35] Kaggle, “Google Cloud & NCAA® ML Competition 2018-Men’s,”
[23] K. Pomeroy, “KenPom Ratings methodology update,” 2016. 2018. https://www.kaggle.com/c/mens-machine-learning-
 https://kenpom.com/blog/ratings-methodology-update/ (accessed competition-2018/ (accessed Aug. 10, 2020).
 Aug. 10, 2020). [36] “Pattern Recognition and Machine Learning,” J. Electron. Imaging,
[24] S. Dutta, S. H. Jacobson, and J. J. Sauppe, “Identifying NCAA 2007, doi: 10.1117/1.2819119.

You can also read