Machine Learning Based Prediction of League of Legends Matches

Page created by Donna Nichols

Uncategorized

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Machine Learning Based Prediction
of League of Legends Matches
Angela Kim Afsheen Ghorashy Paul van der Vecht
Systems Design Engineering Systems Design Engineering Systems Design Engineering
University of Waterloo University of Waterloo University of Waterloo
aylkim@uwaterloo.ca arghorashy@uwaterloo.ca pevander@uwaterloo.ca

Abstract – League of Legends is an online video game 1.2 Problem
composed of real time battles lasting ~30 minutes, played Although it is a short phase, strategic avatar selection
between two teams of 5 players each. Each player controls is essential to a team’s success. The avataristics of the
a single avatar for the duration of the game; these avatars avatars are such that some work well together on a team,
are selected in a 5 minute, turn based phase that occurs some are strong against certain opponents, and some are
before the battle phase. After all avatars are selected, weak against others. The outcome of the chaotic battle
players have the option to concede the game before the phase is difficult to predict, but it is hypothesized that there
battle phase begins. When deciding whether or not to is a strong correlation between the composition of a team
concede, players require an indication of whether or not with respect to its opponent’s and its likelihood of winning.
their team composition is favoured to win against the
opponent’s. Current solutions either rely on user- After the avatars have been chosen, players are given
submitted rules of thumb, or do not fully consider the the option to concede the game before the battle phase
interactions of avatars with one another in a game. The begins. The conceding player is assigned a penalty to their
proposed system uses game data and ensemble decision overall rating, but this penalty is less than that incurred by
tree machine intelligence techniques to provide an playing the game and losing. Given that playing a game
indication of the likelihood of a given team winning against where the odds of winning are small can be frustrating,
another. It was determined through experimentation that time consuming, and result in a larger penalty than
ensemble decision trees were more effective than decision conceding, it is desirable for players to know if they are
trees or ensemble support vector machine for this task. favoured to win before the battle phase begins.
Keywords: League of Legends, MOBA, Team
Composition, Machine Intelligence, Ensemble, Decision 1.3 Current solutions
Tree, Support Vector Machine
Players currently rely on a variety of tools when
1 Introduction determining the odds of their team winning. Players
frequently copy the compositions of teams chosen by
1.1 Background professional players [2], believing these compositions to
have high likelihoods of winning. However, doing so does
League of Legends (LoL) is an online video game in not take into account the opposing team’s composition.
the Multiplayer Online Battle Arena (MOBA) genre that Players consult summary statistic pages, which show the
was released by developer Riot Games in 2009. The game avatars with the highest win percentages [3]. However, this
is extremely popular, having roughly 32 million active method does not consider an avatar’s interaction with other
players [1]. It is also competitive; $5 million USD were avatars. One tool [4] compares the attributes of two teams’
awarded in tournament prizes in 2012 [1], and broadcasts compositions, but is based on subjective weightings and
of games between professional teams are watched by tens does not provide insight as to whether or not the matchup is
of thousands of fans [2] each week. favourable. Another tool [5] lists the avatars that a given
avatar is weak against, strong against, and works well with.
Games are played by two teams of 5 players each, However, these pair-wise assessments do not consider the
with every player commanding a single avatar for the performance of the team as a whole, and rely on user
duration of the game. The game is played in two phases: voting for data. These tools are useful when selecting
avatar selection and battle. The battle phase takes place in avatars, but they do not provide a reliable answer to the
real time, lasts roughly 30 minutes, and involves complex question of whether or not a given team composition is
interactions between the avatars and their environment. It likely to beat another.
is preceded by avatar selection, a 5 minute process wherein
the teams take turns choosing their avatars from a pool of
110 unique avatars.

1.4    Proposed solution                                          a team rating for each of the 9 avatar attribute categories.
                                                                  This is done by summing the ratings in a given category for
      Records of each player’s games, including team              each selected avatar on a team. The problem was thus
composition and outcome data for each game, are available         simplified; instead of comparing the combinations of
online [3].    The proposed solution applies machine              avatars selected by two teams against each other, the
intelligence (MI) techniques to these data to predict, given      ratings of two teams in 9 categories could be compared
the team compositions in a game, which team is most likely        instead.
to win.
                                                                        Due to the nature of the rating system, the number of
2     Related Works                                               inputs to the system needed to be increased to 17 per team
                                                                  (9 categories, each of which may take on a variety of
      Research efforts suggest that MI techniques applied to      values, with 17 possible values across all categories).
MOBA avatar selection are still in their infancy.                 Although this increases the number of inputs to the system
Surprisingly, it has been noted that the most commercially        from 10 to 34, it also causes a decrease in the problem’s
successful computer games have received the least                 complexity. This is because multiple team compositions
attention and benefits of machine learning techniques. This       may be assigned very similar ratings, which results in those
is in contrast to several traditional games, such as              compositions clustering together. The number of these
Backgammon and Chess, which have been rigorously                  clusters in the problem space is much smaller than the
analyzed. MI techniques have been widely employed to              number of possible combinations of avatars, thus
train some of the top players of these games [6]. The             simplifying the problem.
increasing complexity of modern computer games, driven
by the need for games that are more entertaining and                    After initial trials with the proposed system, the
challenging, imposes new technical challenges and                 problem was further simplified through a principal
constraints which have not yet been addressed by current          component analysis (PCA). Of the 34 eigenvalues (17
MI research efforts [6].                                          inputs per team with 2 teams), the 8 least significant
                                                                  components were seen to be 5 orders of magnitude smaller
      A general investigation into machine learning               than the others. These components were discarded in order
techniques implemented for computer games revealed that           to reduce the computation time of the algorithms (which,
reinforcement learning techniques, such as q-learning and         even after the PCA, was considerable).
TD-learning, are popular choices in developing a
framework for understanding the optimality of various
                                                                  3.2    Candidate techniques
strategies in role-playing games, as well as Poker and
Backgammon [6], [7], [8].                                               Q-learning, given its popularity as an MI technique
                                                                  for game analysis, was initially considered for use in
     Relevant MI research conducted within the game               analyzing LoL. However, Q-learning is typically used for
genre of MOBAs and massively multiplayer online role-             selecting strategies; the problem at hand is one of binary
playing games (MMORPGs) study specific games with                 classification, so decision trees (DT) and support vector
appropriate machine learning techniques [9]. Given the            machines (SVM) were considered instead. A lack of
diverse range of game rules and strategies within these           existing research in this field yielded no compelling reasons
genres, translating the results of these previous efforts to      to favour or reject the usage of these specific techniques.
LoL is not practical.                                             However, it was reasoned that decision trees are the mental
                                                                  analogue of the intuitive, heuristic approach that many
3     Proposed Technique                                          expert LoL players utilize for their own decision making
                                                                  processes, and would naturally suit the creation of a formal,
3.1    Problem simplification                                     computational method intended to replicate a similar
                                                                  process. SVM was selected based on its reputation as a
     LoL has a pool of 110 unique avatars that the 10             high-performance binary classifier, with reported “state-of-
players in a game can choose from. Since there can be no          the-art performances" in a variety of applications [10].
more than one copy of a given avatar in a game, there are
(110! / 100!) = 1.7x1020 possible avatar combinations. In               It is difficult to anticipate which technique will
order to reduce the complexity of the problem, the avatars        perform best; research comparing SVM to machine
were each assigned ratings in 9 categories.            These      learning techniques such as decision trees and naïve Bayes
categories were chosen by an expert player (rated in the top      found no statistically significant difference in performance
5% of all North American players) to capture the most             over several tests using datasets from a variety of
important attributes of an avatar. The ratings for each           applications [11], [12].
avatar in each category were also chosen by the expert
player, who attempted to capture the unique characteristics            To enhance the performance of each technique, a type
of each avatar. With this avatar attribute rating system, it is   of ensemble method called bootstrap aggregating (or
possible to convert a list of 5 avatars chosen by a team into     bagging) was used for in the decision trees and SVM

implementations. The added procedure of bagging, and should not be rejected. It is nonetheless true that a method
ensemble methods in general, are well known methods of with results so close to 50% requires more work before the
substantially reducing classification error in a wide range of results are truly compelling.
applications [13] .

4 Experiments and Results
4.1 Data collection
A web scraper was developed to obtain the following
game data from the online database: the composition of
each team in a game; which team won; and the skill ratings
of the players on each team. The scraper obtained these
data for 150,000 ranked games. In ranked games, players
compete to improve their individual ranking by winning
games; players typically try their best to win ranked games,
resulting in game data that best represents the interactions
between two opposing team compositions. The 150,000
game dataset was effectively doubled to 300,000 by
teaching the algorithm that if team A wins against team B,
then team B loses against team A.

In an effort to reduce noise, only 20,000 of the Figure 1: Classification error from trials with decision trees.
300,000 data were used for the purposes of training and
testing. The selected games were all played by highly 4.3 Data dichotomy
ranked players (rating >1300); it is hypothesized that the
There are many factors that contribute to the triumph
intricacies of the interactions between avatars become more
of one team over another. These factors include team
pronounced if the avatars are controlled by highly skilled
composition relative to the opposing team’s composition,
players, thus reducing noise in the data due to player skill.
how familiar players are with the avatars that they have
The selected games were all played on the same version of
selected, how well the players on a team cooperate,
the game; bi-weekly balance updates released by the game
individual player skill and experience, player mood, player
developers adjust the avatar’s abilities and attributes, which
health, player internet connection strength, etc. It is
can influence the interactions between avatars.
believed, however, that the nature of the game is such that,
Each of the three MI techniques (DT, ensemble DT, regardless of the other factors present in a game, a team
and ensemble SVM) that were applied to solving the composition may be chosen that has a small probability of
problem were tested for suitability. Of the 20,000 data, winning against a given opposing team’s composition.
15,000 were used for training and 5,000 were used for Such matchups can be considered to be “unbalanced” in
testing. This 75% training ratio is similar to those used in terms of team composition. In these unbalanced situations,
other MI applications [14]. it is expected that an MI algorithm will yield an error rate
that is much lower than 50%.
4.2 Decision trees There are also situations where team composition
A standard DT implementation yielded the error rates plays only a small role in determining which team wins.
shown in Figure 1 in trials with training datasets of varying These are situations where the players on both teams have
size. For an increasing number of training data points, 20 done a good job of choosing a team that is evenly matched
decision trees were built and tested against the same 5000 against the opponents’ composition – such matchups can be
test points. The graph shows the average error rate for all considered to be “balanced” in terms of team composition.
20 trees. This test was repeated three times, each with Consider the fact that the game data were taken from
results that are similar to those in Figure 1. ranked games; in these games, a player’s ranking goes up
or down depending on whether they win or lose. Players
In general, the system does marginally better than tend to be quite passionate about their ranking;
50% on average. This trend – which is always below 50%, consequently, the players in the games scraped for this
and always decreases with datasets of increasing size – is project can reasonably be assumed to have been trying to
consistent across multiple iterations of the test, each of win. Based on their experience, then, the teams will each
which features averaging over multiple trees. Although the seek a team composition that maximizes their chance of
error rates are very close to 50%, the consistency of the winning against the opponent’s; the result of this is likely
trend suggests that the hypothesis of a correlation between that the teams will tend to select compositions that are
team composition and the likelihood of a team winning roughly evenly matched. It is therefore proposed that the

standard decision tree algorithm be modified to
discriminate between compositions that are and are not
evenly matched.

It should be noted that the standard SVM technique,
much like the standard DT technique, will not be able to
discriminate between balanced and unbalanced
compositions. As a result, it would succumb to the same
problems as DT, and so it was not tested in these
experiments.

4.4 Ensemble algorithms
The proposed algorithm generates multiple decision Figure 2: Error rate for a given confidence prediction likelihood.
trees (an ensemble decision tree (EDT)), each trained on
different parts of the dataset. For a given test game, each For an optimal ensemble, confident win-loss
tree will predict whether or not team A is favoured to win. predictions should be delivered often and accurately. In
If a percentage of the trees greater than a “consensus order to search for the parameters which yield the optimal
threshold” predict the same outcome, the matchup is ensemble, the test results for both EDT and ESVM were
considered to be unbalanced. In such a case, it is more plotted in Figure 2. To create this figure, for each
likely that the prediction is reliable and it will be shown to “confident prediction likelihood” (i.e., how often a
the users so they may know whether they are likely to win confident prediction is delivered), the trial with the lowest
or lose. If a percentage of the trees smaller than a error-rate was found. The resulting pairs were plotted in
“consensus threshold” predict the same outcome, it is likely Figure 2.
that the team is quite well balanced. In such a case, the
users would only be informed that the matchup is fair—in Notice that this graph illustrates a trade-off between
which case they would probably decide to play because low error-rate and highly confident prediction likelihood.
victory will likely be decided by player skill. This is expected, since when more highly confident
predictions are given, they start to be given even in cases
EDTs require three parameter values to be where it is impossible to predict the outcome with any sort
constructed: the number of members per ensemble, the of reliability (i.e., in cases where the teams are balanced).
number of training points per ensemble member and the As a result, it is important to compromise between error
consensus threshold. Since it is not clear how to set these and confident prediction likelihood.
parameters optimally, over 2000 EDTs were generated
using different parameter-value combinations. They were It should also be noted from Figure 2 that EDT has
all then tested with the same test data set. performance that is consistently superior to that of ESVM,
especially when the confident prediction likelihood is
The average set of parameters that yielded the small. This means that the EDT method provides a greater
smallest error is: 31 members per ensemble; 51% of ability to tune the error rate by altering the confidence
training data per ensemble member; and a consensus threshold.
threshold of 20%. None of these values are surprising, but
insight can be gained from analysis of the number of Similarly, Figure 3 shows the error rate for a given
members per ensemble and the amount of training data per confidence for both EDT and ESVM. For a single
ensemble member. A value of 31 is near the top of the classification, the confidence is given by the number of
bounds (36) for which the number of ensemble members members of the ensemble that agree. The trend lines in
parameter was varied, which suggests that the larger the Figure 3 indicate how this measure of confidence is related
ensemble, the better. Providing each ensemble member to the error rate. Notice that the variation about the trend
with only 51% of the training dataset ensures that the line for ESVM is of roughly the same magnitude as the
members are dissimilar. If the members are too similar to total range of ESVM’s error rate. However, for EDT, the
each other, then the ensemble would approximate the variation about the trend line varies with a magnitude that
behavior of a single tree; this would defeat the purpose of is much smaller than the magnitude of the total range of
having an ensemble. The identical approach and EDT’s error rate. It should therefore be possible to use the
parameters used for the EDT algorithm were used to create EDT trend line to estimate the error rate given the
an ensemble support vector machine (ESVM) algorithm. confidence. In fact, for EDT, further analysis shows that
the error rate can be predicted to within ± 3 percent, 95%
of the time. This result is of particular interest to players of
LoL, who, when using EDTs to decide whether or not to
resign a game before the battle phase, will want to know

the confidence and likelihood of classification error of the 6 References
EDT’s suggestion.
[1] C. MacManus, "League of Legends the world's 'most
played video game'," 12 October 2012. [Online].
Available: http://news.cnet.com.
[2] Riot Games, "League of Legends LCS," 2013.
[Online]. Available: http://na.lolesports.com/.
[3] ZAM Network, "LoL King," 2013. [Online].
Available: http://www.lolking.net/.
[4] Team Solo Mid, "Team Builder," 2013. [Online].
Available: http://championselect.net/teamBuilder/.
[5] Team Solo Mid, "LoL Counter," 2013. [Online].
Figure 3: Error rate for a given confidence for EDT and ESVM.
Available: http://lolcounter.com/.
[6] M. Bowling, J. Furnkranz, T. Graepel and R. Musick,
5 Conclusion "Machine learning and games," Machine Learning ,
Playing a game of League of Legends where the odds pp. 211 - 215 , June 2006.
of winning are small can be frustrating, time consuming, [7] P. Spronck, M. Ponsen, S.-K. Ida and E. Postma,
and result in a larger penalty than conceding. It is therefore "Adaptive game AI with dynamic scripting," Machine
desirable for players to know if they are favoured to win Learning , pp. 217 - 248, June 2006.
before the battle phase begins so that they can decide
[8] L. Kocsis and C. Szepesvári, "Universal parameter
whether or not to concede.
optimisation in games based on SPSA," Machine
To this end, the decision tree machine intelligence
Learning, pp. 249 - 286, June 2006.
technique was employed to determine the likelihood of a
team winning given the composition of avatars chosen by [9] M. L. Maher and K. Merrick, "Motivated
each team. The development of a avatar attribute rating Reinforcement Learning for Non-Playe rAvatars in
system simplified the problem, but the nature of League of Persistent Computer Game Worlds," in Empirical
Legends is such that decision trees are not well suited to Software Engineering and Measurement, New York,
providing a reliable solution. In some games, the team 2007.
compositions are balanced to the point where the avatars [10] M. A. Hearst, "Support vector machines," Intelligent
chosen by each team play only a small role in determining Systems and their Applications, IEEE, pp. 18 - 28,
which team wins. Other games are unbalanced, and team 1998.
composition is the most important factor in determining [11] J. Huang, "Comparing naive Bayes, decision trees, and
which team wins. SVM with AUC and accuracy," in Third IEEE
In order to address this dichotomy, ensemble decision International Conference on Data Mining, Melbourne,
tree and ensemble support vector machine approaches were Florida, 2003.
developed. These approaches create ensembles with
[12] H. Drucker, D. Wu and V. N. Vapnik, "Support Vector
multiple members, each trained with only a portion of the
Machines for Spam Categorization," IEEE
training data, that must reach a consensus in order to make
Transactions on Neural Networks, pp. 1048 - 1054,
a confident classification.
Sept. 1999.
Through experimentation, it was determined that it is
possbile to predict the outcome of a game of League of [13] T. Hothorn and B. Lausen, "Double-bagging:
Legends given only the team compositions. Of the machine combining classiÿers by bootstrap aggregation,"
intelligence techniques tested, ensemble decision trees Pattern Recognition, pp. 1303 - 1309, 2003.
provide the best results. [14] Microsoft, "Partitioning Data into Training and
Future experiments would benefit from a larger Testing Sets," 2008. [Online]. Available:
dataset. This can be done by scraping more match data http://msdn.microsoft.com/en-
from LoLKing.net. With a larger dataset, it should be us/library/bb895173%28v=sql.100%29.aspx.
possbile to train the system to have a lower error rate. [Accessed April 2013].
Further study of the effects of sources of noise, such as the [15] F. Pedregosa, et al., "Scikit-learn: Machine Learning in
skill level of the players in the dataset or the patch version Python," Journal of Machine Learning Research, vol.
of the game, may motivate methods for obtaining a better 12, pp. 2826-2830, 2011.
dataset in the future. Time constraints prevented the
exploration of the effect of increasing the size of the dataset [16] C.-C. Chang and C.-J. Lin, "LIBSVM : a library for
on the error rate for the ensemble methods, as well as the support vector machines," ACM Transactions on
effect of varying training ratio. Conducting these Intelligent Systems and Technology, vol. 2, no. 27, pp.
experiments and further tuning the three parameters of the 1-27, 2011.
ensemble methods are recommended for future work.

Classifier Council.py
# Class used for managing ensembles of MI objects
# This class is an interface that must be inherited to be used.
# The inheriting class must override train_function and test_function
# with functions containing instructions for training and testing a given
# MI method (see CouncilTypes.py below).
class ClassifierCouncil:

      # X, Y are numpy matrices: X is a table of inputs, Y is a table of outputs
      # First dim: data points
      # Second dim: data for each data point
      def __init__(self, X,Y,num_of_members, train_ratio, train_ratio_per_member):
          self.X = X;
          self.Y = Y;

         # Number of MI objects to be used in the ensemble
         self.num_of_members = num_of_members;

         # Ratio of data to be used for training; the rest for testing
         self.train_ratio = train_ratio;

         # How much of the training data should be used to train each object
         self.train_ratio_per_member = train_ratio_per_member;

         # Collection of trained MI objects in ensemble
         self.classifier_list= [];

         # Slice and dice data (assumes y only has width of 1)
         data = numpy.concatenate((X,Y), axis=1);
         (train, test) = samplelist(data, self.train_ratio, 0);
         self.trainX = train[:,:-1];
         self.trainY = train[:,-1];
         self.trainY = self.trainY[:,None];
         self.testX = test[:,:-1];
         self.testY = test[:,-1];
         self.testY = self.testY[:,None];

    # Train as many MI objects as specified above and store                  them    in
classifier_list
    def train(self):
        data = numpy.concatenate((self.trainX,self.trainY), axis=1);

         for i in range(self.num_of_members):
             (train, test) = tools.samplelist(data,       self.train_ratio_per_member,
1);

             trainX = train[:,:-1];
             trainY = train[:,-1];

             m = self.train_function(trainY.tolist(), trainX.tolist());

             self.classifier_list.append(m);

    # Override this function in a class inheriting this class with a function
containing
    # instructions for training a given MI technique. A trained object should be
returned.
    def train_function(self,trainY, trainX):
        pass

# Test
    def test(self):
        mem_results = zeros(shape=(self.testX.shape[0], self.num_of_members));
        for mem_num in range(len(self.classifier_list)):
            testY = self.testY[:,0].tolist();
            testX = self.testX.tolist();
            m = self.classifier_list[mem_num];
            p_label = self.test_function(testY, testX, m);
            mem_results[:,mem_num] = p_label;

        return mem_results;

    # Override this function in a class inheriting this class with a function
containing
    # instructions for testing a given MI technique. A trained object should be
returned.
    def test_function(self, testY, testX, model):
        pass

    # Given test results, process them and return detailed insightful information
in three hash tables
    def process_test_results(self, mem_results, confidence_cutoff):

        # Count how many ensemble objects said 'yes'         =   'will   win'   for   each
datapoint
        # Note: "Will win" = 1, "Wil lose" = 0
        mem_results = numpy.sum(mem_results, axis=1);

        # What ratio of objects said 'yes'
        yes_ratio = mem_results / self.num_of_members;

        raw_outcome = zeros(shape=(self.testX.shape[0], 1));
        for i in range(len(yes_ratio)):
            if yes_ratio[i] > 0.5:
                raw_outcome[i] = 1;
            else:
                raw_outcome[i] = 0;

        # In which cases were the majority of objects right?
        right = numpy.logical_not(numpy.logical_xor(raw_outcome, self.testY));

        # What is the confidence in each case?
        confidence      =       amax(numpy.concatenate((yes_ratio[:,None],             (1-
yes_ratio[:,None])), axis=1), axis=1);

        # Package all lists of results in hash table
        results_lists = {}
        results_lists['mem_results']        = mem_results;
        results_lists['yes_ratio']          = yes_ratio;
        results_lists['raw_outcome']        = raw_outcome;
        results_lists['right']              = right;
        results_lists['confidence']         = confidence;

        # Package all givens in a hash table
        givens = {}
        givens['num_of_mems']                = self.num_of_members;
        givens['datapts']                    = self.X.shape[0];

givens['train_ratio']                 = self.train_ratio;
        givens['train_ratio_per_member']      = self.train_ratio_per_member;
        givens['confidence_cutoff']           = confidence_cutoff;

        confidence_mean = mean(confidence);
        confidence_std = std(confidence);

        confidence_high = [];

        # error count in cases where confidence > parameter "confidence_off"
        confidence_high_err_count = 0;
        confidence_low = [];

        # error count in cases where confidence < parameter "confidence_off"
        confidence_low_err_count = 0;

        for i in range(len(confidence)):
            if confidence[i] > (0.5 + confidence_cutoff):
                confidence_high.append(confidence[i]);
                if right[i] == 0:
                    confidence_high_err_count = confidence_high_err_count + 1;
            else:
                confidence_low.append(confidence[i]);
                if right[i] == 0:
                    confidence_low_err_count = confidence_low_err_count + 1;

        confidence_high_mean = mean(confidence_high);
        confidence_high_std = std(confidence_high);
        confidence_low_mean = mean(confidence_low);
        confidence_low_std = std(confidence_low);

        if len(confidence_high) == 0:
            confidence_high_err_rate = 0;
        else:
            confidence_high_err_rate    =      float(confidence_high_err_count)     /
len(confidence_high);

        if len(confidence_low) == 0:
            confidence_low_err_rate = 0;
        else:
            confidence_low_err_rate     =      float(confidence_low_err_count)      /
len(confidence_low);

        # How often is confidence high?
        confidence_high_ratio = float(len(confidence_high)) / len(confidence);

        # Package scalar results in hash table
        results_summary = {}
        results_summary['confidence_mean'] = confidence_mean;
        results_summary['confidence_std'] = confidence_std;
        results_summary['confidence_high_err_count'] = confidence_high_err_count;
        results_summary['confidence_low_err_count'] = confidence_low_err_count;
        results_summary['confidence_high_mean'] = confidence_high_mean;
        results_summary['confidence_high_std'] = confidence_high_std;
        results_summary['confidence_low_mean'] = confidence_low_mean;
        results_summary['confidence_low_std'] = confidence_low_std;

results_summary['confidence_high_err_rate'] = confidence_high_err_rate;
results_summary['confidence_low_err_rate'] = confidence_low_err_rate;
results_summary['confidence_high_ratio'] = confidence_high_ratio;

return (results_lists, givens, results_summary);

CouncilTypes.py

import numpy;
from numpy import shape;
import tools;
from ClassifierCouncil import *

# These classes specialise ClassifierCouncil for specific MI techniques

# Defines train and test functions for SVM
from libsvm.python.svm import *
from libsvm.python.svmutil import *
class SVMCouncil(ClassifierCouncil):

    def train_function(self,trainY, trainX):
        model = svm_train(trainY, trainX);
        return model;

    def test_function(self, testY, testX, model):
        p_labels, p_acc, p_vals = svm_predict(testY, testX, model);
        return p_labels;

# Defines train and test functions for decision trees
from sklearn import tree
class DecisionTreeCouncil(ClassifierCouncil):

    def train_function(self,trainY, trainX):
        model = tree.DecisionTreeClassifier()
        model = model.fit(trainX, trainY)
        return model;

    def test_function(self, testY, testX, model):
        p_labels = model.predict(testX)
        return p_labels;

dttest.py (similar file was used for testing SVM)

# For trying combinations of parameters for ensembles and finding
# good ones.

### Get info
my_data = genfromtxt('dat/summary.csv', delimiter=',');

### Discard header
my_data = my_data[1:, :];

###   Split into   X and Y
X =   my_data[:,   :33];
Y =   my_data[:,   34];
Y =   Y[:,None];   # to make vector a matrix

result_hold = [];

# Vary number of members in ensemble
for num_mem in range(20,36,5):
    # Vary ration of data given to each member
    for train_ratio_per_mem in range(55,96,5):
        train_ratio_per_mem = float(train_ratio_per_mem);
        print "Currently working on (num_mem = ", num_mem, ", train_ratio_per_mem
= ", train_ratio_per_mem/100, ")";
        m = DecisionTreeCouncil(X, Y, num_mem, 0.75,train_ratio_per_mem/100);
        m.train();
        raw_results = m.test();

        # Process test results for various confidence_cutoffs
        for confidence_cutoff in range(0,41,5):
            confidence_cutoff = float(confidence_cutoff);
            print   "Currently   working    on   (num_mem   =   ",   num_mem,    ",
train_ratio_per_mem = ", train_ratio_per_mem/100, ", confidence_cutoff =         ",
confidence_cutoff/100, ")";
            (results_lists,           givens,          results_summary)           =
m.process_test_results(raw_results, confidence_cutoff/100);

              important_data = dict(givens.items() + results_summary.items());
              print important_data;
              result_hold.append(important_data);

          fo = open('dat/dttest.struct', 'w');
          pickle.dump(result_hold, fo);
          fo.close();

You can also read