Can a Support Vector Machine identify poor performance of dyslectic children playing a serious game? - Diva-Portal.org

Page created by Jerry Gardner
 
CONTINUE READING
Can a Support Vector Machine identify poor performance of dyslectic children playing a serious game? - Diva-Portal.org
EXAMENSARBETE INOM TEKNIKOMRÅDET
MEDIETEKNIK
OCH HUVUDOMRÅDET
DATALOGI OCH DATATEKNIK,
AVANCERAD NIVÅ, 30 HP
STOCKHOLM, SVERIGE 2021

Can a Support Vector Machine
identify poor performance of
dyslectic children playing a
serious game?

VIKTOR LEMON

KTH
SKOLAN FÖR ELEKTROTEKNIK OCH DATAVETENSKAP
Can a Support Vector Machine identify poor performance of dyslectic children playing a serious game? - Diva-Portal.org
SAMMANFATTNING
Den här uppsatsen har varit en del i utvecklingen av det seriösa spelet Kunna, ett webbaserat spel för barn
diagnostiserade med dyslexi. Spelet består av fem olika övningar som syftar till att öva och utveckla barnens läs- och
skrivförmåga. Då Kunna kan användas var som helst behövs verktyg för att förstå varje individs kapaciteter och
svårigheter. Därför syftar den här uppsatsen till att presentera hur ett seriöst spel och stödvektormaskiner (eng.
support vector machine) kan användas för att identifiera de användare som inte uppnådde prestationskraven. På grund
av den uppblossande coronapandemin kunde dock Kunna enbart testas på barn som inte var diagnostiserade med
dyslexi och därför bör den här uppsatsen ses som en pilotstudie. Inledningsvis identifierades flera variabler för att mäta
prestandan hos barn med dyslexi. Därefter implementerades variablerna i Kunna och testades på 16 spansktalande
barn där resultaten analyserades i syfte att identifiera samband kopplade till svaga prestationer. Slutligen delades
deltagarnas data upp i två grupper, varav en grupp innehöll deltagare med klart högre värden i tid och antal fel.
Uppdelningen gjordes för att träna och utvärdera om en stödvektormaskin kan separera de två grupperna och därav
identifiera de deltagare som inte uppnådde prestationskraven. De slutliga resultaten indikerar dock att en
stödvektormaskin inte är det effektivaste valet för detta ändamål. Istället föreslås att framtida arbeten bör överväga
multiklassificeringsalgoritmer.
Can a Support Vector Machine identify poor performance of
              dyslectic children playing a serious game?
                                                           Viktor Lemón
                                                  KTH Royal Institute of Technology
                                                       Stockholm, Sweden
                                                         vlemon@kth.se

ABSTRACT                                                              word recognition problems, word pronouncing, and reading
                                                                      fluently [2]. Early detection combined with effective preven-
This paper has been a part of developing the serious game             tive practices can minimize the difficulties associated with
Kunna, a web-based game with exercises targeting children             dyslexia [3,10,22]. Beyond traditional treatment-tools, seri-
diagnosed with dyslexia. This game currently consists of five         ous games are commonly recognized as games used for pur-
different exercises aiming to practice reading and writing            poses other than pure entertainment and have the potential
without a therapist or neuropsychologist present. As Kunna            to reach anyone in need. These games exist in several fields
can be used anywhere, tools are needed to understand each             such as education, health, manufacturing, military, and medi-
individual's capacities and difficulties. Hence, this paper aims      cine [6]. Furthermore, serious games have been used to esti-
to present how a serious game and a support vector machine            mate the risk of dyslexia, and several studies [4,5,7,10,12,14]
were used to identify children that performed poorly in               have combined web-based games and machine learning to of-
Kunna’s exercises. Though, due to the current corona pan-             fer screening tests online.
demic, Kunna could only be tested on children not diagnosed
with dyslexia. Therefore, this paper should be seen as a proof        The challenges of dyslexia are highly individual [26], and dig-
of concept. As an initial step, several variables were identified     ital interventions should therefore be personally adapted.
to measure the performance of dyslectic children. Secondly,           Software has the potential to support neuropsychologists and
the variables were implemented into Kunna and tested on 16            therapist’s overall comprehension of a patient, as serious
Spanish-speaking children. The results were analyzed to               games can be used without their presence. As a first step to-
identify how poor performance could be recognized using the           wards this ambition, this study presents how a web-tool was
identified variables. As a final step, the data was divided into      constructed and evaluated in order to identify children strug-
two groups for each exercise, of which one group contained            gling with exercises in a serious game. This by training a sup-
participants who appear to perform poorly. These were par-            port vector machine (SVM), a classifying algorithm that has
ticipants with clearly outlying values in the number of errors        shown promising results in categorizing and differentiating
and duration. Thus, to train and evaluate if a Support Vector         dyslexic from non-dyslexic populations [4,7,10,12,20]. Hence,
Machine (SVM) can separate the two groups and thereby iden-           the hypothesizes is that SVM can identify poor performance
tify the participants who performed poorly. From the discus-          by being trained with two groups of participants. One group
sion followed that the SVM is not the most efficient choice for       who exposes difficulties and one group that performs well
this aim. Instead, it is suggested that future work should con-       within an exercise.
sider multiclassification algorithms.
                                                                      The study was done in collaboration with the Swedish web
CCS CONCEPTS                                                          agency Prototyp and the Spanish mental health clinic Neu-
• Human-centered computing → Field studies; User studies;             rotalentum. These companies have developed the applica-
Empirical studies in accessibility.
                                                                      tion Kunna which exercises aim to practice and measure skills
                                                                      needed to handle cognitive challenges of dyslexia. However,
KEYWORDS                                                              the current circumstances of covid-19 made it only possible
Serious games, Dyslexia, Identifying, Performance, Support            to test the game Kunna on 16 participants without dyslexia.
Vector Machines
                                                                      Reading guide
INTRODUCTION                                                          This paper consists of three steps: 1. Find relevant variables
Dyslexia is a reading disability disorder diagnosed in child-         to measure the performance of children diagnosed with dys-
hood, with a prevalence of 5 to 10 percent globally                   lexia. 2. Evaluate how these variables can indicate poor per-
[1,4,7,10,12]. It is characterized by poor spelling abilities,        formance in the context of Kunna’s exercises. 3. Evaluate if the
SVM can identify participants that appear to be struggling         Find the syllable
with an exercise. Thus, three methods (Finding variables,          Like the rhyming exercise, the user is instructed to drag a
Characteristics of poor performance and Testing the support        word containing the target syllable. However, each card only
vector machine) are presented with corresponding results.          showed an image, and the user must listen to each card to
                                                                   identify the syllable. The user could here listen to the sentence
Research Question                                                  “Which word contains the syllable …?”. The target syllable is
How can a Support Vector Machine identify poor performance         presented in the green box in the upper right corner. See im-
of dyslectic children playing a serious game?                      age 3 in figure 1.

PRESENTATION OF THE EXERCISES                                      Game Design
                                                                   Each exercise began with a tutorial and instructions were
                                                                   given in text read by an auto-played voice. The participants
                                                                   had to complete a simplified version of the game to enter the
                                                                   exercise to ensure they understood the instructions. Each ex-
                                                                   ercise has several levels, and the content was varying depend-
                                                                   ing on the exercise and the difficulty level. A level was com-
                                                                   pleted when the participant provided a correct answer, indi-
                                                                   cated by a cheerful message.

                                                                    Exercise      Capacities          Interaction   Audio   Levels
                                                                    Name          tested              type
                                                                    Syllable      Syllabic and pho-   Drag-and-     Yes     6
                                                                                  nological aware-    drop
                                                                                  ness
Figure 1. Showing the five different exercises in the web-
                                                                    Sort          Visual discrimi-    Drag-and-     Yes     6
application Kunna. The CPT exercise (1) is not covered in
                                                                                  nation and cate-    drop
this paper.                                                                       gorization

Exercises                                                           Rhyme         Phonological        Drag-and-     Yes     8
                                                                                  awareness           drop
Kunna currently has five exercises: one Continuous perfor-
mance test (CPT) targeting ADHD (not covered in this paper)         Labyrinth     Visual search       Click         No      4
and four exercises targeting dyslexia. The four dyslexia tar-      Table 1. The exercises studied in Kunna
geting exercises were invented and designed by Neurotal-
entum and are presented below and summarized in table 1.           THEORY
                                                                   Dyslexia
Labyrinth
                                                                   In the Diagnostic and statistical manual of mental disorders
In this task, the user must find a path from the starting letter
                                                                   (DSM -5), dyslexia is defined as “an alternative term used to
or word to reach the flag, by clicking on cells. These cells ei-
                                                                   refer to a pattern of learning difficulties characterized by prob-
ther contained the target word/letter or a non-target. See im-
                                                                   lems with accurate or fluent word recognition, poor decoding
age 2 in figure 1.
                                                                   and poor spelling abilities.” [2]. Several studies agree that dys-
                                                                   lexia results in an impaired phonological processing capabil-
Sort long words from short words
                                                                   ity [1, 3, 16, 22, 23], which causes difficulties in processing
The user is given two or three words and instructed to “drag”
                                                                   sounds in a language. For example, distinguish phonemes of a
short and long words to two different fields. Additionally, the
                                                                   word and judge whether two sounds rhyme [16]. Further-
user was able to listen to the words by clicking on the card. A
                                                                   more, Dyslectic children have shown to perform slower dur-
level was completed when both targets were correctly sorted.
                                                                   ing visual search tasks [11, 18] and maintain attention for a
See image 4 in figure 1.
                                                                   shorter period [18, 19]. Children training their phonological
                                                                   awareness are more likely to improve their reading and
Find the rhyme
                                                                   spelling [9,19].
This exercise is about dragging a card containing a word that
rhymes with a target word presented in a green box in the left
                                                                   Support Vector Machines
corner. The user could either read or listen to the sentence
                                                                   Support Vector Machine is a machine learning classifying al-
“Which word rhymes with …?” See image 5 in figure 1.
                                                                   gorithm used by several screening studies [4,7,10,12]. The bi-
                                                                   nary version aims to select a hyperplane that maximizes the
                                                                   distance between the classes. A test sample is then classified
depending on which side of the hyperplane the sample is. SVM         game aimed to differentiate children with and without dys-
supports high dimensional spaces and thus several features.          lexia and found five significant indicators. These were: total
According to Mark Hall [14], features should not be depend-          clicks, time to the first click, hits (number of correct answers),
ent on other features. They should further be correlated with        accuracy, efficiency (time of the last click/ hits). This study
the class yet uncorrelated with each other. Cross-validation is      was later followed up by implementing machine learning (us-
a common technique that splits the data into several sets. One       ing SVM) to detect dyslexic children [10]. The SVM was
set is used for testing and the rest for training. This is used to   trained using the number of clicks per item, number of correct
avoid overfitting – a situation occurring when the boundaries        answers, number of incorrect answers, scores (the sum of
between the classes are closely fitted to the training set, which    correct answers for each stage's problem type), accuracy (the
will not perform well when classifying new samples [13].             number of hits divided by the number of clicks) and miss rate.
                                                                     267 children and adults participated, and the results showed
                                                                     an accuracy of 84.62 percent [10]. Recently, Luz Rello et al. [5]
                                                                     conducted a more extensive study of the independent game
                                                                     where 313 children participated. The features: total clicks,
                                                                     first click, hits, efficiency, duration, fourth click, and average
                                                                     click time was significantly different for Spanish speaking
                                                                     children. This time was Random Forest used which resulted
                                                                     in an accuracy of 69 percent for Spanish speaking children
                                                                     and 75 percent for German.
Figure 2. Showing a plane differentiating the white and
black samples with maximized margin.                                  Variable              Definition               Study
                                                                      Number of clicks                               4, 5*, 6, 7
Several kernels can create a hyperplane that differentiates the       Correct answers                                4, 5*, 6, 7*, 10
samples. Depending on the dataset, the efficiency of achieving        Incorrect answers                              4, 6, 8*, 10, 11
a high classification accuracy varies among the kernels. The          Accuracy              Correct    answers/      3, 4, 7, 10
kernels used within this study are some of the kernels pro-                                 number of clicks
                                                                      Miss rate             The     number     of    4, 8*
vided by the Python package Scipy [28].
                                                                                            misses divided by
                                                                                            the number of clicks
RELATED WORK                                                          Time to the first                              5*, 7*
A few studies [3, 4, 5, 7, 10, 12] have measured the perfor-          click
mance in serious games using machine-data interaction                 Efficiency            Time of the last         5*, 7*
methods. In this paper, these are defined as tools available by                             click/ correct an-
                                                                                            swers
software technologies to measure interactions. A study in It-
                                                                      Number of clicks                               7*, 10
aly [3] developed six games to predict the risk of dyslexia in        per item
preschoolers. The dependent variables were response times,            Score                 The sum of correct       10
scores, and accuracy. However, due to a small population and                                answers all levels
minor application errors, no significant difference between           Duration              The time to complete     5*, 7*, 8*, 11, 18
the variables could be reported. Furthermore, data from an                                  an exercise
earlier phonological testing study was used to detect dyslexia        N:th click            The time to a specific   5*, 7*
using machine learning [20]. The test had four tasks with 56                                click
                                                                      Average click time                             5*
children participating and reached an accuracy of 89 percent
using decision trees. However, the study does not mention            Table 2. All identified features measured in similar
any of the used features. Rello Luz et al. [4,5,7,10] conducted      studies. Variables reported with a significant difference
several experiments intending to predict the risk of dyslexia        are noted with a star*.
using machine learning. They first developed a linguistic web-
game to detect dyslexia using SVM. The game consisted of 17          OVERALL METHOD
exercises and 32 levels and was tested on 243 children and           Since there are no standards for measuring performance
adults (95 diagnosed with dyslexia). Performance was meas-           within these exercises, a combination of research, interviews,
ured by counting the number of clicks, correct answers, incor-       testing, and statistical analysis were used to address the re-
rect answers, accuracy (hits/ number of clicks), and miss rate       search question. As no similar studies were found using these
(the number of misses divided by the number of clicks). How-         exercises, these following questions should first be answered:
ever, no statistical analysis of the variables was presented.
Later on, a language-independent web game (no reading was                 -      How can performance be measured in Kunna?
required) was built with visual and audial-based tasks [7]. To-           -      How are these variables related to poor performance?
tally 178 children with and without dyslexia participated. The
Question one was answered by studying related work to iden-        INTERVIEW RESULTS
tify the state of art of web-based measurements for dyslexia.      Looking at the variables in table 2, are these relevant to esti-
Furthermore, one interview was held to include a neuropsy-         mate cognitive capacities of children with dyslexia?
chologist’s perspectives and validate the relevance of the var-
iables in the context of these exercises. To answer the second         -    “During this concept testing phase are all these vari-
question, the identified variables were first implemented and               ables relevant. Especially the number of mistakes. But
tested on a few participants. The results were analyzed to                  the completion time and “Time to the first response”
evaluate the variables' applicability and effectivity as a foun-            are interesting since it tells the time it took for them
dation for identifying participants with difficulties. Since                to finish and understand a task. The number of audio
these exercises are designed to be challenging for children                 plays may not tell you anything about dyslexia, but it
with dyslexia, poor performance is defined as a high number                 may tell you something about the comprehension ca-
of mistakes in relation to a protruding time duration. This                 pacity which is a variable not too far away from dys-
since slow time response and errors are related to difficulties             lexia. […]. The benefit of having computers is that they
in search tasks, reading, and writing [11,16,18,21]. However,               can measure variables that are hard for a human to
this doesn’t exclude that poor performance can be explained                 keep track of. But they must be compared with the test
by other variables. Therefore, correlation to time and errors               group and the control group and compared within
were calculated. To finally answer the research question, the               age groups”.
participants were divided into two group based on the test re-
sults. This to train and evaluate the SVM.                         Are there any variables missing?
                                                                       -    “It would be interesting to see if a child is guessing
To summarize the overall method, the following steps were                   while doing these exercises. Some participants could
applied:                                                                    e.g., be slow in completing the exercise but have a
    -    Find dependent variables within similar exercises                  higher number of mistakes. But can guesses be seen in
         that are feasible to implement.                                    any other way?”
    -    Interview with a therapist of Neurotalentum.
    -    Develop and implement the variables.                      This question generated the variables Response Frequency
    -    Evaluate the results of each variable to find charac-     and Fast Responses. Response Frequency measure the time be-
         teristics of poor performance.                            tween each response and Fast Responses counts the number
    -    Train and evaluate the SVM.                               of times a response was given in less than one second.
                                                                   With respect to these variables. How do you expect a child with
As this overall approach contains several steps, these are di-     or without dyslexia to perform during these exercises?
vided into three methods with a corresponding result sec-              -    “Theoretically, these tools should put the children into
tion.                                                                       test, and we believe that these exercises should be able
                                                                            to differentiate these two groups. But the outcome
FINDING VARIABLES                                                           can be spread. Some children with a diagnose might
The first question was addressed by researching related stud-               perform similar to the control group. This can and
ies that has used machine learning and serious games to                     will probably happen. To interpret the performance,
measure the performance of dyslexia. Furthermore, a semi-                   we need to look at all the variables being measured to
structured interview was held with one of the neurophysiol-                 estimate the capacities of the child. “
ogists at Neurotalentum. Thus, to validate and identify
measures that they were interested in and further understand       How can these variables help to identify individuals having dif-
how these variables can identify children with difficulties. The   ficulties with an exercise?
interviews were audio-recorded in Google Meet and the fol-
lowing questions were asked:                                           -    First off, every kid is different. Because dyslexia is a
                                                                            word for making a summary of all the problems a kid
    -    Looking at the variables in table 2, are these relevant            has with reading and writing comprehension. Inside
         to estimate cognitive capacities of children with dys-             reading and writing comprehension, there are several
         lexia?                                                             parameters we need to measure, and we need to focus
    -    Are there any variables missing?                                   on the parameters that a kid has a problem with. […]
    -    With respect to these variables. How do you expect a               The challenge is to tell this program if a value is okey.
         child with or without dyslexia to perform during these             One way is to measure the number of mistakes and
         exercises?                                                         then see if they are above or below average compared
    -    How can these variables help to identify individuals               to their age group.
         having difficulties with an exercise?
Type          Dependent measures               Description                              Syllable       Sort         Rhyme         Labyrinth
 Time          Duration [seconds]              The total time to complete a level           X             X             X               X
               Time to first response [sec-                                                 X             X             X               X
               onds]
               Response frequency [seconds]    The time between responses                   X             X             X               X
               Longest pause                   The maximum value of the response            X             X             X               X
                                               frequency
 Scores        Number of incorrect answers                                                  X             X             X               X
 Events        Number of responses             The number of clicks or drag and             X             X             X               X
                                               drops
               Number of regrets               The user has regretted a response                          X                             X
               Number of audio plays           The user clicked to hear an audio file.      X             X             X
               Number of fast responses        A response given in 1 second or less         X             X             X               X
 Quotients     Response       speed     [re-   Number of responses divided by the           X             X             X               X
               sponses/second]                 duration
               Error rate                      Number of wrong answers divided by           X             X             X               X
                                               the minimum number of required re-
                                               sponses
Table 3. All the measurements implemented
                                                                               Incorrect answers
     -    Secondly, we can look at the time and the other mea-
                                                                               Within the labyrinth, the user can only select a neighbor to the
          surements. The next step is to combine the result of all             last selected cell. E.g., a user cannot click on the final cell di-
          exercises in order to estimate the capacities of the kid.            rectly. Therefore, the coordinates of a clicked cell were con-
          Such as reading comprehension and attention capac-
                                                                               trolled in X and Y -direction. This was accomplished by stor-
          ity.
                                                                               ing the previously selected cells' x and y coordinates. Since a
The interview indicates the importance of comparing the val-                   user can either move in a horizontal or vertical direction, the
ues between the two groups with the corresponding age
                                                                               direction must be validated. For a click to be valid in the ver-
group. It also highlights that outcomes of dyslexia are unique,
                                                                               tical or horizontal direction, the absolute difference between
which underlines the importance of forming a system that can                   the previous and current coordinates must be equal to one. A
interpret individual differences.                                              wrong answer was returned if these conditions were false,
                                                                               and the cell showed a red flashing animation. The cell changed
How can performance be measured in Kunna?
                                                                               to a green background if the answer was correct. To validate
Together with the two requested variables from the interview                   a response, every cell was configured with the JavaScript at-
and the research results, the variables presented in Table 3                   tribute onclick. Similarly, the JavaScript attribute on-
were implemented into the exercises. These variables were
                                                                               change was used within the "drag and drop" exercises to com-
selected to fit the given design. The number of correct answers                pare the value-attribute of a given card with the correct an-
was not included since all exercises require a correct answer                  swer. Within the syllable and rhyming exercise, an incorrect
to pass. Similarly, accuracy and efficiency were not qualified
                                                                               card was automatically removed to the initial area.
using the definition in Table 2. Furthermore, the variables
were adapted to fit the exercises. E.g., did several of the re-                Time
lated studies use clicks as a primary interaction type [4, 5, 7,               Currently, JavaScript provides the two inbuilt methods per-
10]. As noted in Table 1, either clicks or drag-and-drops are                  formance.now and Date.now to measure a time interval. Per-
used in Kunna. A more detailed explanation of each variable                    formance.now was selected since it runs independent of the
is described in the implementation section.                                    code and is not dependent on JavaScript event loops [15]. The
                                                                               method's accuracy is estimated to five microseconds [15].
IMPLEMENTATION                                                                 Therefore, performance.now appears to be accurate enough.
This section will cover how the exercises were built and how                   A time interval is measured by calling the method twice. A
the variables were implemented and measured.                                   variable t0 stores the time when an event is initialized, and a
                                                                               variable t1 stores the current time when the event is finished.
Application Setup                                                              The time interval is the difference between t1 and t0 but when
The application was built using the Model View Controller                      should these two variables be initialized? Vue.js includes the
(MVC) framework Sails.js [29] and was connected to a Post-                     function this.$nextTick, which is called when the entire view
gres database. The front-end was built using the JavaScript                    is rendered. Hence, the time variable t0 is initialized to ensure
framework Vue.js [30]. The website's images, sounds and                        the time clock starts when the content is fully displayed. The
texts were stored using the Content Service Manager (CSM)                      duration was calculated immediately after the last cell was
Contentful.
clicked or when a card was dropped by calculating the differ-         strength of the relationship between two variables by return-
ence between t0 and t1.                                               ing one value within the range of -1 to 1. To interpret the co-
                                                                      efficient is r = 0.1 told to indicate a small correlation, 0.30 me-
Response frequency                                                    dium and 0.5 a strong correlation [25]. As duration and errors
Response frequency is defined as the time between each re-            are considered as the most vital variable, these two variables
sponse. This was calculated by taking the time difference be-         are compared to the others in table 3. To support the correla-
tween the previous and the current response. The value was            tions and interpret the spread of the data, box plots are pre-
stored in a list. The purpose of this variable is to see whether
                                                                      sented - A recommended approach to visualize non-paramet-
the user generally responded slowly or fast during a level. The
final value is the mean of the stored values.                         ric data [25]. The box of a boxplot shows 50 percent of the
                                                                      data fall; the horizontal line indicates the median and the X
Time to the first response                                            the mean value. The whisker marks the maximum and mini-
The time to each response from the initial time was calculated        mum value and outliers (values greater than three standard
and stored in a list. Time to first response is the first value in    deviations of the mean) are marked with a ring [25]. The box-
this list.                                                            plots were created in Excel.

Longest pause                                                         Ethics and Recruitment
The longest pause is the maximum value in response fre-               The participants were recruited by Prototyp. The parents re-
quency list. If this value is very high, e.g., a minute or more, it   ceived a consent form to confirm their permission. The con-
may indicate that the user is not paying attention to the exer-       sent form specified the scientific use of the data collection and
cise. It could also indicate that the user is struggling or being     made clear that the children were allowed to quit at any time.
more thoughtful about the next move.                                  The parents received an email with instructions specifically
                                                                      telling the parents not to help solve the exercises. The partic-
Fast responses                                                        ipants were anonymized by a randomized email address and
A fast response is defined as a value in the response frequency       a password to access the website. 16 Spanish-speaking chil-
list that is equal to or less than one second. The purpose of         dren (8 boys and 8 girls) without a diagnose participated in
this variable is to estimate the sureness of the user. Compared       testing Kunna’s exercises. The participants were between 7-8
with the other variables could, e.g., a high number of re-            years old and recruited by Prototyp. They were further re-
sponses and a high number of errors, indicate a guessing ap-          quested to use the web browser Chrome (version 80 or later).
proach. Likewise, a combination of a high number of fast re-          The estimated time to complete all four exercises (excluding
sponse and a low number of errors could indicate certainty.           tutorials) was approximately five minutes.
Quotients
Miss rate [errors/response] is defined as the number of incor-
                                                                      TEST RESULTS
rect answers divided by the total number of responses. Speed          TIME
[responses/second] is calculated as the number of responses                      300
divided by the total time.
                                                                                 250                                       Labyrinth
Database                                                                         200
                                                                                                                           Syllable
                                                                       seconds

The measures from each exercise are saved in separate Post-                      150
gres tables. The results are sent directly after a level is fin-                                                           Rhyme
                                                                                 100
ished to make sure every completed level is stored. Besides
the other measurement, all the user’s answers are stored in a                     50                                       Sort
list together with the time until each response. For example,                      0
this might tell which target is most often mixed up.
                                                                      Figure 3. The total time to complete an exercise
Automated Tests                                                       The mean total time to complete Labyrinth was 133 seconds
To verify that all the measures were implemented correctly,           (sd = 61), the corresponding for Syllable was 69 seconds (sd
automated tests were executed using Mocha.js. The automatic           = 32), Rhyming 73 seconds (sd = 49) and Sort 55 seconds (sd
test consisted of two parts: validate the database storage (by        = 27). It appears as the labyrinth was more spread and gener-
returning a 200 response) and test if the validation functions        ally took the longest time. Compared to the other exercises,
returned the correct values.                                          were no images or sounds implemented. Thus, can the laby-
                                                                      rinth be assumed to be trickier as these children recently have
CHARACTERISTICS OF POOR PERFORMANCE                                   learned to read. Looking at the correlations in table 4, errors
To find characteristics of poor performance, Pearson's corre-         and duration has a medium to strong correlation for all the
lation coefficients were calculated using Excel version 16.45.        exercises.
The aim was to find correlations between errors and time in
relation to the other variables. The coefficient measures the
Exercise                       Duration          TTFR       Errors            RF          FR           AP              LP

           Labyrinth     Duration             -                0,712   0,693            0,844      -0,659          -          0,808
                          Errors            0,693              0,504       -            0,288      -0,173          -          0,359
           Syllable      Duration             -                0,981   0,411            0,377         -          0,764        0,351
                          Errors            0,411              0,252       -            0,851         -          0,275        0,902
            Rhyme        Duration             -                0,990   0,623            0,699         -          0,763        0,428
                          Errors            0,623              0,550       -            0,774         -          0,457        0,725
               Sort      Duration             -                0,947   0,381            0,661      0,319         0,759        0,034
                          Errors            0,381              0,252       -            0,042      0,942        -0,054        0,005
Table 4. Pearson correlation coefficient for duration, errors and the other variables.

MEAN TIME TO FIRST RESPONSE (TTFR)                                     LONGEST PAUSE (LP)
           30                                                                      18
           25                                       Labyrinth                      16
                                                                                   14                                     Labyrinth
           20                                                                      12
 seconds

                                                    Syllable
                                                                                                                          Syllable
                                                                         seconds
           15                                                                      10
                                                    Rhyme                           8
           10                                                                                                             Rhyme
                                                                                    6
               5                                    Sort                            4                                     Sort
               0                                                                    2
                                                                                    0
Figure 4. The mean time to the first response for all levels
Syllable resulted in the slowest Time to first response with a         Figure 6. The mean longest pause
mean time of 10.92 seconds (sd = 4.98). The time for the lab-          The mean longest pause within the labyrinth was 5.9 seconds
yrinth was 9.16 (sd = 4.74) and 8.37 seconds for Rhyme (sd =           (sd = 4.59). The corresponding for Syllable was 0.66 (sd
8.38) and 5.83 for Sort (sd = 3.36). Looking at the correlations       =2.15), Rhyme 0.73 (sd = 2.70) and Sort 2.98 seconds (sd =
in table 4, errors and TTFR has a medium to strong correla-            2.31). Similar to the response frequency, the longest pause is
tion for all the exercises.                                            non-zero for the phonological testing exercises if a mistake
                                                                       was made. This is indicated by the strong correlation between
MEAN RESPONSE FREQUENCY (RF)                                           the longest pause and the number of errors. This variable in-
                                                                       dicates that no user was absent for a longer time as the high-
           6
                                                                       est value is approximately 25 seconds.
           5                                        Labyrinth
           4                                                           Scores
 seconds

                                                    Syllable
           3                                                           TOTAL INCORRECT ANSWERS
                                                    Rhyme
           2                                                            30
           1                                        Sort                25                                                 Labyrinth
           0                                                            20
                                                                                                                           Syllable
Figure 5 The mean response frequency                                    15
                                                                                                                           Rhyme
The labyrinth and Sort resulted in the highest RF with 2.6 sec-         10
onds (sd = 1,2) respectively 2.64 seconds (sd = 1,2). Syllable              5                                              Sort
resulted in 0.58 (sd = 0.95) and Rhyme 0.75 (sd = 1.08) sec-                0
onds. The RF was generally higher within the labyrinth and
Sort. However, RF is calculated after the first given response.        Figure 7. The total number of incorrect answers
Since Syllable and Rhyming require a minimum of one answer             The participants made in average 5.6 incorrect answers (sd =
to complete, RF is zero if a correct answer is given. This indi-       6.5) in Labyrinth, Syllable resulted in 0.625 errors (sd = 0.88),
cates why error and RF are highly correlated for these two ex-         Rhyme 1.0 (sd = 1.67) and Sort 0.937 (sd = 1.06). The laby-
ercises and the strong correlation between LP and the num-             rinth resulted in a higher number of errors, which was ex-
ber of errors in Rhyme and Syllable. Thus, RF is mainly work-          pected as the design allows for more mistakes (e.g., by acci-
ing within the Labyrinth.                                              dently pressing a cell). Furthermore, the number of errors
were slightly higher within Sort, which is expected as the par-   were not required to listen in Rhyme, audio plays could indi-
ticipants were required to give two answers per level com-        cate poor performance.
pared to Rhyme and Syllable. But the correlation between the
errors and duration are less strong in Sort and Syllable com-     How are these variables related to poor performance?
pared to the other exercises.                                     First, we can see obvious correlations between duration,
                                                                  TTFR and longest pause. Regarding the central findings:
FAST RESPONSES (FR)
                                                                       •    In Labyrinth: errors were strongly correlated with du-
 25                                                                         ration and TTFR. The duration was strongly nega-
 20                                               Labyrinth                 tively correlated with fast responses and response fre-
                                                                            quency.
 15                                               Syllable             •    In Syllable: errors were strongly correlated with re-
 10                                               Rhyme                     sponse frequency and longest pause. However, these
                                                                            are invalid due to the dependency with errors. Dura-
  5                                               Sort                      tion was correlated with response frequency and au-
  0                                                                         dio plays.
                                                                       •    In Rhyme: errors were strongly correlated with dura-
Figure 8. The total number of fast responses                                tion, TTFR, response frequency and longest pause.
The labyrinth's mean fast response was 8.06 (sd = 7.1) but                  Similarly, to Syllable, RF and LP are also dependent on
gave no results for the syllable and the rhyme. Sort resulted               the errors. The duration was correlated with audio
in 0.5 (sd = 0.63). The purpose of this variable was to count               plays.
spontaneous answers. However, the results indicate that this           •    In Sort: errors were strongly correlated with fast re-
measure can be interpreted differently for click-based and                  sponses. The duration was strongly correlated with
drag-based exercises. There is a strong negative correlation                audio plays.
with time within the labyrinth but a small correlation be-
tween errors, indicating a rapid but accurate response. This is   Duration and Errors were strongly correlated with TTFR in
the opposite of Sort, which shows that a fast response is         Labyrinth and Rhyme and with Audio plays in Rhyme. How-
strongly correlated with errors and not with response fre-        ever, the audio boxplot shows an outlier which could be ar-
quency. This indicates that the users corrected a wrong an-       gued to affect the correlation. Furthermore, TTFR and Dura-
swer quite fast. However, the number of total fast response       tion appear to be the same value in the drag-and-drop-based
and errors were quite a few within Sort and thus is such a con-   exercise, as the correlation between these two is approxi-
clusion too weak to be seen as a general rule.                    mately one. Duration and Audio plays were strongly corre-
                                                                  lated for all of the drag-and-drop-based exercises but were
TOTAL AUDIO PLAYS (AP)                                            medium to non-correlated with Errors. Audio plays could be
 30                                                               assumed to indicate poor performance in Syllable and Rhyme.
                                                                  Though, the correlation with Duration could partly be ex-
 25                                               Labyrinth
                                                                  plained by the time it takes to listen to the audio file. Further-
 20                                                               more, the longest pause has a medium correlation with errors.
                                                  Syllable
 15                                                               But the boxplots indicate an outlier in incorrect answers. To
                                                  Rhyme           summarize, these results could be argued as too weak to es-
 10
  5                                               Sort            tablish a general conclusion. Therefore, the conclusion is that
                                                                  these results cannot prove strong relationships that indicate
  0
                                                                  poor performance.
Figure 9. The total number of audio plays
The mean number of audio plays was in Syllable totally 6.18
                                                                  TESTING THE SUPPORT VECTOR MACHINE
(sd = 4.95) plays. Corresponding for Rhyme was 5.06 (sd =         Dividing participants into two groups
7.41) Sort and 1.68 (sd = 2.67). (The labyrinth had no audio)     As a proof of concept, two groups (G1 and G2) were hand-
Rhyme and Syllable had a higher number of audio plays,            picked to see whether the system could identify them. The
which was expected as the design required the users to listen.    correlation analysis could not indicate any strong relation-
Time was strongly correlated with the number of audio plays       ships for poor performance characteristics. Hence, the selec-
for all three exercises. However, the correlation between au-     tion was accomplished by looking at the individual perfor-
dio and errors was stronger within Rhyme but medium cor-          mances. Participants with clearly outlying values in duration
related in Syllable and non-correlated in Sort. Since the users   and errors were placed in group 2 (G2). As Mark Hall [14] sug-
                                                                  gests, good features for machine learning should preferably
                                                                  not be dependent on other features, which is the case for
Group         N        Duration        TTFR     Errors          RF         FR               AP         Miss rate       Speed
                                            110,4          8,1      1,7            9,5        9,1                               0,1        1,7
                    G1           9                                                                              -
                                            (58,6)        (5,1)    (2,5)          (5,7)      (8.6)                             (0.2)      (0.8)
 Labyrinth
                                            162,2         10,5     10,7           11,3        6,7                               0,8        1,3
                    G2           7                                                                              -
                                            (56,1)        (4,5)    (6,7)          (3,1)      (5,2)                             (0,4)      (0,3)
                                             60,4          9,9      0,1            0,7        0,0               5,1             0,0        0,9
 Syllable           G1           11
                                            (28,9)        (4,9)    (0,3)          (2,3)       (0)              (4,7)           (0.2)      (0.4)
                                             89,6         13,1      1,8            9,7        0,0               8,6             0,8        0,7
                    G2           5
                                            (32,1)        (5,0)    (0,4)          (6,2)       (0)              (5,2)           (0,2)      (0,2)
                                             55,2          6,7      0,3            2,0        0,0               2,5             0,2        1,6
 Rhyme              G1           12
                                            (25,3)        (2,9)    (0,5)          (4,0)       (0)              (3,4)           (0,2)      (0,6)
                                            126,4         13,5      3,0           18,0        0,0              12,8             1,5        0,9
                    G2           4
                                            (70,6)        (8,5)    (2,4)          (8,6)       (0)             (10,4)           (1,2)      (0,4)
                                             48,0          5,0      0,6           15,4        0,3               1,3             0,1        2,0
 Sort               G1           13
                                            (17,0)        (1,7)    (0,9)          (7,8)      (0,5)             (1,6)           (0,2)      (0,7)
                                             85,3          9,4      2,3           18,0        1,3               3,3             0,5        1,5
                    G2           3
                                            (47,3)        (7,8)    (0,6)          (5,1)      (0,6)             (5,8)           (0,1)      (0,5)
Table 5. The table is showing how the SVM categorized the dataset into two groups using the linear kernel. The mean values
and the standard deviations (in parenthesis) are presented.
                                                                                                     Linear             Rbf            Polynomial
response frequency and the longest pause within the drag-                   Mean accuracy             80%              74%                79%
and-drop-based exercises. Therefore, these features were not                N agreements               52               49                 45
used. The classification was done using Python and the SciPy                N disagreements            12               15                 19
                                                                            p-value                   0.629            0.010             0.0524
SVM-package. The linear, polynomial and the Rbf kernel were
                                                                            Statistic                  7.0              5.0                6.0
tested. Leave-one-out cross-validation was used due to the
small data set, meaning that 15 participants were used for                 Table 7. The mean accuracy, disagreements and p-value
training and one for classification. This was repeated for                 for all exercises and kernel.
every participant and exercise.
                                                                           Table 5 shows how the SVM classified the dataset into two
Evaluating the classification performance                                  groups. The mean errors and duration are slightly higher in
A McNemar test was used to test if the system significantly                group 2 and more audio plays can be seen in Rhyme and Sort.
disagrees between the predicated and the pre-selected partic-              Thus, in this minimized version, the system gives a small indi-
ipants for all exercises. This was accomplished by counting                cation to be working as intended. Though, the standard devi-
the number of true positives (TP), false positives (FP), true              ation indicates an overlapping spread between several varia-
negatives (TN), and false negatives (FN) in all exercises for              bles. According to the classification results in table 6 and 7,
each kernel. Accuracy and F1-scores indicate the efficiency of             the linear kernel was most effective in identifying the selected
the SVM. Accuracy is defined as the number of correct classi-              participants with a mean accuracy of 80 percent. The F1-
fications divided by the total number of classifications.                  scores signals that the Rbf and the polynomial kernel under-
(TP+TN/TP+TN+FP+FN) F1 score is a harmonic mean of pre-                    performed in identifying true positives. i.e., the participants in
cision (TP/TP+TN) and recall (TP/TP+FN) ranging from zero                  group 2. The mean accuracy for the Rbf was 74 percent, but
to one. A higher value indicates a higher classification perfor-           the kernel underperformed in classifying the correct targets
mance [27].                                                                as the F1 score was zero in three of the exercises. The polyno-
                                                                           mial kernel performed slightly better but also showed prob-
CLASSIFICATION RESULTS                                                     lems with classifying the true positives in Syllable. The p-val-
                    N*         Linear     Rbf        Polynomial            ues for the polynomial and the Rbf also indicate a general dis-
 Labyrinth          6           81%       69%           69%                agreement between the selections and the predictions. I.e.,
                               (0.77)    (0.54)        (0.28)              the true positives and false negatives were not the same.
 Syllable            4          69%       62%           75%                However, the linear kernel failed to reject the null hypothesis,
                               (0.28)      (0)           (0)               I.e., that the disagreements are not significant. Still, 20 percent
 Rhyme               4          88%       75%           87%
                                                                           of the participants were misclassified, which could be argued
                               (0.66)      (0)         (0.66)
 Sort                3          81%       81%           87%                as insufficient.
                                (0.4)      (0)          (0.5)
Table 6. Accuracy and F1-score for the exercises and the                   DISCUSSION
kernels *The number of participants selected in group 2                    THE VARIABLES APPLICABILITY TO THE GAME DESIGN
who had a higher number of errors and duration.                            This study aimed to see if a support vector machine could
                                                                           identify dyslectic children who performed poorly within a se-
                                                                           rious game. Several variables were implemented to get a
comprehensive picture of how the participants interacted              machine learning to define clusters, which could support in
with the game and to find poor performance characteristics.           distinguish groups with different behaviors. A second ap-
The variables appeared to be diverse and dependent on the             proach to solve the misclassifications (as mentioned in the in-
exercise, making it complicated to define general rules for dif-      terview) would be to decide thresholds for the variables. This
ficulty characteristics. E.g., Syllable requires the user to listen   would not result in any misclassifications of true positives
to a sentence and thereby TTFR cannot be directly compared            and allows neurophysiologists total control of the system.
to the other exercises. Syllable and Rhyme (which required a          This approach could also be argued as less time-consuming as
minimum of one answer per level) did not result in longest            no selections are required. However, a higher level of com-
pause or response frequency if the user's first answer was cor-       plexity will be added if several variables are being used.
rect. The Labyrinth required several solutions, which resulted
in more responses per level which benefit the use of fast re-         FUTURE WORK
sponses, response frequency and longest pause. Thus, in terms         As this was a simplified case, the game should be tested on
of maximizing these variables' applicability, a game design re-       children within the targeting group. Thus, to compare the
quiring more than one answer is recommended.                          measurements by testing for significant differences and vali-
                                                                      date that the exercises are challenging. Conducting the tests
METHOD CRITIQUE                                                       in a controlled environment would benefit the reliability and
The linear kernel resulted in an accuracy of 80 percent, which        the applicability of these variables to estimate poor perfor-
could be argued as insufficient and vague since a considerable        mance. Furthermore, the variables that correspond to each
portion is misclassified. The selection was based on the defi-        classification group should be statistically compared. As the
nition that outlying values in errors and time mainly charac-         parameters can be assumed to be non-parametric, a Mann-
terize poor performance. This assumption was based on stud-           Whitney U test would be a reasonable choice [25]. From the
ies reporting that these two variables appear to differentiate        discussion followed that this approach of identifying strug-
people with dyslexia from non-dyslectics. Though, the aim             gling is not covering the spectra of different kinds of problem-
was to broaden this definition in the context of Kunna's exer-        solving approaches. However, this paper only used Support
cises using correlation analysis. Of course, such an approach         Vector Machines, and other machine learning techniques like
is limited by not comparing and analyzing the target group.           Random Forest, K-neighbors, or Neural Networks could be
However, the correlation analysis may not be sufficient to find       tested to receive a greater accuracy. Also, these multiclassifi-
poor performance characteristics as the cause and effect are          cation algorithms may also be used to divide the spectra of
not certain. Thus, two variables can be correlated but still be       different problem-solving behaviors.
unrelated. A comparison with the target group and a con-
trolled observing test could complement the correlation anal-         CONCLUSION
ysis to understand better how the users interact with these           This paper aimed to see if a support vector machine could
games.                                                                identify dyslectic children who performed poorly within a se-
                                                                      rious game. Eleven variables were used to measure the per-
INTERPRETING POOR PERFORMANCE                                         formance of 16 Spanish-speaking children. Four measures not
The definition of poor and well performance is floating, and a        found in similar studies were: response frequency, longest
line must be drawn somewhere, and the purpose of Support              pause, fast responses, and audio plays. However, the game de-
Vectors Machines is to find this optimal line. The benefit and        sign limited the applicability of these variables. Therefore, a
perhaps a disadvantage of this approach is that no strict             game design requiring more than one answer that allows sev-
thresholds need to be decided for each variable. However, this        eral independent variables is suggested. However, there was
binary classification causes a diverse mix of problem-solving         not enough evidence to prove poor performance characteris-
behaviors separated into two groups. Of which one group               tics between these variables. Though, the analysis was done
contains participants with outlying values. This causes a bi-         on children not diagnosed with dyslexia, and the game should
ased selection and could result in a non-desirable outcome,           therefore be tested and statistically compared with the target
which emphasizes well-founded selections. But as mentioned            group. Even though the game could never be thoroughly eval-
in the interview, every child is different. E.g., one user may        uated, the highest accuracy received was 80 percent using a
have many errors but a short completion time, another may             linear kernel. However, from the discussion followed that the
have a long duration but a few errors, and a third may have           Support Vector Machine may not be the most efficient choice
many errors and a long duration etc. Thus, the binary SVM re-         for identifying poor performance as the binary SVM merges
quires a selection process that merges the spectra of various         different problem-solving behaviors. Therefore, this paper
problem-solving behaviors. This is going against Mark Hall's          suggests that future work should consider multiclassification
[14] definition of good features as various results reduces the       algorithms.
correlations to the class, which could reduce the classification
performance. Another approach would be unsupervised
ACKNOWLEDGMENTS                                                                    [13] William S. Noble. 2006. What is a support vector machine?. Nature biotech-
                                                                                   nology 24.12 (2006): 1565-1567.
I want to thank Pontus Österberg for his support and the em-                       [14] Mark Andrew Hall. 1999. Correlation-based feature selection for machine
ployees at Prototyp for helping with technical issues. I would                     learning.
also like to thank the people at Neurotalentum and my super-                       [15] Michael Schwarz, Clémentine Maurice, Daniel Gruss, Stefan Mangard.
                                                                                   2017. Fantastic timers and where to find them: High-resolution microarchitec-
visor Patric Dalhqvist.                                                            tural attacks in javascript. International Conference on Financial Cryptography
                                                                                   and Data Security. Springer, Cham.
REFERENCES                                                                         [16] Anthony Jason and David Francis. 2005. Development of phonological
[1] Eva Germanò, Antonella Gagliano and Paolo Curatolo. 2010 Comorbidity of        awareness. Current directions in psychological Science 14.5 (2005): 255-259.
ADHD and dyslexia. Developmental neuropsychology 35, no. 5 (2010): 475-            DOI:https://doi.org/10.1111/j.0963-7214.2005.00376.x
493.                                                                               [17] Claudia Maehler and Kirsten Schuchardt. 2016. Working memory in chil-
[2] American Psychiatric Association. 2013. Diagnostic and statistical manual      dren with specific learning disorders and/or attention deficits. Learning and
of mental disorders (DSM-5®). American Psychiatric Pub.                            Individual Differences 49 (2016): 341-347.
[3] Ombretta Gaggi, Claudio Enrico Palazzi, Matteo Ciman, Giorgia Galiazzo,        [18] Andrea Facoetti, Pierluigi Paganoni, Massimo Turatto. Valentina Marzola
Sandro Franceschini, Milena Ruffino, Simone Gori, and Andrea Facoetti. 2017.       and Gian Gastone Mascetti. 2000. Visual-spatial attention in developmental dys-
Serious Games for Early Identification of Developmental Dyslexia. Comput. En-      lexia." Cortex 36.1 (2000): 109-123. DOI:https://doi.org/10.1016/S0010-
tertain.    15,     2,    Article    4    (Summer       2017),     24     pages.   9452(08)70840-2
DOI:https://doi.org/10.1145/2629558                                                [19] Lynette Bradley and Peter E. Bryant. 1983. Categorizing sounds and learn-
[4] Luz Rello, Miguel Ballesteros, Abdullah Ali, Miquel Serra, Daniela Alarcón     ing to read—a causal connection. Nature 301.5899 (1983): 419-421.
Sánchez, Jeffrey P. Bigham. 2016. Dytective: diagnosing risk of dyslexia with a    [20] Alexandra Poole, Farhana Zulkernine, and Catherine Aylward. 2017. Lexa:
game. In PervasiveHealth (pp. 89-96).                                              A tool for detecting dyslexia through auditory processing. IEEE Symposium Se-
[5] Maria Rauschenberger, Ricardo Baeza-Yates, and Luz Rello. 2020. Screening      ries on Computational Intelligence (SSCI). IEEE,
risk of dyslexia through a web-game using language-independent content and         [21] Christopher Sterling, Marion Farmer, Barbara Riddick, Steven Morgan,
machine learning. In Proceedings of the 17th International Web for All Confer-     Catherine Matthews.1998. Adult dyslexic writing. Dyslexia, 4(1), 1-15.
ence (W4A '20). Association for Computing Machinery, New York, NY, USA, Ar-        [22]. Heikki Lyytinen, Miia Ronimus, Anne Alanko, Anna-Maija Poikkeus & Ma-
ticle 13, 1–12. DOI:https://doi.org/10.1145/3371300.3383342                        ria Taanila. 2007. Early identification of dyslexia and the use of computer game-
[6] Voravika Wattanasoontorn, Imma Boada, Rubén García, Mateu Sbert. 2013.         based practice to support reading acquisition, Nordic Psychology, 59:2, 109-
Serious games for health. Entertainment Computing 4.4 231-247.                     126, DOI: 10.1027/1901-2276.59.2.109
[7] Maria Rauschenberger, Luz Rello, Ricardo Baeza-Yates, and Jeffrey P.           [23] Swan, Denise, and Usha Goswami. 1997. Phonological awareness deficits
Bigham. 2018. Towards Language Independent Detection of Dyslexia with a            in developmental dyslexia and the phonological representations hypothe-
Web-based Game. In Proceedings of the Internet of Accessible Things (W4A           sis." Journal of experimental child psychology 66.1 (1997): 18-41.
'18). Association for Computing Machinery, New York, NY, USA, Article 17, 1–       [24] Richard Boada, Erik G. Willcutt, and Bruce F. Pennington. 2012. Under-
10. DOI:https://doi.org/10.1145/3192714.3192816                                    standing the comorbidity between dyslexia and attention-deficit/hyperactivity
[8] Serrano, Francisca and Sylvia Defior. 2008. Dyslexia speed problems in a       disorder. Topics in Language Disorders 32.3 (2012): 264-284.
transparent orthography. Annals of dyslexia 58.1 (2008): 81.                       [25] Andy Field and Graham Hole. 2003. How to design and report experiments.
[9] Kast, Monika, et al. Computer-based learning of spelling skills in children    SAGE Publications, London, 2003.
with and without dyslexia. Annals of dyslexia 61.2 (2011): 177-200.                [26] Johannes C. Ziegler, Caroline Castel, Catherine Pech-Georgel, Florence
[10] Luz Rello, Enrique Romero, Maria Rauschenberger, Abdullah Ali, Kristin        George, F-Xavier Alario and Conrad Perry. 2008. Developmental dyslexia and
Williams, Jeffrey P. Bigham, and Nancy Cushen White. 2018. Screening Dyslexia      the dual route model of reading: Simulating individual differences and sub-
for English Using HCI Measures and Machine Learning. In Proceedings of the         types. Cognition 107.1 (2008): 151-178.
2018 International Conference on Digital Health (DH '18). Association for Com-     [27] Zachary C Lipton, Charles Elkan, and Balakrishnan Naryanaswamy. 2014.
puting        Machinery,       New       York,       NY,       USA,       80–84.   Optimal thresholding of classifiers to maximize F1 measure. Joint European
DOI:https://doi.org/10.1145/3194658.3194675                                        Conference on Machine Learning and Knowledge Discovery in Databases.
[11] Iles Jo, Vincent Walsh, and Alex Richardson. 2000. Visual search perfor-      Springer, Berlin, Heidelberg.
mance in dyslexia. Dyslexia 6.3 (2000): 163-177.                                   [28] Scikit, 2021, Sklearn.svm.SVC. Retrieved from https://scikit-
                                                                                   learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC
[12] Aleksandar Tenev, Silvana Markovska-Simoska, Ljupco Kocarev, Jordan
                                                                                   [29] Sails.js, 2021, Retrieved from: https://sailsjs.com/
Pop-Jordanov, Andreas MüllerGian and Candrian Tenev. 2014. Machine learn-
                                                                                   [30] Vue.js, 2021, Retrieved from: https://vuejs.org/
ing approach for classification of ADHD adults. International Journal of Psycho-
physiology 93.1. 162-166.
TRITA-EECS-EX-2021:116

www.kth.se
You can also read