Toward an Automated Feedback System in Educational Cybersecurity Games - Masaryk University

Page created by Barbara Logan
 
CONTINUE READING
Toward an Automated Feedback System in Educational Cybersecurity Games - Masaryk University
Masaryk University
 Faculty of Informatics

Toward an Automated Feedback
 System in Educational
 Cybersecurity Games

 Ph.D. Thesis Proposal

 Mgr. Valdemar Švábenský

 Advisor: doc. Ing. Pavel Čeleda, Ph.D.

Brno, January 2019
 Signature of Thesis Advisor
Toward an Automated Feedback System in Educational Cybersecurity Games - Masaryk University
Declaration
Hereby I declare that this paper is my original authorial work, which
I have worked out on my own. All sources, references, and literature
used or excerpted during elaboration of this work are properly cited
and listed in complete reference to the due source.

 Mgr. Valdemar Švábenský

Advisor: doc. Ing. Pavel Čeleda, Ph.D.

 i
Acknowledgements
The majority of research and published papers was supported by the
Security Research Programme of the Czech Republic 2015–2020 (BV III
/ 1 – VS) granted by the Ministry of the Interior of the Czech Republic
under No. VI20162019014 – Simulation, detection, and mitigation of cy-
ber threats endangering critical infrastructure. Computational resources
were provided by the European Regional Development Fund Project
CERIT Scientific Cloud (No. CZ.02.1.01/0.0/0.0/16_013/0001802).

I’d like to sincerely thank the whole CSIRT-MU group. The team leader
and my advisor, Pavel Čeleda, has always found time to help me with
any question about my Ph.D. studies. His dedication and hard work has
motivated me to do my best. My consultant, Jan Vykopal, has drawn
me in the area of cybersecurity education and has been a great “partner
in crime” ever since. All the other members of the team are really good
colleagues as well, and I appreciate working with them.
 Another big thank you goes to Petr Jirásek and AFCEA, who made
it possible for me to attend the European Cyber Security Challenge 2018
in the role of the coach of the Czech team. It was a great responsibility
and learning experience to face all the practical challenges of a large
international Capture the Flag competition.
 Next, I learned an awful lot from Martin Ukrop and Ondráš Přibyla
as the leaders of the Teaching Lab initiative. Vlasta Šťavová was a
big help for her advice on Ph.D. and, along with Tomáš Effenberger,
provided useful comments on some of the preliminary stages of this thesis
proposal. I also thank Peter Hladký for our discussions on (working)
life and providing another perspective on my ideas.
 Finally, I thank mum and dad, my girlfriend Pavlínka, the FASKOM+
crew, and all my family and friends who have been supporting me and
helping in various ways throughout my journey.
 I dedicate this work to the loving memory of my brother Gabriel.

ii
Abstract
Educational games feature practice tasks with multiple approaches to
the solution. An example of such a game is Capture the Flag, a popular
type of training activity for exercising cybersecurity skills. During the
game, regular feedback is crucial to support learning of novice players.
However, providing feedback manually requires an expert instructor and
is time-consuming. Therefore, our goal is to automate the process by
creating a virtual learning assistant. To achieve this goal, we need to
address three research topics. The first is defining a formal model for
tasks in educational games within the cybersecurity domain. The second
is leveraging the data of players to improve the accuracy of the model.
The third is employing the model to provide personalized feedback.
 To address the first topic, we propose a transition graph describing
how a single player can progress through the game tasks and what actions
(s)he can perform. Next, we will develop data analysis methods for
deriving information about the progress of each player. It is possible to
collect data about players’ in-game actions, such as typed commands or
solution attempts, along with their timing. We will leverage this data to
improve the accuracy of the model for a particular game. Finally, we will
employ the model to provide automated, personalized formative feedback.
The feedback will include hints and debriefing of the player’s strategy to
support learning. To evaluate our research, We will test the hypothesis
that players who receive the feedback will perform significantly better
than others on the same game.
 We aim to contribute both theoretical and applied results. First,
we will develop a general approach to model tasks in cybersecurity
games. Second, we will provide associated methods for data analysis
and explore patterns in solving the game tasks. Third, we will create
software that will assist players by providing feedback. Although there
are automated feedback systems for learning programming, the novelty
of our approach lies in the context of educational games. The data
logged from them are more heterogeneous than data from programming
assignments, and cybersecurity tasks require using domain-specific tools.
Our research will increase the educational impact of the games and
reduce their dependency on human experts. As a result, more students
will be able to learn cybersecurity skills at an individual pace.

 iii
Keywords
cybersecurity games, capture the flag, active learning, adult educa-
tion, educational data mining, learning analytics, formative assessment,
intelligent tutoring systems

iv
Contents
1 Introduction 1
 1.1 Overview of the Research Problem . . . . . . . . . . . . 2
 1.2 Expected Contributions and Impact . . . . . . . . . . . . 4
 1.3 Structure of the Thesis Proposal . . . . . . . . . . . . . . 4

2 State of the Art 5
 2.1 Capture the Flag Games . . . . . . . . . . . . . . . . . . 5
 2.1.1 Games in Education . . . . . . . . . . . . . . . . 5
 2.1.2 Origins and Definition of CTF . . . . . . . . . . . 6
 2.1.3 Attack-defense CTF . . . . . . . . . . . . . . . . 7
 2.1.4 Jeopardy CTF . . . . . . . . . . . . . . . . . . . 7
 2.1.5 Technical Infrastructure for CTF . . . . . . . . . 8
 2.1.6 Advantages and Disadvantages of Competitive CTF 9
 2.1.7 Towards an Educational Use of CTF . . . . . . . 10
 2.1.8 Using CTF for Cybersecurity Research . . . . . . 10
 2.2 Computer Science Education Research . . . . . . . . . . 11
 2.2.1 General Approaches . . . . . . . . . . . . . . . . 11
 2.2.2 Formative Feedback . . . . . . . . . . . . . . . . . 12
 2.2.3 Intelligent Tutoring Systems . . . . . . . . . . . . 13
 2.2.4 Challenges to Providing Feedback in Cybersecurity 13
 2.2.5 Employing Command Line History for Feedback . 14
 2.2.6 Ideal Feedback . . . . . . . . . . . . . . . . . . . 14
 2.2.7 Related Research in Cybersecurity Education . . 15
 2.2.8 Education Research in Other Domains . . . . . . 15

3 Research Aims and Methods 17
 3.1 Research Questions and Expected Results . . . . . . . . 17
 3.2 Research Environment . . . . . . . . . . . . . . . . . . . 18
 3.2.1 Cybersecurity Game Format . . . . . . . . . . . . 18
 3.2.2 Target Audience of the Games . . . . . . . . . . . 18
 3.2.3 Technical Infrastructure . . . . . . . . . . . . . . 18
 3.2.4 Data Collection . . . . . . . . . . . . . . . . . . . 19
 3.2.5 Properties of the Data . . . . . . . . . . . . . . . 19
 3.2.6 User Privacy and Ethical Concerns . . . . . . . . 20
 3.3 Research Methods . . . . . . . . . . . . . . . . . . . . . . 20

 v
3.3.1 Modeling Game Levels (RQ1) . . . . . . . . . . . 20
 3.3.2 Exploring Interaction Patterns of Players (RQ2) . 23
 3.3.3 Providing Hints and Feedback (RQ3) . . . . . . . 24
 3.3.4 Evaluation of the Research . . . . . . . . . . . . . 26
 3.3.5 Limitations of the Approaches . . . . . . . . . . . 26
 3.4 Schedule of the Research . . . . . . . . . . . . . . . . . . 27
 3.5 Publication Venues . . . . . . . . . . . . . . . . . . . . . 27
 3.5.1 Conferences . . . . . . . . . . . . . . . . . . . . . 27
 3.5.2 Journals . . . . . . . . . . . . . . . . . . . . . . . 28

4 Achieved Results 31
 4.1 Predicting Performance of Players . . . . . . . . . . . . . 31
 4.2 Cybersecurity Course Report and Evaluation . . . . . . . 32
 4.3 Analysis of Game Events . . . . . . . . . . . . . . . . . . 33
 4.4 Academic Achievements . . . . . . . . . . . . . . . . . . 33
 4.4.1 Presentations at International Conferences . . . . 33
 4.4.2 Participation in Research Projects . . . . . . . . . 34
 4.4.3 Teaching and Student Supervision . . . . . . . . . 34

5 Author’s Publications 35
 5.1 Accepted and Released Publications . . . . . . . . . . . . 35
 5.1.1 Challenges Arising from Prerequisite Testing in
 Cybersecurity Games . . . . . . . . . . . . . . . . 35
 5.1.2 Enhancing Cybersecurity Skills by Creating Seri-
 ous Games . . . . . . . . . . . . . . . . . . . . . . 35
 5.1.3 Gathering Insights from Teenagers’ Hacking Ex-
 perience with Authentic Cybersecurity Tools . . . 36
 5.2 Accepted Publications to Appear . . . . . . . . . . . . . 36
 5.2.1 Reflective Diary for Professional Development of
 Novice Teachers . . . . . . . . . . . . . . . . . . . 36
 5.2.2 Towards Learning Analytics in Cybersecurity Cap-
 ture the Flag Games . . . . . . . . . . . . . . . . 37
 5.2.3 Analyzing User Interactions with Cybersecurity
 Games . . . . . . . . . . . . . . . . . . . . . . . . 37

vi
1 Introduction
More than 16,500 new security vulnerabilities were discovered in 2018 [1].
To give an example, one of the most prominent exploits targeted Face-
book in September 2018 [2]. Through a combination of software bugs,
attackers were able to obtain an access token for an arbitrary user ac-
count. This breach, which was arguably the largest in Facebook’s history,
exposed personal information of 50 million users. In the light of similar
cyber attacks threatening enterprises all over the globe, it is startling
that 41% of companies leave sensitive data completely unprotected [3].
What is more, by 2021, the annual damages from cybercrime will cost
the world a staggering $6 trillion [4], which is a huge increase compared
to the $1 trillion costs in 2012 [5].
 Cybersecurity is defined as “a computing-based discipline involv-
ing technology, people, information, and processes to enable assured
operations in the context of adversaries” [6]. With the globally rising
importance of combating cyber threats, the cybersecurity workforce
shortage is growing as well. It is estimated that by 2022, 1.8 million jobs
that require cybersecurity expertise will be unfilled [7]. Other sources
estimate an even higher demand for 3.5 million experts by 2021 [4]. At
the time of writing this thesis proposal, more than 310,000 of the open
positions are in the USA [8]. Educational institutions, governmental
organizations, and private companies are all aware that in a situation
like this, training more cybersecurity professionals is crucial. As a result,
they are continually developing curricula, courses, and training materials
to fight the skill gap.
 An increasing trend in cybersecurity education is to complement
theoretical knowledge and concepts with their practical applications.
This is done by employing active learning methods [9] such as cyber-
security games. These are software applications that allow learners to
exercise their cybersecurity knowledge and skills by completing training
tasks in a game-like context. The games simulate a broad spectrum of
practical, real-world situations in a controlled environment. The players
can attack and defend computer systems, analyze network traffic, or
disassemble binaries without any negative consequences.
 Employing cybersecurity games in educational settings or com-
petitions carries numerous benefits. The games can engage learners,

 1
1. Introduction

spark interest in cybersecurity, and motivate to explore the field fur-
ther [10, 11, 12]. Next, the games allow learners to apply security
tools [12] and practice theoretical concepts [13], thereby increasing com-
petence, creativity, and problem-solving skills [14]. Many games feature
assignments that resemble authentic cybersecurity work tasks, which
might otherwise be problematic to simulate in a classroom [15]. This
especially applies to exercising adversarial thinking. This term refers
to adopting a “hacker perspective” [14] on how to force a computer
system to fail. Such a skill is crucial for cybersecurity professionals,
since it enables them to understand cyber attacks and set up effective
defenses [14, 16]. Finally, apart from their value in promoting learn-
ing [10, 17], ranking well in competitive games often leads to public
recognition, (monetary) prizes, or job opportunities [18].
 Because of their benefits, cybersecurity games have become one of
the most prevalent and attractive methods of hands-on learning and
competing. Games of various difficulty levels and focus grow in numbers
and spread widely [10, 12, 15, 19, 20], from informal online hacking
communities to universities, security conferences, and professional train-
ing events. What is more, the number of participants in cybersecurity
games is rising exponentially [11].

1.1 Overview of the Research Problem
The most popular format of a cybersecurity game is Capture the Flag
(CTF). In a CTF game, the player completes practical security-related
tasks while exercising technical skills and adversarial thinking. Finishing
each task yields a unique textual flag that the player submits to confirm
the solution. If the solution is correct, the player is immediately awarded
points and continues with the next task.
 Although this game format has a vast educational potential for
learners at all skill levels, it is currently employed mostly in competitions
that target advanced players. CTF games usually require a substantial
knowledge of the cybersecurity domain, as well as practical expertise.
Therefore, these games are effective only for already skilled players [11]
and offer little educational value to less experienced learners [18, 20].
Even worse, an unsuccessful attempt in such a game can frustrate
beginners and diminish their motivation to learn [18].

2
1. Introduction

 Cybersecurity Education

 Data analysis

Figure 1.1: The research problem is in the intersection of three areas.

 To reduce the participation barrier for novice learners and increase
the educational impact of cybersecurity games, players need to receive
in-depth feedback on their approach. This feedback must be personalized
for each player, explaining whether their approach is correct and why,
what they do well, and what should they improve and how. Providing
such guidance has “more effect on student achievement than any other
single factor” [21, p. 480]. Without it, beginners miss learning goals and
take a longer time to learn [22]. However, to the best of our knowledge, no
CTF to date provides detailed feedback to players, and research of such
methods in the context of cybersecurity games is almost non-existent.
We see this as an open research problem.
 CTF games allow collecting data about the actions of players and the
corresponding metadata, such as the timing of these actions. Although
researchers analyzed such data to study computer security [12, 17, 23],
very few focused on facilitating learning. To address this issue, we want
to develop and evaluate methods for providing players with automated
personalized feedback about their progress. For this, we need to define
a model of the game levels, understand how players interact with the
game and security tools, and create the feedback system. As Figure 1.1
shows, the research problem combines hands-on cybersecurity education
with data analysis techniques.

 3
1. Introduction

1.2 Expected Contributions and Impact
Enhancing CTF games with real-time automated personalized feedback
will improve the effectiveness of cybersecurity training on multiple levels.
The feedback system can complement or even partially replace human
teachers. Since automated interventions scale better than manual ones,
more people, especially novice and intermediate learners, will be able to
develop cybersecurity skills. Each learner will proceed at an individual
pace and receive feedback with higher accuracy than from teachers,
who act based on limited data. Moreover, the feedback system will
address the needs of players who may require help, for example, by
providing hints, explanations, or relevant study materials. This would
allow learners to recognize mistakes, learn from them, and then improve
their approach. Since cybersecurity tools are becoming increasingly
complex, relevant feedback will ultimately help learners accomplish
practical work tasks.
 Apart from helping students at all learning levels, our results will
have a broad impact also on cybersecurity instructors, game designers,
and educational researchers. Instructors will gain deeper insight into
the difficulties of the learners, enabling them to facilitate learners more
effectively. Game designers will gather evidence for improving the games
and enhance the experience of future players. Finally, researchers will
explore trends in the game data across groups of players or different
games. What is more, since many other cybersecurity games involve
similar types of player interactions, our methods could be generalized
and applied also in other domains.

1.3 Structure of the Thesis Proposal
This thesis proposal is divided into five chapters. Chapter 2 describes
the current state of the art in cybersecurity education, analysis of
educational data, and related areas. Chapter 3 defines the research
problem, research questions, and methods. It also presents the plan of
the work and lists relevant publication venues. Chapter 4 summarizes the
results already achieved. Finally, Chapter 5 lists my published papers,
three of which are included in the appendix.

4
2 State of the Art
This chapter provides the necessary background and a survey of related
research findings. Section 2.1 covers in depth the topic of CTF as the core
of this thesis proposal. Section 2.2 deals with approaches to educational
data analysis, the main research area for this work.
 It is important to note that cybersecurity education is a relatively
young field. ACM/IEEE Computer Science Curricula [24] included
Information Assurance and Security as a knowledge area only in 2013.
Research in cybersecurity education is fragmented, and there is no
single comprehensive resource, such as a monograph or journal series.
Therefore, when writing this chapter, I read papers published at the
related conferences and journals (see Section 3.5). I focus especially on
the most recent publications from 2013 to 2018.

2.1 Capture the Flag Games
Since this work is interdisciplinary and overlaps with educational re-
search, this section starts with a broader context of using games in
education. It narrows down to cybersecurity as it continues with a brief
history and definition of CTF, its typology along with examples, and a
discussion of the required technical infrastructure. Finally, the section
examines the use of CTFs for competitions, education, and research.

2.1.1 Games in Education
When it comes to gaming approaches, education can be enhanced by
gamification or serious games. The former is defined as “the use of
game design elements in non-game contexts” [25]. The latter refers to
full-fledged games designed for a primary purpose other than entertain-
ment [26] (in this context, to teach knowledge or skills). Using gamifica-
tion or serious games in education is a form of active learning [27], and
the latter is sometimes called (digital) game-based learning [28].
 Cybersecurity educators applied gamification and serious games
with great success. Enhancing an undergraduate cybersecurity course
with game elements such as storyline, real-time scoring, and badges
deepened student interest and motivation [29]. In another course, gam-

 5
2. State of the Art

ification reinforced student experience and engagement [30]. Multiple
case studies of serious cybersecurity games report their positive effects
on learning. Apart from CTFs described later, these games include
Netsim, a web-based game to teach network routing [31]; CyberCIEGE,
a game with 3D graphics to simulate cybersecurity decision-making at
a managerial level [32]; and Werewolves, a text-based game to demon-
strate information flow and covert channels [33]. Even board and card
games were developed to teach cybersecurity principles. These games
include Elevation of Privilege, a game to teach threat modeling [34];
Control-Alt-Hack, a game to promote the cybersecurity field [35]; and
[d0x3d!], a game to teach cybersecurity principles [36].
 The benefits of gaming approaches, which were mentioned above
and in Chapter 1, are also supported in pedagogical theory and research.
Studies confirm that students who play serious games exhibit higher
flow and attainment compared to lectures [37, 38]. Generally, games
promote learning and motivation, but their positive effects also depend
on the context and the target audience [39]. For the advantages of
serious games to manifest, elements of story, interactivity, and adequate
delivery of educational content are vital [40].

2.1.2 Origins and Definition of CTF
The term Capture the Flag originally refers to a traditional outdoor
game for two teams. The goal of each team is to steal a physical flag from
the other team’s base while defending own flag at the same time. This
game format later inspired the organizers of a hacker conference DEF
CON [41] when creating a virtual playground for exercising cybersecurity
skills. In 1996, DEF CON started a tradition of cybersecurity CTFs
that is still evolving today.
 Since then, the label CTF has been used to denote a broad spectrum
of events [23] with a different scope, structure, and variations of rules.
This sometimes led to ambiguous interpretations of the term. Therefore,
based on a thorough literature review below, we propose the following
definition to unify the terminology. CTF is a remote or on-site training
activity in which participants exercise their cybersecurity skills by solving
various technical tasks in a limited time. Completing each task results
in finding (“capturing”) a text string called flag. The flag serves as a
proof of solution that is worth points. Therefore, the flags are usually

6
2. State of the Art

long and random to prevent cheating. The team with the most points
at the end of the game wins. CTF games can run in one of three modes,
Attack-defense, Attack-only, or Jeopardy, which are detailed below.

2.1.3 Attack-defense CTF
In an Attack-defense CTF [17, 23], the organizers prepare a (virtual)
network of hosts that run intentionally vulnerable services. Each partic-
ipating team controls an instance of this network with identical services.
The goal is to attack the networks of other teams and defend own
network at the same time. Attacking involves exploiting the vulnerabil-
ities in the services of other teams, which results in gaining access to
secret flags. Defending involves patching the same services on own hosts
without breaking their functionality. A scoring bot regularly grades the
teams based on a combination of submitting correct flags, applying de-
fensive countermeasures, and maintaining the availability of the services.
Examples of scoring systems are detailed in [20, 23, 42].
 Attack-defense was the first type of CTF [43]. Since its inception
in 1996, DEF CON CTF [41] has been hosted annually as an on-site
event. Next, iCTF [17] is the largest Attack-defense CTF focused on
cybersecurity education, which has been running online every year since
2003. Apart from the US-based events, the Russian RuCTFE is one of
the biggest online Attack-defense CTFs held annually [16].
 An Attack-only CTF is a subcategory of Attack-defense CTF. The
defensive elements are removed from the game, and the teams focus
only on exploiting services in the given network infrastructure. The
term Defense-only CTF appeared in the literature [23, 44], but is rare
in practice. Instead, Cyber Defense Exercises serve for defense training.
Still, offensive and defensive skills are closely related, often blurring the
line between attacking and defending [45, 46].

2.1.4 Jeopardy CTF
A Jeopardy CTF imitates the format of the popular television game show
“Jeopardy!” [15]. It features an online game board with many different
standalone assignments1 called challenges [13, 47]. The challenges are

1. Since the assignments are usually of an offensive nature, some authors regard
Jeopardy CTFs as a subcategory of Attack-only CTFs [23]. However, the following

 7
2. State of the Art

divided into categories; the five most common are cryptography, web
security, reverse engineering, forensics, and pwn (a hacker jargon for
gaining ownership of a service) [48]. Each challenge has different difficulty
and a corresponding score value. At any time, each team can choose to
attempt any challenge2 , which typically includes downloadable files [49]
and is solved locally. A successful completion yields a flag that is
submitted to a scoring server to confirm the solution.
 Jeopardy CTFs are a part of informal competitions, academic events,
and professional courses. One of the first Jeopardy CTFs arose again
within the DEF CON community [41]. Every year since 2002 [50],
DEF CON CTF Quals has determined advancement to the Attack-
defense finale. This event inspired a multitude of other informal CTFs,
such as Plaid CTF [51] running since 2011. Even Google started its
annual CTF in 2016 [52]. Inter-ACE and C2C target university students,
and extensive experience report from the organizers is available [53].
Next, CSAW CTF [54] is a well-established entry-level CTF hosted
by academics. Since 2007, it has offered challenges for undergraduates
who are beginners in cybersecurity and CTF [55]. Another introductory
CTF is PicoCTF [56] running since 2013 for middle- and high-school
students. Last but not least, private companies, such as SANS, create
Jeopardy CTFs for certified security training [57]. The vast majority of
Jeopardy CTFs, including all those previously named, are held online.

2.1.5 Technical Infrastructure for CTF
Regarding platforms for Attack-defense CTFs, iCTF framework [17] is
an open-source tool to build virtual machines (VMs) for the games. It
was later developed into a website that offers on-demand creation of
CTFs [16]. The service runs in a cloud and features a library of vulnera-
bilities that can be included in the VMs. Alternative approaches include
Git-based CTF [58], an open-source Attack-defense CTF platform, or us-
ing Docker application containers to create the game infrastructure [59].

distinction is more practical: the tasks in Attack-defense and Attack-only CTFs are
carried out in underlying network infrastructure, whereas in Jeopardy CTFs, the
tasks are predefined in a web interface or a virtual machine.
2. All challenges are usually released at the start of the game. However, unlocking
new ones at a predefined time or based on solved challenges is not uncommon.

8
2. State of the Art

 There are many open-source platforms for Jeopardy CTFs. A well-
established one is CTFd [47], which allows creating and administering
Jeopardy challenges via a web browser. The framework is documented
and customizable with plugins. PicoCTF [56] developed an own platform
similar to CTFd. It was later enhanced with the generation of unique
flags or problem instances [60]. The former serves to prevent and detect
flag sharing, while the latter allows creating practice problems. Finally,
it is possible to share offline VMs with Jeopardy challenges [13].

2.1.6 Advantages and Disadvantages of Competitive CTF
The original purpose of CTF was competitive [47], and most CTFs
remain “highly focused on competition” [12]. Similarly to programming
contests [61], their goal is to showcase and evaluate the performance
of already skilled participants [62]. Competitive CTFs cover many cy-
bersecurity topics [10] and offer recruitment opportunities, reputation
building [10], and enjoyment of competing [13] to the participants. Next,
a competitive setting can motivate and engage students [29, 13], espe-
cially those who are attracted to cybersecurity, have extensive prior
experience, or possess skills required by the competition [11]. By solving
the competition tasks, participants deepen their understanding of cyber-
security [22] and practice creative approaches to both known problems
and those outside the traditional curriculum [56]. Moreover, competi-
tions offer considerable learning benefits also before and after the event.
Preparing for a CTF involves developing new tools, studying vulnerabil-
ities, and discussing strategies [17], which exposes participants to new
skills [15]. After a CTF, the competitors or organizers publish write-
ups: walkthroughs that report solutions and explain the vulnerabilities
involved in the game. Both writing and reading these is beneficial [16].
 Some authors argue that the effectiveness of cybersecurity com-
petitions is not researched thoroughly [11], and that the evidence of
engagement and motivation for learning is often anecdotal [10]. Although
CTFs have vast educational potential, their competitive setting might
discourage or even alienate other students [12, 63], especially begin-
ners [11], for three main reasons. First, the tasks are usually too difficult
for less-experienced participants [20]. Second, some of the tasks are also
intentionally ambiguous, require a lot of guessing, or include artificial
obstacles to make them harder to solve [55]. Third, the participants

 9
2. State of the Art

receive limited individual feedback about their progress. They are often
unsure if they are on the right track and usually receive only information
about whether the submitted flag was correct or wrong [55].
 While the properties mentioned above are often suitable for compe-
titions, they also create a large barrier to entry. Competitions do not
attract many new participants, possibly leaving many talents uniden-
tified and undeveloped [10]. The features of competitions can even
deter novices from pursuing cybersecurity knowledge [64]. Even if less
experienced players decide to participate, they may quickly become
discouraged [55] or frustrated because of performing poorly [11]. Finally,
although the unguided progress inherent for competitions suits advanced
learners and can lead to creative solutions [65], it is highly ineffective
for beginners [66]. Without guidance, novice students miss essential
learning goals and take longer to learn a concept [22].

2.1.7 Towards an Educational Use of CTF

Only one can win a competition, but everyone can meet a challenge [21].
Therefore, educators can leverage the format of CTF for a self-paced,
individual, hands-on practice of cybersecurity skills without the overly
competitive setting. This would preserve most of the advantages men-
tioned above without alienating beginners [12]. Some educators host
CTFs with simpler tasks that are more suitable for beginners [67, 64].
However, to further unfold the educational potential of CTFs, learners
must receive formative feedback (see Section 2.2.2) in the game.

2.1.8 Using CTF for Cybersecurity Research

Apart from their value in competitions and education, CTFs can generate
realistic datasets for research [68]. This data were employed to study
cybersecurity itself [12, 17, 23] (not cybersecurity education, which is
discussed in Section 2.2). Examples of such cybersecurity research include
measuring network traffic during attacks, exploring the attack mechanics,
or testing prototype tools. Moreover, the iCTF team leveraged this data
to measure the effectiveness of attacks [42] or analyze the risks and
rewards associated with the players’ strategy [69]. The datasets from
iCTF are public [70], as well as from DEF CON CTF [41].

10
2. State of the Art

2.2 Computer Science Education Research
This section discusses approaches to educational data mining (EDM) [71]
and learning analytics (LA) [72]. These are applied computer science
disciplines that leverage student data to better understand learning and
teaching, and ultimately, improve it [73]. EDM and LA significantly
overlap, and differences in their philosophies are minor [74]. Both are
interdisciplinary fields that combine educational theory and practice with
inferential statistics, data analysis, and machine learning. They allow
a shift from subjective teaching and learning to evidence-based, data-
driven approaches [75]. We examine motivation, foundations, and recent
findings related to the goals of this thesis. However, EDM/LA research
in the domain of cybersecurity is sparse. Therefore, this section also
mentions works from other domains: most notably programming, which
comprises the majority of EDM/LA studies in computer science [76].

2.2.1 General Approaches
EDM/LA studies usually build a model of student data for further
analysis. These models can be descriptive or predictive [77]. Descriptive
models aim at explaining structure, patterns, and relationships within
the data to address student modeling, support, and feedback. They
usually apply unsupervised learning algorithms (such as clustering),
inferential statistics, association rules, or instances-based learning. Pre-
dictive models aim at estimating “unknown or future values of dependent
variables based on the features of related independent variables” [77] to
address student behavior modeling and assessment. They usually apply
supervised learning algorithms (such as regression and classification),
decision trees, or Bayesian networks.
 Traditionally, EDM/LA studies often relied on collecting additional
information about learners, such as their demographics, previous experi-
ence, or academic performance, via questionnaires [78]. This paradigm is
apparent also in studies that evaluated some aspect of CTFs, such as par-
ticipant learning or engagement. The evaluation was almost exclusively
based on informal and often self-reported participant data [12] from
surveys and focus group interviews, as in [20, 62, 79, 80, 81, 82, 83]. An-
other traditional approach was comparing pre-test and post-test scores,
as in [64, 84]. While both these approaches have merit in educational

 11
2. State of the Art

research, they also have major shortcomings. Self-reported data can be
inaccurate3 , while test scores strongly depend on the test design.
 Nowadays, it is becoming increasingly common to examine student
data produced while solving assignments [78], such as program code,
and the corresponding metadata, such as the time spent on a task.
This type of research involves four cyclical steps: collect data from a
learning platform, analyze it, design a learning intervention, and deliver
it within the platform [73]. We will employ a similar approach in a
rigorous analysis of data available from CTFs.

2.2.2 Formative Feedback
Formative feedback (also called formative assessment) is “information
communicated to the learner. . . to modify his or her thinking or behavior
to improve learning” [86]. An example of formative feedback is inform-
ing students who struggle with a task about their misunderstandings
and recommending concrete steps for improvement. Unlike summative
assessment, which refers to evaluating a student’s performance with a
grade or points after finishing a task, formative feedback is useful while
the student is still learning [21, p. 480].
 Although most practical computer science courses involve students in
extensive problem-solving, completing as many assignments as possible
does not necessarily promote learning [87]. Pedagogical theory [88] and
cybersecurity educators [76] agree that formative feedback is another
crucial element. It helps students to deeply engage with the subject [13],
correct misconceptions, and improve understanding. Perhaps surpris-
ingly, students are more motivated to improve their work when they
receive formative feedback without the summative one [89]. Forma-
tive feedback is especially vital in serious games [90], since it deepens
understanding and separates educational games from play [18].
 Nevertheless, assessing student performance on cybersecurity as-
signments is a difficult task [76] that involves collecting and analyzing
evidence about student achievement [91]. Quality feedback requires
domain experts, few of which are available for this task [92]. Even then,
providing feedback manually is laborious, time-consuming [13], and
costly [89], thus becomes impossible even in moderately large classes.

3. CTF participants may report behavior not reflected in the game logs, and
conversely, report not behaving in a way that is however shown in the logs [85].

12
2. State of the Art

If instructors assess learners manually, the feedback is often sparse or
delayed. Therefore, there is a great need to automate the process.

2.2.3 Intelligent Tutoring Systems
An intelligent tutoring system (ITS) provides automated feedback to
learners while they solve computer-based assignments [73]. The system is
based on domain knowledge, since the feedback results from comparing
the learner’s problem-solving process with the expert’s solution. An ITS
aims to automate teaching by replacing instructor feedback [73], and
in STEM4 disciplines, an ITS can be as effective as human tutors [93]
and increase student achievement [94]. However, supplementing the
instructor is not always needed. Instead, having an ITS analyze edu-
cational data can enhance classroom instruction by providing teachers
with greater insight into students’ problem-solving processes [75].

2.2.4 Challenges to Providing Feedback in Cybersecurity
Cybersecurity games could incorporate an ITS, since they allow gather-
ing rich data that can be automatically analyzed. This data includes
information about the game network (such as the status of the services),
player interactions with the game systems (such as typed shell com-
mands), or generic game events (such as flag submissions). However,
achieving the desired level of feedback automation is extremely com-
plex, because educational game logs consist of vast numbers of possible
actions, observable variables, and their relationships to student perfor-
mance [95]. Another challenge is that the game tasks have multiple
paths to the correct solution [96, 76].
 Since providing detailed feedback is complex, students receive only
summative feedback in CTFs and most cybersecurity exercises. They
are informed whether they reached the correct answer or not, and are
possibly awarded points. Although this feedback is easy to automate,
it is insufficient for educational purposes [76]. It disregards important
information about the process of finding the answer, that is, how the stu-
dent approached a particular task. Without this information, instructors
or computer systems are unable to provide formative feedback. What

4. STEM is an acronym for Science, Technology, Engineering, and Mathematics.

 13
2. State of the Art

is more, negative binary feedback can demotivate beginners, mainly
because it does not explain what was wrong and how to fix it [97].

2.2.5 Employing Command Line History for Feedback
Gathering command-line history of learners solving cybersecurity tasks
is essential to provide automated formative feedback. Collection of
Bash history (including timestamps of commands, their arguments,
and exit status) is implemented, to the best of our knowledge, only in
the EDURange platform for cybersecurity exercises [98]. The platform
can automatically generate an oriented graph that visualizes the Bash
history. The vertices of the graph represent the executed commands.
The edges represent the sequence of commands, that is, an edge from
a command to means that was executed after . Instructors can
use the graphs in real time to check how the students progress, what
mistakes do they make, and whether they need extra guidance. A post-
exercise use case would be to compare the graphs to each other, examine
the pros and cons of different approaches, or compare them to a sample
solution. This helps students to understand what they did well, identify
misconceptions, and discover better approaches.
 Creating the graphs is explained in [76]. Starting from the raw
Bash history log, the instructors identify primary commands essential
to solving the particular exercise (e.g., nmap), along with secondary
commands that are informative about the student’s progress but are
not specific to the exercise (e.g., grep). Then, to reduce the complexity
of the lengthy log, they group commands with similar arguments into
a single vertex in the graph. Therefore, a subgraph can correspond to
a particular high-level task. Lastly, the authors compare the students’
quantitative results (to what extent they reached the solution) with a
qualitative analysis of patterns in the corresponding command graphs.

2.2.6 Ideal Feedback
Formative feedback in a cybersecurity game should include a person-
alized breakdown [15] of player’s actions and an explanation of their
effects. This would allow learners to recognize mistakes and learn from
them. The feedback can also include hints, for example, in the form of
explanations of concepts, links to reference materials, or information

14
2. State of the Art

about the flag format [55]. Similarly helpful are indicators that encour-
age the player to continue in a correct approach [55] or prevent him/her
from pursuing a blind path. All these aspects can act as an instructional
scaffolding [98, 22] that guides the player to maximize learning. At the
same time, it is necessary to keep in mind that too much guidance can
resort to “cookbook instructions”, which the students can blindly follow
without understanding and learn nothing [98].

2.2.7 Related Research in Cybersecurity Education
An open research area is exploring tools and methods learners use to
solve CTF tasks [12]. Only a few studies addressed it to date. One study
explored players’ behavioral patterns in a Jeopardy CTF. Participant
success positively correlated with time to solve the challenges and nega-
tively with challenge abandonment, Internet searching, and switching
between tools or challenges [99]. Recognizing this behavior can be a
basis for alerting instructors about students who experience difficulties.

2.2.8 Education Research in Other Domains
The largest body of R&D in computing education is carried out in
the domain of introductory programming. The focus is on summative
assessment; especially automated grading received attention in scientific
studies [100] as well as commercial software [101, 102, 103]. However, on-
line programming tutorials and environments lack personalized formative
feedback [96]. They usually provide only shallow feedback [104] focused
on the program correctness based on automated tests [89, 96, 105, 106].

Exploring Errors and Correct Solutions
Educators call for exploring ways of providing in-depth feedback and
guidance to beginners [104]. A step in this direction is examining stu-
dents’ errors and mistakes. A typology of errors in Java code was used to
analyze student compilations and determine most common types of er-
rors, their time-to-fix, repetitions, and spread [107]. In [108], the authors
collected multiple correct student solutions to the same programming
problems. They then used thematic analysis to categorize differences in
syntax, structure, and style of the correct solutions. In [95], clustering
identified player solution strategies. It showed actions that contributed

 15
2. State of the Art

to the solution and also revealed error patterns. If more correct solutions
were possible, the authors calculated student preference for particular
solutions.

Generating Hints and Feedback
Misconception-Driven Feedback [92] is a model for which instructors pre-
define common programming misconceptions. These can be discovered
in interviews [109], students’ solutions to assignments [100], or recorded
incremental changes in code [110]. The model then analyzes a student’s
code to detect syntax and semantic errors (based on compiler error
messages) and logical errors (based on output checking by unit tests or
code pattern matching to common errors). For each error, instructors
define feedback messages shown to students to explain where and why
the misconception occurred. In [111], similar feedback messages were
displayed directly in the programming environment.
 A related research area is an automated hint generation, which was
studied in the domain of introductory programming [112, 113]. A graph
with all solution paths of previous students was created in [114]. A hint
corresponded to the path from a given node towards the goal. What
is more, instructors can annotate hints [110] to provide higher-quality
feedback [92]. However, these data-driven approaches suffer from a
typical slow-start problem [92]. An exception is employing historical
data of previous students: in [96], data from only ten students sufficed
to generate hints for more than 90% of students.

16
3 Research Aims and Methods
The chapter starts with an overview of the research questions and
expected results in Section 3.1. Then, it explains the research environ-
ment in Section 3.2. This establishes the ground for Section 3.3, which
proposes a method for each research question. It also discusses the
evaluation of the results and limitations of the approaches. Section 3.4
presents a time plan for the research steps. Finally, Section 3.5 lists
conferences and journals relevant for publishing the results.

3.1 Research Questions and Expected Results
The research aim of this work is exploring pathways to providing auto-
mated formative feedback to players of cybersecurity games. Specifically,
we want to develop and evaluate a virtual learning assistant for CTF
games. To achieve this goal, we will explore the following three research
questions.

RQ1: How to model paths to the solution of a game level?
RQ2: How to improve the accuracy of the model by extending it with
 interaction patterns of past players?
RQ3: How to employ the model to provide automated formative feedback
 to current players?

 We expect three main contributions from answering these questions.
First, we will describe a general approach to modeling game levels
and apply it in practice. Second, we will propose a taxonomy of errors
and perform an exploratory study to discover common issues that
learners face. Third, we will create a system for providing automated
personalized hints and feedback. These results will improve the players’
learning experience, the instructors’ insight, and the games themselves.
 Although we focus on CTF, logs in other educational games can
be analyzed similarly. Therefore, our results will have a broad impact
in being applicable also in other domains. At the same time, we will
explore the domain-specific question of using cybersecurity tools in the
context of CTF. Since this research is applied, we will identify and
address practical issues and test the results in a realistic use case.

 17
3. Research Aims and Methods

3.2 Research Environment
This section provides a starting point for the research to set the context
for understanding the research methods in Section 3.3.

3.2.1 Cybersecurity Game Format
We develop and run training activities in the form of Attack-only CTFs
for practicing offensive security skills. Each game is played by a single
player who initially controls an attacker machine in a network with
several vulnerable hosts. The player gradually receives security-related
assignments structured into a linear sequence of levels. Each level is
finished by finding the correct flag. There is always only one correct
solution; however, there might be several correct pathways to reaching
it. Finishing a level is awarded by a specified number of points that
contribute to the player’s total score. The game ends when the player
enters the last flag or when (s)he decides to quit.
 The game provides optional scaffolding by offering static hints, which
are predefined by the game’s author. If the player struggles with a level,
(s)he can reveal these hints in exchange for points. It is also possible
to display the complete solution recommended by the game’s author,
in which case the player receives zero points for the level. Since the
game focuses on practicing and learning, the players are usually allowed
to use any materials, and sometimes even discuss their approach with
other players. This setting mimics real-life situations where external
knowledge bases and outside help are available [85].

3.2.2 Target Audience of the Games
Our CTF games are intended for adults with IT background who want to
broaden their technical skills. Specifically, we focus on computer science
university students and cybersecurity professionals. In the vast majority
of cases, we have no additional information about the educational
background or experience of the learners.

3.2.3 Technical Infrastructure
The games are hosted in the KYPO cyber range [115, 116], a virtual
training environment based on computational resources of CERIT Sci-

18
3. Research Aims and Methods

entific Cloud [117]. KYPO can emulate arbitrary networks of virtual
hosts, each of which can run a broad range of operating systems and
applications [85]. The hosts are sandboxed, which provides an isolated
and controlled environment for safe execution of cyber attacks [118]. The
KYPO environment and the games in it were created by CSIRT-MU, the
Computer Security Incident Response Team of Masaryk University [119].

3.2.4 Data Collection
The generic game format allows us to collect generic events from the
game portal, regardless of the topic of the particular game. The game
events describe the player’s interaction with the game interface. There
are seven types of game events: starting the game, ending the game,
starting a level, ending a level (by submitting a correct flag), submitting
an incorrect flag (and its content), taking a hint (and its number), and
displaying a solution to the level. Each game event is logged with the
corresponding player ID and a timestamp.
 In the vast majority of games, the players work with command-line
tools, mostly with the penetration testing toolkit in Kali Linux [120]. In
addition to the game events, the KYPO cyber range allows retrieving
command history from sandboxes of individual players. Below is an
example of a command log. The player attempts to use the hydra
tool to crack the password of a user yoda and makes several errors
before reaching the solution. The commands are sequentially executed
“one-liners”, and we can determine the timestamp for each.

 2018-08-20 16:53:02 # hydra -l yda -P pass.txt
 2018-08-20 16:53:15 # hydra -l yoda -P pass.txt
 2018-08-20 16:53:34 # hydra -h
 2018-08-20 16:57:28 # hydra -l yoda -P pass.txt ssh:172.18.1.14
 2018-08-20 16:57:54 # hydra -l yoda -P pass.txt ssh://172.18.1.14

3.2.5 Properties of the Data
A single session of a KYPO CTF game is intended for up to 30 people
due to resource constraints. The games feature tasks whose solution
is often unclear at first and requires completing several sub-tasks. As
a result, we can gather rich and detailed dataset in each CTF game
that we can explore deeply. Each player interacts with the game for as

 19
3. Research Aims and Methods

long as two hours, generates dozens of game events, and enters up to a
hundred commands.
 Compared to systems for learning basic programming, CTF games
are arguably more heterogeneous. Introductory programming tasks
combine only several syntactic blocks, such as conditionals and loops.
Cybersecurity tasks, on the other hand, require applying a multitude of
specialized tools, several of which may fit the task. Both systems are
similar in allowing to collect data about solutions and the associated
metadata.

3.2.6 User Privacy and Ethical Concerns
The data we collect neither contain any personally identifiable informa-
tion nor can such information be inferred from them. All player IDs
are either randomly generated or replaced with sequentially increasing
numbers before further analysis. We do not associate any personal data
with these IDs. As a result, player records are completely anonymized,
and tracking the same player across different sessions is impossible.
 We purposefully do not store individual keystrokes, since typing
patterns can identify users [121]. Similarly, we avoid collecting other
sensitive data, such as eye movements [122], audio recordings, or heart
rate [123], all of which were employed in computing education research.

3.3 Research Methods
To address the research questions posed in Section 3.1, we will first define
a model of expected paths to the solution of a game level. Second, we
will update the model with the data about actions of previous players
who completed the level. Third, we will provide dynamic hints and
feedback to future players based on their position in the model. Now,
let us examine the research methods in detail.

3.3.1 Modeling Game Levels (RQ1)
We need to define a comprehensive representation of all the feasible
progressions throughout a game level that lead to the solution. The
model must allow tracking the progress of individual players so that if the
player is stuck, (s)he can receive helpful automated hints. The challenge

20
3. Research Aims and Methods

is to define a model that is abstract enough to suppress unimportant
details but preserves interesting properties for analysis [124].

Theoretical Specification of the Model
We will start by having a game designer model the solution to a particular
game level. The model will be a transition graph = ( , ) that is
finite, directed, and acyclic. The vertices represent the current state of
a player. There are two types of states: a knowledge state and an action
state.
 A knowledge state ∈ represents the current knowledge of a
player. The first knowledge state 1 represents the information given to
the player at the start of the level. The last knowledge state represents
finding the flag. The ones between represent partial knowledge resulting
from completing the sub-tasks.
 An action state ∈ represents a tool that is relevant for completing
a task. The player can use it to advance to another knowledge state. We
will not assume an arbitrary tool, but limit the model to the tools that
are pre-installed in the attacker virtual machine. In the early stages of
the research, we will also constrain ourselves to command-line tools. To
maintain a reasonable complexity of the model, we will not store the
arguments of the commands directly in the model, but instead associate
them with the action state (see later in this section).
 Together, the knowledge and action states form the whole set of
states, that is, = ∪ . They capture situations that the player
can reach during the game. Finally, the edges represent transitions
between states. A direct transition between two knowledge states is not
permissible, as there must always be at least one action performed.

Example Model
To show an example, assume the following simplified game level. In
the beginning, the player is informed about a server with weak login
credentials, and the objective is to access it. To accomplish this task,
the player must scan the server’s ports to reveal that the TCP port
22 is open on the workstation, running the SSH service. Having a list
of common usernames and passwords, the player must then execute a
dictionary attack, crack the password, and access the workstation.

 21
3. Research Aims and Methods

 Nmap Hydra

 Knows Knows Knows
 Medusa
 server SSH port password

 Metasploit John

Figure 3.1: The graph shows an example model for a simplified game
level. The knowledge states are marked with a blue full line. The action
states are marked with a red dashed line.

 The game designer can model this level as shown in Figure 3.1. In
the initial state, the player knows about the existence of the vulnerable
server from the level description. After performing a successful port
scan, for which the player may use Nmap or Metasploit, (s)he discovers
the open ports. Finally, the player executes a dictionary attack using
Hydra, Medusa, or John the Ripper. Accomplishing this task reveals
the password that the player uses to log in.

Representation of Shell Commands
The executable commands are largely heterogeneous. Therefore, we
need to convert them to a suitable abstract representation. This will
ensure a normalized format that disregards whitespace and the order of
the arguments. For example, a command nmap -sS -T4 123.45.67.89
can be represented in the following JSON structure:

 {
 "command_name" : "nmap",
 "options" : ["sS", "T4"],
 "parameters" : ["123.45.67.89"]
 }

22
3. Research Aims and Methods

 We will then associate these structures with the corresponding action
states to fully capture the author’s solution.

3.3.2 Exploring Interaction Patterns of Players (RQ2)
To provide valuable feedback, modeling only the expert’s sample solution
is not enough. The game designer might not capture all the possible
states. Therefore, we need to gradually update the initial model with data
from players’ interactions. After we collect logs of players who completed
a level, we will group them by the player ID, and for each player filter
actions that either contribute to a solution or indicate an error. This
will allow us to discover both new actions (new solution strategies) and
problematic parts. We will then update the model accordingly.

Errors of Players
We are especially interested in errors that players make when using
cybersecurity tools. Our motivation is that observing types of errors
and their repetition is a reliable indicator of performance [125]. Based
on the collected data, we will propose a classification of errors. We will
then statistically analyze and compare the occurrence of errors to create
a knowledge base of common issues associated with action states. This
will help us understand the thinking of learners [126] and thus provide
more relevant feedback to their errors. It might also indicate poorly
designed game levels.
 Although there are many possible errors, describing the most frequent
ones proved to be sufficient [127]. Similarly to [126], we expect that the
most common errors will cover the majority of errors. Moreover, we
will examine which errors repeat and how often. One-time errors are
most likely accidental mistakes, whereas repeated errors might indicate
a misconception that needs to be addressed [128]. Repeated errors in a
short time may show guessing or brute-forcing.

Approaches to Analysis
Although an analysis model exists for programming data [73], most of its
features do not apply in the context of CTF. This is because the model
heavily relies on specifics on programming, such as code editing data,
compilation data, and debugging data. Nevertheless, we will employ

 23
You can also read