A critical review of intrusion detection systems in the internet of things: techniques, deployment strategy, validation strategy, attacks, public ...

Page created by Willard Phillips
A critical review of intrusion detection systems in the internet of things: techniques, deployment strategy, validation strategy, attacks, public ...
Khraisat and Alazab Cybersecurity     (2021) 4:18

 SURVEY                                                                                                                                                Open Access

A critical review of intrusion detection
systems in the internet of things:
techniques, deployment strategy, validation
strategy, attacks, public datasets and
Ansam Khraisat* and Ammar Alazab*

  The Internet of Things (IoT) has been rapidly evolving towards making a greater impact on everyday life to large
  industrial systems. Unfortunately, this has attracted the attention of cybercriminals who made IoT a target of malicious
  activities, opening the door to a possible attack on the end nodes. To this end, Numerous IoT intrusion detection
  Systems (IDS) have been proposed in the literature to tackle attacks on the IoT ecosystem, which can be broadly
  classified based on detection technique, validation strategy, and deployment strategy. This survey paper presents a
  comprehensive review of contemporary IoT IDS and an overview of techniques, deployment Strategy, validation
  strategy and datasets that are commonly applied for building IDS. We also review how existing IoT IDS detect intrusive
  attacks and secure communications on the IoT. It also presents the classification of IoT attacks and discusses future
  research challenges to counter such IoT attacks to make IoT more secure. These purposes help IoT security researchers
  by uniting, contrasting, and compiling scattered research efforts. Consequently, we provide a unique IoT IDS taxonomy,
  which sheds light on IoT IDS techniques, their advantages and disadvantages, IoT attacks that exploit IoT
  communication systems, corresponding advanced IDS and detection capabilities to detect IoT attacks.
  Keywords: Malware, Intrusion detection system, IoT, Anomaly detection, Machine learning, Deep learning, Internet of
  things, Attacks, IoT security

Introduction                                                                              in increasing attack surface area and probabilities of attacks
Internet of Things (IoT) are interconnected systems                                       will increase. As security will be a vital supporting element
of devices that facilitate seamless information exchange                                  of most IoT applications, IoT intrusion detection systems
between physical devices. These devices could be medical                                  need also be developed to secure communications enabled
and healthcare devices, driverless vehicles, industrial ro-                               by such IoT technologies (Granjal et al., 2015).
bots, smart TVs, wearables and smart city infrastructures;                                  In the last few years, advancement in Artificial Intelli-
and they can be remotely monitored and regulated. IoT                                     gent (AI) such as machine learning and deep learning
devices are expected to become more prevalent than                                        techniques has been used to improve IoT IDS (Intrusion
mobile devices and will have access to the most sensitive                                 Detection System). The current requirement is to do an
information, such as personal information. This will result                               up-to-date, thorough taxonomy and critical review of
                                                                                          this recent work. Numerous related studies applied
* Correspondence: a.khraisat@federation.edu.au; aalazab@mit.edu.au                        different machine learning and deep learning techniques
Federation University Australia, Federation University Australia, Ballarat,               through various datasets to validate the development of

                                          © The Author(s). 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License,
                                          which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
                                          appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if
                                          changes were made. The images or other third party material in this article are included in the article's Creative Commons
                                          licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons
                                          licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain
                                          permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
A critical review of intrusion detection systems in the internet of things: techniques, deployment strategy, validation strategy, attacks, public ...
Khraisat and Alazab Cybersecurity   (2021) 4:18                                                                  Page 2 of 27

IoT IDS. But, it’s still not clear that which dataset,            The highly cited survey by Alvarenga et al. (Zarpelao
machine learning or deep learning techniques are more           et al., 2017) reviews the IoT security issues in general.
effective for building an efficient IoT IDS. Secondly, the      Attacks against IoT devices are not discussed in their
time consumed in building and testing IoT IDS is not            studies, such as Denial of Service (DoS) Attack and at-
considered in the evaluation of some IDSs techniques,           tack on RPL (Routing Protocol for Low-Power and Lossy
despite being a critical factor for the effectiveness of ‘on-   Networks). Critical Infrastructure such as power sys-
line’ IDSs (Khraisat et al., 2019a).                            tems, transport, the internet, air traffic control, railways
   This paper provides an up to date taxonomy, to-              and power plants could all be disrupted by an IoT
gether with a critical review of the significant research       attacker. The authors reviewed intrusion detection in
works on IoT IDSs up to the present time; and a                 IoT, and they presented a great taxonomy to classify the
classification of the proposed systems according to             IoT IDSs based on detection method, IDS placement
the taxonomy. It provides a structured and compre-              strategy, and security threat and validation strategy. It
hensive overview of the existing IoT IDSs so that a             was also indicated by Alvarenga et al. (Zarpelao et al.,
researcher can become quickly familiar with the key             2017) in 2017 that intrusion detection for IoT is still in
aspects of IoT IDS. This paper also provides a critical         an initial stage and that the existing IDSs do not enough
review of machine learning and deep learning tech-              for a wide variety of IoT attacks. This paper explored
niques applied to build IoT IDS. The detection tech-            and discussed if the recent IoT IDSs are enough to deal
niques, validation strategies, deployment strategies are        with different IoT attacks.
reviewed, along with several techniques used in each              Existing review articles (e.g., such as (Chaabouni et al.,
method. The complexity of different detection techniques,       2019; da Costa et al., 2019; Buczak & Guven, 2016; Lunt,
intrusion deployment strategy, and their evaluation tech-       1988; Agrawal & Agrawal, 2015)) focus on intrusion de-
niques are discussed, followed by a set of suggestions          tection techniques or dataset issue or type of computer
identifying the best techniques, depending on the nature        attack and IDS evasion. No articles comprehensively
of the IoT IDS. Challenges for the current IoT IDSs are         reviewed IoT IDS, dataset problems, deployment strat-
also discussed. Compared to previous survey publications        egies, IoT Intrusion techniques, and different kinds of
(Khraisat et al., 2019a; Benkhelifa et al., 2018; Chaabouni     attack altogether. In addition, the development of IoT
et al., 2019; Zarpelao et al., 2017; Hindy et al., 2018) this   IDS has been such that several different systems have
paper presents a discussion on IoT techniques, IoT de-          been proposed in the meantime, and so there is a need
ployment strategy and IDS dataset problems which are of         for an up-to-date. The updated survey of the taxonomy
main concern to the research community in the area of           of IoT IDS discipline is presented in this paper further
IoT intrusion detection systems (IDS). Prior studies such       enhances taxonomies given in (Khraisat et al., 2019a;
as (Yang et al., 2017; Yar & Steinmetz, 2019) have not          Benkhelifa et al., 2018; Chaabouni et al., 2019; Liao
completely reviewed IoT IDSs in terms of the datasets,          et al., 2013a).
challenges and techniques. In this paper, we provide a            Given the discussion on prior surveys, this article
structured and contemporary, wide-ranging study on IDS          focuses on the following:
in terms of techniques, IoT attacks and datasets; and also
highlight challenges of the IoT techniques and then make           Classifying various kinds of IoT IDS based on
recommendations.                                                      intrusion techniques, deployment strategy, and
   During the last few years, several surveys on IoT IDS              validation strategy.
have been published. Table 1 shows the IDS techniques                Presenting a recent works effort to improve IoT
and datasets covered by this survey and previous survey               security IDS.
papers. The comparison that in this table discusses the              Taxonomy of IoT attacks.
contributions of each survey related to the develop                  Discussion of available IDS datasets.
intrusion detection system for IoT. The survey on intru-             The challenges of IoT IDS.
sion detection systems and taxonomy by Axelsson
(Axelsson, 2000) classified intrusion detection systems
based on the detection methods. The highly cited survey         Intrusion detection in the internet of things
by Debar et al. (Debar et al., 2000) surveyed detection         In this section, a review of the existing IDS research
methods based on the behaviour and knowledge profiles           for IoT is presented. Each research was categorized
of the attacks. A taxonomy of IoT intrusion systems by          by considering the following characteristics: IDS
Liao et al. (Liao et al., 2013a), has presented a classifica-   placement strategy, detection method, and validation
tion of five subclasses with an in-depth perspective on         strategy. Figure 1 shows the classification of IDS for
their characteristics: Statistics-based, Pattern-based,         IoT networks, while Table 1 provides some recent
Rule-based, State-based and Heuristic-based.                    related research.
A critical review of intrusion detection systems in the internet of things: techniques, deployment strategy, validation strategy, attacks, public ...
Khraisat and Alazab Cybersecurity

Table 1 Comparison of this survey and similar surveys: (✔: Topic is covered, ✖ the topic is not covered)
Survey                     Intrusion Detection System Techniques                                                                 Attacks   Validation   Deployment   IoT
                                                                                                                                 on IoT    Strategy     Strategy     Dataset
                           SIDS   AIDS                                                                                  Hybrid
                                                                                                                                                                                 (2021) 4:18

                                  Supervised    Unsupervised       Semi-supervised   Ensemble methods   Deep Learning
Lunt (Lunt, 1988)          ✔      ✖             ✖                  ✖                 ✖                  ✖               ✖        ✖         ✖            ✖            ✖
Axelsson                   ✔      ✔             ✖                  ✖                 ✖                  ✖               ✖        ✖         ✖            ✖            ✖
(Axelsson, 2000)
Liao, et al.               ✔      ✔             ✔                  ✖                 ✖                  ✖               ✔        ✖         ✖            ✖            ✖
(Liao et al., 2013b)
Agrawal and                ✔      ✔             ✔                  ✔                 ✔                  ✖               ✔        ✖         ✖            ✖            ✖
Agrawal (Agrawal &
Agrawal, 2015)
Buczak and Guven           ✔      ✔             ✔                  ✖                 ✔                  ✖               ✔        ✔         ✖            ✖            ✖
(Buczak & Guven, 2016)
Zarpelao, et al.           ✖      ✔             ✔                  ✖                 ✖                  ✖               ✖        ✔         ✔            ✔            ✖
(Zarpelao et al., 2017)
Khraisat, et al.           ✔      ✔             ✔                  ✔                 ✔                  ✔               ✖        ✖         ✖            ✖            ✔
(Khraisat et al., 2019a)
This survey                ✔      ✔             ✔                  ✔                 ✔                  ✔               ✔        ✔         ✔            ✔            ✔
                                                                                                                                                                                  Page 3 of 27
A critical review of intrusion detection systems in the internet of things: techniques, deployment strategy, validation strategy, attacks, public ...
Khraisat and Alazab Cybersecurity        (2021) 4:18                                                            Page 4 of 27

 Fig. 1 Classification of IDSs for IoT

  Figure 1 shows the IDS techniques, deployment strat-         methods are used to find a previous intrusion. In other
egy, validation strategy, attacks on IoT and datasets cov-     words, when an intrusion signature matches the signa-
ered by this paper and previous research papers. The           ture of a previous intrusion that already exists in the sig-
variety in the IoT IDS surveys indicates that a study of       nature database, an alarm signal is triggered. For SIDS,
IDS for IoT must be reviewed. Specifically, none of these      the host’s logs are inspected to find sequences of com-
surveys cover all detection methods of IoT, which is           mands or actions which have previously been identified
considered crucial because of the heterogeneous nature         as malware. SIDS has also been labelled in the literature
of the IoT ecosystem. For that reason, this survey review      as Knowledge-Based Detection or Misuse Detection
IDS for IoT from a broad technological scale.                  (Modi et al., 2013).
                                                                  Figure 2 demonstrates the conceptual working of SIDS
IoT intrusion detection systems methods                        approaches. The main idea is to build a database of in-
IoT Intrusion is defined as an unauthorised action or          trusion signatures and to compare the current set of ac-
activity that harms the IoT ecosystem. In other words,         tivities against the existing signatures and raise the
an attack that results in any kind of damage to the confi-     alarm if a match is found. For example, a rule in the
dentiality, integrity or availability of information is con-   form of “if: antecedent -then: consequent” may lead to
sidered an intrusion. For example, an attack that will         “if (source IP address=destination IP address) then label
make the computer services unavailable to its legitimate       as an attack “.
users is considered an intrusion. An IDS is defined as a          SIDS usually gives an excellent detection accuracy for
software or hardware system that maintains the security        previously known intrusions (Kreibich & Crowcroft,
of the system by identifying malicious activities on the       2004). However, SIDS is unable to detect zero-day at-
computer systems (Liao et al., 2013a). The main aim of         tacks as the database does not contain a matching signa-
IDS is to identify unauthorised computer usage and ma-         ture until the signature of the new attack is extracted
licious network traffic which is not possible while using      and stored. SIDS is employed in a number of common
a traditional firewall. This results in making the com-        tools, such as Snort (Roesch, 1999) and NetSTAT (Vigna
puter systems highly protective against the malicious ac-      & Kemmerer, 1999).
tions that compromise the availability, integrity, or             Traditional methods of SIDS have difficulty in identify-
confidentiality of computer systems. IDS system has two        ing attacks that span multiple packets as they examine
main sub-categories: Signature-based Intrusion Detec-          network packets and perform matching against a data-
tion System (SIDS) and Anomaly-based Intrusion Detec-          base of signatures. With the increased sophistication of
tion System (AIDS).

Signature-based intrusion detection systems (SIDS)
Signature intrusion detection systems (SIDS) utilize pat-
tern matching techniques to find a known attack; these
are also known as Knowledge-based Detection or Misuse
                                                                Fig. 2 Conceptual working of SIDS approaches
Detection (Khraisat et al., 2018). In SIDS, matching
A critical review of intrusion detection systems in the internet of things: techniques, deployment strategy, validation strategy, attacks, public ...
Khraisat and Alazab Cybersecurity        (2021) 4:18                                                                                   Page 5 of 27

modern malware, extracting signature information from                        based, knowledge-based and machine learning-based
multiple packets may be required. With this, IDS needs                       (Butun et al., 2014).
to bring the contents of earlier packets as well. For                          The main advantage of AIDS is the ability to identify
creating a signature for SIDS, generally, there have been                    zero-day attacks because recognizing the abnormal user
several methods where signatures are created as state                        activity does not rely on a signature database (Alazab
machines (Meiners et al., 2010), formal language string                      et al., 2012). AIDS triggers a danger signal when the
patterns or semantic conditions (Lin et al., 2011).                          examined behavior deviates from normal behavior.
  With the increasing rate of zero-day attacks (Symantec,                    Furthermore, AIDS has a number of benefits. First, they
2017), SIDS techniques have become progressively less                        can discover internal malicious activities. If an intruder
effective because of the absence of signature for any such                   starts making transactions in a stolen account that are
attacks. The other factors such as the polymorphic                           unidentified in the typical user activity, it creates an
variants of the malware and the rising amount of targeted                    alarm. Second, it is challenging for a cybercriminal to
attacks also add up in compromising the adequacy of this                     recognize what is a normal user behavior without produ-
traditional model. A potential solution to this problem                      cing an alert as the system is constructed from custom-
would be to use AIDS techniques. AIDS works by differ-                       ized profiles.
entiating between acceptable and unacceptable behaviour                        Table 2 presents the differences between signature-
rather than profiling what is anomalous, as described in                     based detection and anomaly-based detection. The main
the next section, as described in the next section.                          difference between these two is that AIDS can discover
                                                                             zero-day attacks, whereas SIDS can only detect previ-
Anomaly-based intrusion detection system (AIDS)                              ously known intrusions. However, AIDS can result in a
AIDS has attracted a lot of scholars because of its                          high false-positive rate because anomalies may just be
feature to overcome the limitation of SIDS. In AIDS, a                       new normal activities rather than genuine intrusions.
normal model of the behavior of a computer system is                           Since there is a lack of a taxonomy for anomaly-based
created using machine learning, statistical-based or                         intrusion detection systems, we have identified five
knowledge-based methods. Any significant deviation be-                       subclasses based on their features: Statistics-based,
tween the observed behavior and the model is regarded                        Pattern-based, Rule-based, State-based and Heuristic-
as an anomaly, which can be interpreted as an intrusion.                     based as shown in Table 3.
This kind of technique works on the fact that malicious
behaviour is different from typical user behaviour. The                      Techniques for implementing AIDS
behaviour of abnormal users that differentiates from the                     This section presents an overview of AIDS approaches
standard behaviour is defined as an intrusion. There are                     proposed in recent years for improving detection accur-
two phases in the development of AIDS: the training                          acy and reducing false alarms.
phase and the testing phase. In the training phase, the                        AIDS methods can be categorized into four main
normal traffic profile is used to learn a model of normal                    groups: supervised learning (Chao et al., 2015), unsuper-
behaviour. In the testing phase, a new data set is used to                   vised learning (Elhag et al., 2015; Can & Sahingoz, 2015),
develop the system’s capacity to generalise to previously                    reinforcement learning and deep learning (Buczak &
unseen intrusions. AIDS can be sub-categorized based                         Guven, 2016; Meshram & Haas, 2017). Supervised
on the method used for training, for instance, statistical-                  learning involves collecting and examining every input

Table 2 Comparisons of intrusion detection methodologies
                                    Advantages                                             Disadvantages
Detection methods       SIDS        • Very useful in identifying intrusions with minimum   • It needs to be updated frequently with a new signature.
                                      false alarms (FA).                                   • SIDS is designed to detect attacks for known signatures.
                                    • Promptly identifies the intrusions.                    When a previous intrusion has been altered slightly to a
                                    • Superior for detecting the known attacks.              new variant, then the system would be unable to
                                    • Simple design                                          identify this new deviation of a similar attack.
                                                                                           • Unable to detect the zero-day attack.
                                                                                           • Not suitable for detecting multi-step attacks.
                                                                                           • Little understanding of the insight of the attacks
                        AIDS        • It could be used to detect new attacks.              • AIDS cannot handle encrypted packets, so the attack
                                    • Could be used to create intrusion signature            can stay undetected and can present a threat.
                                                                                           • High false positive alarms.
                                                                                           • Hard to build a normal profile for a very dynamic
                                                                                             computer system.
                                                                                           • Unclassified alerts.
                                                                                           • It needs initial training.
A critical review of intrusion detection systems in the internet of things: techniques, deployment strategy, validation strategy, attacks, public ...
Khraisat and Alazab Cybersecurity           (2021) 4:18                                                                                      Page 6 of 27

Table 3 Detection methodology characteristics for IoT IDS
Detection Methodology                               Examples                                          Characteristics
Statistics based: analyzes the network              Bhuyan, et al. (Bhuyan et al., 2014)              • Needs a large amount of knowledge of statistics
traffic using complex statistical                                                                     • Simple but less accurate
algorithms to process the information.                                                                • Real-time
Pattern-based: identifies the characters,           Liao, et al. (Liao et al., 2013a)                 • Easy to implement
forms, and patterns in the data.                    Riesen and Bunke (Riesen et al., 2008)            • A hash function could be used for identification.
Rule-based: uses an attack “signature”              Hall, et al. (Hall et al., 2009)                  • The computational cost of rule-based systems
to detect a potential attack on the                                                                     could be very high because rules need pattern
suspicious network traffic.                                                                             matching.
                                                                                                      • It is very hard to estimate what actions are going
                                                                                                        to occur and when
                                                                                                      • It requires a large number of rules for determining
                                                                                                        all possible attacks.
                                                                                                      • The low false-positive rate
                                                                                                      • High detection rate
State-based: examines a stream of                   Kenkre, et al. (Kenkre et al., 2015)              • Probabilistic, self-training
events to identify any possible attack.                                                               • Low false positive rate.
Heuristic-based: identifies any                     Abbasi, et al. (Abbasi et al., 2014)              • It needs knowledge and experience
abnormal activity that is out of the                Butun, et al. (Butun et al., 2014)                • Experimental and evolutionary learning
ordinary activity.

variable and an output variable and you use an algorithm                               reinforcement learning methods enable an agent to
to learn the normal user behaviour from the input to the                               learn in an interactive environment by trial and error
output. The objective is to approximate the mapping                                    using feedback from its own actions and experiences.
function so well that when a new input record is                                       In reinforcement learning, the aim is to obtain an
collected that predicts the output variables for that                                  appropriate action model that would maximize the
record. On the other hand, Unsupervised learning                                       total cumulative reward of the agent. Deep learning
tries to identify the requested actions from existing                                  models are based on artificial neural networks, specif-
system data such as protocol specifications and                                        ically convolutional neural networks (CNN)s. These
network traffic instances where you only have input                                    four classes along with examples of their subclasses
data and no corresponding output variables, while                                      are shown in Fig. 3.

 Fig. 3 Classification of AIDS methods
Khraisat and Alazab Cybersecurity    (2021) 4:18                                                                          Page 7 of 27

   Machine learning is the process of extracting know-                learning techniques has been increasing in the last few
ledge from large quantities of data. Machine learning                 years. The main objective of IDS based on machine
models comprise of a set of rules, methods, or complex                learning research is to detect patterns and build an in-
“transfer functions” that can be applied to find interest-            trusion detection system based on the dataset. Generally,
ing data patterns or to recognise or predict behaviour                there are two categories of machine learning methods,
(Dua & Du, 2016). Machine learning techniques have                    supervised and unsupervised.
been applied extensively in the area of AIDS. To extract
the knowledge from intrusion datasets, different algo-                Supervised learning in intrusion detection system
rithms and techniques such as clustering, neural                      This subsection presents various supervised learning
networks, association rules, decision trees, genetic algo-            techniques for IDS. Each technique is presented in de-
rithms, and nearest neighbour methods are utilized.                   tail, and references to important research publications
   Some prior research has examined the use of different              are presented.
techniques to build AIDSs. Chebrolu et al. examined the                 Supervised learning-based IDS techniques detect intru-
performance of two feature selection algorithms involv-               sions by using labeled training data. A supervised learn-
ing Bayesian networks (BN) and Classification Regres-                 ing approach usually consists of two stages, namely,
sion Trees (CRC) and combined these methods for                       training and testing. In the training stage, relevant fea-
higher accuracy (Chebrolu et al., 2005).                              tures and classes are identified and then the algorithm
   Bajaj et al. proposed a technique for feature selection            learns from these data samples. In supervised learning
using a combination of feature selection algorithms such              IDS, each record is a pair, containing a network or host
as Information Gain (IG) and Correlation Attribute                    data source and an associated output value (i.e., label),
evaluation. They tested the performance of the selected               namely intrusion or normal. Next, feature selection can
features by applying different classification algorithms              be applied to eliminating unnecessary features. Using
such as C4.5, naïve Bayes, NB-Tree and Multi-Layer                    the training data for selected features, a supervised
Perceptron (Khraisat et al., 2018; Bajaj & Arora, 2013). A            learning technique is then used to train a classifier to
genetic-fuzzy rule mining method has been used to                     learn the inherent relationship that exists between the
evaluate the importance of IDS features (Elhag et al.,                input data and the labelled output value. A wide variety
2015). Thaseen et al. proposed NIDS by using the                      of supervised learning techniques have been explored in
Random Tree model to improve accuracy and reduce                      the literature, each with its advantages and disadvan-
the false alarm rate (Thaseen & Kumar, 2013). Subrama-                tages. In the testing stage, the trained model is used to
nian et al. proposed classifying the NSL-KDD dataset                  classify the unknown data into intrusion or normal class.
using decision tree algorithms to construct a model for               The resultant classifier then becomes a model that, given
their metric data and studying the performance of                     a set of feature values, predicts the class to which the
decision tree algorithms (Subramanian et al., 2012).                  input data might belong. Figure 5 shows a general
   Various AIDSs have been created based on machine                   approach for applying classification techniques. The
learning techniques as shown in Fig. 4. The main aim of               most existing IDSs proposed are trained in a supervised
using machine learning methods is to create IDS that                  way. It implies that the cybersecurity professional need
requires less human knowledge and improve accuracy.                   to label the network traffic and revise the model manu-
The quantity of AIDS which makes use of machine                       ally from time to time.

 Fig. 4 Conceptual working of AIDS approaches based on machine learning
Khraisat and Alazab Cybersecurity    (2021) 4:18                                                                  Page 8 of 27

 Fig. 5 Classification as the task

  There are many classification methods such as decision         Genetic algorithms (GA) Genetic algorithms are a
trees, rule-based systems, neural networks, support vector       heuristic approach to optimization, based on the princi-
machines, naïve Bayes and k-nearest neighbour. Each tech-        ples of evolution. Each possible solution is represented
nique uses a learning method to build a classification           as a series of bits (genes) or chromosomes, and the
model. However, a suitable classification approach should        quality of the solutions improves over time by the
not only handle the training data, but it should also identify   application of selection and reproduction operators,
accurately the class of records it has not ever seen before.     biased to favour fitter solutions. In applying a genetic
Creating classification models with reliable generalization      algorithm to the intrusion classification problem, there
ability is an important task of the learning algorithm.          are typically two types of chromosome encoding: one is
                                                                 according to clustering to generate binary chromosome
                                                                 coding method; another is specifying the cluster center
Decision trees A decision tree has three basic compo-
                                                                 (clustering prototype matrix) by an integer coding
nents. The first component is a decision node, which is
                                                                 chromosome. Murray et al. have used GA to evolve
used to identify a test attribute. The second is a branch,
                                                                 simple rules for network traffic (Murray et al., 2014).
where each branch represents a possible decision based
                                                                 Every rule is represented by a genome and the primary
on the value of the test attribute. The third is a leaf that
                                                                 population of genomes is a number of random rules.
comprises the class to which the instance belongs
                                                                 Each genome is comprised of different genes that
(Rutkowski et al., 2014). There are many different decision
                                                                 correspond to characteristics such as IP source, IP
tree algorithms, including ID3 (Quinlan, 1986), C4.5
                                                                 destination, port source, port destination and 1 proto-
(Quinlan, 2014) and CART (Breiman, 1996).
                                                                 col type (Hoque & Bikas, 2012).

Naïve Bayes This approach is based on applying Bayes'            Artificial neural network (ANN) ANN is one of the
principle with robust independence assumptions among             most broadly applied machine-learning methods and has
the attributes. Naïve Bayes answers questions such as            been shown to be successful in detecting different mal-
“what is the probability that a particular kind of attack is     ware. The most frequent learning technique employed
occurring, given the observed system activities?” by             for supervised learning is the backpropagation (BP) algo-
applying conditional probability formulae. Naïve Bayes           rithm. The BP algorithm assesses the gradient of the
relies on the features that have different probabilities of      network’s error with respect to its modifiable weights.
occurring in attacks and normal behavior. The naïve              However, for ANN-based IDS, detection precision, par-
Bayes classification model is one of the most prevalent          ticularly for less frequent attacks, and detection accuracy
models in IDS due to its ease of use and calculation             still need to be improved. The training dataset for less-
efficiency, both of which are taken from its conditional         frequent attacks is small compared to that of more-
independence assumption property (Yang & Tian, 2012).            frequent attacks, and this makes it difficult for the ANN
However, the system does not operate well if this inde-          to learn the properties of these attacks correctly. As a re-
pendence assumption is not valid, as was demonstrated            sult, detection accuracy is lower for less frequent attacks.
on the KDD’99 intrusion detection dataset, which has             In the information security area, huge damage can occur
complex attribute dependencies (Koc et al., 2012). The           if low-frequency attacks are not detected. For instance, if
results also reveal that the Naïve Bayes model has               the User to Root (U2R) attacks evade detection, a cyber-
reduced accuracy for large datasets. A further study             criminal can gain the authorization privileges of the root
showed that the more sophisticated Hidden Naïve Bayes            user and thereby carry out malicious activities on the
(HNB) model can be applied to IDS tasks that involve             victim’s computer systems. In addition, less common at-
high dimensionality, extremely interrelated attributes           tacks are often outliers (Wang et al., 2010). ANNs often
and high-speed networks (Koc et al., 2012).                      suffer from local minima and thus learning can become
Khraisat and Alazab Cybersecurity   (2021) 4:18                                                                           Page 9 of 27

very time-consuming. The strength of ANN is that, with          Hidden Markov model (HMM) HMM is a statistical
one or more hidden layers, it can produce highly nonlinear      Markov model in which the system being modeled is as-
models that capture complex relationships between input         sumed to be a Markov process with unseen data. Prior
attributes and classification labels. With the development of   research has shown that HMM analysis can be applied
many variants such as recurrent and convolutional NNs,          to identify particular kinds of malware (Annachhatre
ANNs are powerful tools in many classification tasks            et al., 2015). In this technique, a Hidden Markov Model
including IDS.                                                  is trained against known malware features (e.g., oper-
                                                                ation code sequence) and once the training stage is
Fuzzy logic This technique is based on the degrees of           completed, the trained model is applied to score the in-
uncertainty rather than the typical true or false Boolean       coming traffic. The score is then contrasted to a prede-
logic on which the contemporary PCs are created.                fined threshold, and a score greater than the threshold
Therefore, it presents a straightforward way of arriving        indicates malware. Likewise, if the score is less than the
at a conclusion based upon unclear, ambiguous, noisy,           threshold, the traffic is identified as normal.
inaccurate or missing input data. With a fuzzy domain,             K-Nearest Neighbors (KNN) classifier: The k-Nearest
fuzzy logic permits an instance to belong, possibly par-        Neighbor (k-NN) technique is a typical non-parametric
tially, to multiple classes at the same time. Therefore,        classifier applied in machine learning (Lin et al., 2015).
fuzzy logic is a good classifier for IDS problems as the        The idea of these techniques is to name an unlabelled
security itself includes vagueness, and the borderline          data sample to the class of its k nearest neighbors (where
between the normal and abnormal states is not well              k is an integer defining the number of neighbors to be
identified. In addition, the intrusion detection problem        considered). Figure 6 illustrates a K-Nearest Neighbors
contains various numeric features in the collected data         classifier where k = 5. The point X represents an instance
and several derived statistical metrics. Building IDSs          of unlabelled data that needs to be classified. Amongst
based on numeric data with hard thresholds produces             the five nearest neighbors of X, there are three similar
high false alarms. An activity that deviates only slightly      patterns from the class Intrusion and two from the class
from a model could not be recognized, or a minor                Normal. Taking a majority vote enables the assignment
change in normal activity could produce false alarms.           of X to the Intrusion class.
With fuzzy logic, it is possible to model this minor ab-           k-NN can be appropriately applied as a benchmark for
normality to keep the false rates low. Elhag et al. showed      all the other classifiers because it provides a good classi-
that with fuzzy logic, the false alarm rate in determining      fication performance in most IDSs (Lin et al., 2015).
intrusive actions could be decreased. They outlined a
group of fuzzy rules to describe the normal and abnor-          Ensemble methods Multiple machine learning algo-
mal activities in a computer system, and a fuzzy infer-         rithms can be used to obtain better predictive perform-
ence engine to define intrusions (Elhag et al., 2015).          ance than any of the constituent learning algorithms
                                                                alone (Vasan et al., 2020a). Training several classifiers at
Support vector machines (SVM) SVM is a discrimina-              the same stage to detect different attacks, and then unit-
tive classifier defined by a splitting hyperplane. SVMs         ing their result to increase the detection rate. Typically,
use a kernel function to map the training data into a           the ensemble’s ability is better than a single classifier’s,
higher-dimensioned space so that intrusion is linearly          as it can enhance weak classifiers to produce better
classified. SVMs are well known for their                       results than can a solitary classifier (Aburomman &
generalization capability and are mainly valuable when          Reaz, 2017). Several different ensemble methods have
the number of attributes is large and the number of             been proposed, such as Boosting, Bagging and Stacking.
data points is small. Different types of separating hy-
perplanes can be achieved by applying a kernel, such
as linear, polynomial, Gaussian Radial Basis Function
(RBF), or hyperbolic tangent. In IDS datasets, many
features are redundant or less influential in separating
data points into correct classes. Therefore, feature se-
lection should be considered during SVM training.
SVM can also be used for classification into multiple
classes. In the work by Li et al., an SVM classifier with
an RBF kernel was applied to classify the KDD 1999
dataset into predefined classes (Li et al., 2012). From a
total of 41 attributes, a subset of features was carefully
                                                                 Fig. 6 An example of classification by k-Nearest Neighbour for k = 5
chosen by using a feature selection method.
Khraisat and Alazab Cybersecurity   (2021) 4:18                                                                   Page 10 of 27

Boosting refers to a family of algorithms that can trans-
form weak learners into strong learners. Bagging means
training the same classifier on different subsets of the
same dataset. Stacking combines various classification
via a meta-classifier (Aburomman & Ibne Reaz, 2016).
The base-level models are built based on a whole
training set, and then the meta-model is trained on
the outputs of the base level model as attributes.
   Researchers have revealed that the combination of dif-       Fig. 7 Using Clustering for Intrusion Detection
ferent classifier techniques is an effective way to resolve
the shortcomings traditional IDSs have when they are
applied for IoT. Jabbar et al. proposed an ensemble clas-      separate ‘n’ data objects into ‘k’ clusters in which each
sifier that is built using Random Forest and also the          data object is selected in the cluster with the nearest
Average One-Dependence Estimator (AODE which                   mean. K means it is an iterative clustering algorithm that
solves the attribute dependency problem in the Naïve           aids to obtain the highest value for every iteration. It is a
Bayes classifier. Random Forest (RF) enhances precision        distance-based clustering technique and it does not need
and reduces false alarms (Jabbar et al., 2017). It is com-     to compute the distances between all combinations of
bining both approaches in ensemble results in improved         records. It applies a Euclidean metric as a similarity
accuracy over either technique applied independently.          measure. The number of clusters is determined by the
   More recently, Khraisat, et al. (Khraisat et al., 2019b)    user in advance. Typically, several solutions will be
proposed a stacking ensemble method that combined              tested before accepting the most appropriate one.
the C5 decision tree classifier and one-class support          Annachhatre et al. used the K-means clustering algo-
vector machine. The reported classification accuracy of        rithm to identify different host behaviour profiles
detection of malware is 94% on the IoT intrusion dataset       (Annachhatre et al., 2015). They have proposed new dis-
C5 decision tree classifier, while it is 92.5% in stage two.   tance metrics that can be used in the k-means algorithm
They reported in the stacking ensemble, and the classifi-      to relate the clusters closely. They have clustered data
cation accuracy was 99.97%.                                    into several clusters and associated them with known be-
                                                               haviour for evaluation. Their outcomes have revealed
Unsupervised learning in intrusion detection system            that k-means clustering is a better approach to classify
Unsupervised learning is a kind of machine learning that       the data using unsupervised methods for intrusion de-
makes use of input datasets without class labels to            tection when several kinds of datasets are available.
extract interesting information. The input data points         Clustering could be used in IDS for reducing intrusion
are normally treated as a set of random variables. A joint     signatures, generate a high-quality signature or similar
density model is then created for the data set. In super-      group intrusion.
vised learning, the output labels are given and used to
train the machine to get the required results for an           Probabilistic clustering This technique uses probability
unseen data point. In contrast, in unsupervised learning,      distribution to create the clusters.
no labels are given, and instead, the data is grouped
automatically into various classes through the learning
process. In the context of developing an IDS, unsuper-         Principal component analysis is a common method for
vised learning means, use of a mechanism to identify           obtaining a set of low dimensional features from the
intrusions by using unlabelled data to train the model.        largest set of features.
IoT network traffic is clustered into groups, based on
the similarity of the traffic, without the need to pre-        Hierarchical clustering This is a clustering technique
define these groups.                                           that aims to create a hierarchy of clusters. Approaches
  As shown in Fig. 7, once records are clustered, all of       for hierarchical clustering are normally classified into
the cases that appear in small clusters are labeled as an      two categories:
intrusion because the normal occurrences should pro-
duce sizable clusters compared to the anomalies. In              (i) Agglomerative- bottom-up clustering techniques
addition, malicious intrusions and normal instances are               where clusters have sub-clusters, which in turn
dissimilar, thus they do not fall into an identical cluster.          have sub-clusters and pairs of clusters are combined
                                                                      as one moves up the hierarchy.
K-means The K-means technique is one of the most                 (ii) Divisive - hierarchical clustering algorithms where
prevalent techniques of clustering analysis that aims to              iteratively the cluster with the largest diameter in
Khraisat and Alazab Cybersecurity        (2021) 4:18                                                                                    Page 11 of 27

      feature space is selected and separated into binary                      of the agent is to learn how to interact with its environ-
      sub-clusters with a lower range.                                         ment in such a way that allows it to achieve its goals.
                                                                                 Deep reinforcement learning is the application of
Singular value decomposition It is a method of decom-                          reinforcement learning to train deep neural networks. It
posing a matrix into other matrices as a series of linear ap-                  has an input layer, an output layer, and multiple hidden
proximations that expose the underlying meaning-structure                      layers same as prior deep neural networks. However, our
of the matrix. The goal of Singular Value Decomposition is                     input is the state of the environment. For instance, a bus
to uncover the optimal set of features that best predict the                   is trying to get its passengers to their destination. The
detection.                                                                     inputs are the position, speed, and direction; our output
                                                                               is a series of possible actions like speed up, slow down,
                                                                               turn left, or turn right. In addition, we’re feeding our
Independent component analysis It is used for show-                            rewards signal into the network so that we can learn to
ing hidden factors that underlie sets of random features.                      associate what actions produce positive results given a
  A lot of work has been done in the area of the cyber-                        specific state of the environment.
physical control system (CPCS) with attack detection
and reactive attack mitigation by using unsupervised                           Deep Q-network It is combined reinforcement learning
learning. For example, a redundancy-based resilience                           and deep neural networks at scale. The algorithm was
approach was proposed by Alcara (Alcaraz, 2018). He                            developed by enhancing a classic RL algorithm called
proposed a dedicated network sublayer that can handle                          Q-Learning with deep neural networks.
the context by regularly collecting consensual informa-
tion from the driver nodes controlled in the control net-                      Double Q-learning It is an off-policy reinforcement
work itself, and discriminating view differences through                       learning algorithm that utilises double estimation to
data mining techniques such as k-means and k-nearest                           counteract overestimation problems with traditional
neighbor. Chao Shen et al. proposed Hybrid-Augmented                           Q-learning.
device fingerprinting for IDS in Industrial Control
System Networks. They used different machine learning                          Deep learning
techniques to analyse network packets to filter anom-                          Deep learning is a form of machine learning where a
aly traffic to detect intrusions in ICS networks (Shen                         computer uses a hierarchy of data based on experience
et al., 2018).                                                                 and form multiple layers as an output. Deep learning
  Likewise, Khraisat, et al. (Khraisat et al., 2020) experi-                   can be supervised as well as unsupervised. In the case of
mented with both single and ensemble classifiers com-                          supervised deep learning, data can be classified whereas
posed of the decision tree, and SVM, for classification of                     in the case of unsupervised deep learning data patterns
the NLS KDD intrusion detection evaluation data set.                           are analyzed. Deep learning is directly related to artificial
They found that an ensemble of all three classifiers,                          intelligence where machines will acquire knowledge by
based on majority voting, marginally out-performed all                         learning with experience and will replace human
other classifiers.                                                             intelligence. Deep learning works on the platform of
                                                                               artificial neural networks by studying massive amounts
Reinforcement learning                                                         of data with the help of algorithms prepared by human
Deep Reinforcement learning utilizes deep learning and                         intelligence. It is referred to as ‘deep learning’ as the arti-
reinforcement learning principles for building IDS.                            ficial neural networks possess different deep layers that
Reinforcement learning involves an agent interacting                           enables them to learn. Table 4 shows a Comparison of
with an environment. The agent is trying to achieve a                          Machine learning and deep learning. Table 5 shows a
goal of some kind within the environment. The purpose                          summary of the deep learning model techniques.

Table 4 Comparison of Machine learning and deep learning
GENERAL      MACHINE LEARNING                                                         Deep learning
Network      Features extraction is required from the raw data to conduct a           Features extraction is not necessary and the raw data could be
Feature      classification.                                                          used in a completely autonomous to build IDS.
Number of    Only a part of available data is being utilized for building IDS. The    Processes all of the data, with a large number of features to
Contents     data is scaled into a small vector of features, e.g. statistical         detect the intrusions.
             correlations, it isinevitably throwing away most of the data
Correlations Features selected by a human domain expert                               Using raw data offers the capability to discover non-linear cor-
                                                                                      relations between data that are too complex for a human
Khraisat and Alazab Cybersecurity        (2021) 4:18                                                                               Page 12 of 27

Table 5 Summary of deep learning model techniques
Model                  Learning Model              Input Data               Characteristics
FCNN                   Supervised                  Image, sound, etc.       - No special assumptions needed to be made about the input.
                                                                            -Requires a huge number of connections and network parameters.
RNN                    Supervised                  Serial, time-series      -Processes sequences of data through internal data.
                                                                            -Useful in IDS with time-dependent
GAN                    Semi-supervised             various                  -The GAN sets up a supervised learning problem to do unsupervised
                                                                            -Less connection.
CNN                    Supervised                  Image, sound, etc.       -Need a large training dataset.
Autoencoder            Unsupervised                various                  -It can be trained in an unsupervised manner.
                                                                            -It can be used for intrusion detection in the event of a poor
                                                                            - Generating new content
                                                                            - Filtering out noise

  In neural networks, each neural node of every single                   Discriminator Network. The Generator Network pro-
hidden layer calculates the weighted values receiving                    duces synthetic data, and the Discriminator Network
from the previous layer and passes on the output values                  tries to detect if the data that it’s seeing is real or syn-
to the subsequent layer. The result value of the last layer              thetic. These two networks are adversaries in the sense
can be considered as the final results achieved by the                   that they’re both competing to beat one another.
neural networks from the raw data.
                                                                         Convolutional neural network (CNN) A Convolutional
Fully connected neural networks (FCNN) Fully                             Neural Network is contained of one or more convolu-
Connected Feedforward Neural Networks are the stand-                     tional layers and then linked by one or more fully con-
ard network architecture applied in mainly basic neural                  nected layers as in a standard multilayer neural network
network applications. Fully connected denotes that an                    (Vasan et al., 2020b). A convolutional neural network
individual neuron in the earlier layer is linked to every                contains an input and an output layer, as well as mul-
neuron in the subsequent layer. Feedforward indicates                    tiple hidden layers. The hidden layers of a CNN typically
that neurons in any preceding layer are only ever                        contain a sequence of convolutional layers that convolve
connected to the neurons in a subsequent layer. Fully                    with a multiplication. A CNN receives a 2-D input and
Connected Neural Networks can be used for feature                        abstracts high-level features via a sequence of hidden
extraction (Wang et al., 2020).                                          layers. CNN’s, which better upon the architecture of the
                                                                         common neural networks, benefit from spatial features
Recurrent neural network (RNN) The recurrent neural                      (Vasan et al., 2020c). Spatial features are usually applied
network can function efficiently on a series of data with                types of traffic features in the area of IDS. When apply-
variable input length. This means that RNNs use the in-                  ing spatial features, network traffic is reformed into
formation of its prior state as an input for their current               traffic images; it follows that the image classification
prediction, and we can repeat this process for an arbi-                  technique is used to categorize the traffic images, which
trary number of steps allowing the network to propagate                  also ultimately achieves the objective of detecting the in-
information via its hidden state through time. This is                   trusion traffic. This technique is comparatively recent,
essentially like giving a neural network a short-term                    but then numerous recent research results prove its
memory. This feature makes RNNs very effective for                       great potential. For example, Vasan, et al. (Vasan et al.,
working with sequences of data that occur over time.                     2020b) adopted CNNS techniques and transformed the
Yin, et al. (Yin et al., 2017) proposed a deep learning                  raw malware binary into both grayscale and colour im-
approach for intrusion detection using recurrent neural                  ages and apply the fine-tuned CNN architecture.
networks (RNN-IDS). their experimental results show
that RNN-IDS is very appropriate for creating IDS with                   Autoencoder An autoencoder is trained to restructure
high accuracy and that its performance is superior to                    its inputs. Autoencoders have been used for developing
that of traditional machine learning classification                      online IoT IDS (Mirsky et al., 2018). In general, an auto-
methods in both binary and multiclass classification.                    encoder trained on X gains the capability to restructure
                                                                         unobserved instances from the identical data distribution
Generative adversarial networks (GAN) The Genera-                        as X. If an instance does not be appropriate to the model
tive Adversarial Network is an integration of two deep                   learned from X, then it is expected the restructure to
learning neural networks: Generator Network, and a                       have a high error.
Khraisat and Alazab Cybersecurity   (2021) 4:18                                                                   Page 13 of 27

IoT IDS deployment strategies                                    with HIDS and firewalls, can provide a concrete, resilient,
IDS can also be classified based on the deployment used          and multi-tier protection against both external and insider
to detect IoT attacks. In IDS Deployment strategies, IDS         attacks. Table 6 shows a summary of comparisons be-
can be classified as distributed, centralized or hybrid.         tween IDS deployment strategies.
                                                                   Data source consists of system calls, application pro-
Distributed IDS                                                  gram interfaces, log files, data packets that are extracted
In distributed placement, the IoT devices could be re-           from well-known attacks. These data sources can be use-
sponsible for checking other IoT devices. Distributed            ful to classify intrusion behaviors from abnormal actions.
IDS be made up of several IDS over a big IoT ecosystem,
all of which communicate with each other, or with a              Hierarchical IDS
central server that assists advanced intrusion detection         In Hierarchical IDS, the network is separated into clusters.
systems, packet analysis, and incident response.                 The sensor nodes that are adjacent to each other typically
  Several IDS deploy distributed architectures. This             belong to the same cluster. Each cluster is assigned a
includes a subset of the network checking the other              leader, the so-called cluster head that screens the member
nodes. Distributed IDS offers the incident analyst many          nodes and plays a part in network-wide analyses.
advantages over centralized IDS. The main benefit is the
capability to identify attack forms across a whole IoT           IDS validation strategies
ecosystem. This might increase prompt IoT attack pre-            IDS Validation is the process for determining whether
vention and detection. The additional supported benefit          the IoT IDS model is an accurate enough representation
is to allow early detection of an IoT Botnet creating its        of the system, for detecting IoT attacks. To validate the
way through corporate IoT devices. This data could then          effectiveness of IDSs, researchers have used different
be used to detect and clean systems that have been in-           techniques such as theoretical, empirical, and hypothet-
fected by the IoT Botnet and stop further spread of the          ical strategies for validating their techniques.
Botnet into the IoT ecosystem consequently take down               There are many classification metrics for IDS, some of
any IoT devices damaged that would otherwise have                which are known by multiple names. Table 7 shows the
occurred. Furthermore, the advantage of distributed IDS          confusion matrix for a two-class classifier which can be used
rather than centralized IDS computing resources also             for evaluating the performance of an IDS. Each column of
implies reduced control over those resources.                    the matrix represents the instances in a predicted class,
                                                                 while each row represents the instances in an actual class.
Centralized IDS                                                    IDS are typically evaluated based on the following
In the centralized IDS location, the IDS is placed in cen-       standard performance measures:
tral devices, for instance, in the boundary switch or a
nominated device. All the information that the IoT de-              True Positive Rate (TPR): It is calculated as the ratio
vices collect and then send to the network boundary                   between the number of correctly predicted attacks
switch passes through the boundary switch (Benkhelifa                 and the total number of attacks. If all intrusions are
et al., 2018). Consequently, the IDS positioned in a bound-           detected then the TPR is 1 which is extremely rare
ary switch can check the packets switched between the                 for an IDS. TPR is also called a Detection Rate (DR)
IoT devices and the network. Despite this, checking the               or the Sensitivity. The TPR can be expressed
network packets that pass through the boundary switch is              mathematically as
not adequate to identify anomalies that affect the IoT de-
vices. The network traffic is monitored in centralized IDS.                    TP
                                                                    TPR ¼
This traffic is extracted from the network through differ-                   TP þ FN
ent network data sources such as packet capture, NetFlow,
etc. The computers connected in a network can be moni-
                                                                   False Positive Rate (FPR): It is calculated as the ratio be-
tored by Network-based IDS. Moreover, NIDS is also cap-
                                                                 tween the number of normal instances incorrectly classified
able of monitoring the external malicious activities that
                                                                 as an attack and the total number of normal instances.
could have been commenced from an external threat at
an earlier stage, before these threats expand to other com-                    FP
puter systems. However, NIDS comes with some limita-                FPR ¼
                                                                            FP þ TN
tions such as its restricted ability to inspect the whole data
in a high bandwidth network because of the volume of
data passing through modern high-speed communication
networks (Bhuyan et al., 2014). NIDS deployed at several            False Negative Rate (FNR): False negative means
positions within a particular network topology, together              when a detector fails to identify an anomaly and
Khraisat and Alazab Cybersecurity        (2021) 4:18                                                                                        Page 14 of 27

Table 6 Comparison of IDS deployment strategies based on their positioning
                             Advantages                                       Disadvantages                     Data source
IDS            Distributed   • HIDS can check end-to-end encrypted            • Delays in reporting attacks     • Audits records, log files, Application
deployment     IDS             communications behaviour.                      • Consumes host resources           Program Interface (API), rule patterns,
strategies                   • No extra hardware is required.                 • It needs to be installed on       system calls.
                             • Detects intrusions by checking the host          each host.
                               file system, system calls or network events.   • It can monitor attacks only
                             • Every packet is reassembled                      on the machine where it is
                             • Looks at the entire item, not streams only       installed.
               Centralized   • Do not impose an additional overhead           • IoT can be exposed if the       • Simple Network Management Protocol
               IDS             on the sensor nodes.                             centralized IDS is                (SNMP)
                             • Detects attacks by checking network              compromised.                    • Network packets (TCP/UDP/ICMP),
                               packets.                                       • Challenge is to identify        • Management Information Base (MIB)
                             • Not required to install on each host.            attacks from encrypted          • Router NetFlow records
                             • Can check various hosts in the same              traffic.
                               period.                                        • Dedicated hardware is
                             • Capable of detecting the broadest ranges         required.
                               of network protocols                           • It supports only the
                                                                                identification of network
                                                                              • Difficult to analysis a high-
                                                                                speed network.
                                                                              • The most serious threat is
                                                                                the insider attack.
                                                                              • Not applicable
                                                                              For a large scale IoT
               Hierarchical • It uses NIDS, HIDS and wireless intrusion  • the complexity of the IDS            Various
                              detection system (WIDS) presenting
                              success in interoperability across
                              heterogeneous Network types.
                            • IDS is likely to be extremely deployable
                              across big and heterogeneous IoT networks,

     classifies it as normal. The FNR can be expressed                        for different cut-off points. Each point on the ROC curve
     mathematically as:                                                       represents an FPR and TPR pair corresponding to a cer-
                                                                              tain decision threshold. As the threshold for classifica-
                 FN                                                           tion is varied, a different point on the ROC is selected
    FNR ¼
               FN þ TP                                                        with different False Alarm Rate (FAR) and different
                                                                              TPR. A test with perfect discrimination (no overlap in
   Classification rate (CR) or Accuracy: The CR
                                                                              the two distributions) has a ROC curve that passes
     measures how accurate the IDS is in detecting
                                                                              through the upper left corner (100% sensitivity, 100%
     normal or anomalous traffic behavior. It is described
     as the percentage of all those correctly predicted
     instances to all instances:
                                                                              State-of-the-art intrusion detection in IoT
                        TP þ TN                                               Table 8 shows a summary of the proposed research to
    Accuracy ¼                                                                IDSs for IoT.
                   TP þ TN þ FP þ FN
                                                                                Cho et al. proposed a methodology for checking
                                                                              packets that are passing through the border router for
  Receiver Operating Characteristic (ROC) curve: ROC                          communication between physical and network devices.
has FPR on the x-axis and TPR on the y-axis. In the                           Their methodology is based on the botnet attacks by
ROC curve, the TPR is plotted as a function of the FPR                        checking the packet length (Cho et al., 2009). However,
                                                                              no information is presented about the technique
Table 7 Confusion matrix for IDS system                                       employed to create a normal behaviour profile. It is also
Actual Class     Predicted Class                                              not clear how the proposed IDS techniques would work
                 Class        Normal                    Attack                on resource constraints nodes in the IoT.
                 Normal       True negative (TN)        False Positive (FP)
                                                                                Rathore et al. proposed semi-supervised Fuzzy
                                                                              learning-based distributed attack detection framework
                 Attack       False Negative (FN)       True positive (TP)
                                                                              for IoT (Rathore & Park, 2018). The evaluation was done
You can also read