Active Learning for Network Traffic Classification: A Technical Study - arXiv

Page created by Brandon Navarro

Business

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Active Learning for Network Traffic Classification: A Technical Study - arXiv

IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING                                                                                   1

                                         Active Learning for Network Traffic Classification:
                                                        A Technical Study
                                                                  Amin Shahraki, Mahmoud Abbasi, Amir Taherkordi and Anca Delia Jurcut

                                          [Note: This work has been submitted to the IEEE Trans-                       networks and maintain their performance, such as Monitor-
                                        actions on Cognitive Communications and Networking jour-                       Analyze-Plan-Execute (MAPE), and Observe-Orient-Decide-
                                        nal for possible publication. Copyright may be transferred                     Act (OODA) [2].
                                        without notice, after which this version may no longer be                         In networking, the process of analyzing the network traffic
                                        accessible]                                                                    behavior is mainly known as Network Traffic Monitoring and
arXiv:2106.06933v2 [cs.NI] 5 Aug 2021

                                           Abstract—Network Traffic Classification (NTC) has become an                 Analysis (NTMA) [3]. NTMA has attracted much interest
                                        important feature in various network management operations,                    in recent years and become an important research topic in
                                        e.g., Quality of Service (QoS) provisioning and security services.
                                        Machine Learning (ML) algorithms as a popular approach for                     the field of communication systems and networks [4]. The
                                        NTC can promise reasonable accuracy in classification and deal                 importance of NTMA lies in the properties and challenges
                                        with encrypted traffic. However, ML-based NTC techniques                       of modern networking, e.g., heterogeneity, complexity, and
                                        suffer from the shortage of labeled traffic data which is the                  dynamicity, resulting in instability in data transmission [5].
                                        case in many real-world applications. This study investigates the              NTMA is an essential approach to measure the performance of
                                        applicability of an active form of ML, called Active Learning
                                        (AL), in NTC. AL reduces the need for a large number of                        applications and services, and to discover network inefficien-
                                        labeled examples by actively choosing the instances that should                cies. Indeed, NTMA allows us to shed light on the functioning
                                        be labeled. The study first provides an overview of NTC and                    of communication systems and to deal with unexpected events,
                                        its fundamental challenges along with surveying the literature                 especially in complex and large-scale networks, such as the
                                        on ML-based NTC methods. Then, it introduces the concepts of                   Internet.
                                        AL, discusses it in the context of NTC, and review the literature
                                        in this field. Further, challenges and open issues in AL-based                    NTMA applications are generally categorized into eight
                                        classification of network traffic are discussed. Moreover, as a                groups, including Network Traffic Classification (NTC), traffic
                                        technical survey, some experiments are conducted to show the                   prediction, fault management, network security, traffic routing,
                                        broad applicability of AL in NTC. The simulation results show                  congestion control, resource management, and Quality of
                                        that AL can achieve high accuracy with a small amount of data.                 Service (QoS) and Quality of Experience (QoE) management
                                                                                                                       [6]. In this study, we focus on NTC as an important and open
                                          Index Terms—Survey, Network Traffic Classification, Active                   issue in NTMA. NTC refers to techniques for categorizing
                                        Learning, Machine Learning, NTMA
                                                                                                                       network traffic into different classes based on their properties.
                                                                                                                       The classification of network traffic is highly beneficial in
                                                                  I. I NTRODUCTION                                     various network services from QoS (e.g., traffic policing and
                                           During the last decades, emerging new networking                            shaping) and pricing to malware and intrusion detection [7].
                                        paradigms, such as Internet of Things (IoT), have introduced                   NTC provides detailed knowledge on network traffic, which
                                        various network management challenges. Given the prolif-                       is very useful for those who investigate the changes in traffic
                                        eration of IoT devices and the distinguishing characteristics                  characteristics and long-term requirements of networks [8],
                                        of IoT traffic, such as heterogeneity, spatio-temporal depen-                  e.g., Network Management and Orchestration (NMO) tools,
                                        dencies, dominating uplink traffic, and low duty-cycle traffic                 and performance management models.
                                        patterns, network management and monitoring has become                            NTC techniques can be broadly grouped into three cate-
                                        challenging. Gaining deep insight into such complex networks                   gories: port-based, payload-based, and flow-based methods
                                        for performance evaluation and network planning purposes is                    [9]. Port-based techniques associate a standard port number
                                        not a trivial task with respect to processing time, human effort,              to a service or application, while payload-based methods
                                        and computational overhead. Understanding network traffic                      carefully inspect the content of the captured packets to classify
                                        behavior plays a vital role in a wide variety of network man-                  them. Last but not least, flow-based techniques utilize the
                                        agement aspects, e.g., fault management, accounting, security,                 network traffic flow characteristics (e.g., round-trip time and
                                        and network performance management [1]. Some general                           inter-arrival times) to associate produced traffic to the related
                                        approaches have been introduced to analyze the behavior of                     sources. The two latter methods cannot be used in some
                                                                                                                       network types (e.g., Virtual Private Network (VPN)), or violate
                                          Amin Shahraki is with School of Computer Science, University College         the privacy of users by accessing their personal data. Flow-
                                        Dublin, Ireland. Corresponding author e-mail: (am.shahraki@ieee.org)
                                          Mahmoud Abbasi was with Department of Computer Sciences, Islamic             based techniques are the most common techniques for NTC
                                        Azad University, Mashhad, Iran, email: mahmoud.abbasi@ieee.org                 as instead of inspecting all packets passing through a given
                                          Amir Taherkordi is with the Department of Informatics (IFI), University of   link, they examine network traffic flows or an aggregated form
                                        Oslo, Norway. email: amirhost@ifi.uio.no
                                          Anca Delia Jurcut is with Department of Computer Sciences, University        of the network header packets information. As a result, the
                                        College Dublin, Dublin, Ireland, email: anca.jurcut@ucd.ie                     volume of data needed to be examined will be reduced, and

IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING                                                                       2

the encrypted traffic will no longer be a problem. Flow-based        CFS        Correlation based Feature Selection
techniques assume that each application’s traffic has almost         CNN        Convolutional Neural Network
unique statistical or time-series features that can be utilized      DDoS       Distributed Denial of Service
by classifiers to categorize both encrypted and regular traffics.    DL         Deep Learning
   In flow-based methods, the traffic classifier may leverage        DPI        Deep Packet Inspection
Machine Learning (ML) algorithms to automate the classifica-         EER        Expected error reduction
tion process, discover different traffic patterns produced by de-    GAN        Generative Adversarial Network
vices, and classify encrypted traffic. Although ML algorithms        i.i.d      Identically and independently distributed
are powerful techniques to classify network traffic flows [10],      IDSs       Intrusion Detection System
[11], the accuracy of learning-based approaches is limited by        IoT        Internet of Things
their need for a massive number of labeled instances. As the         LAL        Learning Active Learning
authors in [12] mentioned, most of the real-world application        LSTM       Long Short-Term Memory
data is semi-labeled or unlabeled data. Moreover, the data           M2M        Machine-to-Machine
labeling process for ML tasks can be challenging in terms            MAPE       Monitor-Analyze-Plan-Execute
of human effort and cost [13].                                       ML         Machine Learning
   Fortunately, Active Learning (AL), as a sub-field of ML, is       MLP        Multi-layer Perceptron
a promising approach to deal with the need for a huge amount         NMO        Network Management and Orchestration
of labeled instances. AL aims to reduce the need for labeled         NTC        Network Traffic Classification
examples by intelligently querying the labels during training.       NTMA       Network Traffic Monitoring and Analysis
The query goes for the examples that the AL algorithm                OODA       Observe-Orient-Decide-Act
believes will help build the best model [14]. Therefore, based       P2P        Peer-to-peer
on the aforementioned challenges, AL can be considered as            QBC        Query-By-Committee
an appropriate and efficient technique for flow-based NTC.           QoE        Quality of Experience
Providing a thorough study on the usefulness of AL in NTC            QoS        Quality of Service
and reviewing the state-of-the-art techniques in this field can      RAE        Relief Attribute Evaluation
significantly help the network research community in better          RAL        Reinforcement AL
adoption of AL for classification of network traffic in various      RL         Reinforcement Learning
domains. To the best of our knowledge, this is the first and only    SDAE       Stacked Denoising Autoencoder
study that technically reviews the efficiency and importance of      SDN        Software Defined Networking
AL for NTC along with surveying the literature in this field.        SFEM       Structural Feature Extraction Methodology
In this paper, we study the NTC techniques and discuss AL            SVDD       Support Vector Data Dscription
as a useful approach in this field. The main contributions of        SVM        Support Vector Machine
our work are summarized as follows:                                  TLS        Transport Layer Security
                                                                     UNC        Uncertainty sampling
   • Discussing NTC techniques and their correlations with
                                                                     VAE        Variational Autoencoder
     ML techniques
                                                                     VPN        Virtual Private Network
   • Reviewing existing work in AL-based NTC
                                                                     WSNs       Wireless Sensor Networks
   • Empirical evaluation of the performance of AL for NTC
     purposes
   • Discussing the challenges, and future directions in using                     II. R ELATED S URVEY A RTICLES
     AL for NTC
                                                                       There exist several literature studies reviewing the use
The rest of this paper is structured as follows: In Section
                                                                    of ML techniques in communication systems and wireless
II, we review existing survey works on traffic classification
                                                                    networks, e.g., [15], [16]. There are also some surveys that
techniques. In Section III, we provide an overview of the
                                                                    focus on specific ML techniques, e.g., Deep Learning (DL)
NTC problem and the use of ML techniques. Then, we devote
                                                                    [17] and Reinforcement Learning (RL) [18] , or specific types
Section IV to discussing the fundamental elements of AL and
                                                                    of networking, e.g., Software Defined Networking (SDN) [19]
query strategies. Next, in Section V, we discuss the advantages
                                                                    and optical networks [20]. Moreover, some survey works com-
of using AL for NTC purposes and carry out a literature review
                                                                    pare, evaluate or review different techniques, e.g., ML-based
on this topic. In Section VI, we evaluate the performance of
                                                                    techniques, heuristic models and statistical-based techniques
AL in NTC. In Section VII, we discuss the challenges and
                                                                    for NTC e.g., [21]. Considering the volume of survey literature
future directions in using AL for NTC, and finally we conclude
                                                                    in this field, in this section, we focus only on surveys that
the paper in Section VIII.
                                                                    review NTC or the use of various ML techniques in NTC.

                  L IST OF ABBREVIATIONS
                                                                      •   General literature reviews on NTC: In [22], Dainotti
 AL         Active Learning
                                                                          et al. reviewed the issues and future research directions
 ALBL       AL by learning
                                                                          of NTC, especially in case of applicability, reliability
 ASVM       AL Support Vector Machine
                                                                          and privacy. They outlined the research and policy future
                                                                          directions of NTC, e.g., validating the NTC models, effect

IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 3

of network speed in NTC and NTC tools. In [23], Fin- Pacheco et al. comprehensively surveyed the use of ML
sterbusch et al. reviewed the payload-based NTC based techniques in NTC for different cases, e.g., encrypted
on Deep Packet Inspection (DPI). They also practically network traffic. By understanding the challenges of using
analysed the most significant open-source DPI modules ML techniques in NTC, they studied the reliable label
to show their performance in terms of accuracy and assignment, dynamic feature selection, integrating the
requirements. Additionally, they provided a guideline on meta-learning processes. They considered these solutions
how to design and implement DPI-based NTC modules. to solve several issues, including imbalance network data,
In [24], Velan et al. studied NTC models for encrypted dynamicity of networks, and online strategies for re-
network traffics to measure the traffic and improve the training the ML models.
security, e.g., detecting anomalies. They have reviewed In Table I, a summary of the surveys above is provided
different types of encrypted traffics and how payload- based on their vision of NTC, the reviewed solutions, network
based and feature-based NTC techniques can classify type and practical evaluation of studied solutions. As indicated
encrypted network traffics. Zhao et al. [7] reviewed the in the table, our survey is for flow-based NTC for the use
use of NTC in IoT and Machine-to-Machine (M2M) in Internet communications and specifically considers AL as
networks. They reviewed the current NTC within the IoT one of the most important ML-based solution. To the best
context based on the differences between IoT and non-IoT of our knowledge, our study is one of the rare literature
network traffics. By reviewing the literature, the authors surveys that evaluates such specific ML solutions for NTC as
showed that in IoT research area, most of NTC techniques most of existing surveys consider general ML models, e.g.,
are proposed to solve security challenges. The authors in supervised learning solutions for NTC. Studying AL-based
[25] reviewed the NTC techniques, i.e., statistics-based solutions makes our work different from all existing survey
classification, correlation-based classification, behavior- works.
based classification, payload-based classification, and
port-based classification. They also quantified classifica-
III. OVERVIEW ON NTC AND ML
tion granularity based on four levels, i.e., application type
layer, protocol layer, application layer and service layer. In NTC, one should clarify the goals of classification based
Last but not least, they classified network traffic features on the intended use, such as for accounting purposes, malware
and the existing public datasets that are commonly used detection, intrusion detection, providing QoS, and identifying
in the proposed NTC techniques. types of applications based on the network traffic (e.g., VPN
• Literature reviews on the use of ML in NTC: As one of and nonVPN traffics or Tor and nonTor traffics). Indeed, there
the earliest study in the use of ML in NTC, Nguyen et are different factors that one can use to categorize network
al. [26] reviewed the literature between the years 2004 to traffic, including applications (e.g., Facebook and Hangouts),
2007. They studied how ML models can be employed protocols (e.g., HTTP and BitTorrent), traffic types (e.g., Web
for NTC in IP networks, e.g., clustering approaches, Browsing and Chat), browsers (e.g., Firefox and Chrome),
supervised learning approaches and hybrid approaches. operating systems, and websites. Therefore, the purpose is to
They also reviewed the literature that compares ML tech- determine the label of each network flow truly, e.g., browsing,
niques or non-ML techniques for NTC. They mentioned interactive, and video stream. NTC can be further categorized
that offline analysis models, e.g., AutoClass, Decision into online and offline classification. In online NTC, the input
Tree and Naive Bayes can achieve a high accuracy for traffic needs to be classified in a real-time or near real-time
about 99%. They also outlined some critical operational manner (e.g., QoS provisioning). On the other hand, offline
requirements for real-time NTCs models compared to classification is appropriate for applications such as anomaly
offline models. In [21], Singh evaluated the unsupervised detection and billing systems. Despite their importance, exist-
ML techniques including K-means and Expectation Max- ing NTC techniques suffer from general networking challenges
imization algorithm for NTC. The results show that the as listed below:
accuracy of K-Means is better than Expectation Maxi- • While the literature on traffic classification is mature
mization algorithm. In [27], Perera et al. compared six to adapt to old-fashioned networking paradigms, e.g.,
ML algorithms including Naive Bayes, Bayes Net, Naive legacy cellular systems, the dramatic growth and evolu-
Bayes Tree, Random Forest, Decision Tree and Multi- tion of online applications and services have made traffic
player Perceptron along with two feature extraction tech- classification a non-trivial task. Due to the traffic char-
niques, i.e., Correlation based Feature Selection (CFS) acteristics of modern networks, e.g., being large-scale,
and Relief Attribute Evaluation (RAE). Their results show heterogeneity, multimodal data, and big data, emerging
that Decision Tree and Random Forest have better perfor- NTC methods must meet strict requirements in terms
mance compared to other techniques. In [28], Gomez et of system performance, accuracy, and robustness. For
al. compared seven ensemble ML techniques including example, the vast amount of raw data generated by IoT
OneVsRest, OneVsOne, Error-Correcting Output-code, and cellular devices pose severe challenges to ML-based
Adaboost classifier, Bagging algorithm, Random Forest NTC methods as they need clean and pre-processed data
and Extremely Randomized Trees which are all based for training purposes.
on decision trees in NTC. They compared them in case • NTC is a multi-factor procedure in which an automated
of model accuracy, latency and byte accuracy. In [29], program categorizes the network traffic based on the

IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING                                                                                                           4

                                                 Table I: An overview of existing literature surveys on NTC and ML.
                                                                                                                                                           Practical
                 Study              Year   NTC vision                        Reviewed Solution(s)                                Type of network
                                                                                                                                                          Evaluation
                 [26]               2008   Analysing Statistical traffic     ML Solutions                                        IP networks              No
                                           Characteristics
                 [22]               2012   General NTC                       Not Specified                                       TCP Networks             No
                 [23]               2014   Payload-Based           NTC       DPI-based techniques                                Internet                 Yes
                                           techniques
                 [21]               2015   Comparative Study                 Comparing unsupervised ML techniques                Internet                 Yes
                 [24]               2015   Analysing encrypted network       ML techniques and hybrid techniques                 Not Specified            No
                                           traffic by payload-based and
                                           feature-based NTC technique
                 [27]               2017   Comparative Study                 Comparing six ML Solutions                          Communication Networks   Yes
                 [28]               2017   Comparative Study                 Comparing Decision-tree based ensemble techniques   Internet                 Yes
                 [29]               2018   ML-based NTC                      Most existing ML solutions                          IP Networks              No
                 [7]                2020   NTC for M2M network traffic       Generic solutions                                   IoT                      No
                 [25]               2021   Reviewing various types of        Most existing ML solutions                          Internet                 No
                                           NTC models
                 Our Study          2021   Flow-based NTC                    Active Learning                                     Internet                 Yes

     network traffic features, e.g., types of network protocols,                                    A. Data gathering
     applications, hosts, etc. As a challenge, NTC techniques                                          Since ML algorithms learn to classify the data based on
     need to select the best features to classify the network                                       sample datasets, representative data must be collected as the
     traffic with high accuracy, while each of them can be ef-                                      data gathering step. While a few publicly available network
     ficient or inefficient from one network to another network.                                    traffic datasets have been released, using these to train a
     In other words, feature engineering is a challenge when                                        traffic classification model can be difficult [33]. In addition,
     it comes to using classical ML for traffic classification.                                     since the behavior of the network traffic is different from one
  • The recent increase of encrypted network traffic and                                            network to another one, it is highly recommended to train
     protocol encapsulation methods limit the effectiveness                                         the ML algorithm for the target network [2]. Additionally,
     of many traffic classification techniques since the packet                                     the number of network traffic classes can be high, and it is
     inspection techniques are unable to extract network man-                                       rather impractical to consider all classes in one public dataset.
     agement information from network traffics. For example,                                        Furthermore, there are a variety of data gathering and labeling
     a significant portion of the Internet traffic is associated                                    techniques that lead to different feature sets. Hence, in real-
     with Peer-to-peer (P2P) applications. However, classifi-                                       world applications, the goal is to use datasets that are tailored
     cation of P2P traffic is a difficult task [7] as many P2P                                      to the intended use of NTC, mainly gathered from the target
     applications, such as online video and P2P downloading,                                        network.
     use encryption and obfuscation protocols to remove the
     limitations posed by Internet service providers.
                                                                                                    B. Data pre-processing
  To overcome the above challenges, various techniques have
been introduced, e.g., graphical techniques, statistical methods                                       After gathering, the data must be pre-processed such that
and ML-based methods [24]. In the scope of ML, various                                              it is represented in a form that the target ML algorithms can
solutions for port-based, payload-based, and flow-based have                                        discover different patterns. In traffic classification, header data
been proposed as the most promising solutions for NTC [30]                                          and payload are two major data structures. These structures
[31]. Multiple steps are needed for building a ML-based                                             often need to be pre-processed because they contain irrelevant
network traffic classifier as presented in [32]. Figure 1 shows                                     or redundant information, such as network management data,
a graphical description of all steps. In the rest of this section,                                  which is not needed for traffic classification, e.g., source and
we discuss each individual step.                                                                    destination IP addresses, and protocol information. Moreover,
                                                                                                    changes in the distribution of packet-level features can occur
                                                                                                    in real-world environments because of unexpected events like
                    Steps towards building a ML-based network traffic classifier
                                                                                                    the re-transmission of packets. In short, performing some
 Data gathering
                          Data pre-          Feature          Model            Model                pre-processing steps such as packet filtering, elimination of
                         processing        engineering       selection       evaluation
                                                                                                    noisy samples, header removal, and data quality assessment
                                                                                                    is needed to ease the learning process for the ML algorithms
       Public                Packet          Time series         Header+
      datasets               filtering         features        time series                          [34].
                                            Header-related     Header+
     Exclusive               Header
                                              features         payload
     datasets                removal

                          Data quality
                          assessment
                                              Statistical
                                               features
                                                               Statistical
                                                                features
                                                                                                    C. Feature engineering
                                                                                                       Conventional classification solutions, e.g., ML- and
                                               Header
                                               removal                                              statistical-based techniques, need to go through a feature
                                                                                                    engineering procedure, in which domain knowledge is used to
Figure 1: The main steps in building a network traffic classifier.                                  extract features or patterns from the raw data [35], [23]. Fea-
                                                                                                    ture engineering is a crucial step in ML-based NTC methods

IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 5

because of the fact that choosing appropriate features can ease features of network flows generated by different services
the difficulties of the modelling phase, and vice versa [36]. It is or applications are almost unique. Nevertheless, a big
worth mentioning that considering privacy, the risk associated challenge with the methods that use statistical features
with feature engineering and representation procedures is also is that they are not suitable for online classification. This
crucially important, especially in the payload feature-based is mainly due to the fact that a classifier needs to monitor
techniques. Indeed, there are some legal restrictions on using the entire or significant part of a network flow in order
payload-based methods in many environments or recognizing to extract statistical features.
all communication protocols. This is mainly due to the user’s
privacy policies, as such methods inspect the content of the
D. Model selection
network packets [37].
Generally, there are four major types of input features for Another step towards building a traffic classifier is selecting
NTC: the right ML model. In the context of ML, choosing a model
can carry different meanings, such as the selection of hyper-
• Time series: Considering time series related features, parameters and parameters, as well as algorithm selection.
one can refer to maximum packet inter-arrival time, Given NTC, several factors can be involved in the selection of
maximum number of bytes in packet, and inter-packet the classification model (e.g., model performance, available
timings. According to [38], the length of time series (or resources, model complexity, and feature selection). One of
the number of packets within a flow) has a visible effect the most significant factors is feature selection. This is due
on classification accuracy and computational overhead. to the fact that there is a direct correlation between features
Specifically, increasing the number of considered packets and input dimensions of the model, and consequently the
can improve the classification performance but at the computational and memory complexities of the model, which
cost of higher computational overhead. In [38], only are crucial factors in NTC. This implies that the dimensions
the first 20 traffic packets in a flow are used for the and structure of input data for training purposes should be
experiments. The authors in [39] use the time-series optimized. Moreover, the selected features directly affect the
features of packets, e.g., source and destination ports, performance of the final learning task (e.g., classification and
payload size, and TCP window size (bytes) as input for regression) and the dimensions of the input data for training.
a semi-supervised model to perform traffic classification Hence, one should consider the right number of informative
related to the five Google services, including Hangout features. In the context of traffic classification, it may be not
Chat, Hangout Voice Call, YouTube, File transfer, and sufficient to consider the model performance as the only factor
Google play music. The simulation result shows excellent for model selection. Thus, one can also consider other criteria,
accuracy, despite using a limited number of labeled data such as training time and model explainability.
samples. This is mainly because they conducted a pre-
training step on the entire unlabeled network flows in
E. Model Evaluation
order to learn statistical features, and then they re-trained
the model using a small labeled dataset for fine-tuning. Finally, the evaluation of the selected model is the final
• Header: The header of a network packet contains infor- step in building a network traffic classifier. In this step, the
mation related to different layers (e.g., the network layer). performance of the ML model on unseen data is measured.
Features such as port number and protocol number are The ML model should be able to give accurate predictions
widely used as informative features in traffic classification to be useful for the given task. However, the accuracy is not
tasks. However, some modern NTC techniques, especially the only evaluation metric for a classification task, and other
DL-based, accept entire packets as the input feature. For metrics such as confusion matrix, F1 score, recall, etc. should
example, in [40] the authors used hexadecimal raw packet be considered. NTC is a classification task, and we use the
header and convolutional networks to classify Tor/non- same metrics to evaluate the performance of the proposed
Tor traffic. To this end, they utilized TCP/IP headers, model.
especially the first 54 bytes of packets, because TCP is
associated with around 90% of all the Internet traffic. F. Existing Work
• Payload: NTC techniques can also use layer-related
information above the transport layer to classify network Recently, several ML techniques have been proposed for
traffic. As a prime example, in [41] the authors utilize network traffic classification. In this subsection, we categorize
BitTorrent handshake packets on layer 4 to classify the existing work in the literature based on the goals of network
BitTorrent traffic. BT generates the highest amount of traffic classification, including identifying applications (also
P2P traffic. Moreover, some works use packets related to called apps), cyber security purposes, fault detection, website
the Transport Layer Security (TLS) handshake process to fingerprinting, user activities identification, and operating sys-
identify HTTPS services [42]. tems identification. We discuss these goals in more details in
• Statistical features: The statistical features of network the sequel.
flows, such as minimum inter-arrival time and size of • Mobile apps identification: This goal refers to analyzing
the IP packets can be used for NTC [43]. The main and finally identifying the network traffic related to a
idea behind using statistical features is that the statistical particular mobile app. Given the ever-increasing number

IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 6

of mobile apps, network administrators and telecommu- al. utilize federated learning for malware detection in IoT
nications companies are actively looking for rigorous devices through one supervised model (based on Multi-
methods to secure their infrastructure. Apps identification layer Perceptron (MLP)) and one unsupervised model
based on analyzing the network traffic of mobile apps can (based on autoencoder). To evaluate the framework, they
assist network administrators with resource management use N-BaIoT dataset, which models the traffic of IoT
and planning, and app-specific policy establishment (e.g., systems impacted by malware. In [52], McLaughlin et
security policy establishment and access management for al. present a DL-based method for Android malware
a specific app). Furthermore, the identification of apps can detection using the raw opcode sequence as the in-
help protect smartphone platforms (e.g., Android) against put of a CNN model which can automatically learn
emerging security threats and uncover sensitive apps. the features of malware instances. The authors claimed
Moreover, by app identification, it is possible to forbid the that the proposed method has a more straightforward
use of some particular apps (e.g., Google+ and Instagram) training pipeline than the previously proposed works
in an enterprise network [44]. Several papers have been (e.g., n-gram-based malware detection). Huang et al. [53]
published on app identification. Ajaeiya et al. in [44] combine the unsupervised spatiotemporal encoder with
present a framework for the classification of Android LSTM to detect abnormal network traffic. The spatial
apps. The proposed framework identifies apps traffic from feature of network traffic data was extracted in the first
a network viewpoint without adding any overhead on stage by the spatiotemporal model. Then, the obtained
users’ mobile phones. Moreover, the authors provide a features are used to train another LSTM layer for the
pre-processing method for traffic flows to extract the classification purpose. NSL-KDD dataset was used for
most informative features for ML-based techniques. The the evaluation of the model. Based on the experimental
work in [45] leverages Variational Autoencoder (VAE) for results, using the proposed DL model, the efficiency of
the identification of mobile apps. The authors claimed intrusion detection is significantly high compared to the
that their method is able to label a massive number of traditional techniques.
instances and extract the features in mobile apps traffic • Fault detection: Fault detection is part of a more ex-
automatically. To this end, the authors first transform the tensive network management process, called fault man-
mobile apps traffic to meaningful images, and then use agement. Fault management points to a set of processes
VAE as a classifier. Similar work was carried out by Wang to detect, isolate, and then correct unusual situations of
et al. in [46], in which the authors design three DL- a network. Failure occurs when a system (e.g., an IoT
based models, including Stacked Denoising Autoencoder network) cannot adequately provide a service, where a
(SDAE), 1D Convolutional Neural Network (CNN), and fault is the source cause of a failure. Fault manage-
Long Short-Term Memory (LSTM) for mobile apps ment, especially fault detection, play an essential role in
identifications. The authors in [47] provide a multi- today’s network management (e.g., QoS provisioning).
classification scheme for the classification of mobile apps Hence, many works have been conducted to improve
traffic. More specifically, they combine several mobile the fault management process. In [54], Huang et al.
traffic classifiers’ decisions (knowledge) to classify their survey fault detection techniques in IoT networks and
traffic samples. introduce a fault-detection framework for Self-Driving
• Cybersecurity purposes: One of the main goals of Network (SelfDN)-enabled IoT. Moreover, the authors
traffic classification is detecting security breaches in propose an algorithm called Gaussian Bernoulli restricted
communication systems, e.g., intrusion detection, mal- Boltzmann machines auto-encoder to change the fault-
ware detection, anomaly detection, and worm detection. detection into a classification task. The simulation result
Cybersecurity tools/techniques (e.g., intrusion detection demonstrates the superiority of the proposed method to
systems) aim to defend communication systems from other adopted methods, such as linear discriminant anal-
internal/external threats. Traffic classification methods ysis and SVM. In [55], the authors focus on the problem
can be used to assess network traffic behavior through of cell coverage degradation detection through a deep
detecting malicious traffic flow/link, and then prevent neural network. They propose a deep recurrent model
attacks. A large body of work in the literature has for diagnosing cell radio performance deterioration and
focused on ML-based malware and intrusion detection. complete cell outages in a mobile phone network. In [56],
The authors in [48] propose an intrusion detection ap- Noshad et al. adopt the Random Forest classifier for fault
proach based on deep neural networks and compare the detection in Wireless Sensor Networks (WSNs). They use
performance of DL with classical ML classifiers, demon- a dataset with six types of faults at the sensor levels for
strating the superiority of DL models. Similarly, in [49], performance evaluation, such as data loss, offset, and out-
Shone et al. propose a non-symmetric deep auto-encoder- of-bounds. Moreover, they compare the performance of
based learning solution for intrusion detection. The auto- the proposed method with other well-known techniques,
encoder network has been used for learning features in e.g., MLP, CNN, and probabilistic neural networks.
an unsupervised manner. Then, they employ a stacked • Website fingerprinting: It refers to methods for identify-
non-symmetric auto-encoder as a traffic classifier. In [50], ing and collecting data about websites visited by a mobile
Nguyen et al. propose a federated self-learning method to device, which is essential for the advertising industry,
detect anomalies in IoT systems. Similarly, in [51], Rey et identifying the characteristics of attacks (e.g., botnets

IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 7

and sniffing) and protecting users’ privacy. Website fin- obtain the more stable traffic features. Hou et al. in [63]
gerprinting can help recognition of fraudsters and other categorize user activities of the WeChat application by
unusual activities. Moreover, website fingerprinting can performing a detailed analysis on the encryption protocol
be considered as a type of traffic analysis attack that of this application, called MMTLS, to find the typical user
allows eavesdroppers to get information on the victim’s activities of the application (e.g., advertisement click and
activities. Given the importance of website fingerprinting, browsing moments). Then, they adopt different learning
there is a large body of literature on this topic. In algorithms, such as Naive Bayes, Random forest, and
[57], Rahman et al. leverage the idea of adversarial ML Logistic Regression, to classify these activities.
to defend users against website fingerprinting attackers. • Operating systems identification: This refers to identi-
The authors propose a method to generate adversarial fying the operating system installed on a mobile device
examples to decline the accuracy of the attacks that use by analyzing its generated traffic. Adversaries can use
learning-based techniques for robust traffic classification. operating systems identification to launch more serious
The simulation results show that the proposed method attacks against a specific mobile operating system. More-
can decline the accuracy of the state-of-the-art attack over, it is desirable to use this analysis to investigate the
by half. The work in [58] focuses on the concept drift popularity of the mobile operating systems (e.g., Android
problem in static website fingerprinting attacks for the and iOS) among users. Hagos et al. in [64] introduce a
Tor network. The authors refer to the fact that it is costly learning-based technique for passive operating systems
to update static attacks in dataset updating and retrain fingerprinting. They use classical ML (i.e., Support Vec-
the model. Hence, they introduce AdaWFPA, an adap- tor Machine (SVM), Random Forest, k-nearest neighbors,
tive online website fingerprinting attack that leverages and Naive Bayes) and DL algorithms (i.e., MLP and
adaptive stream mining techniques. Luo et al. in [59] LSTM ) for classification purposes. Moreover, the authors
propose Random Bidirectional Padding (RBP), a website propose to use the underlying TCP variant as a practical
fingerprinting obfuscation technique against intelligent feature for improving classification accuracy. The authors
fingerprinting attacks. It uses time sampling and random in [65] compare the performance of the ML-based tech-
bidirectional packets padding to change the inter-arrival niques, such as k-nearest neighbors and Decision Tree,
time characteristics in the traffic flow, and consequently, with the traditional commercial rule-based strategy for
to identify more complex patterns in network packets. operating systems fingerprinting. The simulation result
• User activities identification: Such traffic analysis can demonstrates the superiority of the learning-based tech-
be used to obtain exciting pieces of information about a niques to the traditional method. Lastovicka et al. in [66]
specific action that a mobile subscriber carries out on investigate the performance of the three famous operating
his/her device (e.g., posting a video on Twitter). The system fingerprinting techniques, including user-agent,
identification of the user activities may also be made TCP/IP parameters fingerprint, and specific domains com-
to get information about a specific activity, such as the munication. Performance measures reveal that the method
length of a message sent by a user within a particular chat based on user-agents provides better performance than its
application. User activity identification can be utilized counterparts.
by adversaries/researchers to reveal the identity behind
an unknown user, e.g., in a social media, that prefers
IV. OVERVIEW ON AL
to remain anonymous. This can be done by behavioral
profiling for the users of a network, which is helpful for A supervised machine learns to discriminate the different
identifying reconnaissance within the network. Moreover, traffic classes by being trained on labeled training data. While
such traffic analysis offers a possibility to character- capturing large quantities of network data is relatively easy,
ize the users’ habits in a network, e.g., chatting with analysing the data by ML techniques can be a very time-
friends in the morning and watching the video stream consuming, expensive, or human-labor intensive process. This
in the evening. The user’s behavior information can be is mainly because of the complexity of ML techniques or the
employed next time to detect the user presence in the shortage of labeled data resulting in inefficient training. In
network. In [60], Conti et al. use ML techniques (i.e., Dy- order to reduce the number of needed labeled examples and,
namic Time Warping (DTW), hierarchical clustering, and consequently, reduce the effect of ground truth challenge, AL
Random Forest) for analyzing Android encrypted network can be used to facilitate labeling.
traffic, and consequently, to identify user actions (e.g., AL systems can participate in the gathering and selection
email actions, including sending email, replying, and of training instances, such that only the most informative
Facebook actions). The authors in [61] leverage transfer examples are required to be labeled. Using AL, a learner
learning to analyze encrypted mobile traffic to deal with follows an iterative strategy in which it interacts with an oracle
the problem of diversity of app releases, mobile operating to choose the most useful data instances to be labeled, thereby,
systems, and model of devices, and identify user actions. it reduces the cost of data labeling by using only a few labeled
The work in [62] focuses on the identification of the examples to deliver satisfactory performance in a reasonable
Instagram user behavior. Unlike previous works that used time. The AL paradigm is illustrated in Fig. 2, in which the
the statistical features of encrypted traffic, this work three core components are: query strategy, annotator, and
provides a new technique based on maximum entropy to ML model. The query strategy is responsible for choosing

IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING                                                                                             8

unlabeled data according to a pre-defined policy. A label                         the performance of the query strategies in Section VI. Note
is then provided for the selected data by a human/machine                         that in ML terminology, hypothesis space refers to the all
annotator, and the data is added to the set of training instances.                possible legal hypotheses, where a hypothesis is a particular
Afterwards, the model is updated, and the process repeated                        computational model that best explains the target data in
as long as new data is available, or a stopping criterion is                      supervised ML. In active learning settings, a query strategy can
satisfied. Different stopping criteria can be defined to end                      search the hypothesis space through testing unlabeled samples
this iterative process, such as reaching the desired accuracy,                    to reduce the number of legal hypotheses under attention.
running time, or a maximum number of queries, which can
directly affect the performance of using AL.                                        •   Uncertainty sampling (UNC): In UNC, a learner prefers
   There are mainly two AL scenarios to consider, namely,                               to label the instances where the model is most uncertain
stream-based selective sampling and pool-based sampling                                 about the class of the example. The idea behind the strat-
(presented in Fig. 3). In the former, the distribution of un-                           egy is that those examples on which the model exhibits
labeled instances is known, and the instances are considered                            the most degree of uncertainty are most likely to improve
one at a time. The learner then observes each instance in                               the performance of the model over time. Different criteria,
sequence and decides whether the instance should be labeled                             also called uncertainty strategies, for measuring uncer-
or discarded. AL is a promising technique to alleviate the                              tainty, have been proposed including posterior probability,
challenge of streaming-based learning scenarios [67], [68].                             smallest margin, and entropy [70]. Entropy is one of the
AL algorithms designed for streaming scenarios can control                              most popular uncertainty strategies in many AL problems.
the labeling process and gradually perform this process over                            In an n-class classification problem, assume the estimated
time [69]. Using this strategy, it is expected that the labeling                        probabilities of the n classes are p1 , . . . , pn , respectively.
process will be in balance and the algorithms will detect                               Given the currently labeledPdata instances, the entropy
                                                                                                                         n
the changes. In the case of pool-based sampling, a pool of                              is defined as E(X) = − i=1 pi . log(pi ). Given this
unlabeled data is provided, and the aim of the learner is to                            expression, a larger value of the entropy means a higher
select the most informative instances from the pool to be                               level of uncertainty. Accordingly, this objective function
labeled by the annotator. Pool-based sampling is attractive for                         can be considered as a maximization problem.
many real-world learning scenarios as it is possible to collect                     •   Query-By-Committee (QBC): In QBC, an AL system
a large body of unlabeled data at once. Pool-based sampling                             consists of a committee of different learners trained on
presumes that a limited amount of labeled data and a big pool                           the current labeled data. These learners are then used
of unlabeled data are available.                                                        to make a prediction on the labels of unlabeled data.
                                                                                        The instances for which the committee members disagree
                                                                                        the most on the correct label are selected for labeling.
A. Active learning query strategies                                                     Then, the committee of learners will use the new labeled
   The fundamental question in AL is that what is the most                              data examples for training purposes. The QBC strategy
effective strategy for querying data instances? In NTC applica-                         creates wider diversity than UNC because it considers
tions, different query strategies can be used based on various                          the differences in the predictions of several different
network circumstances, e.g., new unknown flows, changes in                              learners, instead of measuring the level of uncertainty
the behavior of network traffic, and discovering unclassified                           of labeling using only a single learner. However, the
network traffics. We first, introduce the most well-known query                         technique for measuring the disagreement is often similar
strategies of AL widely used in literature and then evaluate                            for both query strategies [71]. In the QBC strategy,
                                                                                        the vote entropy and KL-divergence metrics are usually
                                                                                        applied to measure the disagreement. In the literature, to
               Train a model                            Requiring new data
                                                                                        construct a committee of learners, two major approaches
                                          Machine
                                       learning model                                   have been proposed. In the former, one can change the
                                                                                        parameters/hyperparameters of a particular model (e.g.,
                                                                                        by sampling) in order to generate different models and,
                                                                                        consequently, the committee models (or learners). In
                                                                      Unlabeled         contrast, in the latter, the committee is built by a bag
                                                                        data
          Labeled
           data
                                                                                        of different learners (i.e., ensemble of learners).
                                                                                    •   Learning Active Learning (LAL): The main idea behind
                                                                                        this strategy is to train a regressor that forecasts the Ex-
              A
               dd

                                                                    gy
                  in

                                                                                        pected error reduction (EER) for an instance in a specific
                                                                 te
                    g

                                                                 ra
                    to

                                                               st

                                                                                        learning state. Indeed, this technique formulates the query
                        th

                                                            ry
                         e

                                                          ue
                         tr
                             ai

                                                         Q

                                                                                        strategy of unlabeled data as a regression problem. Then,
                             ni
                               ng
                                  da

                                                                                        regarding a trained classifier and its output for specific
                                    ta

                                        Annotator                                       unlabelled instance, the Learning Active Learning (LAL)
                                    (human or machine)                                  forecasts the decrease in generalization error that can be
                                                                                        reached by labeling that instance. The interested readers
      Figure 2: Graphical description of active learning.                               are referred to read [72] for details.

IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING                                                                                                                                      9

                 Train a model
                                          Machine                                                                    Train a model
                                       learning model                                                                                         Machine
                                                                                                                                           learning model
                                                                             Observe an                                                                                         Select the most
                                                                              instance                                                                                            informative
                                                                                                                                                                                    instance

            Labeled                                                      Make
                                                                                             Input source
             data                                                       decision                                Labeled                                                    Labeled
                                                                                                                 data                                                       data

                                                                        gy
                La he

                                                                       te

                                                                                                                   ad

                                                                                                                                                                           gy
                to

                                                                                   Discard

                                                                       ra
                  be tra

                                                                                                                      d

                                                                                                                                                                          te
                   t

                                                                     st

                                                                                                                     La e
                                                                                                                     to

                                                                                                                                                                        ra
                     l a ini

                                                                 ry

                                                                                                                                                                      st
                                                                                                                        be tra
                                                                                                                        th
                        nd ng

                                                                ue

                                                                                                                                                                   ry
                                                                                                                           l a ini
                          ad da

                                                            Q

                                                                                                                                                                 ue
                                                                                                                              nd ng
                             d ta

                                                                                                                                                                Q
                                                                                                                                 da
                                                                                                                                      ta
                                          Annotator
                                      (human or machine)                                                                                       Annotator
                                                                                                                                           (human or machine)

                                    (a) Stream-based sampling                                                                               (b) Pool-based sampling

                                      Figure 3: (a) Stream-based selective sampling, and (b) Pool-based sampling.

  •   Random: It refers to the conventional supervised learning                                         to increase the accuracy of the method or recognize new
      scheme in which instances are randomly selected to be                                             applications, protocols, or protocol versions. The update is
      labeled. Since data labeling is an expensive procedure,                                           essentially performed using new labeled data.
      random sampling may not lead to the best learner, es-                                                AL is a promising research field in this context as it greatly
      pecially when the query of each sample is costly, and                                             reduces the cost of training and dramatically speeds up the
      consequently, few labels will finally be available [71].                                          learning phase [74]. This is advantageous to ML-based traffic
  •   Information Density (Density): Uncertainty sampling,                                              classification methods to better satisfy the aforementioned
      QBC, and LAL query strategies are all prone to choosing                                           requirements, precisely data requirements and the need for
      outliers or unrepresentative instances and, consequently,                                         updating to identify new types of traffic through attaching
      this can lead to sub-optimal queries. A solution is to                                            labels on the most informative instances and the need for
      use the representativeness of an instance to ensure the                                           updating to identify new types of traffic.
      selected instances resemble the overall distribution. When
      considering whether to query an instance, a combination
                                                                                                        A. Advantages of using AL for NTC purposes
      of representativeness and the informativeness instances
      is typically used [73]. In the density query, to measure                                             AL is potentially a good candidate to perform NTC. Below,
      the representativeness of a data instance, the closeness                                          we summarize the advantage of using AL techniques in the
      of the data instance to all other data instances is often                                         field of NTC:
      considered.                                                                                          • Less amount of data needed for labeling: As mentioned
                                                                                                              before, most conventional networks generate unlabeled
       V. ACTIVE LEARNING FOR NETWORK TRAFFIC                                                                 and semi-labeled data. Meanwhile, one of the key chal-
                               CLASSIFICATION                                                                 lenges to use the learning-based techniques for NTMA
   As explained in Section III, NTC has attracted much in-                                                    is the lack or limited accessibility of labeled instances.
terest in recent years and different ML methods have been                                                     Moreover, data labeling is not often a straightforward
proposed to solve the NTC problem. However, most of these                                                     procedure and can raise the cost in terms of time, human
methods suffer from various challenges such as requiring                                                      effort, and the computational overhead. Other than that,
a large amount of fully labeled data, existence of a con-                                                     if data labeling is performed manually or by online
siderable amount of semi-labeled or unlabeled data in real-                                                   tools, it can reduce the data quality, since not all data
world network scenarios, and complex, costly, and time-                                                       instances are informative. AL can tackle this concern by
consuming methodology for data labeling. Providing labels to                                                  labeling only the most informative instances. To this end,
data instances is especially challenging for NTC techniques,                                                  a comprehensive set of querying strategies in AL has been
because one must consider several requirements in terms of                                                    proposed to determine the quality of instances for labeling
traffic data granularity in order to satisfy the desired traffic                                              [14].
classification objectives. One can, for example, refer to classes                                          • Concept Drift: Due to high dynamicity of computer
on the application level (i.e., Skype or Facebook), protocols                                                 networks, ML techniques must be re-trained frequently
level (i.e., TCP or HTTP), or at the service group level (i.e.,                                               because of various reasons, e.g., new network behavior
browsing or streaming) as typical examples of data granularity                                                and new classes of network traffic [75]. In most ML
[24]. Moreover, updating a traffic classification method is time-                                             techniques, such as DL, retraining a model from scratch
consuming. However, updating the models may be needed                                                         is a resource-intensive task in terms of time and power

IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 10

computation in addition to their need for huge amount B. Literature Review on using AL in NTC
of new data samples. Most well-known ML techniques
become useless in NTC as the network cannot be unat- In this section, we review existing work on the application
tended for a long time due to retraining purposes. AL is of AL in NTC.
able to (re-)train the models very fast with high accuracy Torres textitet al. [80] proposed a botnet detection technique
by continuous provisioning of new labeled instances. This based on AL. The authors provided a novel AL strategy to
is demonstrated in Section VI where AL performance is label network traffic that contains normal and botnet traffics.
evaluated with regard to the training time. The AL strategy is used to create a random forest model
• Dealing with the shortage of labeled data samples: In that benefits from the user’s previously-labeled instances. The
case of retraining, the number of labeled samples to train primary objective of the proposed technique is to help the user
the model is very limited due to the cost of labelling in the labeling process. Similarly, the work in [81] employed
process, e.g., time, complexity, need for domain knowl- AL for a security purpose, i.e., malware classification. In this
edge, etc. Most ML models, e.g., DL, need a considerable work, SVMs and AL by learning (AL) have been combined
amount of data to train. As shown in Section VI, AL to tackle the lack of labeled instances in malware detection.
can train the model with a high accuracy using a limited The simulation results reveal that using AL can enhance the
number of data samples. performance of classification in terms of accuracy and the
• Incremental Learning: Although AL is not essentially quality of labeled instances. In addition, the authors claimed
considered as an online learning technique, using the that by using different training algorithms, e.g., Generative
stream-based sampling can possibly turns it into an Adversarial Networks (GANs), one can solve issues such as
incremental learning technique to be adaptable with the the diversity of security-related datasets.
nature of highly dynamic networks. As most of conven- The work in [82] is another attempt to develop an accurate
tional and emerging networking paradigms are highly malware detection system. The system is based on AL, where
dynamic in different aspects, AL can be used to learn a new Structural Feature Extraction Methodology (SFEM) is
the behavior of network traffic online. In addition, pool- introduced to extract from docx files. The proposed system is
based sampling can help reduce the time complexity of able to identify new unknown malicious docx files. To have
learning from scratch, as the number of training samples an updatable detection model and identify new malicious files,
becomes limited. Although labeling is a time-consuming the system benefits from AL to update and complement the
task, using different query strategies based on the network signature database with new unknown malware.
traffic circumstances can reduce the time complexity of Common cybersecurity attack vectors, such as viruses, bot-
learning. nets, and malware are known for Intrusion Detection Systems
• Monitoring incoming stream traffic: Using passive learn- (IDSss). Nevertheless, malicious users continuously create
ing methods for NTC tasks, such as security and intrusion new attacks that can bypass the IDSss. Analyzing anoma-
detection is no longer reasonable, as these methods lous behaviors calls for a considerable amount of time and
cannot handle changes in the statistical characteristics effort. Preparing a significant of labeled data for the training
of the target data (i.e., concept drift). To address this process is both increasingly costly and inefficient, because
issue, one can investigate the great abilities of stream- of the continuous design of new attacks. In this case, one
based AL [76]. Several AL-based strategies have been can use AL to reduce the number of the required labeled
proposed to detect concept drift and instantly adapt to instances, while increasing the accuracy of anomaly detection.
evolving characteristics of data [77] [78] [79]. In [83], a semi-supervised IDS has been designed that works
• Addressing Theory of network: In Internet Engineering effectively with a small number of labeled instances. The
Task Force 97 (IETF97)1 , the challenge is introduced as proposed learning algorithm for the IDS benefits from two
networks suffer from the lack of a unified theory that can ML techniques, including AL Support Vector Machine (AL)
be applied to all networks. It means that the behaviors of and Fuzzy C-Means clustering. Furthermore, [83] reported
different networks are various based on their topology, that the proposed learning algorithm enables the IDSs to add
equipment, scale, applications, etc. Theory of Network new training instances with minimum computational overhead.
causes an important problem that ML techniques should Due to the fact that domain knowledge is required for the
be trained for each network separately. AL can be con- annotations of unlabeled instances, adopting new cost-effective
sidered as a suitable online learning choice in such cases labeling techniques is desired. To this end, the work in [84]
thanks to its ability to be learned by a limited number by Beaugnon et al. developed an interactive labeling strategy,
of data samples. This is beneficial for highly dynamic namely ILAB, to assist the experts in the labeling process of
networks with a huge volume of starting and stopping large intrusion detection datasets. ILAB adopts divide and con-
network traffics. AL also allows frequent retraining which quer approach to lower the computation cost. Deka et al. [85]
eliminates the necessity of using representative datasets. investigated the important role of AL in the selection of more
informative instances. Then, they used these instances to train
a binary IDSs for Distributed Denial of Service (DDoS) attack
classification. In addition, since there are massive amounts of
traffic in modern networks, a parallel computation method has
1 https://www.ietf.org/blog/reflections-ietf-97/ been employed. The authors referred to this fact that using AL

You can also read