LEASH: Enhancing Micro-architectural Attack Detection with a Reactive Process Scheduler

Page created by Lorraine Flores
 
CONTINUE READING
LEASH: Enhancing Micro-architectural Attack Detection
                                                                     with a Reactive Process Scheduler
                                                                                     Nikhilesh Singh, Chester Rebeiro
                                                                                   Indian Institute of Technology, Madras
                                                                                        {nik,chester}@cse.iitm.ac.in

                                                                   Abstract
                                            Micro-architectural attacks use information leaked through
                                         shared resources to break hardware-enforced isolation. These
                                         attacks have been used to steal private information ranging
arXiv:2109.03998v2 [cs.CR] 20 Sep 2021

                                         from cryptographic keys to privileged Operating System (OS)
                                         data in devices ranging from mobile phones to cloud servers.
                                         Most existing software countermeasures either have unaccept-
                                         able overheads or considerable false positives. Further, they
                                         are designed for specific attacks and cannot readily adapt to
                                         new variants.                                                       (a) LEASH stymies micro-architectural attacks by detecting malicious behav-
                                            In this paper, we propose a framework called LEASH, which        ior in programs and reducing its CPU-share, thereby reducing the leakage
                                         works from the OS scheduler to stymie micro-architectural           from the shared resource.
                                         attacks with minimal overheads, negligible impact of false
                                         positives, and capable of handling a wide range of attacks.
                                         LEASH works by starving maliciously behaving threads at
                                         runtime, providing insufficient time and resources to carry out
                                         an attack. The CPU allocation for a falsely flagged thread
                                         found to be benign is boosted to minimize overheads. To
                                         demonstrate the framework, we modify Linux’s Completely
                                                                                                             (b) LEASH detects malicious behavior by monitoring run time characteristics
                                         Fair Scheduler with LEASH and evaluate it with seven micro-         of all threads (t1 , t2 , t3 ) in the system using HPCs. Threads with a high
                                         architectural attacks ranging from Meltdown and Rowhammer           threat index get less CPU time. Thread t1 has persistent malicious behavior
                                         to a TLB covert channel. The runtime overheads are evaluated        and hence gets throttled. Thread t2 gets flagged for an epoch but recovers,
                                         with a range of real-world applications and found to be less        while thread t3 remains unaffected.
                                         than 1% on average.                                                            Figure 1: A high-level overview of LEASH.
                                         1. Introduction                                                        In a typical micro-architectural attack, the attacker runs a
                                         For several years, computer systems designers have relied           program called the spy that contends with a victim program
                                         on traditional hardware mechanisms such as protection rings,        for shared hardware resources such as a common Cache Mem-
                                         segmentation, page tables, and enclaves, to isolate software        ory [6, 28, 42, 45, 58], Branch Prediction Unit (BPU) [7, 23],
                                         entities executing in a CPU. These isolation mechanisms             Translation Lookaside Buffer (TLB) [26], or DRAM [37, 59].
                                         have been considerably weakened by a potent class of side-          The contention affects the spy’s execution time in a manner
                                         channel attacks known as micro-architectural attacks. A micro-      that correlates with the victim’s execution. If the victim’s
                                         architectural attack makes use of shared hardware resources to      execution pattern happens to depend on secret data, then, the
                                         leak information across isolation boundaries. They have been        correlation can be used to reveal it.
                                         used in a variety of applications, from creating covert chan-          Most schemes to prevent micro-architectural attacks work
                                         nels [32], retrieving secret keys of ciphers [13, 58], reading      either in the software or are incorporated in the hardware.
                                         Operating System data [38, 43], fingerprinting websites [65],       Hardware solutions include bypassing [73] or partition-
                                         logging keystrokes [64], reverse engineering Deep Learning          ing [35, 44, 70] the shared resource, or randomizing accesses
                                         algorithms [30], and breaking Address Space Layout Random-          to it [60, 61, 70]. The preventive solutions in software attempt
                                         ization [11, 27]. The last couple of years have seen these attack   to have constant-time implementations [76] or inject noise
                                         vectors applied on a range of devices from mobile phones [42]       in measurements to make the attack difficult [31, 46]. While
                                         to third-party cloud platforms [64]. They have been used to         software solutions have huge overheads and may be environ-
                                         leak secrets stored in Trusted Execution Environments like          ment specific, hardware solutions are not applicable to legacy
                                         SGX [15] and Trustzone [77] and attack remote computers             systems.
                                         using webpages that host malicious Javascript [55] or Web-             An alternate approach that promises to solve these prob-
                                         assembly [25].                                                      lems is to detect and thwart an attack in real time. The de-
tection is typically done using performance monitoring reg-              add-ons that can detect complex scenarios.
isters in the CPU called Hardware Performance Counters                   • We modify Linux’s Completely Fair Scheduler (CFS)
(HPCs) [8, 10, 16, 18, 50, 78]. On detection, the malicious-             with LEASH and evaluate its effectiveness with seven micro-
behaving thread is terminated or rescheduled to a different              architectural attacks including Prime+Probe on the L1
CPU. A severe limitation of current detection schemes is                 Data [56], L1 Instruction [6], and LLC caches [74]; L1 and
the false positives due to which benign processes are falsely            TLB Evict+Time attacks [26, 56], Rowhammer [3, 37], Melt-
flagged malicious. Contemporary works [8, 18, 50, 52, 78]                down [1, 43]. The entire framework just adds around 20 lines
attempt to reduce false positives by deploying complex Ma-               of C code in the Linux kernel to provide micro-architectural
chine Learning models. While this increases overheads, the               security for the entire system.
impact of the false positive rate is still unacceptably high. This       • We have evaluated LEASH with multiple benchmark
limitation is exacerbated by the fact that a micro-architectural         suites including the SPEC-2017 [17], SPEC-2006 [66],
attack occurs rarely compared to the number of benign pro-               SPECViewperf-13 [4] and STREAM [48] and found an aver-
grams executing. Thus, these detection techniques more often             age perfomance overhead of 1%.
impact benign program execution than mitigate an attack.                    The rest of the paper is organized as follows. Section 2
   In this paper, we present a micro-architectural attack detec-         presents the necessary background on micro-architectural at-
tion countermeasure called LEASH that has all the advantages             tacks, Hardware Performance Counters, and Linux sched-
of detection countermeasures with almost no impact of false              uler. In Section 3, we discuss the related works in micro-
positives. LEASH makes use of the observation that a spy                 architectural attack mitigation. Section 4 sums the key idea
thread in a micro-architectural attack needs to contend with             in the LEASH framework. In Sections 5 and 6 we discuss the
the victim for a shared resource. The success of the attack              implementation details and present the evaluation and results,
depends on the extent to which the spy can force this con-               respectively. Section 7 discusses the caveats and limitations
tention. If the spy gets insufficient time to execute on the CPU,        of LEASH . Section 8 concludes the paper.
then the information leakage is reduced, stymieing the attack.
Figure (1a) provides an analogy.                                         2. Background
   The LEASH framework works from the Operating System
                                                                         2.1. Micro-architectural attacks
(OS) scheduler. At every context switch, it uses the HPCs to
quantify the malicious behavior of each thread in the system             A micro-architectural side-channel attack leverages leakage
using a metric called threat index. It then allocates CPU                from shared micro-architectural resources in a system to glean
resources for the thread based on its threat index. A high value         sensitive information from a process. We illustrate a micro-
of threat index indicates that the process gets less CPU time.           architectural attack that uses a shared L1-data cache to create
For instance, in Figure (1b) thread t1 has a threat index that           a covert channel between a sender and a receiver process.
continuously increases. The time slice it obtains for execution          Apriori, the sender and receiver agree upon two cache sets
correspondingly reduces. If the thread were indeed a micro-              for communicating 0 and 1 respectively. The receiver first
architectural attack, the lesser CPU time would clamp the                performs memory operations that fill both cache sets. This is
amount of information that is leaked from the victim, thus               known as the prime phase. Depending on the message bit, the
stymieing the attack.                                                    sender performs a memory operation to evict the receiver’s
   A significant advantage of LEASH is in its handling of false          data from the corresponding cache set. In the probe phase, the
detection. Suppose a benign thread temporarily exhibits ma-              receiver performs the memory operations again and times the
licious behavior, its CPU quota falls temporarily, and over              accesses. Based on the access time, the receiver can infer the
time it regains its full CPU quota. Thread t2 in Figure (1b)             transmitted bit since the memory access to the evicted cache
demonstrates this recovery after the 4-th epoch. This is un-             set would take longer owing to the cache miss.
like contemporary works [8, 18, 50, 52, 78], where a false                  Later in the paper, we use this covert channel as a running
detection has adverse irreversible effects. With LEASH , the             example to explain LEASH and show how the transmission-rate
false detection would only result in a small overhead. With              between sender and receiver can be reduced by throttling the
the wide range of benchmarks we evaluated, the performance               receiver’s CPU time. Further, we apply LEASH to other micro-
overheads were found to be less than 1% on average.                      architectural attacks, including Meltdown [1, 43], Rowham-
The important contributions of this paper are as follows:                mer [3, 37], L1 data cache attacks on AES [56], L1 instruction
• LEASH is an OS scheduler based countermeasure for micro-               cache attacks on RSA [6], a TLB covert channel [26] and a
architectural attacks that has low overheads and negligible              prime-and-probe LLC covert channel [74].
impact of false positives. It can support a variety of micro-            2.2. Hardware Performance Counters
architectural attacks and can easily adapt to new variants.The
framework is highly versatile and can be tuned to detect multi-          Most modern processors have a Performance Monitoring Unit
ple different attacks simultaneously. It is designed to provide          on-chip to monitor micro-architectural events of running ap-
quick response, while still having features for sophisticated            plications. Each logical core has a dedicated set of 4 to 8

                                                                     2
configurable registers that can count the number of times a
particular event occurs in a given duration. These registers are
called Hardware Performance Counters (HPCs) and can be
used to monitor a wide range of events like CPU-cycles, cache
accesses, context-switches, and page faults. The number of
such events that can be monitored simultaneously is limited
by the number of HPCs available in the hardware.                         Figure 2: Effects of CPU-share to the receiver thread on the L1-
  Patterns present in the time series traces of performance              data cache covert channel’s Bandwidth and Error rate. LEASH
counter events have been shown to characterize the behavior              uses this as a premise to throttle attack programs on detec-
of a thread’s execution. This has been leveraged in several              tion.
applications such as detecting anomalous behavior in pro-
grams [40, 67, 72], side-channel attacks [10, 18, 50, 78] and            attacks from the Operating System, such as flushing shared re-
malware analysis [12, 21, 67]. LEASH, similarly uses HPCs to             sources [19, 32] and soft isolation of Virtual Machines sharing
compute a threat index for a thread. Threads which depict a              the same hardware [69]. Nomani and Szefer in [53] incorpo-
micro-architectural attack like behavior are given a high threat         rate a neural-network within the OS scheduler, to detect tasks
index and are throttled by reducing their CPU time.                      requiring the same functional units and scheduling them on
                                                                         different cores to minimize contention. Such solutions are
2.3. Process Scheduling                                                  likely to have scalability issues as the workloads increase.
 LEASH modifies the OS scheduling algorithm to consider the
                                                                            Most contemporary detection-based solutions use HPCs and
threat index of a thread for CPU resource allocation. While              Machine Learning (ML) models [8, 18, 50, 51, 52]. Among
we can incorporate LEASH in most scheduling algorithms, in               the several shortcomings [20] of these works, the most serious
this paper, we demonstrate LEASH with the Completely Fair                limitation is with the false positives which adversely affect
Scheduler (CFS); the default scheduler present in the Linux              benign program execution. Contemporary works use complex
kernel since Version 2.6 [41].                                           techniques like KNNs, decision trees, neural networks and en-
   The CFS algorithm tries to achieve the ideal multitasking             semble methods to reduce false positives which can still have
environment where threads with equal priorities receive the              unacceptable impacts. Furthermore, the complex techniques
same share of processor time, called timeslice. The timeslice            used have considerable overheads and could reduce the predic-
allocated to a thread t, denoted ∆ts , is a fraction of a pre-           tion rate leading to fast attacks completing undetected. Unlike
defined value called targeted latency (∆tl ). When multiple              these works, LEASH reacts to a flagged program by throttling
threads compete for CPU time, the scheduler allocate times-              its CPU-share, thus preventing the attack from completing.
lices in proportion to a metric called weight of the thread as           Contemporary works, on the other hand, would either migrate
follows                                                                  the program to another CPU or terminate it. The advantage
                                                                         we achieve with the feedback loop of LEASH is that falsely
                               w                                         flagged threads can recover and regain their CPU-share with
             ∆ts = ∆tl ×              = ∆tl × s ,             (1)
                           ∑threads w                                    only slight increase in execution time. Further, due to the
                                                                         reactive control loop, very simple detection mechanisms are
where w is the weight of the thread t, ∑threads w is the sum
                                                                         sufficient.
of weights of all the threads sharing the CPU, and s is the
relative weight of thread t. On thread creation, its weight
                                                                         4. The LEASH Framework
takes a default value (wDEF ) which lies in the middle of 40
discrete levels, ranging from wMIN to wMAX . The ratio of weights        Micro-architectural attacks depend considerably on CPU re-
at two consecutive levels γ, (0 < γ < 1) is determined by                sources. If the attack programs are starved of the CPU, the
the OS scheduler at design time. The relationship between                success drops considerably. Consider, for example, the L1-
these weight levels can be given as wMIN = γ 19 wDEF = γ 39 wMAX .       cache timing covert channel discussed in Section 2.1. Figure 2
The weight also gives a notion of thread priorities. A higher            shows that starving the receiver thread not only reduces the
weight value for thread implies a larger timeslice and a higher          communication bandwidth considerably but also increases the
frequency of getting scheduled for execution.                            transmission errors. The reduced bandwidth and the increased
                                                                         error is due to smaller CPU-time shares, which prevents the
3. Related work                                                          receiver from performing a sufficient number of timing mea-
There is a pressing need to develop lightweight, portable so-            surements.
lutions for micro-architectural attacks. Most preventive coun-              Along with being highly CPU-bound, micro-architectural
termeasures that have low overheads require hardware modifi-             attacks are procedural and repetitive by design, which gives
cations [22, 35, 39, 44, 54, 57, 60, 61, 68, 71, 73] and cannot          them a distinct execution behavior. For example, the receiver
be incorporated in existing systems. To achieve portability,             in the cache covert channel continuously executes a small
there have been a few attempts to tackle micro-architectural             set of instructions in a loop performing extensive memory

                                                                     3
Algorithm 1: Detector component in LEASH for thread
                                                                       t at the beginning of the i-th epoch.
                                                                      1    Initial State: At i = 0, Ti = 0 and flag = 0. Global:
                                                                                   Event count threshold: τ, default weight: wDEF .
                                                                                   Input: Thread t = {wi−1 , Ti−1 , HPCBuf}.
                                                                           Result: Threat index for i-th epoch: Ti , flag
                                                                      2    begin
                                                                       3       µ = mean(HPCBuf)
Figure 3: The LEASH reactive loop. The Sensor logs the HPC             4       if flag = 1 then
values for a thread t in HPCBuf and passes it to the Detec-            5           Ti = Ti−1 + S(t) // threat index update
tor which computes the threat index for t. It then passes the          6

threat index and the mean of HPCBuf to the Supervisor which            7          if wi−1 = wDEF then
adjusts the control value (v) for the next epoch, and to the           8              flag = 0 // Thread t is unflagged
Actuator which updates the weight of t as per its threat index.        9              Ti = 0
This loop continues throughout the runtime of t.
                                                                      10      if (flag = 0 and µ ≥ τ ) then
                                                                      11           flag = 1 // Thread t is flagged
                                                                      12      return (Ti , flag)

                                                                      is bounded by the number of available HPC registers in the
                                                                      hardware. Reducing the number of HPCs logged would reduce
                                                                      context-switch overheads, while increasing the number of
                                                                      HPCs logged would widen the scope of the attacks that can be
                                                                      detected. Section 5 discusses this further.
                                                                      Detector. At every N context switches (called an epoch) for the
Figure 4: LEASH framework effects on an L1-data cache
                                                                      thread, the Detector D() is triggered and uses Algorithm 1 to
covert channel (left column) and a benign program bzip2 (right
                                                                      compute the new threat index for the thread. The threat index
column) from the SPEC-2006 benchmark suite.
                                                                      in the i-th epoch (Ti ) depends on the mean of the HPCBuf
operations. Due to the continuous probing, the number of
                                                                      values (µ), the event-specific threshold (τ), the threat index of
memory accesses by the receiver increases to be significantly
                                                                      the previous epoch Ti−1 and the control value vi passed by the
higher compared to a regular thread. LEASH uses HPCs to
                                                                      Supervisor function S() as,
detect such anomalous behavior and penalizes such threads by
decreasing their weight, which in turn reduces their timeslice                           Ti = D(HPCBuf, Ti−1 , vi ) .                 (2)
(Equation 1). If the thread stops exhibiting the anomalous               LEASH flags threads which begin to behave maliciously
behavior, its weight is gradually increased, thus regaining its       (Line 11 in Algorithm 1). This happens when the mean µ of
regular timeslice. To achieve this, LEASH has four components:        an unflagged thread first violates the threshold. For every sub-
a Sensor, Detector D(), Actuator A(), and a Supervisor S()            sequent epoch for this thread, where µ exceeds the threshold,
as shown in Figure 3.                                                 its threat index is modified as per the output of the Supervisor
Sensor. At every context switch, LEASH reads the values of            function S() described in Equation 5 (Line 6). LEASH allows a
HPCs and logs the event counts since the last context switch.         flagged thread to recover if it stops behaving maliciously. The
The log is stored in a buffer of size N, called HPCBuf, in the        recovery starts when µ for the flagged thread first falls below
context of the scheduled-out thread. Thus, every thread exe-          the threshold. The thread’s threat index is then reduced in
cuting in the CPU has a time-series trace for each HPC logged         subsequent epochs until the thread reattains the default weight,
during its execution. For example, the first row in Figure 4          which restores its CPU share. When this happens, the thread
shows the partial time-series traces for the L1-D covert chan-        is unflagged and its threat index is reset (Lines 8 and 9).
nel described in Section 2.1 and the bzip2 application from              Figure 4 shows two threads, a L1 Data Cache Covert Chan-
the SPEC-2006 benchmark suite. The counter corresponds to             nel and a benign program bzip2 from the SPEC-2006 [66]
an event that counts the delay cycles due to DSB (Decoded             benchmark suite. While both threads, with counter values
Stream Buffer) to MITE (Micro Instruction Translation En-             greater than the threshold get flagged by LEASH , the benign
gine) switches on an Intel i7-3770 processor.                         thread bzip2 recovers its fair share of CPU resources (in the
   The counters to log are determined prior to the deployment         14th epoch) when its counter values go below the threshold.
of LEASH based on the execution characteristics of the attack         The CPU share for the covert channel is never restored as its
vectors considered. The number of counters that can be logged         counter values always remains over the threshold. Thus the

                                                                  4
covert channel is never unflagged. Its threat index saturates             Algorithm 2: Context Switching in LEASH
when the minimum weight is reached.                                        Input: Task prev getting preempted from runqueue rq
Actuator. For each thread in the i-th epoch, the Actuator                         by task next, per thread counter cs_count to
receives the threat index (Ti ) from the Detector and modifies                    log the number of context switches.
the thread’s weight wi and in turn, the relative weight si                 Result: rq: prev reweighed, next scheduled, Logged
(Equation 1) for the thread. The relative weight for the thread                    HPC data for both
                                                                         1 begin
at the beginning of the i-the epoch is given as
                                                                         2    if threads in rq = 1 and flag = 1 then
         si = A(si−1 , Ti ) = si−1 − γ(si−1 ) × Ti ,          (3)        3        wake up dummy thread
where is γ, (0 < γ < 1) is a constant fixed by the OS sched-              4       if prev.start then
uler which determines the amount of fall in the weight with               5           prev.end ← read event count
every increase in the threat index. In our evaluation platform,                        prev.HPCBuf[prev.cs_count]
γ = 0.1 which means that, for every rise in threat index val-             6                  ← (prev.end − prev.start)
ues, the weight drops by 10% until it reaches wMIN . Similarly,           7           increment prev.cs_count
when a thread is recovering, the threat index value is negative           8       if prev.cs_count == N then
and hence every fall in threat index value increases its weight           9           Detector(prev) // Algorithm 1
by 10% until its weight is restored. The adaptable design                10           Reset HPCBuf, prev.cs_count
of LEASH efficiently brings down the cost of a false penaliza-
                                                                         11       CFS context switching steps
tion. Once a benign thread, which is erroneously flagged, is
                                                                         12       next.start ← read event count
unflagged, it regains its CPU share and executes without any
                                                                         13       return rq
additional overheads.

Supervisor. LEASH’s Detector identifies threads that are pe-             5. Implementation
nalized and rewarded. Ideally, this decision should be based on
careful analysis of the HPCBuf data so that malicious threads            This section provides the implementation details of LEASH
are quickly identified and penalized while benign threads are            and some of its primary design choices.
not affected. However, supporting sophisticated analysis in the          5.1. Sensing Program Behavior
Detector is difficult because it executes in a context switch and
therefore affects the performance of the entire system. Thus,            Most modern processors have 4 to 8 Hardware Performance
as seen in Algorithm 1, the Detector is kept very simple and             Counters (HPCs) per logical core that can be configured to
comprises of about 6 instructions that are executed at every             count one of several micro-architectural events such as cache
epoch (N context switches).                                              misses, branch events, or loads and stores to various levels of
   To support complex analysis on the performance counter                memory. To enhance readability, we label 40 selected events
data, LEASH uses a Supervisor that does not need to execute              as e1 , e2 , · · · , e40 in the paper. We describe these events and
in the context switch. The Supervisor provides thread specific           associated masks defined by the micro-architecture in detail
control value (vi ) that represents the amount by which the              in the Appendix.
thread index Ti of a flagged thread is degraded or upgraded in
                                                                         5.2. Enhancing the CFS Scheduler for LEASH
the i-th epoch. The output of the S(t) for a thread t is given
by                                                                       In the Linux kernel’s CFS implementation, the context switch
             vi = S(vi−1 , Ti−1 , Ti−2 , Ti−3 , · · ·) .       (4)       is implemented in a function called context_switch1 . The
                                                                         function switches context from a thread called prev to a thread
   A general framework for the Supervisor function for thread            called next as shown in Algorithm 2. To support LEASH, the
t involves a penalty function P(t) and reward function R(t),             context_switch function is modified to perform three addi-
as given in Equation( 5). Such a design allows the Supervisor            tional operations; (a) it schedules a dummy thread if needed;
to be implemented with different policies                                (b) it logs the performance counter data in the scheduled out
           
                                                                         thread’s task structure; and (c) it executes the detector when
           P(t), µ > τ (t is malicious)
           
  S(t) = R(t), t is flagged and µ ≤ τ (recovering) (5)                   the HPCBuf gets full. We describe each of these operations.
           
             0,     thread t is unflagged,
           
                                                                         Dummy Thread. As described in Equation (1), the targeted
                                                                         latency ∆tl is shared among the threads in the Run Queue
where µ is the mean of HPCBuf and τ is the threshold. Config-            (rq), proportional to their weights. However, when there is
urable Supervisor policies allow LEASH to handle new attack              only one thread in the Run Queue, the timeslice it receives is
vectors. Section 5.3 provides more details about the Supervi-
sor.                                                                          1 This   function is defined in kernel/sched/core.c in the kernel source.

                                                                     5
5.3. Supervisor Policy
   executed per context switch

                                                                                                        Bandwidth (bits/second)
     Additional instructions

                                 22                           Overheads
                                                              Bandwidth                            0.010                              The Supervisor in LEASH manages the control values, as shown
                                 20
                                                                                                                                      in Equation (5). Algorithm 1 shows how these values affect
                                 18                                                                0.005                              the threat index and, in turn, the weight of a flagged thread.
                                 16
                                                                                                                                      The key components of the Supervisor are the penalty (P(t))
                                 14                                                                0.000
                                          1   2   3       4   5     6     7   8        9      10                                      and reward (R(t)) functions that respectively provide the rate
                                                              log2(N)                                                                 of degradation and recovery for a flagged thread.
  Figure 5: The number of additional instructions executed per                                                                           The penalty and reward functions are pluggable. The sim-
  context switch and L1-data cache covert channel bandwidth                                                                           plest form is a static policy, where the outputs of the functions
  with different values N.                                                                                                            are constants. Such a policy is agnostic to the thread execution
                                                                                                                                      history. In contrast, an adaptive policy uses the thread execu-
                        1
  (normalized)

                                                                                                                                      tion history. An example of such policy is P(t) = P(t) + 1 and
     Counts

                                                                                       Threshold
                                                                                                                                      R(t) = R(t) + 1 which increments the penalty values every
                        0
                                                                                                                                      flagged epoch and the reward value every recovering epoch.
                   20
threat index

                                                                                                   Static                             Figure 6, shows the difference between the two policies for a
                        0                                                                          Adaptive
                                                                                                                                      fragment of the benign bzip2 execution, where the counter
               −20                                                                                                                    briefly crossed the threshold for seven epochs. With the static
                      1
(normalized)

                                                                                                                                      policy, the threat index is increased with the same rate, inde-
   Weight

                                                                                                   Static
                                                                                                   Adaptive                           pendent of the previous epochs. Similarly, when the counter
                      0
                                      0               5          10               15               20                                 drops below the threshold, the threat index decreases with
                                                                 Epochs
                                                                                                                                      the same rate until the thread is unflagged. With the adaptive
   Figure 6: The threat index and weight of bzip2 violating the                                                                       policy, the output of P(t) and hence the rate of increase of
   event threshold for a few epochs, with Static and Adaptive Su-                                                                     threat index (Algorithm 1), is incremented in every flagged
   pervisor policies.                                                                                                                 epoch. Similarly, when the counter drops below the threshold,
  equal to ∆tl and is independent of the weight. Penalizing the                                                                       the threat_index is reduced much more quickly. As seen in
  thread, in this case, does not affect the timeslice.                                                                                Figure 6, recovery for this benign thread takes 2 epochs less
                                                                                                                                      compared to the static policy.
     To solve this, LEASH uses a special thread called dummy
  thread. The thread starts when the system boots, runs an                                                                            6. Evaluation and Results
  infinite loop in privileged mode. It is signaled to wake up only                                                                    Every micro-architectural attack is characterized by two as-
  when the Run Queue has exactly one thread and that is flagged                                                                       pects. The first is the micro-architectural component that
  (Lines 2-3 in Algorithm 2). In all other cases, the dummy                                                                           leaks information, for example, shared resources like the
  task continues to sleep. The dummy task thus ensures that the                                                                       L1 data [13, 56] and instruction [6] caches, the Last Level
  weights of a thread always influence its timeslice.                                                                                 Cache (LLC) [45, 75, 74], Translation Lookaside Buffer
                                                                                                                                      (TLB) [26], DRAM [37], and Branch Prediction Units
                                                                                                                                      (BPU) [23, 14]. The second is the attack strategy used to
  Log Buffer. The log buffer, HPCBuf, is a per-thread buffer of                                                                       extract information from the leaking resources. Several strate-
  size N. It logs the event counter values and when full, marks                                                                       gies such as Prime+Probe [5, 6, 45], Evict+Time [36, 33],
  the end of an epoch and triggers the Detector. The value of                                                                         Flush+Reload [75], and time-driven attacks [13, 23] have
  N directly affects the overheads and effectiveness of LEASH.                                                                        been proposed for this purpose. We evaluate LEASH with
  Figure 5 shows how the average overhead per context switch                                                                          seven different micro-architectural attacks chosen to get a
  decreases with an increase in N. This is because a small N,                                                                         wide coverage of leaking components and attack strategies.
  implies more frequent calls to the Detector (Lines 9-10 in                                                                          Table 1 provides the details of attacks evaluated with some
  Algorithm 2), increasing the amortized number of additional                                                                         example events to detect them. Further in this section we
  instructions executed per context-switch.                                                                                           discuss the deployment of LEASH on a system and present the
     A small value of N also allows LEASH to perform the de-                                                                          evaluation results for various attacks. We then present the over-
  tection of malicious threads at a finer granularity. Figure 5                                                                       heads on various workloads including the SPEC-2006 [66],
  shows that as N increases, the bandwidth of the covert channel                                                                      SPEC-2017 [17], SPECViewperf-13 [4] and STREAM [48]
  also increases. This is because the coarse detection granular-                                                                      benchmark suites.
  ity allows the covert channel to execute for sufficient time to                                                                        The evaluation is done on an Intel Core i7-3770 processor,
  transfer information from sender to receiver. As seen in the                                                                        which deploys Intel’s Ivy Bridge micro-architecture. The
  Figure 5, a buffer size between 16 and 64 entries provides a                                                                        processor has four physical cores, each with two hyperthreads.
  good trade-off between the overheads and the effectiveness                                                                          We patch the Linux kernel, version 4.19.2, with LEASH and
  of LEASH.                                                                                                                           boot Ubuntu 16.04 Operating System. We configure LEASH

                                                                                                                                  6
Table 1: Micro-architectural attacks evaluated; their leaking component; and the attack strategy used. For our evaluation we have
used e2 , e11 , e12 , and e39 .

                                    Attack                                  Leaking                                   Strategy                                           Example Event(s)
                                                                           Component                                                                                    to detect the attack
      L1-D cache Covert                                                                                                                    e12 : cycles for which the instruction decode queue is empty, or e26 :
                                                                           L1-D cache                           Prime+Probe
         Channel [56]                                                                                                                      writebacks from L1D to L2 cache lines in modified coherency state.
 L1-D cache attack on AES [56]                                             L1-D cache                            Evict+Time              e12 , or e28 : L2 store read for ownership (RFO) requests from the thread.
   L1-I cache attack on RSA                                                                                                               e39 : the number of prefetcher requests that miss the L2 cache, or e13 :
                                                                           L1-I cache                           Prime+Probe
             [5, 6]                                                                                                                                         the number of instruction cache misses,
   LLC Covert Channel [74]                                                    LLC                               Prime+Probe                        e39 , or e37 : the cache misses for references to the LLC
                                                                                                                                         e2 : the mispredicted branch instructions at retirement, or e15 : direct and
                              Meltdown [1, 43]                              Memory                          Flush+Reload
                                                                                                                                                                   indirect near call instructions
      Rowhammer [3, 37]                                                     DRAM                      Hammer access                                e2 , or e17 : the dirty L2 cache lines evicted by demand.
    TLB Covert channel [26]                                                  TLB                       Evict+Time                                                e11 : the misses in all TLB levels

Figure 7: A heatmap of the Detectability Score of 40 selected events for each of the attacks in Table 1. Identically annotated
events for multiple attacks can be used to detect all those attacks, such as event e2 for Rowhammer and Meltdown.

                                                                                                                                                                                    10
                                                                                                                                                                   Bits retrieved
                                                                                                                                                                                             CFS
                                                                                                                                                                                             Leash
                                                                                                                                                                                    5

                                                                                                                                                                                    0
                                                                                                                                                                                         0   250     500     750 1000 1250 1500 1750
                                                                                                                                                                                                            Time (seconds)
                               (a) L1-data cache attack on AES                                                  (b) L1-instruction cache attack on RSA                   (c) LLC covert channel transmitting 10 bits.
    Number of bit-flips

                                                                                         Characters retrieved

                          6        CFS                                                                          150                                                                          CFS
                                                                                                                                                                   Bits retrieved

                                                                                                                           CFS
                                   Bit flips                                                                               Leash                                                    10       Leash
                          4                                                                                     100
                                   Leash
                          2                                                                                     50                                                                  5
                          0                                                                                      0                                                                  0
                               0   25          50   75   100   125   150    175   200                                  0     10      20      30     40   50   60                         0   50      100    150   200   250    300   350   400
                                                     Iterations                                                                      Time (in seconds)                                                     Time (in seconds)
                               (d) Rowhammer for 200 iterations                                                                    (e) Meltdown                                               (f) TLB covert channel
                                                                                  Figure 8: The effects of LEASH on different attacks.

with an adaptive Supervisor policy with P(t) = P(t) + 1 and                                                                                  programs. In the subsequent sections, we present techniques
R(t) = R(t) + 1 (Section 5.3) for the evaluation.                                                                                            to choose such events and the evaluation results.

6.1. Deploying LEASH                                                                                                                         Selecting HPCs. As a representative for benign programs, we
                                                                                                                                             use all programs of a CPU benchmark suite that we assume
Prior to the deployment of LEASH on a system, the designer                                                                                   cover a variety of benign program behavior. For each micro-
models the threat in terms of a set of attack vectors. Events                                                                                architectural attack, we analyze all supported events and rank
need to be chosen to detect all attack vectors in the set. While                                                                             them based on distinguishability from the benign set using
typical microprocessors have a large number of countable                                                                                     Principal Component Analysis [24]. Figure 7 summarises
events, they have a limited number of configurable Hardware                                                                                  the distinguishability score for the 7 attack vectors in Table 1
Performance Counter (HPC) registers. Our evaluation plat-                                                                                    using SPEC-2006 CPU benchmark suite as the set of benign
form for instance, has 208 events and four configurable HPC                                                                                  programs [66] for the best 40 events of the 208 supported on
registers [34]. The designer thus has to choose at most four                                                                                 our platform. Since there are only 4 HPC registers available,
events that can best distinguish the set of attacks from benign                                                                              we configure at most 4 events that can best distinguish the 7

                                                                                                                                         7
30
                                                                       Base on CFS
Bits transmitted

                   20                                                  Variant1 on CFS
                                                                       Variant2 on CFS
                                                                       Base on Leash
                   10                                                  Variant1 on Leash
                                                                       Variant2 on Leash
                   0
                        0         100    200    300       400    500     600       700
                                                Time (seconds)                                 Figure 10: Performance overheads due to LEASH on various
                                                                                               benchmark suites.
Figure 9: Effects of LEASH on multiple variants of the L1-data
cache covert channel transmitting 30 bits.
                                                                                               STREAM [48]. SPEC-2006 and SPEC-2017 are CPU bench-
attacks. From Figure 7, we notice that events e2 , e11 , e12 , and
                                                                                               mark suites with different integer and floating-point programs
e39 satisfy this requirement. Event e2 covers attacks Meltdown
                                                                                               like Machine Learning algorithms. SPECViewperf-13 is a
and Rowhammer event e11 covers attacks L1-I Cache attack
                                                                                               collection of graphics-oriented benchmarks programs, while
and TLB Covert Channel, event e12 covers attacks L1-D Cover
                                                                                               STREAM is designed to perform memory-intestive tasks. The
Channel and L1-D attack on AES while the event e39 can cover
                                                                                               overheads seen in Figure 10 is due to (1) the few additional
the attacks LLC Covert Channel and Rowhammer. Many other
                                                                                               instructions in each context switch (Figure 5) and (2) benign
such event combinations are also possible.
                                                                                               threads being falsely flagged. In contrast to contemporary
                                                                                               detection counter-measures [8, 10, 16, 18, 50, 78], all falsely
Detecting Attacks. LEASH is able to throttle all the attacks in
                                                                                               flagged programs recover and are not adversely affected. The
Table 1. Figure 8 describes the impact of LEASH on these at-
                                                                                               only case where over 25% overheads occurs is blender_r,
tacks. For example, the guessing entropy [47] for 1 byte of an
                                                                                               a 3D rendering program, which is falsely flagged in 30% of
T-table based AES key [2], using the L1-D Evict+Time attack
                                                                                               its epochs due to the prolonged malicious looking behavior.
increases from 10 to 131 (Figure 8a). The error in guessing
                                                                                               Out of the programs evaluated, over 45% were flagged at-
1-bit of an RSA key used in a square-and-multiply implemen-
                                                                                               least once. However, run-time overheads on average across
tation [49] increases to 50% (Figure 8b). In both cases, the
                                                                                               all benchmarks remain low at 1%. Another reason for the low
attacks are as good as an attacker that randomly guesses the
                                                                                               overheads is that LEASH executes during a context switch in
key. The covert channels at LLC [74] and TLB [26] see a
                                                                                               the kernel. Unlike [10, 16, 18, 50, 78], it does not require a
drastic fall in the number of bits communicated after getting
                                                                                               background task that needs scheduling and is independent of
throttled by LEASH (Figure 8c and 8f). The Rowhammer [3]
                                                                                               the system load.
attack, which induces a bit-flip in a DRAM2 row every 29
iterations (on average) is unable to cause a single bit-flip even
after a day of execution (Figure 8d). Further, the Meltdown [1]                                7. Caveats and Limitations
attack that dumps contents of the kernel memory, is throttled
by LEASH to become ineffective (Figure 8e).
                                                                                               • Time-Driven Attacks. Unlike the attacks discussed in Sec-
                                                                                               tion 6, in a time-driven attack, the attacker uses the victim’s
LEASH    with Attack Variants. While configuring LEASH,
                                                                                               execution time [14, 62, 63].LEASH can not detect such attacks
events are chosen by profiling a specific realization of an
                                                                                               because of the absence of a spy thread to flag.
attack. However, with the same configuration LEASH can de-
                                                                                               • High-resolution Attacks. If a high-resolution attack like [9]
tect different variants of the attacks as well. For example, the
                                                                                               works in less time than an epoch, LEASH would not be able
covert channel described in Section 2.1, can have multiple
                                                                                               to stop it unless the epoch is shortened. However, this can
variants by tweaking the communication protocol. For ex-
                                                                                               increase the overheads (Figure 5).
ample, we design a Variant 1, which transmits redundant
                                                                                               • Undetectable Attacks. While the attacks evaluated were
bits for reliability while a Variant 2, which uses multiple
                                                                                               identifiable by HPCs, there may be attacks that evade them.
sets to transmit two bits at a time. LEASH can detect these
                                                                                               LEASH would not be able to thwart such attacks.
variants of the covert channel as seen in Figure 9. Without
                                                                                               • Complex adversarial attacks on LEASH. While the Supervi-
LEASH, the bandwidth is 0.43 bits/second for the base variant,
                                                                                               sor component in LEASH can be designed to detect multiple
1.76 bits/second for Variant 1 and to 0.087 bits/second for
                                                                                               adversarial attacks, there may be complex strategies such as us-
Variant 2. With LEASH, all three covert channels are stymied
                                                                                               ing distributed colluding threads [29]. The Supervisor would
and almost no bit gets transmitted correctly.
                                                                                               need to be enhanced to handle such distributed attacks.
6.2. Performance Overheads                                                                     • Detecting a large number of attacks. As the attack vectors
                                                                                               to detect increase, the number of events to be counted may
We evaluated LEASH with several benchmark suites including                                     exceed the HPCs available in the hardware, requiring multi-
SPEC-2006 [66], SPEC-2017 [17], SPECViewperf-13 [4] and                                        plexing.This may increase context switch times and reduce
                   2 Transcend   DDR3-1333 645927-0350 DRAM chip.                              the precision of the results.

                                                                                           8
8. Conclusions                                                                   [14] Sarani Bhattacharya and Debdeep Mukhopadhyay. Who watches the
                                                                                      watchmen?: Utilizing performance monitors for compromising keys of
Detection based countermeasures for micro-architectural at-                           RSA on intel platforms. In Cryptographic Hardware and Embedded
                                                                                      Systems - CHES 2015 - 17th International Workshop, Saint-Malo,
tacks are promising as they can adapt easily to a wide range                          France, September 13-16, 2015, Proceedings, pages 248–266, 2015.
of attacks. However, a major shortcoming is the unaccept-                        [15] Ferdinand Brasser, Urs Müller, Alexandra Dmitrienko, Kari Kosti-
                                                                                      ainen, Srdjan Capkun, and Ahmad-Reza Sadeghi. Software grand
ably high false positive rate. Contemporary research attempts                         exposure: SGX cache attacks are practical. In Proceedings of the
to address this limitation on by deploying sophisticated anal-                        11th USENIX Conference on Offensive Technologies, WOOT’17, pages
                                                                                      11–11, Berkeley, CA, USA, 2017. USENIX Association.
ysis techniques. While this has found to only marginally
                                                                                 [16] Samira Briongos, Gorka Irazoqui, Pedro Malagón, and Thomas Eisen-
improve false positives, the overheads and latencies are af-                          barth. Cacheshield: Detecting cache attacks through self-observation.
fected considerably due to complex techniques used. Unlike                            In Proceedings of the Eighth ACM Conference on Data and Applica-
                                                                                      tion Security and Privacy, CODASPY 2018, Tempe, AZ, USA, March
these approaches, the reactive design of LEASH facilitates                            19-21, 2018, pages 224–235, 2018.
lightweight analysis which allows falsely flagged threads to                     [17] James Bucek, Klaus-Dieter Lange, and Jóakim v. Kistowski. Spec
recover quickly with minimum overheads of less than 1%                                cpu2017: Next-generation compute benchmark. In Companion of the
                                                                                      2018 ACM/SPEC International Conference on Performance Engineer-
while still accurately detecting attacks. To the best of our                          ing, ICPE ’18, page 41–42, New York, NY, USA, 2018. Association
knowledge, LEASH is the first reactive countermeasure for                             for Computing Machinery.
                                                                                 [18] Marco Chiappetta, Erkay Savas, and Cemal Yilmaz. Real time detec-
micro-architectural attacks in the Operating System scheduler                         tion of cache-based side-channel attacks using hardware performance
and is therefore, invariant to system load. Additionally, it                          counters. Appl. Soft Comput., 49:1162–1174, 2016.
just adds around 15 instructions per context switch thus has                     [19] David Cock. Practical probability: Applying pgcl to lattice scheduling.
                                                                                      In Interactive Theorem Proving - 4th International Conference, ITP
a negligible impact on context switch latencies. It thus opens                        2013, Rennes, France, July 22-26, 2013. Proceedings, pages 311–327,
avenues to security aware OS scheduler designs.                                       2013.
                                                                                 [20] Sanjeev Das, Jan Werner, Manos Antonakakis, Michalis Polychronakis,
                                                                                      and Fabian Monrose. Sok: The challenges, pitfalls, and perils of using
References                                                                            hardware performance counters for security. In 2019 IEEE Symposium
                                                                                      on Security and Privacy, SP 2019, San Francisco, CA, USA, May 19-23,
 [1] Meltdown Implementation - Institute of Applied Information Process-              2019, pages 20–38, 2019.
     ing and Communications (IAIK). Accessed: 2020-08-13.
                                                                                 [21] John Demme, Matthew Maycock, Jared Schmitz, Adrian Tang, Adam
 [2] OpenSSL T-table implementation of AES. Accessed: 2020-08-13.                     Waksman, Simha Sethumadhavan, and Salvatore J. Stolfo. On the
 [3] Rowhammer test implementation - Google. Accessed: 2020-08-13.                    feasibility of online malware detection with performance counters. In
 [4] SPECViewperf 13 Linux Edition Benchmark.                                         The 40th Annual International Symposium on Computer Architecture,
                                                                                      ISCA’13, Tel-Aviv, Israel, June 23-27, 2013, pages 559–570, 2013.
 [5] Onur Aciiçmez. Yet another microarchitectural attack: : exploiting
     i-cache. In Proceedings of the 2007 ACM workshop on Computer                [22] Leonid Domnitser, Aamer Jaleel, Jason Loew, Nael B. Abu-Ghazaleh,
     Security Architecture, CSAW 2007, Fairfax, VA, USA, November 2,                  and Dmitry Ponomarev. Non-monopolizable caches: Low-complexity
     2007, pages 11–18, 2007.                                                         mitigation of cache side channel attacks. TACO, 8(4):35:1–35:21,
                                                                                      2012.
 [6] Onur Aciiçmez, Billy Bob Brumley, and Philipp Grabher. New re-              [23] Dmitry Evtyushkin, Ryan Riley, Nael B. Abu-Ghazaleh, and Dmitry
     sults on instruction cache attacks. In Cryptographic Hardware and                Ponomarev. Branchscope: A new side-channel attack on directional
     Embedded Systems, CHES 2010, 12th International Workshop, Santa                  branch predictor. In Proceedings of the Twenty-Third International
     Barbara, CA, USA, August 17-20, 2010. Proceedings, pages 110–124,                Conference on Architectural Support for Programming Languages and
     2010.                                                                            Operating Systems, ASPLOS 2018, Williamsburg, VA, USA, March
 [7] Onur Aciiçmez, Çetin Kaya Koç, and Jean-Pierre Seifert. Predicting               24-28, 2018, pages 693–707, 2018.
     secret keys via branch prediction. In Topics in Cryptology - CT-RSA
     2007, The Cryptographers’ Track at the RSA Conference 2007, San             [24] Karl Pearson F.R.S. Liii. on lines and planes of closest fit to systems
     Francisco, CA, USA, February 5-9, 2007, Proceedings, pages 225–242,              of points in space. The London, Edinburgh, and Dublin Philosophical
     2007.                                                                            Magazine and Journal of Science, 2(11):559–572, 1901.
 [8] Manaar Alam, Sarani Bhattacharya, Debdeep Mukhopadhyay, and                 [25] Daniel Genkin, Lev Pachmanov, Eran Tromer, and Yuval Yarom. Drive-
     Sourangshu Bhattacharya. Performance counters to rescue: A machine               by key-extraction cache attacks from portable code. In Applied Cryp-
     learning based safeguard against micro-architectural side-channel-               tography and Network Security - 16th International Conference, ACNS
     attacks. IACR Cryptology ePrint Archive, 2017:564, 2017.                         2018, Leuven, Belgium, July 2-4, 2018, Proceedings, pages 83–102,
 [9] Gorka Irazoqui Apecechea, Mehmet Sinan Inci, Thomas Eisenbarth,                  2018.
     and Berk Sunar. Wait a minute! A fast, cross-vm attack on AES.              [26] Ben Gras, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida. Trans-
     In Angelos Stavrou, Herbert Bos, and Georgios Portokalidis, editors,             lation leak-aside buffer: Defeating cache side-channel protections with
     Research in Attacks, Intrusions and Defenses - 17th International                TLB attacks. In 27th USENIX Security Symposium, USENIX Secu-
     Symposium, RAID 2014, Gothenburg, Sweden, September 17-19, 2014.                 rity 2018, Baltimore, MD, USA, August 15-17, 2018., pages 955–972,
     Proceedings, volume 8688 of Lecture Notes in Computer Science,                   2018.
     pages 299–319. Springer, 2014.                                              [27] Ben Gras, Kaveh Razavi, Erik Bosman, Herbert Bos, and Cristiano
                                                                                      Giuffrida. ASLR on the line: Practical cache attacks on the MMU.
[10] Zelalem Birhanu Aweke, Salessawi Ferede Yitbarek, Rui Qiao, Reetu-               In 24th Annual Network and Distributed System Security Symposium,
     parna Das, Matthew Hicks, Yossi Oren, and Todd Austin. Anvil:                    NDSS 2017, San Diego, California, USA, February 26 - March 1, 2017,
     Software-based protection against next-generation rowhammer attacks.             2017.
     In Proceedings of the Twenty-First International Conference on Archi-       [28] Daniel Gruss, Raphael Spreitzer, and Stefan Mangard. Cache template
     tectural Support for Programming Languages and Operating Systems,                attacks: Automating attacks on inclusive last-level caches. In Proceed-
     ASPLOS ’16, pages 743–755, New York, NY, USA, 2016. ACM.                         ings of the 24th USENIX Conference on Security Symposium, SEC’15,
[11] Antonio Barresi, Kaveh Razavi, Mathias Payer, and Thomas R. Gross.               pages 897–912, Berkeley, CA, USA, 2015. USENIX Association.
     CAIN: silently breaking ASLR in the cloud. In 9th USENIX Workshop           [29] David Gullasch, Endre Bangerter, and Stephan Krenn. Cache games –
     on Offensive Technologies, WOOT ’15, Washington, DC, USA, August                 bringing access-based cache attacks on aes to practice. In Proceedings
     10-11, 2015., 2015.                                                              of the 2011 IEEE Symposium on Security and Privacy, SP ’11, pages
[12] Kanad Basu, Prashanth Krishnamurthy, Farshad Khorrami, and                       490–505, Washington, DC, USA, 2011. IEEE Computer Society.
     Ramesh Karri. A theoretical study of hardware performance counters-         [30] Sanghyun Hong, Michael Davinroy, Yigitcan Kaya, Stuart Nevans
     based malware detection. IEEE Trans. Information Forensics and                   Locke, Ian Rackow, Kevin Kulda, Dana Dachman-Soled, and Tudor
     Security, 15:512–525, 2020.                                                      Dumitras. Security analysis of deep neural networks operating in the
[13] Daniel J. Bernstein. Cache-timing Attacks on AES, 2005.                          presence of cache side-channel attacks. CoRR, abs/1810.03487, 2018.

                                                                             9
[31] Wei-Ming Hu. Reducing timing channels with fuzzy time. In Proceed-               [51] Maria Mushtaq, Ayaz Akram, Muhammad Khurram Bhatti, Rao
     ings of the 1991 IEEE Symposium on Security and Privacy, Oakland,                     Naveed Bin Rais, Vianney Lapotre, and Guy Gogniat. Run-time
     California, USA, May 20-22, 1991, pages 8–20, 1991.                                   detection of prime + probe side-channel attack on AES encryption
[32] Wei-Ming Hu. Lattice Scheduling and Covert Channels. In Proceed-                      algorithm. In 2018 Global Information Infrastructure and Networking
     ings of the 1992 IEEE Symposium on Security and Privacy, SP ’92,                      Symposium, GIIS 2018, Thessaloniki, Greece, October 23-25, 2018,
     pages 52–61, Washington, DC, USA, 1992. IEEE Computer Society.                        pages 1–5. IEEE, 2018.
[33] Ralf Hund, Carsten Willems, and Thorsten Holz. Practical timing                  [52] Maria Mushtaq, Jeremy Bricq, Muhammad Khurram Bhatti, Ayaz
     side channel attacks against kernel space aslr. In Proceedings of the                 Akram, Vianney Lapotre, Guy Gogniat, and Pascal Benoit. WHISPER:
     2013 IEEE Symposium on Security and Privacy, SP ’13, page 191–205,                    A tool for run-time detection of side-channel attacks. IEEE Access,
     USA, 2013. IEEE Computer Society.                                                     8:83871–83900, 2020.
[34] Intel Corporation. Intel 64 and IA-32 Architectures Software Devel-              [53] Junaid Nomani and Jakub Szefer. Predicting program phases and
     oper’s Manual - Volume 3B, August 2007.                                               defending against side-channel attacks using hardware performance
[35] Intel Corporation. Intel CAT: Improving Real-Time Performance by                      counters. In Proceedings of the Fourth Workshop on Hardware and
     Utilizing Cache Allocation Technology., 2015.                                         Architectural Support for Security and Privacy, HASP@ISCA 2015,
[36] Himanshi Jain, D. Anthony Balaraju, and Chester Rebeiro. Spy cartel:                  Portland, OR, USA, June 14, 2015, pages 9:1–9:4, 2015.
     Parallelizing evict+time-based cache attacks on last-level caches. J.            [54] Jason Oberg, Sarah Meiklejohn, Timothy Sherwood, and Ryan Kastner.
     Hardware and Systems Security, 3(2):147–163, 2019.                                    A practical testing framework for isolating hardware timing channels.
[37] Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee,                         In Design, Automation and Test in Europe, DATE 13, Grenoble, France,
     Donghyuk Lee, Chris Wilkerson, Konrad Lai, and Onur Mutlu. Flip-                      March 18-22, 2013, pages 1281–1284, 2013.
     ping bits in memory without accessing them: An experimental study                [55] Yossef Oren, Vasileios P. Kemerlis, Simha Sethumadhavan, and An-
     of dram disturbance errors. In Proceeding of the 41st Annual Inter-                   gelos D. Keromytis. The spy in the sandbox: Practical cache attacks
     national Symposium on Computer Architecuture, ISCA ’14, pages                         in javascript and their implications. In Proceedings of the 22nd ACM
     361–372, Piscataway, NJ, USA, 2014. IEEE Press.                                       SIGSAC Conference on Computer and Communications Security, Den-
[38] Paul Kocher, Jann Horn, Anders Fogh, , Daniel Genkin, Daniel Gruss,                   ver, CO, USA, October 12-16, 2015, pages 1406–1418, 2015.
     Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas                   [56] Dag Arne Osvik, Adi Shamir, and Eran Tromer. Cache attacks and
     Prescher, Michael Schwarz, and Yuval Yarom. Spectre attacks: Ex-                      countermeasures: The case of AES. In Topics in Cryptology - CT-RSA
     ploiting speculative execution. In 40th IEEE Symposium on Security                    2006, The Cryptographers’ Track at the RSA Conference 2006, San
     and Privacy (S&P’19), 2019.                                                           Jose, CA, USA, February 13-17, 2006, Proceedings, pages 1–20, 2006.
[39] Jingfei Kong, Onur Aciiçmez, Jean-Pierre Seifert, and Huiyang Zhou.              [57] Dan Page. Partitioned cache architecture as a side-channel defence
     Hardware-software integrated approaches to defend against software                    mechanism. IACR Cryptology ePrint Archive, 2005:280, 2005.
     cache-based side channel attacks. In 15th International Conference               [58] Colin Percival. Cache Missing for Fun and Profit. In Proc. of BSDCan,
     on High-Performance Computer Architecture (HPCA-15 2009), 14-18                       2005.
     February 2009, Raleigh, North Carolina, USA, pages 393–404, 2009.
                                                                                      [59] Peter Pessl, Daniel Gruss, Clémentine Maurice, Michael Schwarz, and
[40] Prashanth Krishnamurthy, Ramesh Karri, and Farshad Khorrami.                          Stefan Mangard. DRAMA: exploiting DRAM addressing for cross-cpu
     Anomaly detection in real-time multi-threaded processes using hard-                   attacks. In 25th USENIX Security Symposium, USENIX Security 16,
     ware performance counters. IEEE Trans. Information Forensics and                      Austin, TX, USA, August 10-12, 2016., pages 565–581, 2016.
     Security, 15:666–680, 2020.
                                                                                      [60] Moinuddin K. Qureshi. CEASER: mitigating conflict-based cache
[41] The Linux Kernel Archives, 2019.                                                      attacks via encrypted-address and remapping. In 51st Annual
[42] Moritz Lipp, Daniel Gruss, Raphael Spreitzer, Clémentine Maurice,                     IEEE/ACM International Symposium on Microarchitecture, MICRO
     and Stefan Mangard. Armageddon: Cache attacks on mobile devices.                      2018, Fukuoka, Japan, October 20-24, 2018, pages 775–787, 2018.
     In Proceedings of the 25th USENIX Conference on Security Sympo-
     sium, SEC’16, pages 549–564, Berkeley, CA, USA, 2016. USENIX                     [61] Moinuddin K. Qureshi. New attacks and defense for encrypted-address
     Association.                                                                          cache. In Proceedings of the 46th International Symposium on Com-
[43] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher,                          puter Architecture, ISCA 2019, Phoenix, AZ, USA, June 22-26, 2019,
     Werner Haas, Anders Fogh, Jann Horn, Stefan Mangard, Paul Kocher,                     pages 360–371, 2019.
     Daniel Genkin, Yuval Yarom, and Mike Hamburg. Meltdown: Reading                  [62] Chester Rebeiro and Debdeep Mukhopadhyay. Boosting profiled cache
     kernel memory from user space. In 27th USENIX Security Symposium                      timing attacks with A priori analysis. IEEE Trans. Information Foren-
     (USENIX Security 18), 2018.                                                           sics and Security, 7(6):1900–1905, 2012.
[44] Fangfei Liu, Qian Ge, Yuval Yarom, Frank McKeen, Carlos V. Rozas,                [63] Chester Rebeiro and Debdeep Mukhopadhyay. Micro-architectural
     Gernot Heiser, and Ruby B. Lee. Catalyst: Defeating last-level cache                  analysis of time-driven cache attacks: Quest for the ideal implementa-
     side channel attacks in cloud computing. In 2016 IEEE International                   tion. IEEE Trans. Computers, 64(3):778–790, 2015.
     Symposium on High Performance Computer Architecture, HPCA 2016,                  [64] Thomas Ristenpart, Eran Tromer, Hovav Shacham, and Stefan Savage.
     Barcelona, Spain, March 12-16, 2016, pages 406–418, 2016.                             Hey, you, get off of my cloud: exploring information leakage in third-
[45] Fangfei Liu, Yuval Yarom, Qian Ge, Gernot Heiser, and Ruby B. Lee.                    party compute clouds. In Proceedings of the 2009 ACM Conference on
     Last-level cache side-channel attacks are practical. In 2015 IEEE                     Computer and Communications Security, CCS 2009, Chicago, Illinois,
     Symposium on Security and Privacy, SP 2015, San Jose, CA, USA, May                    USA, November 9-13, 2009, pages 199–212, 2009.
     17-21, 2015, pages 605–622, 2015.                                                [65] Anatoly Shusterman, Lachlan Kang, Yarden Haskal, Yosef Meltser,
[46] Robert Martin, John Demme, and Simha Sethumadhavan. Timewarp:                         Prateek Mittal, Yossi Oren, and Yuval Yarom. Robust website fin-
     Rethinking timekeeping and performance monitoring mechanisms to                       gerprinting through the cache occupancy channel. In 28th USENIX
     mitigate side-channel attacks. In 39th International Symposium on                     Security Symposium, USENIX Security 2019, Santa Clara, CA, USA,
     Computer Architecture (ISCA 2012), June 9-13, 2012, Portland, OR,                     August 14-16, 2019., pages 639–656, 2019.
     USA, pages 118–129, 2012.                                                        [66] Cloyce D. Spradling. Spec cpu2006 benchmark tools. SIGARCH
[47] James L Massey. Guessing and entropy. In Information Theory, 1994.                    Comput. Archit. News, 35(1):130–134, March 2007.
     Proceedings., 1994 IEEE International Symposium on, page 204. IEEE,              [67] Adrian Tang, Simha Sethumadhavan, and Salvatore J. Stolfo. Unsu-
     1994.                                                                                 pervised anomaly-based malware detection using hardware features.
[48] John D. McCalpin. Stream: Sustainable memory bandwidth in high
     performance computers. Technical report, University of Virginia, Char-                In Research in Attacks, Intrusions and Defenses - 17th International
     lottesville, Virginia, 1991-2007. A continually updated technical report.             Symposium, RAID 2014, Gothenburg, Sweden, September 17-19, 2014.
     http://www.cs.virginia.edu/stream/.                                                   Proceedings, pages 109–129, 2014.
[49] Alfred J. Menezes, Scott A. Vanstone, and Paul C. Van Oorschot.                  [68] Mohit Tiwari, Xun Li, Hassan M. G. Wassel, Frederic T. Chong, and
     Handbook of Applied Cryptography. CRC Press, Inc., USA, 1st edition,                  Timothy Sherwood. Execution leases: a hardware-supported mecha-
     1996.                                                                                 nism for enforcing strong non-interference. In 42st Annual IEEE/ACM
[50] Maria Mushtaq, Ayaz Akram, Muhammad Khurram Bhatti, Maham                             International Symposium on Microarchitecture (MICRO-42 2009), De-
     Chaudhry, Vianney Lapotre, and Guy Gogniat. Nights-watch: a cache-                    cember 12-16, 2009, New York, New York, USA, pages 493–504, 2009.
     based side-channel intrusion detector using hardware performance                 [69] Venkatanathan Varadarajan, Thomas Ristenpart, and Michael M. Swift.
     counters. In Proceedings of the 7th International Workshop on Hard-                   Scheduler-based defenses against cross-vm side-channels. In Proceed-
     ware and Architectural Support for Security and Privacy, HASP@ISCA                    ings of the 23rd USENIX Security Symposium, San Diego, CA, USA,
     2018, Los Angeles, CA, USA, June 02-02, 2018, pages 1:1–1:8, 2018.                    August 20-22, 2014, pages 687–702, 2014.

                                                                                 10
[70] Zhenghong Wang and Ruby B. Lee. Covert and side channels due to             [75] Yuval Yarom and Katrina Falkner. FLUSH+RELOAD: A high resolu-
     processor architecture. In 22nd Annual Computer Security Applica-                tion, low noise, L3 cache side-channel attack. In Proceedings of the
     tions Conference (ACSAC 2006), 11-15 December 2006, Miami Beach,                 23rd USENIX Security Symposium, San Diego, CA, USA, August 20-22,
     Florida, USA, pages 473–482, 2006.                                               2014., pages 719–732, 2014.
[71] Zhenghong Wang and Ruby B. Lee. A novel cache architecture with
     enhanced performance and security. In 41st Annual IEEE/ACM Interna-         [76] Bennet Yee, David Sehr, Gregory Dardyk, J. Bradley Chen, Robert
     tional Symposium on Microarchitecture (MICRO-41 2008), November                  Muth, Tavis Ormandy, Shiki Okasaka, Neha Narula, and Nicholas
     8-12, 2008, Lake Como, Italy, pages 83–93, 2008.                                 Fullagar. Native client: A sandbox for portable, untrusted x86 native
                                                                                      code. In 30th IEEE Symposium on Security and Privacy (S&P 2009),
[72] Lai Leng Woo, Mark Zwolinski, and Basel Halak. Early detection                   17-20 May 2009, Oakland, California, USA, pages 79–93, 2009.
     of system-level anomalous behaviour using hardware performance
     counters. In 2018 Design, Automation & Test in Europe Conference &          [77] Ning Zhang, Kun Sun, Deborah Shands, Wenjing Lou, and Y. Thomas
     Exhibition, DATE 2018, Dresden, Germany, March 19-23, 2018, pages                Hou. Trusense: Information leakage from trustzone. In 2018 IEEE
     485–490, 2018.                                                                   Conference on Computer Communications, INFOCOM 2018, Hon-
[73] Mengjia Yan, Jiho Choi, Dimitrios Skarlatos, Adam Morrison, Christo-             olulu, HI, USA, April 16-19, 2018, pages 1097–1105, 2018.
     pher W. Fletcher, and Josep Torrellas. Invisispec: Making specu-
     lative execution invisible in the cache hierarchy. In 51st Annual           [78] Tianwei Zhang, Yinqian Zhang, and Ruby B. Lee. Cloudradar: A
     IEEE/ACM International Symposium on Microarchitecture, MICRO                     real-time side-channel attack detection system in clouds. In Research
     2018, Fukuoka, Japan, October 20-24, 2018, pages 428–441, 2018.                  in Attacks, Intrusions, and Defenses - 19th International Symposium,
[74] Yuval Yarom. Mastik: A micro-architectural side-channel toolkit,                 RAID 2016, Paris, France, September 19-21, 2016, Proceedings, pages
     2016.                                                                            118–140, 2016.

                                                                            11
You can also read