Pointless Tainting? Evaluating the Practicality of Pointer Tainting

Page created by Kirk Reeves
 
CONTINUE READING
Pointless Tainting?
                              Evaluating the Practicality of Pointer Tainting

                                   Asia Slowinska                                                                Herbert Bos
                          Vrije Universiteit Amsterdam                                           Vrije Universiteit Amsterdam and NICTA ∗
                                 asia@few.vu.nl                                                              herbertb@cs.vu.nl

Abstract                                                                                    value in detecting memory corruption attacks (albeit with
This paper evaluates pointer tainting, an incarnation of Dy-                                false negatives and not on the popular x86 architecture), but
namic Information Flow Tracking (DIFT), which has re-                                       it is fundamentally not suitable for automated detecting of
cently become an important technique in system security.                                    privacy-breaching malware such as keyloggers.
Pointer tainting has been used for two main purposes: de-                                   Categories and Subject Descriptors D.4.6 [Security and
tection of privacy-breaching malware (e.g., trojan keylog-                                  Protection]: Invasive software
gers obtaining the characters typed by a user), and detec-
tion of memory corruption attacks against non-control data                                  General Terms Security, Experimentation
(e.g., a buffer overflow that modifies a user’s privilege level).                           Keywords dynamic taint analysis, pointer tainting
In both of these cases the attacker does not modify control
data such as stored branch targets, so the control flow of                                  1. Introduction
the target program does not change. Phrased differently, in
                                                                                            Exploits and trojans allow attackers to compromise ma-
terms of instructions executed, the program behaves ‘nor-
                                                                                            chines in various ways. One way to exploit a machine is
mally’. As a result, these attacks are exceedingly difficult to
                                                                                            to use techniques like buffer overflows or format string at-
detect. Pointer tainting is considered one of the only methods
                                                                                            tacks to divert the flow of execution to code injected by the
for detecting them in unmodified binaries. Unfortunately, al-
                                                                                            attacker. Alternatively, the same exploit techniques may at-
most all of the incarnations of pointer tainting are flawed.
                                                                                            tack non-control data [Chen 2005b]; for instance a buffer
In particular, we demonstrate that the application of pointer
                                                                                            overflow that modifies a value in memory that represents a
tainting to the detection of keyloggers and other privacy-
                                                                                            user’s identity, a user’s privilege level, or a server configura-
breaching malware is problematic. We also discuss whether
                                                                                            tion string. Non-control data attacks are even more difficult
pointer tainting is able to reliably detect memory corrup-
                                                                                            to detect than attacks that divert the control flow. After all,
tion attacks against non-control data. We found that pointer
                                                                                            the program does not execute any foreign code, does not
tainting generates itself the conditions for false positives. We
                                                                                            jump to unusual places, and does not exhibit strange sys-
analyse the problems in detail and investigate various ways
                                                                                            tem call patterns or any other tell-tale signs that indicate that
to improve the technique. Most have serious drawbacks in
                                                                                            something might be wrong.
that they are either impractical (and incur many false pos-
                                                                                                While protection for some of these attacks may be pro-
itives still), and/or cripple the technique’s ability to detect
                                                                                            vided if we write software in type-safe languages [Jim 2002],
attacks. In conclusion, we argue that depending on architec-
                                                                                            compile with specific compiler extensions [Castro 2006,
ture and operating system, pointer tainting may have some
                                                                                            Akritidis 2008], or verify with formal methods [Elphinstone
∗ NICTA  is funded by the Australian Government as represented by the
                                                                                            2007], much of the system software in current use is writ-
Department of Broadband, Communications and the Digital Economy and                         ten in C or C++ and often the source of the software is not
the Australian Research Council through the ICT Centre of Excellence                        available, and recompilation is not possible.
program.                                                                                        Worse, even with the most sophisticated languages, it is
                                                                                            difficult to stop users from installing trojans. Often trojans
                                                                                            masquerade as useful programs, like pirated copies of popu-
                                                                                            lar applications, games, or ’security’-tools, with keylogging,
Permission to make digital or hard copies of all or part of this work for personal or       privacy theft and other malicious activities as hidden fea-
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation   tures. No exploit is needed to compromise the system at all.
on the first page. To copy otherwise, to republish, to post on servers or to redistribute   Once inside, the malware may be used to join a spam botnet,
to lists, requires prior specific permission and/or a fee.
EuroSys ’09, 1–3, April 2009, Nuremberg, Germany.                                           damage the system, attack other sites, or stealthily spy on a
Copyright c 2009 ACM 978-1-60558-482-9/09/04. . . $5.00                                     user. Again, stealthy spies are harder to detect than ‘loud’
programs that damage systems, or engage in significant net-        ern operating systems are ineffective against this type of
work activity. The trojan spyware, installed by the user, may      attack. The same is true for almost all forms of system call
use legitimate APIs to obtain and store the characters that        monitoring [Provos 2003, Giffin 2004]. As a result, some
are typed in by the users (or data in files, buffers, or on the    trojan keyloggers have been active for years (often unde-
network). From a system’s perspective, the malware is not          tected). In one particularly worrying case, a keylogger har-
doing anything ‘wrong’.                                            vested over 500,000 login details for online banking and
   In light of the above, we distinguish between attacks that      other accounts [Raywood 2008]. At the same time, the con-
divert the control flow of a program and those that do not.        sequences of a successful non-control-diverting attack may
Control diversion typically means that a pointer in a process      be as severe as with a control-diverting attack. For instance,
is manipulated by an attacker so that when it is dereferenced,     passwords obtained by a keylogger often give attackers full
the program starts executing instructions different from the       control of the machines. The same is true for buffer over-
ones it would normally execute at that point. Non control di-      flows that modify a user’s privilege level.
verting attacks, on the other hand, include memory corrup-
tion attacks against non-control data and privacy breaching        However, pointer tainting is not working as advertised.
malware like keyloggers and sniffers. Memory corruption at-        Inspired by a string of publications about pointer tainting
tacks against non-control data manipulate data values that         in top venues [Chen 2005a;b, Yin 2007, Egele 2007, Dalton
are not directly related to the flow of control; for instance, a   2007, Yin 2008, Venkataramani 2008, Dalton 2008], several
value that represents a user’s privilege level, or the length in   of which claim zero false positives, we tried to build a key-
bytes of a reply buffer. The attack itself does not lead to un-    logger detector by means of pointer tainting. However, what
usual code execution. Rather, it leads to elevated privileges,     we found is that for privacy-breaching malware detection,
or unusual replies. The same is true for privacy breaching         the method is flawed. It incurs both false positives and neg-
malware like sniffers and trojan keyloggers.                       atives. The false positives appear particularly hard to avoid.
                                                                   There is no easy fix. Further, we found that almost all exist-
Pointer tainting as advertised is attractive. It is precisely      ing applications of pointer tainting to detection of memory
these difficult to detect, stealthy non-control-diverting at-      corruption attacks are also problematic, and none of them
tacks that are the focus of pointer tainting [Chen 2005a]. At      are suitable for the popular x86 architecture and Windows
the same time, the technique works against control-diverting       operating system.
attacks also. We will discuss pointer tainting in more detail          In this paper, we analyse the fundamental limitations of
in later sections. For now, it suffices to define it as a form     the method when applied to detection of privacy-breaching
of dynamic information flow tracking (DIFT) [Suh 2004]             malware, as well as the practical limitations in current ap-
which marks the origin of data by way of a taint bit in a          plications to memory corruption detection. Often, we will
shadow memory that is inaccessible to software. By track-          see that the reason is that ‘fixing the method is breaking it’:
ing the propagation of tainted data through the system (e.g.,      simple solutions to overcome the symptoms render the tech-
when tainted data is copied, but also when tainted pointers        nique vulnerable to false positives or false negatives.
are dereferenced), we see whether any value derived from               Others have discussed minor issues with projects that use
data from a tainted origin ends up in places where it should       pointer tainting [Dalton 2006], and most of these have been
never be stored. For instance, we shall see that some projects     addressed in later work [Dalton 2008]. To the best of our
use it to track the propagation of keystroke data to ensure that   knowledge, nobody has investigated the technique in detail,
untrusted and unauthorised programs do not receive it [Yin         nobody has shown that it does not work against keyloggers,
2007]. By implementing pointer tainting in hardware [Dal-          and we are the first to report the complicated problems with
ton 2007], the overhead is minimal.                                the technique that are hard to overcome. We are also the first
   Pointer tainting is very popular because (a) it can be ap-      to evaluate the implications experimentally.
plied to unmodified software without recompilation, and                In summary, the contributions of this paper are:
(b) according to its advocates, it incurs hardly (if any) false
positives, and (c) it is assumed to be one of the only (if         1. an in-depth analysis of the problems of pointer tainting
not the only) reliable techniques capable of detecting both           on real systems which shows that it does not work against
control-diverting and non-control-diverting attacks with-             malware spying on users’ behaviour, and is problematic
out requiring recompilation. Pointer tainting has become a            in other forms also;
unique and extremely valuable detection method especially          2. an analysis and evaluation of all known fixes to the prob-
due to its presumed ability to detect non-control-diverting           lems that shows that they all have serious shortcomings.
attacks. As mentioned earlier, non-control-diverting attacks
are more worrying than attacks that divert the control flow,         We emphasise that this paper is not meant as an attack
because they are harder to detect. Common protection               on existing publications. In our opinion, previous papers
mechanisms like address space randomisation and stack-             underestimated the method’s problems. We hope that our
guard [Bhatkar 2005, Cowan 1998] present in several mod-           work wlll help others avoid making the mistakes we made
struct req {                    void serve (int fd)             These two names are combined in a greeting message which
  char reqbuf[64];               {
  void (*handler)(char *);        char *name = globMyHost;       is echoed to the client. If a malicious client overflows the
 };                               char cl name[64];              cl name buffer, it may overwrite the server’s name pointer,
                                  char svr reply[1024];
 void do req(int fd,                                             which means that the reply string is composed of the client’s
        struct req *r)            // now the overflow:           string and a memory region specified by the attacker. The
 {                                read(fd,cl name,128);
  // now the overflow:            sprintf(svr reply,             result is that information leaks out of the system.
  read (fd,r->reqbuf,128);               "hello %s, I am %s",        As the instructions that are executed are exactly the same
  r->handler (r->reqbuf);                cl name, name);
 }                                svr send(fd,svr reply,1024);   as for a non-malicious attack, this is an example of a non-
                                 }                               control-diverting attack. For brevity, we will refer to them
 (a) control data attack         (b) non-control data attack     as non-control data attacks in the remainder of this paper.
                                                                 The other manifestation of the non-control-diverting class
              Figure 1. Trivial overflow examples                of attacks that we will look at concerns privacy breaching
                                                                 malware like keyloggers, spyware, and network sniffers.
when we worked on our ill-fated keylogger detector, and              In many ways, the nature of privacy breaching mal-
perhaps develop improved detection techniques.                   ware is completely different from the two types of attack
                                                                 discussed above, as it is not about intrusion itself. The mal-
                                                                 ware may be installed by way of exploits, or as part of tro-
2. Threat Model
                                                                 jans downloaded by the users, or any other means. Once in-
Before we can evaluate pointer tainting, we revisit in more      stalled, it often uses legitimate means (e.g., existing APIs)
detail the nature of the attacks that we introduced infor-       to achieve illegitimate goals (theft of security sensitive in-
mally in the previous section. Recall that we said that we       formation). As a result, techniques that detect intrusions
would distinguish between two types of attack: (1) control-      are powerless. For instance, a keylogger in Windows often
diverting, and (2) non-control-diverting. Moreover, within       uses well-known OS APIs like GetAsyncKeyState(), or
the latter category we will distinguish between (2a) memory      GetForegroundWindow(), to poll the state of the keyboard
corruption attacks against non-control data, and (2b) privacy    or to subscribe to keyboard events. In practice, a lot of spy-
breaching malware, such as keyloggers and sniffers. We now       ware is implemented as a browser helper object (BHO) li-
define what they are.                                            brary that extends Internet Explorer. Since it runs in the same
   Attackers often compromise computer systems by ex-            address space as the browser, it has full control over the
ploiting security vulnerabilities resulting from low-level       browser’s functionality. Zango [ProcessLibrary.com, Egele
memory errors such as buffer overflows, dangling pointers,       2007], for instance, copies visited URLs to a shared mem-
and double frees. Control-diverting attacks exploit buffer       ory section which is later read by a spyware helper process.
overflows or other vulnerabilities to manipulate a value in          Again, the execution of the program that is spied upon
memory data that is subsequently loaded in the processor’s       does not change, and so we also classify these attacks as
program counter (e.g., return addresses or function pointers)    non-control-diverting. For convenience, this paper often uses
with the aim of executing either code that was injected by       keyloggers as an example, but we stress that the analysis
the attackers, or a particular library function. An example      holds for all types of privacy breaching malware. We do
of an attack against control data is shown in Figure 1(a): a     not care whether the malware is installed by the user, or by
stylised server reads a request in a struct’s buffer field and   means of a prior exploit; nor do we care about the method
subsequently calls the corresponding handler. By overflow-       that malware employs to access sensitive data. Our main
ing reqbuf, an attacker may change the handler’s function        interest is whether we are able to detect them as malware,
pointer and thus cause the program to call a function at a       and say that they access data that was not intended for them.
different address.                                                   Since pointer tainting was originally designed to deal
   Non-control-diverting memory corruption attacks exploit       with non-control-diverting attacks (non-control data exploits
similar vulnerabilities to modify security-critical data in      and privacy breaching malware), we will concentrate on
ways that do not result in a different control flow. For in-     them rather than control-diverting attacks. We have already
stance, a buffer overflow on a server may overwrite the          argued that these are the ‘hard cases’ anyway.
pointer to (part of) the reply message. As a result, an at-
tacker controls the memory area used for the reply, possibly
causing the server to leak confidential information. This ex-
ample is shown in stylised form in Figure 1(b), which shows
a trivial greeting server. To keep it simple, we use an over-    3. Pointer tainting
flow on the stack and assume that the program is compiled        Pointer tainting is a variant of dynamic taint analysis, a
without stack protection. The server stores a pointer to its     technique for detecting various attacks. We show that it was
own name (which is defined as global string) in the variable     originally proposed because taint analysis in its basic form
name and then reads the name of the client from a socket.        cannot handle non-control-diverting attacks.
3.1 Basic tainting                                                  the taint propagation in order to inspect whether the software
One of the most reliable methods for detecting control di-          in question accesses tainted sensitive data.
versions is known as dynamic taint analysis. The technique              However basic taint analysis is weak in the face of trans-
marks (in an emulator or in hardware) all data that comes           lation tables that are frequently used for keystrokes. Assum-
from a suspect source, like the network, with a taint tag. The      ing variable x is tainted, basic taint analysis will not taint y
tag is kept in separate (shadow) memory, inaccessible to the        on an assignment such as y = a[x], even though it is com-
software. Taint is propagated through the system to all data        pletely dependent on x. As a practical consequence, data
derived from tainted values. Specifically, when tainted val-        from the keyboard loses its taint almost immediately, be-
ues are used as source operands in ALU operations, the des-         cause the scan codes are translated via translation tables. The
tinations are also tainted; if they are copied, the destinations    same is true for ASCII/UNICODE conversion, and transla-
are also tainted, etc. Other instructions explicitly ‘clean’ the    tion tables in C library functions like atoi(), to upper(),
tag. An example is ‘xor eax,eax’ on x86 which sets the              to lower(), strtol(), and sprintf().
eax register to zero and cleans the tag. An alert is raised             As a corollary, basic taint analysis is powerless in the face
when a tainted value is used to affect a program’s flow of          of privacy-breaching malware. As data loses its taint early
control (e.g., when it is used as a jump target or as an in-        on, it is impossible to track if it ends up in the wrong places.
struction). We summarise the rules for taint propagation:
                                                                    3.2 Pointer tainting
1. all data from suspect sources is tainted;                        Pointer tainting is explicitly designed to handle non-control-
2. when tainted data is copied, or used in arithmetical cal-        diverting attacks. Because of the two different application
   culations, the taint propagates to the destination;              domains, pointer tainting comes in two guises, which we
                                                                    will term limited pointer tainting (for detecting non-control
3. taint is removed when all traces of the tainted data are
                                                                    data attacks) and full pointer tainting (for detecting privacy
   removed (e.g., when the bytes are loaded with a constant)
                                                                    breaches). Both have shortcomings. To clarify the problems,
   and a few other operations.
                                                                    we first explain the two variants in detail. For now, we just
    Basic taint analysis has been successfully applied in           describe the basic ideas. We will see later that they both need
numerous systems [Crandall 2004, Newsome 2005, Costa                to be curtailed to reduce the number of false positives.
2005, Ho 2006, Portokalidis 2006, Slowinska 2007, Portoka-
lidis 2008]. The drawback is that it protects against control-      Limited pointer tainting (LPT): alerts on dereferences.
diverting attacks, but not against non-control-diverting at-        Systems that aim at detecting non-control data attacks ap-
tacks as shown presently.                                           ply a limited form of pointer tainting [Chen 2005a, Dalton
                                                                    2008]. Defining a tainted pointer as an address that is gener-
Memory corruption and the (in)effectiveness of basic taint-         ated using tainted data, taint analysis is extended by raising
ing. For exploits, the root cause of almost all control-            an alert when a tainted pointer is dereferenced. So:
diverting and non-control data attacks is the dereference of
attacker-manipulated pointers. For instance, a stack smash-         4a. if p is tainted, raise an alert on any dereference of p.
ing attack overflows a buffer on the stack to change the               Doing so catches several of the memory corruption ex-
function’s return address. Similarly, heap corruption attacks       ploits discussed above, but cannot be realistically applied
typically use buffer overflows or double frees to change the        in the general case. For instance, any pointer into an ar-
forward and backward links in the doubly linked free list. Al-      ray that is calculated by way of a tainted index would lead
ternatively, buffer overflows may overwrite function pointers       to an alert when it is dereferenced, causing false positives.
on heap or stack directly. In a format string attack, a member      Again, this is common in translation tables. For this reason,
of the printf() family is given a specially crafted format          LPT implementations in practice prescribe that the taint of
string to trick it into using an attacker-provided value on the     an index used for a safe table lookup is cleaned. In Sec-
stack as a pointer to an address where a value will be stored.      tion 6.2 we evaluate various such cleaning techniques. As
    The nature of these attacks vary, but they all rely on deref-   a consequence, however, LPT cannot be used for tracking
erencing a pointer provided by the attacker via memory cor-         keystrokes. As soon as the tainted keystroke scan-code is
ruption. Basic taint analysis raises alerts only for derefer-       converted by a translation table, the taint is dropped and we
ences due to jumps, branches, and function calls/returns. A         lose track of the sensitive data.
modification of a value representing a user’s privilege level
in a non-control data attack would go unnoticed.                    Full pointer tainting (FPT): propagation on dereferences.
                                                                    Full pointer tainting extends basic taint analysis by propa-
Privacy-breaching and the ineffectiveness of basic taint-           gating taint much more aggressively. Rather than raising an
ing. One may want to employ dynamic taint analysis to               alert, pointer tainting simply propagates taint when a tainted
detect whether a ‘possibly malicious’ program is spying on          pointer is dereferenced. So:
users’ behaviour. A basic approach could work by marking
the keystrokes typed by the user as tainted, and monitoring         4b. if p is tainted, any dereference of p taints the destination.
FPT looks ideal for privacy-breaching malware detection;        schedule() in the kernel (a voluntary context switch), or
table conversion preserves the original taint, allowing us         by interrupts and exceptions (a forced switch). For instance,
to track sensitive data as it journeys through the system.         a timer interrupt handler discovers that a process has used
Panorama [Yin 2007] is a powerful and interesting example          up its quantum of CPU time and sets a flag of the current
of this method. It tries to detect whether a new application       process to indicate that it needs a reschedule. Just prior to
X is malicious or not, by running it in a system with FPT.         the resumption of the user space process, this flag is checked
Sensitive data, such as keystrokes that are unrelated to X         and if it is set, the schedule() function is called.
(e.g., a password you type in when logging to a remote                The two methods differ in the way registers are saved.
machine) are tagged tainted. If at some point, any byte in         In particular, the general purpose x86 registers eax, ecx
the address space of X is tainted, it means that the sensitive     and edx are not saved on the call to schedule() on the
data has leaked into X, which should not happen. Thus, the         voluntary context switch. The calling context is responsi-
program must be malicious and is probably a keylogger.             ble for saving the registers are restoring them later. On in-
                                                                   terrupts and exceptions, all registers are saved at well de-
                                                                   fined points. The implication is that on voluntary switches,
4. Test environment
                                                                   when we measure the state of the registers on return from
To get a handle on the number of false positives, we track the     schedule(), we ignore the taintedness of the above three
spread of taint through the system for increasingly sophisti-      registers. Whether they are tainted or not is irrelevant, as
cated versions of pointer tainting. The idea is that we mark       they will be overwritten later anyway. On a forced switch,
sensitive data as tainted and monitor taint propagation over       when we inspect the condition of the process on the return
the system. If taint spreads to benign applications that should    from interrupt/exception handler, we look at all the registers.
never receive tainted data, we mark it as a false positive.        Summarizing, in any case the state of the registers being pre-
    For the experiments we use Qemu 0.9 [Bellard 2005] with        sented is captured once the original values are restored after
vanilla Ubuntu 8.04.1 with Linux kernel 2.6.24-19-386 and          the context switch. That reflects the state of processes rather
Windows XP SP2. Depending on the test, we modified the             than the state of kernel structures.
Qemu emulator to taint either all characters typed at the             For a complete picture we also monitor the taintedness
keyboard, or all network data. We then propagate the taint         inside the kernel, during the context_switch() function.
via pointer tainting (using rules 1, 2, 3, and either 4a or           As we cannot perform detailed analysis of Windows, we
4b). Whether network or keyboard is tainted will be clarified      measure the state of the registers whenever the value of the
when we discuss our experiments. The taint tag is a 32-bit         cr3 register changes. This x86 register contains the physical
value, so that each key stroke or network byte can have a          address of the top-level page directory and a change indi-
unique colour, which helps in tracking the individual bytes.       cates that a new process is scheduled. For user mode pro-
    To measure the spread of taint we repeatedly inspect the       cesses the measurement is performed once the processor is
taintedness of registers at context-switch time. Tainted reg-      operating in user mode. This way we are sure that we present
isters in processes that do not expect tainted input indicate      the state of the process, and not some kernel structures used
unwanted taint propagation. The more registers are tainted,        to complete the context switch.
the worse the problem. The situation is particularly serious if
special-purpose registers like the stack pointer (esp) and the
                                                                   5. Problems with pointer tainting
frame pointer (ebp) are tainted. Once either of these registers
is tainted, data on the stack also gets tainted quickly. Indeed,   When we started implementing a keylogger detector by
many accesses are made by dereferencing an address relative        means of pointer tainting, we observed that taint spread
to the value of esp or ebp.                                        rapidly through the system. We analyse now the problem of
    The measurements are conservative in the sense that even       taint explosion both experimentally (Sections 5.1 and 5.2)
if the registers are clean at some point, there may still be       and analytically (Section 5.3).
tainted bytes in the process’ address space. Moreover, we
only check taint in registers during context switch time,          5.1 False positives in LPT
probably again underestimating processes’ taintedness. Taint       To confirm the immediate spread of taint in limited pointer
may also leak across the kernel-userspace boundary in other        tainting (LPT), we used the emulator that taints data com-
ways, e.g., when tainted bytes are memory mapped into a            ing from the network. Both for Linux and Windows alerts
process’ address space. In other words, the real situation may     were quickly raised for benign actions like configuring the
be worse than what we sketch here. However the conserva-           machine’s IP address.
tive approach we have implemented is sufficient to present            This is wrong, but not unexpected. We have already seen
the severity of the problem of false positives.                    the causes in the LPT discussion in Section 3.2: without
    Context switches in Linux occur at just one well-defined       appropriate containment mechanisms, LPT propagates taint
point: the schedule() function. The scheduler is called            when combining an untainted base pointer and a tainted
either directly by a blocking call that will lead to a call to     index and dereferencing such an address triggers an alert.
clean
                                                dirty
                               ping        very dirty
                                 cp
                                  ls
                            apt-get
                              dash
                               gzip
                                 tar
                               sed
                      bash (no. 4)
                      bash (no. 3)
                          run-parts
                      bash (no. 2)
                      bash (no. 1)
              console-kit-daemon
                libnss-files-2.7.so
                      pam-unix.so
                           syslogd
                               hald
                            dhcdbd
              hald-addon-storage
                    kernel threads
                             kernel
                                       0                50                 100                  150                  200
                                                                            Intervals

Figure 2. The taintedness of the processes constituting 90% of all context switches. In this and all similar plots the following explanation
holds. The x-axis is divided into scheduling intervals, spanning 50 scheduling operations each. Time starts when taint is introduced in the
system. In an interval, several processes are scheduled. For each of these, we take a random sample from the interval to form a datapoint. So,
even if gzip is scheduled multiple times in an interval, it has only one datapoint. A datapoint consists of two small boxes drawn on top of
each other, separated by a thin white line. The smaller one at the top represents the taintedness of ebp and esp. The bottom, slightly larger
one represents all other registers. We use three colours: lightgrey means the registers are clean, darkgrey means less than half of considered
registers are tainted, and black means that half or more are tainted (very dirty). Absence of a box means the process was not scheduled.

This is exactly what happened in our experiment. We discuss               switch off keystroke tagging, and consult the kernel debug-
ways of addressing this problem in Section 6.2.                           ger to dump values of the cr3 register of running processes.
5.2 Taint explosion for FPT                                               5.3 Analysis: the many faces of taint pollution
To evaluate the spread of taint in full pointer tainting, we              The above results show that pointer tainting without some
introduce a minimal amount of (tainted) sensitive informa-                containment measures is not useful. It is not possible to draw
tion, and observe its propagation. After booting the OS, we               meaningful conclusions from the presence of taint in a cer-
switch on keystroke tracking (which taints keystroke data),               tain process. A crucial step in the explosion of taint through
and invoke a simple C program, which reads a user typed                   the system is the pollution of the kernel. Data structures
character from the command line. This is all the taint that is            wholly unrelated to the keyboard activity pick up taint, and
introduced in the system. Afterwards we run various appli-                from the kernel, taint spills into user processes. As LPT sim-
cations, but do so using ssh, so no additional taint is added.            ply raises an alert (and we have already seen how quickly
   Figure 2 shows how taint spreads to the kernel and some                this happens in a table lookup with tainted index), this sec-
of the most frequently scheduled processes. Aside from a                  tion focuses on the more interesting case of FPT and we con-
few boxes on the very left, almost all applications and the               sider how taint spreads through the system in practice.
kernel have at least half of the considered registers and ebp                 As mentioned earlier, incorrectly tainting ebp or esp is
and esp tainted. Clearly, taint spreads very rapidly to all               particularly bad, as accesses to local variables on the stack
parts of the OS. Moreover in both this and the remaining ex-              are made relative to ebp, and a ‘pop’ from the stack with
periments, tar and gzip should be completely clean as we                  a tainted esp will taint the destination. Unfortunately, the
use a bash script hardcoding the input and output filenames.              Linux kernel has numerous places where ebp and/or esp
   Figure 3 shows a similar picture for Windows XP. Here,                 incorrectly pick up taint. Rather than discussing them all,
performing simple tasks, we provide the guest operating                   we explain as an example how a common operation like
system with new tainted keystrokes during the whole ex-                   opening a file, ends up tainting ebp, as well as various lists
periment. In more detail, we first launch the kernel debug-               and structures in the kernel. The main purpose is to show that
ger, kd.exe, and next switch on keystroke tagging. Thus,                  taint pollution occurs on common code paths (like opening
from this point onward data typed by the user is considered               files) and can be the result of complex interactions.
tainted. Next, we launch Internet Explorer, IEXPLORE.exe,
and calculator, calc.exe. We perform simple web brows-                    5.3.1 Taint pollution by opening files - a case study
ing, thus delivering tainted data to the Internet Explorer pro-           Taint pollution occurs due to calls to the open() system call
cess. However, we do not provide the calculator with any                  in various ways. For the following analysis, we extended the
typed characters, but we use solely the mouse. Finally, we                emulator with code that logs the progression of taint through
clean
                          dirty                                                  fopen()
                     very dirty                                                                                                                                                  USER

                                                                                                                                                                                 KERNEL
      calc.exe                                                                     filp_open()                                          _dentry_open()
                                                                                                          1                                                3
                                                                                           _d_lookup()        hash/head *                        ebp*          stack *
IEXPLORE.EXE
                                                                                                                                                 inode* 4
       kd.exe                                                                                                                                    file list          *    *   *
                                                                                                                                   2                                *    *   *
  services.exe                                                                                                              ebp*                 callback list
                                                                                                         dentry_hashtable                        other lists        *    *   *
   wuauclt.exe                                                                                                                     stack*

     lsass.exe                                                                             Figure 5. Taint pollution when a file is opened
   svchost.exe
   svchost.exe
                                                                                function in line 5 first produces a new hash value from both
  msmsgs.exe
                                                                                the dentry object of the parent directory and the tainted hash
  explorer.exe
     csrss.exe
                                                                                of the filename (so that the new hash is also tainted) and then
                                                                                returns the head of the list of dentries hashing to this new
                 0   100      200   300   400     500   600   700   800   900   hash value. This is the address of the element in the table
                                           Intervals
                                                                                with the index derived from the new hash value. The address
Figure 3. The taintedness of the processes constituting 95% of                  is calculated as the table’s base pointer plus tainted index
all context switches in Windows XP                                              and is therefore tainted. head in line 5 becomes tainted.
                                                                                    As is common in the Linux kernel, the singly linked list
[ 1] struct dentry * __d_lookup(struct dentry * parent,
[ 2]                            struct qstr * name)                             of dentries is constructed by adding a struct hlist node
[ 3] {                                                                          field in a type that should be linked; in this case the dentry
[ 4]     unsigned int hash = name->hash;
[ 5]     struct hlist_head *head = d_hash(parent,hash);                         node. Each hlist node field points to the hlist node field of
[ 6]     struct hlist_node *node;                                               the next element, and a hlist head field points to the start of
[ 7]     struct dentry *dentry;
[ 8]                                                                            the list. We iterate over the list (line 9), searching for the
[ 9]     hlist_for_each_entry_rcu(dentry, node, head,                           dentry matching the name, which will be found, if the file
[10]                              d_hash) {
[11]       struct qstr *qstr;                                                   has been opened previously (which is quite common).
[12]       ...                                                                      Phase 2: ebp gets tainted. During the iteration, head
       Figure 4. A snippet of the d lookup() function.                          (and later node) contain pointers to the list’s hlist head and
                                                                                hlist node link fields. Of course, these fields themselves are
the system at fine granularity. We then manually analysed                       not interesting and the real work takes place on the associ-
the propagation through the Linux source code by mapping                        ated dentry object. Therefore, the macro in line 9 performs
the entries in the log onto the source.                                         a simple arithmetical operation to produce the address of
   The Linux Virtual Filesystem uses dentry objects to store                    the dentry, which results in tainting dentry (line 9). Worse,
information about the linking of a directory entry (a partic-                   within the loop numerous checks of dentry are performed,
ular name of the file) with the corresponding file. Because                     and for efficiency, the ebp register is loaded with dentry’s
reading a directory entry from disk and constructing the cor-                   tainted address. The result is that numerous values on the
responding dentry object requires considerable time, Linux                      stack become tainted, and taint explosion is hard to avoid.
uses a dentry cache to keep in memory dentry objects that                           Phase 3: ebp is cleaned and then tainted again. By
might be needed later. The dentry cache is implemented by                       the time d lookup() returns, ebp is clean again, but taint
means of a dentry hashtable. Each element is a pointer                          keeps spreading. Now that the dentry object is found in the
to a list of dentries that hash to the same bucket and each                     cache, the filp open() function calls dentry open(),
dentry contains pointers to the next dentry in the list.                        passing to it the tainted address of the dentry object. This
   The real work in the open() system call is done by                           function almost immediately loads the ebp register with the
the filp open() function which at some point accesses                           tainted address of the received dentry object. As a result,
the dentry cache by means of the d lookup() func-                               taint spreads rapidly through the kernel’s address space.
tion to search for the particular entry in the parent direc-                        Phase 4: pollution of other structures via lists. Taint
tory (see Figures 4 and 5). The second argument, struct                         spreads further across the kernel by dint of pointer arithmetic
qstr* name, provides the pathname to be resolved, where                         prevalent in structures and list handling. Linked lists are
name->name and name->len contain the file name and                              especially susceptible to pollution.
length, and name->hash its hash value.                                              When we read a field of a structure pointed to by a tainted
   Phase 1: taint enters kernel data structures. To see                         address, the result is tainted. Similarly, when we insert a
how taint propagates incorrectly in d lookup(), let us                          tainted element elem to a list list, we immediately taint the
assume that the filename was typed in by the user, so it is                     references of the adjacent nodes. Indeed, the insertion op-
tainted. The hash is calculated by performing arithmetical                      eration usually executes the assignment list->next=elem
and shift operations on the characters of the filename, which                   which taints the next link. If we perform a list search or
means that the hash in line 4 is tainted. The d hash()                          traversal, then the pointer to the currently examined ele-
ment is calculated in the following fashion: (1) curr=list,          address of this structure, B = A-offset(file_name), and
(2) curr=curr->next, and so the taintedness is passed on             B becomes tainted. Now, depending on a security policy, we
from one element to another.                                         may or may not wish to mark B->file_handler as tainted.
    If a list element is removed from one list and entered into      However, if all these structures are organized in a list, we
another, the second list will also be tainted. For instance,         certainly do not want to propagate taintedness to the next
if a block of data pointed to by a tainted pointer is freed,         element of a list, B->next. On the other hand, if a pointer
free lists may become tainted. By means of common pointer            is itself calculated using tainted data (C=A+eax, where eax
handling, the pollution quickly reaches numerous structures          is tainted), the taint should be propagated, as C might be
that are unrelated to the sensitive information.                     pointing to a field in a translation table. Notice that all these
    Let us continue the example of opening files. As ex-             cases are hard to distinguish for emulators or hardware.
plained earlier, the dentry open() function is provided                  Third, if pointer tainting is applied only for detecting
with the tainted address of the dentry object. This func-            memory corruption attacks on non-control data, rather than
tion executes the instruction inode=dentry->d inode to               tracking keystrokes and other sensitive data, taint may leak
determine the address of the inode object associated with            due to table lookups, as discussed in Section 3.2.
dentry. The assignment taints inode as its value is loaded
from the tainted address dentry plus an offset. Next,
                                                                     5.3.3   False negatives: is pointer tainting enough?
once the new file object file is initialised, we execute
head=inode->i sb->s files as we insert the file into the             While false positives are more serious than false negatives
list of opened files pointed to by head (i sb is a field of the      for automatic detection tools, a system that misses most of
filesystem’s superblock), so the head is tainted. As a result,       the attacks is not very useful either. ‘Pure’ pointer tainting
the file insert operation immediately taints the references          in LPT or FPT does not have many false negatives, but
of the adjacent nodes in the list.                                   even without any containment of taint propagation, pointer
    Finally, when the kernel has finished using the file object,     tainting does not detect all the attacks it is designed for.
it uses the fput() function to remove the object from the            For instance, LPT will detect modification of non-control
superblock’s list and release it to the slab allocator. Without      data by means of a format string attack, or a heap overflow
going into detail, we mention that dentry cache look-ups             that corrupts freelist link pointers. However, it will miss
are lockless read-copy-update (RCU) accesses and that, as            modification of non-control data by means of a direct buffer
a result, the file objects end up being added to the RCU’s           overflow. Limited mitigation may be possible by checking
per-CPU list of callbacks to really free objects when it is          system call arguments, as is done in Vigilante [Costa 2005],
safe to do so. The list picks up the taint and, when it is           but the fundamental problem remains. Consider for instance,
traversed, spreads it across all entries in the list. The callback   a buffer on the heap or stack that contains the username and
is responsible for releasing the tainted object to the slab.         may be located in memory just below a field that indicates
                                                                     the user’s privilege level. If attackers can overflow the buffer,
5.3.2   False positives and root causes of pollution                 they can modify the privilege level.
It is clear that due to false positives, limited pointer tainting       Similarly, FPT and LPT both miss implicit information
and full pointer tainting in their naive, pure forms are im-         flows. Implicit information flows have no direct assignment
practical for automatically detecting memory corruption at-          of a tainted value to a variable (which would be propagated
tacks and sensitive information theft, respectively. We have         by pointer tainting). Instead, the value of a variable is com-
seen that taint leaks occur in many places, even on a com-           pletely determined by the value of tainted data in a condi-
mon code path like that of opening a file. The interesting           tion. For instance, if x is tainted, then information flows to
question is what the root causes of the leaks are, or phrased        y in the following code: if (x=0) y=0; else y=1;. As
differently, whether these leaks have something in common.           we do not track implicit information flows, false negatives
    After manually inspecting many relevant parts of the             are quite likely. This is particularly worrying if FPT is used
Linux kernel, we found three primary underlying causes for           to detect potential privacy-breaching malware, as it gives the
taint pollution. First, the tainting of ebp and esp. These           untrusted code an opportunity to ’launder’ taint. As pointed
pointers are constantly dereferenced and once they are               out by [Cavallaro 2008] purely dynamic techniques cannot
tainted, LPT raises alerts very quickly, while FPT spreads           detect implicit flows. The authors explain that it is necessary
taint almost indiscriminately as the stack becomes tainted.          to reason about the assignments that take place on the un-
    Second, not all pointers are tainted in the same way and         executed program branches, and also provide a number of
not all should propagate taint when dereferenced. If A is a          reasons making the problem intractable for x86 binaries.
tainted address and B is an address calculated relative to A            If we are to have any hope at all of catching sophisticated
(e.g., B=(A+0x4)), then B will be tainted. However, in many          privacy-breaching malware with FPT, we need to detect and
cases it might be unreasonable to mark *B as tainted. For            raise an alert immediately when taint enters the untrusted
example, let’s assume that tainted A points to a field of a          code, lest it be laundered. As soon as untrusted code is
structure, file_name. Next, B is derived to hold the base            allowed to run it can trick the system into cleaning the data.
clean                            than tracking sensitive data, we should try to prevent taint
                                       dirty
                         cp       very dirty                           from leaking due to lookups. Proposed solutions revolve
              bash (no. 5)                                             around detecting some specific pointer arithmetic opera-
              bash (no. 4)
              pam-unix.so                                              tions [Suh 2004], bounds-checking operations [Chen 2005a,
libnss-files-2.7.so (no. 2)
libnss-files-2.7.so (no. 1)                                            Dalton 2007], and more recently pointer-injection [Kat-
                       gzip
                        tar                                            sunuma 2006, Dalton 2008]. Because of conversion tables,
                    apt-get                                            none of these techniques are suitable for FPT.
                   syslogd
     hald-addon-storage
              bash (no. 3)
              bash (no. 2)
                                                                       Detecting and sanitising table accesses.
                       hald
         hald-addon-input                                              Suh et al. [Suh 2004] sanitise table lookups even when the
                   dhcdbd                                              index is tainted and assume that the application has already
            kernel threads
              bash (no. 1)                                             performed the necessary bounds checks. The method is im-
                     kernel
                                                                       practical as it requires us to recognise table lookups, while
                              0          50      100       150   200
                                               Intervals
                                                                       many architectures, including the popular x86, do not have
                                                                       separate instructions for pointer arithmetic. On x86, we can
Figure 6. Taint pollution with esp/ebp protection for code             only instrument those instructions that calculate an address
involved in 90% of the context switches (Linux)                        from base and index. Then we propagate the taint of the base
                                                                       and skip that of the index. However, the use of add and mov
   Like most work on pointer tainting, we assume that false
                                                                       instructions to calculate pointers is extremely common in
negatives in pure pointer tainting are not the most important
                                                                       real-world code and these cannot be monitored in the same
problem. However, to deal with the false positives, we are
                                                                       way. As a result, this method leads to many false positives.
forced to contain taint propagation in various ways. Doing
                                                                       Others have pointed out that this policy is also prone to false
so will reduce the false positive ratio, but the opportunities
                                                                       negatives in case of return-to-libc attacks [Dalton 2006].
for false negatives will increase significantly.
                                                                       Detecting bounds checks
6. Containment techniques                                              Chen et al. [Chen 2005a] argue that most table lookups are
We have seen that without containment, pointer tainting is             safe even if the index is tainted, as long as the index was
not usable. We now evaluate ways to control the spreading.             properly bounds-checked. Thus, to reduce false positives, we
                                                                       may try to detect bounds-checks at runtime, and drop the
6.1 Containment for LPT and FPT: ebp/esp protection                    operand’s taint. Bounds-checks are identified by a cmp in-
The first cause of pollution in FPT (and false positives in            struction of the index with an untainted operand. As the and
LPT) mentioned in Section 5.3.2 is tainting of esp and ebp.            instruction is also frequently used for bound checks [Dalton
We can simply remove it with minimal overhead by never                 2007], we also clean the first source operand of and if the
applying pointer tainting to tainted values of ebp or esp.             second operand is clean and has a value of 2n − 1.
However, on occasion ebp is also used as a temporary gen-                  While simple and fast, the method suffers from false
eral purpose register. Having analysed a number of scenarios           positives and negatives, some of which were noted by oth-
that involved a tainted ebp, we devised a simple heuristic,            ers [Dalton 2006; 2007; 2008]. We are the first to find the
and clean ebp whenever its value is big enough to serve as             last 2 in the list below.
a frame pointer on the stack. Doing so introduces false neg-               In many conversion tables, a lookup simply returns a dif-
atives into the system in case ebp is used as a temporary              ferent representation of the input and cleaning the tag leads
general purpose register and serves as a pointer. However,             to false negatives. For instance, the taintedness of suspicious
we expect this to be rare.                                             input is dropped as it passes through translation tables, even
   We implemented the above restriction in our emulator                if the data is then used for a buffer overflow or other exploit.
and again evaluated the spread of taint through the system.            Incorrectly dropping taint in a way that can be exploited by
The results for Linux, shown in Figure 6, indicate that while          attackers is known as taint laundering (Section 5.3.3). False
the spread has slowed down a little compared to Figure 2,              negatives also occur when the cmp and and instructions are
taint still propagates quickly. Observe that ebp is still tainted      used for purposes other than bounds checking.
occasionally. This is correct. It means that ebp was used as               In addition, the method is prone to false positives if code
a container. For lack of space, we do not show the plot for            does not apply bounds-checking at all or uses different in-
Windows. We will show a combined plot later (Figure 9).                structions to do so. Many lookups take place with an 8-bit
                                                                       index in a 256-entry table and are safe without bounds check.
6.2 LPT-specific containment techniques                                    Furthermore, taint often leaks to pointers in subtle ways
The most important cause of false positives in LPT in-                 that are not malicious at all. For instance, many protocols
volves taint pollution via table lookups. As LPT uses pointer          have data fields accompanied by length fields that indicate
tainting only to detect memory corruption attacks, rather              how many bytes are in the data field. The length may be
used to calculate a pointer to the next field in the message.          More worrying is that the method is very closely tied to
A subtle leak occurs when the length is bounds checked,             the Linux/SPARC combination and portability is a problem.
but the check is against a value that is itself tainted. For        For instance, it would not work well on x86 processors run-
instance, a check whether the length field is shorter than          ning Windows. First, x86 makes it much harder to detect
IP’s total length field (minus the length of other headers).        pointers to statically allocated memory. Second, we cannot
A comparison with tainted data does not clean the index.            modify the kernel, so that we are forced to add specific han-
    Yet another way for taint to escape and cause false pos-        dling for several system calls in hardware or emulator (and
itives, is when check and usage of the index are decoupled.         the number of system calls in Windows is large). Third, we
For instance, the index is loaded into a register from memory       cannot identify kernel regions that may be indexed with un-
and checked, which leads us to clean the register. However,         trusted data. To err on the safe side, we would have to as-
by the time the index is actually used, it may well have to         sume that certain data values are pointers when really they
be loaded from the original (tainted) memory address again,         are not, and that the entire kernel address space could be
because the register was reused in the meantime. This again         pointed to by untrusted data. As a result, we expect many
leads to a tainted dereference and thus an alert. We see that       false negatives on x86. Even so, while limited to OS/archi-
for various reasons, raising alerts immediately on tainted          tecture combinations similar to Linux/SPARC, Raksha is the
dereferences is likely to trigger many false positives.             most reliable LPT implementation we have seen.
    We have just discussed why current solutions that revolve
around detecting bounds checks and table accesses are in-           6.3 FPT-specific techniques
sufficient and incur both false positives and false negatives.      Section 5.3.2 identified primary causes for pollution. We
We implemented these policies, and experiments performed            now try to remove them without crippling the method.
on the emulator confirm our objections: control flow diver-
                                                                    White lists and black lists
sions were reported instantly. In addition, on architectures
like x86, there is little distinction between registers used as     The simplest solution is to whitelist all places in the code
indices, normal addresses, and normal scalars. Worse, the in-       where taint should be propagated using pointer tainting, or
structions to manipulate them are essentially also the same.        alternatively, blacklist all places where it should not be prop-
As a result this problem is very hard to fix, unless a particular   agated. Neither scheme works in practice. Whitelisting is
coding style is enforced.                                           impossible unless you know all software in advance (includ-
                                                                    ing the userspace programs) and well enough to know where
Pointer injection detection                                         taint should propagate. This is certainly not the case when
This brings us to recent and more promising work which              you are monitoring potential malware (e.g., to see if it is a
prevents memory corruption attack by detecting when a               keylogger). It is also difficult for large software packages
pointer is injected by untrusted sources [Katsunuma 2006].          (say, OpenOffice, or MS Word), or any proprietary code. Fi-
The most practical of these, Raksha [Dalton 2008] identifies        nally, whitelisting only a small subset of the places reduces
valid pointers, which it marks with a P bit, and triggers an        FPT to taint analysis with a minimal extension.
alert if a tainted pointer is dereferenced that is not (derived        Blacklisting also suffers from the problem that you have
from) a valid pointer.                                              no detailed knowledge over all programs running on your
   For this purpose, it scans the data and code segments in         system. In addition, the number of taint leaks is enormous
ELF binaries to map out in advance all legitimate pointers          and blacklisting them all is probably not feasible. Notice that
to statically allocated memory and marks these with a P bit.        even if we managed to blacklist part of the software, includ-
To do so, it has to rely on properties of the SPARC v8 ar-          ing the Linux kernel and a few applications, for instance,
chitecture that always uses two specific instructions to con-       that still would not be enough. Assume that one of the pro-
struct a pointer and has regular instruction format. In addi-       grams we do not blacklist causes unrelated data to be tainted.
tion, it modifies the Linux kernel to also mark pointers re-        Next, if such data is communicated to other processes, they
turned by system calls that allocate memory dynamically             become tainted, and a false alarm is raised. Such unrelated
(mmap, brk, shmat, etc) with a P bit. Furthermore, as the           tainted data can enter kernel structures during system calls.
kernel sometimes legitimately dereferences untrusted point-            Finally, blacklisting and whitelisting both have a sig-
ers it uses knowledge of SPARC Linux to identify the heap           nificant impact on performance. Thus, we do not consider
and memory map regions that may be indexed by untrusted             whitelisting or blacklisting a feasible path to remedy FPT.
information (and uses the kernel header files to find the start
and end addresses of these regions).                                Landmarking
   The method effectively stops false positives, but false          Easy fixes like ebp/esp protection and white/black-listing
negatives are possible. For instance, it is possible to overflow    do not work. In this section, we discuss a more elaborate
a buffer to modify an index that is later added to a legitimate     scheme, known as landmarking, that contains taint much
address. The resulting pointer would have the P bit set and         more aggressively. Unfortunately, as a side effect, it signif-
therefore a dereference would not trigger alerts.                   icantly reduces the power of pointer tainting which leads to
[1]   typedef struct test_t {                                        reasoning holds for clean i3. Based on this example one
[2]     int i;
[3]     struct test_t* next;                                         might think that landmarking solves our problems, as we
[4]   } test_t, *ptest_t;                                            propagate the taintedness to the elements of a (translation)
[5]
[6]   ptest_t   table[256] = ...;    //   initialised                table, but we do not spread it over elements of a list.
[7]   ptest_t   i1 = table[index];   //   tainted
[8]   ptest_f   i2 = i1->next;       //   clean
[9]   int       i3 = i1->i;          //   clean                      Problems with landmarking Unfortunately, landmarking
                                                                     is not just a rather elaborate technique, it also cripples the
                Figure 7. An example of landmarking.                 power of pointer tainting and opportunities for false nega-
                                                                     tives abound.
many false negatives. In addition, it still incurs false posi-          Assume that p is a pointer whose calculation involves a
tives and significantly increases the runtime overhead. Nev-         tainted operand, and we load values v0=*p and v1=*(p+1)
ertheless, this is the most powerful technique for preventing        from memory. In a first possible scenario, the compiler
taint explosion we know. A similar technique appears to have         translates the code such that p is calculated once for both
been used in Panorama [Yin 2007], but as our landmarking             load operations. In that case, the first of the loaded vari-
is slightly more aggressive and, hence, should incur fewer           ables becomes tainted, and the other one is clean. So, de-
false positives, we will limit the discussion to landmarking.        pending on the order of instructions, v0=*p; v1=*(p+1);
    Recall that the second primary cause of unwanted taint           vs. v1=*(p+1); v0=*p;, we get different results. This is
propagation is due to pointers being relative to a tainted ad-       strange. On the other hand, if the compiler translates the
dress: if A is a tainted address, and an address B is calculated     code such that p is calculated twice, once for each of the
relative to A (e.g., B=A+0x4), then B is tainted as well, even       variables, then both values are tainted. Such inconsistent
though tainting *B is often incorrect. As a remedy, we will let      behaviour makes it hard to draw conclusions based on the
B influence the taintedness of *B only if it itself was calcu-       results of landmarked pointer tainting. Moreover, it clearly
lated using tainted data. So, with A and eax tainted, we will        introduces false negatives.
exclude B=A+0x4 from taint propagation, but keep C=A+eax.               Another example of false negatives stems from transla-
    For this purpose, we introduce landmarks. Landmarks in-          tion tables containing structures instead of single fields. Let’s
dicate that an address is ‘ready to be used for a dereference’.      refer to Fig 7 once more, where i1 (line 7), is tainted, but
We have reached a landmark for A, if all tainted operations          i3=i1->i (line 9), is clean. Now imagine that the test_t
up to this point were aimed at calculating A, but not a future       structure contains various representations of characters, say
value of B derived from A. Rephrasing, as soon as a value            ASCII and EBCDIC. In that case, the table access makes us
is a landmark (and thus a valid and useful) address, derefer-        lose track of sensitive data, which is clearly undesirable.
ences should propagate taint. However, values derived from              This weakness can also be exploited to cause leakage
the landmark have to be modified with tainted data again in          of secret data. Assume that a server receives a string-
order to make the derived value also qualify for pointer taint-      based request from the network and returns a field from a
edness. Thus, we limit the number of times a tainted value           struct X, pointed to by xptr. If an attacker is able to mod-
can influence the taintedness of a memory access operation.          ify xptr (for instance, by overflowing the request buffer
    In practical terms, we say that a value forms a complete         with a long string), then the server returns the contents
and useful address only when it is used as such. In other            of xptr->field_offset which can point to an arbitrary
words, we identify landmarks either by means of a deref-             place in the memory. For the same reasons as in the example
erence, or by an operation explicitly calculating an address,        above, the result will be clean.
such as the lea instruction on x86 that calculates the effec-           The best thing about landmarking is that we contain the
tive address and stores it in a register.                            spread of taint very aggressively and it really is much harder
                                                                     for taint to leak out. The hope is that, in combination with
Example Consider the code snippet shown in Figure 7. We              ebp/esp protection, landmarking can stop the pollution, so
access the second item of a list rooted at table[index],             that (an admittedly reduced version of) FPT can be used for
where the index is assumed to be tainted. First, in line 7, the      automatic detection of keyloggers.
pointer to the head of the list is fetched from the table. To cal-      The worst thing about landmarking is that it does not
culate the memory load address, (table + index*8), we                work. It still offers ample opportunities for false positives.
use a tainted operand, index, which has never been derefer-          This is no surprise, because even if we restrict taint propaga-
enced before, and so i1 becomes tainted. However, we have            tion via pointer tainting in one register, nothing prevents one
just reached the landmark for i1, meaning that dereferences          from copying the register, perhaps even before the derefer-
of i1 propagate taintedness, but addresses derived from i1           ence, and adding a constant value to the new register. As the
in a clean way do not. Next, in the second assignment, line 8,       new register was not yet used for dereferencing, taint will be
we access memory at the address calculated by increasing i1          (incorrectly) propagated.
by 4, a clean constant. Thus (i1 + 4) when dereferenced                 Another possible reason for false positives arises when
does not propagate taintedness, and i2 is clean. A similar           programs calculate directly the address of an element (or
You can also read