High Precision Detection of Business Email Compromise - Usenix

Page created by Sara Allen
 
CONTINUE READING
High Precision Detection of Business Email Compromise - Usenix
High Precision Detection of Business
               Email Compromise
Asaf Cidon, Barracuda Networks and Columbia University; Lior Gavish, Itay Bleier,
 Nadia Korshun, Marco Schweighauser, and Alexey Tsitkin, Barracuda Networks
          https://www.usenix.org/conference/usenixsecurity19/presentation/cidon

         This paper is included in the Proceedings of the
                28th USENIX Security Symposium.
                    August 14–16, 2019 • Santa Clara, CA, USA
                                   978-1-939133-06-9

                                             Open access to the Proceedings of the
                                              28th USENIX Security Symposium
                                                   is sponsored by USENIX.
High Precision Detection of Business Email Compromise - Usenix
High Precision Detection of Business Email Compromise

 Asaf Cidon1,2 and Lior Gavish, Itay Bleier, Nadia Korshun, Marco Schweighauser and Alexey Tsitkin1
                                       1 Barracuda     Networks, 2 Columbia University

                          Abstract                                   $12 billion since 2013 [13]. Numerous well-known enter-
                                                                     prises have fallen prey to such attacks, including Facebook,
   Business email compromise (BEC) and employee imper-               Google [41], and Ubiquiti [44]. Studies have shown that BEC
sonation have become one of the most costly cyber-security           is the cause of much higher direct financial loss than other
threats, causing over $12 billion in reported losses. Imperson-      common cyberattacks, such as ransomware [11, 13]. BEC
ation emails take several forms: for example, some ask for           attacks have also ensnared operators of critical government in-
a wire transfer to the attacker’s account, while others lead         frastructure [39]. Even consumers have become the targets of
the recipient to following a link, which compromises their           employee impersonation. For example, attackers have imper-
credentials. Email security systems are not effective in detect-     sonated employees of real-estate firms to trick home buyers
ing these attacks, because the attacks do not contain a clearly      to wire down payments to the wrong bank account [1, 7, 17].
malicious payload, and are personalized to the recipient.
                                                                        BEC takes several forms: some emails ask the recipient
   We present BEC-Guard, a detector used at Barracuda Net-
                                                                     to wire transfer money to the attacker’s account, others ask
works that prevents business email compromise attacks in
                                                                     for W-2 forms that contain social security numbers, and some
real-time using supervised learning. BEC-Guard has been
                                                                     lead the recipient to follow a phishing link, in order to steal
in production since July 2017, and is part of the Barracuda
                                                                     their credentials. The common theme is the impersonation
Sentinel email security product. BEC-Guard detects attacks
                                                                     of a manager or colleague of the target [12]. In this work,
by relying on statistics about the historical email patterns that
                                                                     we focus on attacks where the attacker is external to the or-
can be accessed via cloud email provider APIs. The two main
                                                                     ganization, and is trying to impersonate an employee. In §6
challenges when designing BEC-Guard are the need to label
                                                                     we discuss other scenarios, such as where the attacker uses a
millions of emails to train its classifiers, and to properly train
                                                                     compromised internal email account to impersonate employ-
the classifiers when the occurrence of employee imperson-
                                                                     ees [18, 19].
ation emails is very rare, which can bias the classification. Our
key insight is to split the classification problem into two parts,      Most email security systems are not effective in detecting
one analyzing the header of the email, and the second apply-         BEC. When analyzing an incoming email, email security sys-
ing natural language processing to detect phrases associated         tems broadly look for two types of attributes: malicious and
with BEC or suspicious links in the email body. BEC-Guard            volumetric. Examples of malicious attributes are an attach-
utilizes the public APIs of cloud email providers both to au-        ment that contains malware, a link pointing to a compromised
tomatically learn the historical communication patterns of           website, or an email that is sent from a domain with a low
each organization, and to quarantine emails in real-time. We         reputation. There are various well-known techniques to detect
evaluated BEC-Guard on a commercial dataset containing               malicious attributes, including sandboxing [49], and domain
more than 4,000 attacks, and show it achieves a precision of         reputation [2, 48]. Volumetric attributes are detected when the
98.2% and a false positive rate of less than one in five million     same email format is sent to hundreds of recipients or more.
emails.                                                              Examples include the same text or sender email (e.g., spam),
                                                                     and the same URL (e.g., mass phishing campaigns). However,
                                                                     employee impersonation emails do not contain malicious or
1   Introduction                                                     volumetric attributes: they typically do not contain malware,
In recent years, email-borne employee impersonation, termed          are not sent from well-known malicious IPs, often do not con-
by the FBI “Business Email Compromise” (BEC), has be-                tain a link, and are sent to a small number of recipients (with
come a major security threat. According to the FBI, US or-           the explicit intent of evading volumetric filters). When em-
ganizations have lost $2.7 billion in 2018 and cumulatively          ployee impersonation attacks do contain a link, it is typically

USENIX Association                                                                     28th USENIX Security Symposium         1291
a link to a fake sign up page on a legitimate website that was        inverse document frequency [28, 42] (TFIDF). For emails
compromised, which does not appear on any IP black lists. In          with links, we train a random forest classifier that relies on the
addition, the text of the attacks is tailored to the recipient, and   popularity as well as the position of the link in the text. Both
is typically not caught by volume-based filters.                      of the content classifiers can be retrained frequently using
   Our design goal is to detect and quarantine BEC attacks in         customer feedback.
real-time, at a low false positive rate (1 in a million emails)          To create the initial classifiers, we individually label and
and high precision (95%). We make the observations that               train each type of classifier: the labels of the impersonation
popular cloud email systems, such as Office 365 and Gmail,            classifier are generated using scripts we ran on the training
provide APIs that enable account administrators to allow ex-          dataset, while the content classifiers are trained over a manu-
ternal applications to access historical emails. Therefore, we        ally labeled training dataset. Since we run the content classi-
design a system that detects BEC by relying on historical             fication only on emails that were detected as impersonation
emails available through these APIs.                                  attempts, we need to manually label a much smaller subset of
   Prior work on detecting impersonation has been conducted           the training dataset. In addition, to ensure the impersonation
either on very small datasets [10, 14, 20, 45]), or focused on        classifier is trained successfully over the imbalanced dataset,
stopping a subset of BEC attacks (domain spoofing [14] or             we develop an under-sampling technique for legitimate emails
emails with links [20]). In addition, most prior work suffers         using Gaussian Mixture Models, an unsupervised clustering
from very low precision (only 1 in 500 alerts is an attack [20])      algorithm. The classifiers are typically re-trained every few
or very high false positive rates [10, 45]), which makes prior        weeks. The dataset available for initial training consists of
work unsuitable for detecting BEC in real-time.                       a year worth of historical emails from 1500 customers, with
   The main challenge in designing a system that can detect           an aggregate dataset of 2 million mailboxes and 2.5 billion
BEC at a low false positive rate is that BEC emails are very          emails. Since training the initial classifiers, our dataset has
rare as a percentage of all emails. In fact, in our dataset, less     been expanded to include tens of millions of mailboxes.
than one out of 50,000 emails is a BEC attack. Therefore, in             BEC-Guard uses the APIs of cloud-based email systems
order to achieve low false positives, we design a system using        (e.g., Office 365 and Gmail), both to automatically learn the
supervised learning, which relies on a large training set of          historical communication patterns of each organization within
BEC emails. However, bootstrapping a supervised learning              hours, and to quarantine emails in real-time. BEC-Guard sub-
systems presents two practical challenges. First, it is difficult     scribes to API calls, which automatically alert BEC-Guard
to label a sufficiently large training dataset that includes mil-     whenever a new email enters the organization’s mailbox. Once
lions of emails. Second, it is challenging to train a classifier on   notified by the API call, BEC-Guard classifies the email for
an imbalanced dataset, in which the training dataset contains         BEC. If the email is determined to be BEC, BEC-Guard uses
almost five orders of magnitude fewer positive samples (i.e.,         the APIs to move the email from the inbox folder to a dedi-
BEC attacks) than negative samples (i.e., innocent emails).           cated quarantine folder on the end-user’s account.
   In this paper, we present how we initially trained BEC-               To evaluate the effectiveness of our approach, we measured
Guard, a security system that automatically detects and quar-         BEC-Guard’s performance on a dataset of emails taken from
antines BEC attacks in real-time using historical emails. BEC-        several hundred organizations. Within this labeled dataset,
Guard is part of a commercial product, Barracuda Sentinel,            BEC-Guard achieves a a precision of 98.2%, a false positive
used by thousands of corporate customers of Barracuda Net-            rate of only one in 5.3 million. To summarize, we make the
works to prevent BEC, account takeover, spear phishing and            following contributions:
other targeted attacks. BEC-Guard does not require an ana-               • First real-time system for preventing BEC that achieves
lyst to review the detected emails, but rather relies on offline            high precision and low false positive rates.
and infrequent re-training of classifiers. The key insight of            • BEC-Guard’s novel design relies on cloud email provider
BEC-Guard is to split the training and classification into two              APIs both to learn the historical communication patterns
parts: header and body.                                                     of each organization, and to detect attacks in real-time.
   Instead of directly classifying BEC attacks, the imperson-            • To cope with labeling millions of emails, we split the
ation classifier detects impersonation attempts, by determin-               detection problem into two sets of classifiers run sequen-
ing if an attacker is impersonating an employee in the com-                 tially.
pany by inspecting the header of the email. It utilizes features         • We use different types of classifiers for the header and
that include information about which email addresses em-                    text of the email. The headers are classified using a ran-
ployees typically utilize, how popular their name is, and char-             dom forest, while the text classification relies primarily
acteristics of the sender domain. The content classifiers are               on a KNN model that is not dependent on any hard-coded
only run on emails that were categorized as impersonation at-               features, and can be easily re-trained.
tempts, and inspects the body of the email for BEC. For emails           • To train the impersonation classifier on an imbalanced
that do not contain links, we use a k-nearest neighbors [43]                dataset, we utilize a sampling technique for the legiti-
(KNN) classifier that weighs words using term frequency-                    mate emails using a clustering algorithm.

1292    28th USENIX Security Symposium                                                                           USENIX Association
BEC Objective                    Link?       Percentage          personal identifiable information (PII), typically in the form
                                                                      of W-2 forms that contain social security numbers. About
    Wire transfer                    No           46.9%
    Click Link                       Yes          40.1%
                                                                      40% of attacks ask the recipient to click on a link. 12% of
    Establish Rapport                No           12.2%               attacks try to establish rapport with the target by starting a
    Steal PII                        No           0.8%                conversation with the recipient (e.g., the attacker will ask the
                                                                      recipient whether they are available for an urgent task). For
Table 1: The objective of BEC attacks as a percentage of 3,000        the “rapport” emails, in the vast majority of cases, after the
randomly chosen attacks. 59.9% of attacks do not involve a phishing
                                                                      initial email is responded to the attacker will ask to perform a
link.
                                                                      wire transfer.
     Role              Recipient %                Impersonated %         An important observation is that about 60% of BEC attacks
                                                                      do not involve a link: the attack is simply a plain text email
     CEO        2.2%                              42.9%
     CFO        16.9%                             2.2%                that fools the recipient to commit a wire transfer or send
     C-level    10.2%                             4.5%                sensitive information. These plain text emails are especially
     Finance/HR 16.9%                             2.2%                difficult for existing email security systems, as well as prior
     Other      53.7%                             48.1%               academic work to detect [20], because they are often sent
                                                                      from legitimate email accounts, tailored to each recipient, and
Table 2: The roles of recipients and impersonated employees from
                                                                      do not contain any suspicious links.
a sample of BEC attacks chosen from 50 random companies. C-
level includes all executives that are not the CEO and CFO, and          We also sampled attacks from 50 random companies in
Finance/HR does not include executives.                               our dataset, and classified the roles of the recipient of the
                                                                      attack, as well as the impersonated sender. Table 2 presents
2     Background                                                      the results. Based on the results, the term “CEO fraud” used
Business email compromise, also known as employee im-                 to describe BEC is indeed justified: about 43% of the imper-
personation, CEO fraud, and whaling,1 is a class of email             sonated senders were the CEO or founder. The targets of the
attacks where an attacker impersonates an employee of the             attacks are spread much more equally across different roles.
company (e.g., the CEO, a manager in HR or finance), and              However, even for impersonated senders, the majority (about
crafts a personalized email to a specific employee. The intent        57%) are not the CEO. Almost half of the impersonated roles
of this email is typically to trick the target to wire money,         and more than half of targets are not of “sensitive” positions,
send sensitive information (e.g., HR or medical records), or          such as executives, finance or HR. Therefore, simply protect-
lead the employee to follow a phishing link in order to steal         ing employees in sensitive departments in not sufficient to
their credentials or download malware to their endpoint.              protect against BEC.
   BEC has become one of the most damaging email-borne
attacks in recent years, equaling or surpassing other types of        2.2    Common Types of BEC
attacks, such as spam and ransomware. Due to the severity of          To guide the discussion, we describe the three most common
BEC attacks, the FBI started compiling annual reports based           examples of BEC attacks within our dataset: wire transfer,
on US-based organizations that have reported their fraudulent         rapport, and impersonation phishing. In §6 we will discuss
wire transfers to the FBI. Based on the FBI data, between             other attacks that are not covered by this paper. All three
2013 and 2018, $12 billion have been lost [13]. To put this in        examples we present are real BEC attacks from within our
perspective, a Google study estimates that the total amount of        dataset, in which the names, companies, email addresses and
ransomware payments in 2016 was only $25 million [11].                links have been anonymized.
   In this section, we review common examples of BEC, and
provide intuition on how their unique characteristics can be                         Example 1: Wire transfer example
exploited for supervised learning classification.                     From : " Jane Smith " < jsmith@acrne .com >
                                                                      To : " Joe Barnes " < jbarnes@acme .com >
                                                                      Subject : Vendor Payment
2.1        Statistics
To better understand the goals and methodology of BEC at-             Hey Joe ,
tacks, we compiled statistics for 3,000 randomly selected             Are you around ? I need to send a wire
BEC attacks in our dataset (for more information about our            transfer ASAP to a vendor .
dataset, see §4.2). Table 1 summarizes the objectives of the
attacks. The results show that the most common BEC in the             Jane
sampled attacks is try to deceive the recipient to perform a
wire transfer to a bank account owned by the attacker, while             In Example 1, the attacker asks to execute a wire transfer.
about 0.8% of the attacks ask the recipient to send the attacker      Other similar requests include asking for W-2 forms, medical
                                                                      information or passwords. In the example the attacker spoofs
    1 We   refer to this attack throughout the paper as BEC.          the name of an employee, but uses an email address that

USENIX Association                                                                       28th USENIX Security Symposium         1293
scribe the unique attributes in the header of each example,
                    Example 2: Rapport example
                                                                              and then discuss the attributes of the email body and how they
From : " Jane Smith " < jsmith@acme .com >                                    can be used to construct the features of a machine learning
Reply - to : " Jane Smith " 
To : " Joe Barnes " < jbarnes@acme .com >                                     classifier. We also discuss legitimate corner cases of these
Subject : At desk ?                                                           attributes that might fool a classifier and cause false positives.
                                                                              Header attributes. In Example 1 and 3, the attacker im-
Joe , are you available for something urgent ?
                                                                              personates the name of a person, but uses a different email
          Example 3: Spoofed Name with Phishing Link                          address than the corporate email address. Therefore, if an
From : " Jane Smith " < greyowl1234@comcast .net >
                                                                              email contains a name of an employee, but uses an email
To : " Joe Barnes " < jbarnes@acme .com >                                     address that is not the typical email address of that employee,
Subject : Invoice due number 381202214                                        there is a higher probability that the sender is an imposter.
                                                                                 However, there are legitimate use cases of non-corporate
I tried to reach you by phone today but I
couldn ’t get through . Please get back to me
                                                                              emails by employees. First, an employee might use a personal
with the status of the invoice below .                                        email address to send or forward information to themselves
                                                                              or other employees in the company. Ideally, a machine learn-
Invoice due number 381202214:                                                 ing classifier should be able to learn all the email addresses
[http://firetruck4u.net/past-due-invoice/]
                                                                              that belong to a certain individual, including corporate and
                                                                              personal email addresses. Second, if an external sender has
does not belong to the organization’s domain. Some attackers                  the same name as an internal employee, it might seem like an
even use a domain that looks similar to the target organiza-                  impersonation.
tion’s domain (e.g., instead of acme.com, the attacker would                     In Example 2, the attacker spoofs the legitimate email ad-
use acrne.com). Since many email clients do not display the                   dress of the sender, but the reply-to email address is different
sender email address, some recipients will be deceived even                   than the sender address, which is unusual (we will also dis-
if the attacker uses an unrelated email address.                              cuss the case where the attacker sends a message from the
    Example 2 tries to create a sense of urgency. After the recip-            legitimate address of the sender without changing the reply-to
ient responds to the email, the attacker will typically ask for a             field in §6). However, such a pattern has legitimate corner
wire transfer. The email has the from address of the employee,                cases as well. Some web services and IT systems, such as
while the reply-to address will relay the response back to the                LinkedIn, Salesforce, and other support and HR applications,
attacker. Email authentication technologies such as DMARC,                    “legitimately impersonate” employees to send notifications,
SPF and DKIM can help stop spoofed emails. However, the                       and change the reply-to field to make sure the response to the
vast majority of organizations do not enforce email authenti-                 message is recorded by their system.
cation [25], because it can be difficult to implement correctly                  Other header attributes might aid in the detection of BEC
and often causes legitimate emails to be blocked.2 Therefore,                 attacks. For example, if an email is sent at an abnormal time of
our goal is to detect these attacks without relying on DMARC,                 day, or from an abnormal IP or from a foreign country. How-
SPF and DKIM.                                                                 ever, many BEC attacks are designed to seem legitimate, and
    Example 3 uses a spoofed name, and tries to get the re-                   are sent in normal times of day and from seemingly legitimate
cipient to follow a phishing link. Such phishing links are                    email addresses.
typically not detected by existing solutions, because the link
                                                                              Body attributes. The body of Example 1 contains two
is unique to the recipient (“zero-day”) and will not appear
                                                                              unique semantic attributes. First, it discusses sensitive in-
in any black lists. In addition, attackers often compromise
                                                                              formation (a wire transfer). Second, it is asking for a special,
relatively reputable websites (e.g., small business websites)
                                                                              immediate request. Similarly, the text of Example 2 is ask-
for phishing links, which are often classified as high repu-
                                                                              ing whether the recipient is available for an urgent request.
tation links by email security systems. The link within the
                                                                              Such an urgent request for sensitive information or availabil-
email will typically lead the recipient to a website, where they
                                                                              ity might be legitimate in certain circumstances (for example,
will be prompted to log in a web service (e.g., an invoicing
                                                                              in an urgent communication within the finance team).
application) or download malware.
                                                                                 The unique attribute in the body of Example 3 is the link
3    Intuition: Exploiting the Unique Attributes                              itself. The link is pointing to a website that does not have
                                                                              anything to do with the company: it does not belong to a web
     of Each Attack                                                           service the company typically uses, and it is not related to the
The three examples all contain unique characteristics, which                  company’s domain.
set them apart from innocent email messages. We first de-                        Finally, all three examples contain certain textual and visual
   2 Many organizations have legitimate systems that send emails on their     elements that are unique to the identity of the sender. For
behalf, for example, marketing automation systems, which can be erroneously   example, Example 1 contains the signature of the CEO and
blocked if email authentication is not setup properly.                        all of the emails contain a particular grammar and writing

1294     28th USENIX Security Symposium                                                                                  USENIX Association
style. If any of these elements deviate from the style of a          majority of BEC attacks.
normal email from a particular sender, they can be exploited
to detect an impersonation. Since in many BEC emails the             4.2    Dataset and Privacy
attackers take great care in making the email appear legitimate,     We developed the initial version of BEC-Guard using a dataset
we cannot overly-depend on detecting stylistic aberrations.
                                                                     of corporate emails from 1,500 organizations, which are ac-
   As shown above, each of the examples has unique anoma-            tively paying customers of Barracuda Networks. The organi-
lous attributes that can be used to categorize it as a BEC attack.   zations in our dataset vary widely in their type and size. The
However, as we will show in §7, none of these attributes on          organizations include companies from different industries
its own is sufficient to classify an email with a satisfactory       (healthcare, energy, finance, transportation, media, education,
false positive rate.                                                 etc.). The size of the organization varies from 10 mailboxes to
Leveraging historical emails. Much of prior work in de-              more than 100,000. Overall, to train BEC-Guard, we labeled
tecting email-borne threats relies on detecting malicious sig-       over 7,000 examples of BEC attacks, randomly selected from
nals in the email, such as sender and link domain reputa-            the 1,500 organizations.
tion [2, 48], malicious attachments [49], as well as relying on         To access the data, these organizations granted us permis-
link click logs and IP logins [20]. However, as Table 1 and          sion to access to the APIs of their Office 365 email environ-
the examples we surveyed demonstrate, most BEC attacks               ments. The APIs provide access to all historical corporate
do not contain any obviously malicious attachments or links.         emails. This includes emails sent internally within the orga-
Intuitively, access to the historical emails of an organization      nization, and from all folders (inbox, sent, junk, etc.). The
would enable a supervised learning system to identify the            API also allows us to determine which domains are owned by
common types of BEC attacks by identifying anomalies in              each organization, and even whether an email was read.
the header and body attributes. We make the observation that         Ethical and privacy considerations. BEC-Guard is part
popular cloud-based email providers, such as Office 365 and          of a commercial product, and the 1,500 customers that partic-
Gmail, enable their customers to allow third party applications      ipate in the dataset provided their legal consent to Barracuda
to access their account with certain permissions via public          Networks to access their historical corporate emails for the
APIs. In particular, these APIs can enable third-party applica-      purpose identifying BEC. Customers also have the option of
tions to access historical emails. This allows us to design a        revoking access to BEC-Guard at any time.
system that uses historical emails to identify BEC attacks.
                                                                        Due to the sensitivity of the dataset, it was only exposed to
                                                                     the five researchers who developed BEC-Guard, under strict
4     Classifier and Feature Design                                  access control policies. The research team only accessed his-
In this section, we describe BEC-Guard’s design goals, and its       torical emails for the purposes of labeling data to develop
training dataset. We then describe the initial set of classifiers    BEC-Guard’s classifiers. Once the classifiers were developed,
we used in BEC-Guard, and present our approach to training           we permanently deleted all of the emails that are not actively
and labeling.                                                        used for training the classifiers. The emails used for classifi-
                                                                     cation are stored encrypted, and access to them is limited to
4.1    Design Goals                                                  the research team.
The goal of BEC-Guard is to detect BEC attacks in real-time,
without requiring the users of the system to utilize security
                                                                     4.3    Dividing the Classification into Two Parts
analysts to manually sift through suspected attacks. To meet         The relative rare occurrence of BEC attacks influenced several
this goal, we need to optimize two metrics: the false positive       of our design choices. Our first design choice was to rule out
rate, and the precision. The false positive rate is the rate of      unsupervised learning. Unsupervised learning typically uses
false positives as a percentage of total received emails. If we      clustering algorithms (e.g., k-means [15]) to group email cat-
assume an average user receives over 100 emails a day, in            egories, such as BEC emails. However, a clustering algorithm
an organization with 10,000 employees, our goal is that it           would typically categorize many common categories (e.g.,
will be infrequent to encounter a false positive (e.g., once a       social emails, marketing emails), but since BEC is so rare, it
day for the entire organization). Therefore, our target false        results in low precision and many false positives. Therefore,
positive rate is less than one in a million. The precision is the    supervised learning algorithms are more suitable for detecting
rate of true positives (correctly detected BEC attacks) as a         BEC at a high precision. However, using supervised learning
percentage of attacks detected by the system, while the false        presents its own set of challenges.
positive rate is a percentage of false positives of all emails          In particular, BEC is an extreme case of imbalanced data.
(not just emails detected by the system). If the precision is not    When sampled uniformly, in our dataset, “legitimate” emails
high, users of BEC-Guard will lose confidence in the validity        are 50,000× more likely to appear than the BEC emails. This
of its predictions. In addition to these two metrics, we need        presents two challenges. First, in order to label a modest
to ensure high coverage, i.e., that the system catches the vast      number of BEC emails (e.g., 1,000), we need to label a corpus

USENIX Association                                                                      28th USENIX Security Symposium         1295
on the order of 50 million legitimate emails. Second, even                         Feature                                        Description
with a large number of labeled emails, training a supervised                       Sender has corp domain?                        Is sender address from corp domain?
classifier over imbalanced datasets is known to cause various
                                                                                   Reply-to != sender ad-                         Reply-to and sender addresses different?
problems, including biasing the classifier to prefer the larger                    dress?
class (i.e., legitimate emails) [24, 26, 47, 51]. To deal with this
                                                                                   Num times sender and                           Number of times sender name and email
extreme case of imbalanced data, we divided the classification                     email                                          address appeared
and labeling problem into two parts. The first classifier looks
                                                                                   Num times reply-to ad-                         Number of times reply-to address ap-
only at the metadata of the email, while the second classifier                     dress                                          peared
only examines the body and subject of the email.
                                                                                   Known reply-to service?                        Is reply-to from known web service (e.g.,
   The first classifier looks for impersonation emails. We de-                                                                    LinkedIn)?
fine an impersonation as an email that is sent with the name of
                                                                                   Sender name popularity                         How popular is sender name
a person, but was not actually sent by that person. Imperson-
ation emails include malicious BEC attacks, and they also in-                    Table 3: Main features used by the impersonation classifier, which
clude emails that legitimately impersonate an employee, such                     looks for impersonation attempts, including spoofed names and
as internal systems that send automated emails on behalf of                      emails.
an employee. The impersonation classifier only analyzes the                                                    1

metadata of the email (i.e., sender, receiver, CC, BCC fields).
The impersonation classifier detects both spoofed name (Ex-                                                  0.1

                                                                                    Percentage of Users
ample 1 and 3) and spoofed emails (Example 2). The second
                                                                                                            0.01
set of classifiers, the content classifiers, only classify emails
that were detected as impersonation emails, by examining
                                                                                                           0.001
the email’s subject and body to look for anomalies. We use
two different content classifiers that each look for different                                            0.0001
types of BEC attacks.3 The two content classifiers are: the
text classifier, which relies on natural language processing to
                                                                                                                   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
analyze the text of the email, and the link classifier, which
                                                                                                                           Number of Sender Emails Addresses
classifies any links that might appear in the email.
   All of our classifiers are trained globally on the same                       Figure 1: Number of unique emails addresses that were observed
dataset. However, to compute some of the features (e.g., the                     for each user in an organization with 44,400 mailboxes. The X
                                                                                 axis is the number of unique email addresses that were observed,
number of time the sender name and email address appeared
                                                                                 as a percentage (in the Y axis) of the total number of users of the
together), we rely on statistics that are unique to each organi-                 organization.
zation.
                                                                                 sufficient to detect impersonation,. For example, some of the
4.4     Impersonation Classifier                                                 senders that have a large number of email addresses represent
                                                                                 shared mailboxes (e.g., “IT” or “HR”), and are legitimate.
Table 3 includes the main features used by the impersonation
classifier. The features describe the number of times specific                      Hence, several of the features in the impersonation clas-
email addresses and names have appeared before in the sender                     sifier rely on the historical communication patterns of the
and reply-to fields, as well as statistics about the sender’s                    organization. This influenced BEC-Guard’s architecture. In
identity.                                                                        addition, we maintain a list of known web services that “legit-
   To demonstrate why it is helpful to maintain historical                       imately” send emails with reply-to addresses that are different
statistics of a particular organization, consider Figure 1. The                  than the sender address (e.g., LinkedIn, Salesforce), in order
figure depicts the number of email addresses that were used                      to capture the response. The original list of commonly-used
by each sender in an organization with 44,000 mailboxes                          services was populated from a list of the domains of the major
over three months. 82% of the users had emails sent from                         web services. We then augmented this list with additional ser-
only one address, and the rest had emails that were sent from                    vices when we encountered them during the labeling process
more than one address. The reason that some of the senders                       (in §6 we discuss possible evasion techniques related to this
used a large number of email addresses, is that they were                        list of legitimate reply-to senders). The sender name popu-
repeatedly impersonated in BEC attacks. For instance, the                        larity score is computed offline by maintaining a list of how
CEO is a common target for impersonation. and is often                           frequently names appear across different organizations in our
targeted dozens of times. However, this signal alone is not                      dataset. The more popular a name, the higher the likelihood
                                                                                 that a name with an email address the employee typically
   3 There is no inherent advantage in using multiple content classifiers in     does not use is another person (a name collision).
terms of the false positive rate or precision. We decided to use two different
content classifiers, because it made it easier for us debug and maintain them    Name and nickname matching. In order to detect name
separately.                                                                      spoofing, the impersonation classifier needs to match the

1296     28th USENIX Security Symposium                                                                                                            USENIX Association
sender name with a name of an employee. However, names
can be written in various forms. For example: “Jane Smith”                                   num times w appears in email
                                                                                T F(w) =
can be written as: “Smith, Jane”, “Jane J. Smith” or “Jane                                       num words in email
Jones Smith”. In addition, we need to deal with special char-                                       log(num emails)
acters that might appear in names, such as ì or ä.                                    IDF(w) =
                                                                                                   num emails with w
    To address these problems, BEC-Guard normalizes names.
                                                                         Where w is a given word in an email. T F(w) · IDF(w)
It stores employee name as  tuples, and
                                                                      gives a higher score to a word that appears frequently in a
checks all the variants of the sender name to see if it matches
                                                                      specific email, but which is relatively rare in the whole email
a name of an employee with a corporate email address. These
                                                                      corpus. The intuition is that in BEC emails, words for example
variants include stripping the middle name or initial, reversing
                                                                      that denote urgency or a special request would have a high
order of the first name and surname, and stripping suffixes.
                                                                      TFIDF score, because they appear frequently in BEC emails
Suffixes include examples like “Jr.” or when the email address
                                                                      but less so in legitimate emails.
is sent as part of the sender name. In addition, we match the
first name against a publicly available list of nicknames [36],          When training the text classifier, we compute the TFIDF
to catch cases for example when the attacker sends an email           score of each word in each email of the training set. We also
as “Bill Clinton”, and the name of the employee is stored as          compute the TFIDF for pairs of words (bigrams). We store
“William Clinton”.                                                    the global statistics of the IDF as a dictionary, which con-
                                                                      tains number of emails in the training set that contain unique
Content classifiers. Our system uses two content classi-              phrases encountered in the training of the text classifier. We
fiers: the text classifier and link classifier. The text classifier   limit the dictionary size to 10,000 of the top ranked words (we
catches attacks similar to Example 1 and 2, and the link classi-      evaluate how the size of the dictionary impacts classification
fier stops attacks that are similar to Example 3. By design, the      precision in §7.2).
content classifiers are meant to be updated more frequently              The feature vector of each email is equal to the the number
than the impersonation classifier, and should be easily re-           of words in the dictionary, and each number represents the
trained based on false negatives and false positives reported         TFIDF of each one of the words in the dictionary. Words
by users.                                                             that do not appear in the email, or that do not appear in the
                                                                      dictionary have a TFIDF of zero. The last step is to run a
Text classifier. In BEC attacks similar to Example 1 and 2,           classifier based on these features. Table 4 presents the top
the body contains words that are indicative of a sensitive or         10 phrases (unigram and bigram) in the BEC emails in our
special request, such as “wire transfer” or “urgent”. Therefore,      dataset. Note that the top phrases all indicate some form of
our first iteration of the text classifier was designed to look for   urgency.
specific words that might imply a special request or a financial
or HR transaction. The features of the classifiers described                       Top phrases in BEC emails by TFIDF
the position in the text of a list of sensitive words and phrases.                 1. got moment      6. need complete
However, over time, we noticed this approach suffered from
                                                                                   2. response        7. ASAP
several problems. First, a classifier that relies on hard-coded
keywords can miss attacks when attackers slightly vary a                           3. moment need     8. urgent response
specific word or phrase. Second, to successfully retrain the                       4. moment          9. urgent
classifier, we had to modify the lists of keywords that it looks
                                                                                   5. need            10. complete task
for, which required manually updating the keyword list on a
daily basis.                                                          Table 4: The top 10 phrases of BEC emails, sorted by their TFIDF
   Instead, we developed a text classifier that learns expres-        ranking from our evaluation dataset (for more information on evalu-
sions that are indicative of BEC on its own. The first step is to     ation dataset see §7.1). The TFIDF was computed for each word in
pre-process the text. BEC-Guard removes information from              all of the BEC emails in our evaluation dataset.
the subject and body of the email that would not be useful for
classifying the email. It removes regular expression patterns         Link classifier. The link classifier detects attacks similar
that include salutations (“Dear”, “Hi”), pre-canned headers,          to Example 3. In these attacks, the attacker tries to get the
as well as footers (“Best,”) and signatures. It also removes all      recipient to follow a phishing link. As we described earlier,
English stopwords, as well as any names that may appear in            these personalized phishing links are typically not detected by
the email.                                                            IP blacklists, and are usually unique to the recipient. In this
   The second step is to compute the frequency-inverse docu-          case, since the content classifier only classifies emails that
ment frequency [42] (TFIDF) score of each word in the email.          were already classified as impersonation emails, it can mark
TFIDF represents how important each word is in an email,              links as “suspicious”, even if they would have a high false
and is defined as:                                                    positive rate otherwise. For example, a link that points to a

USENIX Association                                                                        28th USENIX Security Symposium           1297
small website, or one that was recently registered, combined              employee, since they will not have any historical emails. As
with an impersonation attempt would have a high probability               that employee receives more emails, BEC-Guard will be start
of being a BEC email.                                                     compiling statistics for the employee. A similar problem may
                                                                          also arise in organizations that periodically purge their old
  Feature                    Description                                  emails. In practice, we found that the classifier performs well
  Domain popularity          How popular is the link’s least popular      after only one month of data.
                             domain
  URL field length           Length of least popular URL (long URLs
                                                                          4.6    Labeling
                             are more suspicious)                         In order to label the initial training set, we made several as-
  Domain registration date   Date of domain registration of least popu-   sumptions about the BEC attack model. First we assumed
                             lar domain (new domains are suspicious)      attackers impersonate employees using their name (under a
Table 5: Main features used by the link request classifier, which         set of allowed variations, as explained above). Second, we as-
stops attacks like in Example 3.                                          sumed the impersonation does not occur more than 100 times
                                                                          using the same email address. Third, we assumed the attacker
   Table 5 describes the main features used by the link request           uses an email address that is different than the corporate ad-
classifier. The domain popularity is calculated by measur-                dress, either as the from address or the reply-to address. We
ing the Alexa score of the domain. In order to deal with link             discuss other types of attacks that do not fit these assumptions,
shorteners or link redirections, BEC-Guard expands the URLs               as well as how attackers may evade these assumptions in §6.
before computing their features for the link classifier. In addi-         Under these constraints, we fully covered all of the possible
tion, several of the URL characteristics require determining              attacks and manually labeled them. In addition, we incorpo-
information about the domain (popularity and score). For the              rated missed attacks reported from customers (we discuss this
domain popularity feature, we cache a list of the top popular             process in §7.3).
domains, and update it offline. To determine the domain reg-                 The reason we assumed a BEC email does not impersonate
istration date, BEC-Guard does a real-time WHOIS lookup.                  an employee using the same email address more than 100
Note that unlike the impersonation classifier, which needs to             times is that BEC-Guard is designed with the assumption
map the distribution of email address per sender name, none               that the organization is already using at a spam filter, which
of the features of the text and link classifier are organization-         provides protection against volume-based attacks (e.g., the
specific. This allows us to easily retrain them based on user             default spam protection of Office 365 or Gmail). Therefore, an
reported emails.                                                          attacker that would send an email from an unknown address
                                                                          more than 100 times to the same recipient would likely be
4.5    Classifier Algorithm                                               blocked by the spam filter. In fact, in our entire dataset, which
                                                                          is only composed of post spam-filtered emails, we have never
The impersonation and link classifiers use random forest [5]
                                                                          witnessed an attacker using the email address to impersonate
classification. Random forests are comprised of randomly
                                                                          an employee more than 20 times. Note that we only used
formed decision trees [40], where each tree contributes a
                                                                          this assumption for labeling the original training set, and do
vote, and the decision is determined by the majority of the
                                                                          not use it for ongoing retraining (since retraining is based on
trees. Our system uses random forests rather than individual
                                                                          customer reported attacks).
decision trees, since we found they provide better precision,
but for offline debugging and analysis we often visualize                 Impersonation classifier. In order to label training data for
individual decision trees. We decided to use KNN for the text             the impersonation classifier, we ran queries on the headers of
classifier, because it had slightly better coverage than random           the raw emails to uncover all emails that might contain BEC
forests. However, we found that since the text classifier uses a          attacks under our labeling assumptions (see above). We then
very large number of features (a dictionary of 10,000 phrases),           labeled the results of all the queried emails as impersonation
its efficacy was similar across different classifiers. In §7.2 we         emails, and all the emails that were not found by the queries
evaluate the performance of the different classifier algorithms.          as legitimate emails.
   In addition, we have explored deep-learning based tech-                Content classifiers. The training dataset for the content
niques, such as word2vec [34] and sense2vec [46], which                   classifiers is constructed by running a trained impersonation
expand each word to a vector that represents its different mean-          classifier on a fresh dataset, which is then labeled manually.
ings. We currently do not use such deep-learning techniques,              The initial training set we used for the content classifiers in-
because they are computationally heavy both for training and              cluded 300,000 impersonation emails from randomly selected
online classification.                                                    organizations over a year of data. Even within this training
Detecting impersonation of new employees. When a new                      data set, we were able significantly further limit the number
employee joins the organization, the impersonation classi-                of emails that needed to be manually labeled. This is due to
fier will not have sufficient historical information about that           the fact that the vast majority of these emails were obviously

1298    28th USENIX Security Symposium                                                                              USENIX Association
not BEC attacks, because they were due to a legitimate web                                  Cluster 1
services that impersonates a large number of employees (e.g.,                       2                                          Cluster 2
a helpdesk system sending emails on behalf of the IT staff).
   After constructing the initial dataset, training content clas-

                                                                        Feature 2
sifiers is very straightforward, since we continuously collect                      1                       Cluster 3
false negative and false positive emails from users and add
them into the training set. Note that we still manually review
these samples before retraining as a measure of quality con-                        0
trol, to ensure that adversaries do not “poison” our training                           0               1                  2               3
set, as well as to make sure that users did not label emails                                                   Feature 1
erroneously.
                                                                    Figure 2: Depiction of running clustering algorithm on a set legiti-
Sampling the dataset. Naïvely training a classifier over an         mate emails in a two-dimensional feature space with three clusters.
imbalanced dataset typically biases the classifier to prefer the    After clustering the legitimate emails, we choose the number of
majority class. Specifically, it can result in a classifier that    samples from each cluster in proportion to the size of the cluster.
will simply always choose to predict the majority class, i.e.,
                                                                    example with two features and 14 legitimate email samples.
legitimate emails, and will thus achieve very high accuracy
                                                                    In this example, the samples are split into three clusters. To
(i.e., accuracy = (t p + tn)/(t p + tn + f p + f n), where t p is
                                                                    choose a representative sample of legitimate emails, we ran-
true positives, tn is true negatives, f p is false positives, and
                                                                    domly pick a certain number of samples from each cluster,
 f n is false negatives). Since BEC is so rare in our dataset,
                                                                    proportional to the number of legitimate emails that belong
a classifier that always predicts that an email is legitimate
                                                                    to each cluster. If for example our goal is to use a total of 7
would achieve a high accuracy. This problem is especially
                                                                    samples, we would choose 4 samples from the first cluster,
acute in the case of our impersonation classifier, which needs
                                                                    2 samples from the second cluster, and 1 sample from the
to do the initial filtering between legitimate and BEC emails.
                                                                    third cluster, because the original number of samples in each
In the case of content classifiers, we did not have to sample the
                                                                    cluster is 8, 4, and 2, respectively.
dataset, because it deals with a much smaller training dataset.
                                                                       We chose the number of clusters that guarantee a minimal
    There are various methods of dealing with imbalanced
                                                                    representation for each major “category” of legitimate email.
datasets, including over-sampling the minority class and
                                                                    We found that using 85 clusters was sufficient for capturing
under-sampling the majority class [6,24,27,29,30], as well as
                                                                    the legitimate emails in our dataset. When we tried using
assigning higher costs to incorrectly predicting the minority
                                                                    more than 85 clusters, the clusters beyond the 85th one would
class [9, 38].
                                                                    be nearly or entirely empty. Even after several iterations of
    Our second major design choice was to under-sample the
                                                                    retraining the impersonation classifier, we have have found
majority class (the legitimate emails). We made this decision
                                                                    that 85 clusters are sufficient to represent our dataset.
for two reasons. First, if we decided to over-sample the BEC
attacks, we would need to do so by a large factor. This might
overfit our classifier and bias the results based on a relatively
                                                                    5      System Design
small number of positive samples. Second, over-sampling             BEC-Guard consists of two key stages: an online classifica-
makes training more expensive computationally.                      tion stage and an offline training stage. Offline training is
    A naïve way to under-sample would be to uniformly sam-          conducted periodically (every few days). When a new email
ple the legitimate emails. However, this results in a classifier    arrives, BEC-Guard combines the impersonation and con-
with a low precision, because the different categories of legiti-   tent classifiers to determine whether the email is BEC or not.
mate emails are not well represented. For example, uniformly        These classifiers are trained ahead of time in the offline train-
sampling emails might miss emails from web services that            ing stage. We describe the key components of our system
legitimately impersonate employees. The impersonation clas-         design in more detail below.
sifier will flag these emails as BEC attacks, because they are         Traditionally, commercial email security solutions have a
relatively rare in the training dataset.                            gateway architecture, or in other words, they sit in the data
    The main challenge in under-sampling the majority class         path of inbound emails and filter malicious emails. As de-
is how to represent the entire universe of legitimate emails        scribed above, some of BEC-Guard’s impersonation classifier
with a relatively small number of samples (i.e., comparable         features rely on historical statistics of internal communica-
or equal to the number of BEC email samples). To do so, we          tions. The gateway architecture imposes constraints on detect-
cluster the legitimate emails using an unsupervised learning        ing BEC attacks for two reasons. First, a gateway typically
algorithm, Gaussian Mixture Models (GMM). The cluster-              cannot observe internal communications. Second, the gateway
ing algorithm splits the samples into clusters, each of which       usually does not have access to historical communications, so
is represented by a Normal distribution, projected onto the         it would require several months or more of observing the com-
impersonation classifier feature space. Figure 2 illustrates an     munication patterns before the system would be able to detect

USENIX Association                                                                              28th USENIX Security Symposium                 1299
Gateway Architecture                                            mailbox. Since the email is quarantined on the server side,
                                         API Architecture
                                                                       when the user’s email clients synchronize the email it will
                                                                       also get quarantined on the user’s email clients. In addition,
            Mail                                                       the vast majority of emails get quarantined by BEC-Guard
            flow                                            Mail       before they are synchronized to the user’s email client.
                                                            flow
                                   Learn past                          6     Evasion
                                 communication
     Filter emails
                                    patterns,                          In this section we discuss attacks that are currently not stopped
   from mail flow
                                Quarantine emails                      by BEC-Guard, and evasion techniques that can be used by at-
                                                                       tackers to bypass BEC-Guard and how they can be addressed.
                                                                           BEC-Guard is a live service in production, and has evolved
Figure 3: Comparison between the architecture of traditional email     rapidly since it was first launched in 2017. We have deployed
security systems, which sit as a gateway that filters emails before
                                                                       additional classifiers to augment the ones described in this
they arrive in the mail system, and BEC-Guard’s architecture, which
                                                                       paper in response to some of the evasion techniques presented
relies on APIs for learning the historical communication patterns of
each organization, and detecting attacks in real-time.                 below, and the existing classifiers have been retrained multiple
                                                                       times. Another benefit of the API-based architecture is that if
incoming BEC attacks. Fortunately, cloud-based email ser-              we find some attacks were missed by an evasion we can go
vices, such as Office 365 and Gmail, provide APIs that enable          back in time and find them, and update the system accordingly.
access to historical communications, as well as to monitor and         The email threat landscape is rapidly changing, and while
move emails in real-time. BEC-Guard leverages these APIs               it is important that the detectors maintain high precision, it
both to gain access to historical communication, and also to           is equally important that the security system can be easily
do near real-time BEC detection. Figure 3 compares the gate-           adapted and retrained.
way architecture with BEC-Guard’s API based architecture.
We describe BEC-Guard’s design and implementation using                6.1    Stopping Other Attacks
the Office 365 APIs.
                                                                       BEC-Guard focuses on stopping BEC attacks, in which an
Warmup phase. We name the process of analyzing each                    external attacker impersonates an employee. However, there
organization’s historical communications, the warmup phase.            are other types of BEC that are not covered by BEC-Guard.
In order to start the warmup, the organization enables BEC-
Guard to get access to its Office 365 account with an authen-          Account takeover. When attackers steal the credentials of
tication token using OAuth with an Office 365 administrator            an employee, they can login remotely to send BEC emails to
account. This allows BEC-Guard to access the APIs for all              other employees. We term this use case “account takeover”.
the users associated with the account. Once authenticated,             There are several approaches to detecting account takeover,
BEC-Guard starts collecting statistics necessary for the imper-        including monitoring internal emails for anomalies (e.g., an
sonation classifier (e.g., number of times a certain user sent an      employee suddenly sending many emails to other employees
email from a certain email address). The statistics collected          they typically do not communicate with), monitoring suspi-
by BEC-Guard go back one year. We found that the classifier            cious IP logins, and monitoring suspicious inbox rule changes
performs well with as little as one month of historical data.          (e.g., an employee suddenly creates a rule to delete outbound
                                                                       emails) [18–20]. This scenario is not the focus of BEC-Guard,
Online classification. After the warmup phase, BEC-
                                                                       but is covered by our commercial product.
Guard is ready to detect incoming BEC attacks in real-time.
To do so, BEC-Guard waits for a webhook API call from any              Impersonating both sender name and email without
of the users in the organization’s Office 365 account. The             changing reply-to address. It is possible that external at-
webhook API calls BEC-Guard anytime there is any new                   tackers could send emails that impersonate both the sender’s
activity for a specific user. When the webhook is triggered,           name and email address, without using a different reply-to
BEC-Guard checks if there is a new received email. If so,              address. We have not observed such attacks in our dataset,
BEC-Guard retrieves the email, and classifies it, first using          but they are possible, especially in the case where the attacker
the impersonation classifier, using a database that contains the       asks the recipient to follow a link to steal their credentials.
historical communication statistics unique to each organiza-           Similar to account takeover, such attacks can be detected by
tion. Then, only if it was classified as an impersonation email,       looking for abnormal email patterns. Another possible ap-
BEC-Guard classifies the email using the content classifiers.          proach, used by Gascon et al., is to look for anomalies in the
   If at least one of the content classifiers classifies the email     actual MIME header [14].
as a BEC attack, BEC-Guard quarantines the email. This is              Impersonation of external people. BEC-Guard’s imper-
performed by removing the email from the folder where it               sonation classifier currently relies on having access to the
was received by the user (typically the inbox folder), and             historical inbound email of employees. In order to detect im-
moving it into a designated quarantine folder in the end user’s        personation of external people that frequently communicate

1300    28th USENIX Security Symposium                                                                           USENIX Association
with the organization, BEC-Guard can incorporate emails that       Gmail or another spam filter), which would pick up a “volu-
are sent from external people to the company.                      metric” attack. Typically these systems would flag an email
                                                                   that was sent at once from an unknown address to more than
Text classification in any language. BEC-Guard is cur-
                                                                   100 employees as spam.
rently optimized to catch BEC in languages that appear fre-
                                                                       However, a sophisticated attacker may try to bypass these
quently in our dataset. Both the impersonation classifier and
                                                                   filters by sending a large number of legitimate emails from the
the link classifier are not language-dependent, but the text
                                                                   impersonated email address to a particular organization, and
classifier relies on the TFIDF dictionary is dependent on the
                                                                   only after sending hundreds of legitimate emails they would
language of the labeled dataset. There are a few possible ways
                                                                   send a BEC using that address. Of course the downside of this
to make BEC-Guard’s text classifier completely language ag-
                                                                   approach is that it would require more investment from the
nostic. One is to deliberately collect sufficient samples in a
                                                                   attacker, and increase the economic cost of executing a suc-
variety of languages (either based on user reports or generate
                                                                   cessful BEC campaign. One way to overcome such an attack,
them synthetically), and label and train on those emails. An-
                                                                   is to add artificial samples to the impersonation classifier that
other potentially more scalable approach is to translate the
                                                                   have higher thresholds, in order to remove the bias. Of course
labeled emails (e.g., using Google Translate or a similar tool).
                                                                   this may reduce the overall precision of BEC-Guard.
Generic sender names. BEC-Guard explicitly tries to de-
                                                                   Using infrequent synonyms. Another evasion technique is
tect impersonations of employee names. However, attackers
                                                                   to send emails that contain text that is different or has a lower
may impersonate more generic names, such as “HR team” or
                                                                   TFIDF than the labeled emails used to train our text classifier.
“IT”. This attack is beyond the scope of this paper, but we
                                                                   For example, the word “bank” has a higher TFIDF, than the
address it using a similar approach to BEC-Guard in order to
                                                                   word “fund”. As mentioned before, one way to overcome
detect these attacks: we combine our content classifiers with
                                                                   these types of attacks is to cover synonyms using a technique,
a new impersonation classifier, which looks for sender names
                                                                   such as word2vec [34].
that commonly occur across different organizations, but are
sent from a non-corporate email address or have a different        Manipulating fonts. Attackers have employed various font
reply-to address.                                                  manipulations to avoid text-based detectors. For example, one
                                                                   technique is to use fonts with a size of zero [35], which are
Brand impersonation. Similar to the “generic sender” at-           not displayed to the end user, but can be used to obfuscate
tack, attackers often impersonate popular online services (e.g.,   the impersonation or meaning of the text. Another technique
Google Drive or Docusign). These types of attacks are out of       is to use non-Latin letters, such as letters in Cyrillic, which
the scope for this paper, but we detect them using a similar       appear similar to the Latin letters to the end user, but are not
methodology of combining content classifier, with an imper-        interpreted as Latin by the text-based detector [16].
sonation classifier that looks for an anomalous sender (e.g.,         In order to deal with these types of techniques, we always
the sender name has “Docusign”, but the sender domain has          normalize any text before feeding it to BEC-Guard’s clas-
no relation to Docusign).                                          sifiers. For example, we ignore any text with a font size of
                                                                   zero. If we encounter Cyrillic or Greek in conjunction with
6.2    Evading detection                                           Latin text, we normalize the non-Latin letters to match the
Beyond BEC attacks that BEC-Guard is not designed to detect        Latin letter that is closest in appearance to it. While these
(as noted above), there are other several ways attackers can       techniques are heuristic based, they have proved effective in
try to evade BEC-Guard. We discuss these below and discuss         stopping the common forms of font-based evasion.
how we have adapted BEC-Guard to address them.                     Hiding text in an image. Instead of using text within the
Legitimizing the sender email address. Any system that             email, attackers can hide the text within an embedded image.
uses signals based on anomaly detection is vulnerable to at-       We have observed this use case very rarely in practice, most
tackers that invest extra effort in not appearing “anomalous”.     likely because these attacks are probably less effective. Many
For example, when labeling our dataset, we assume that the         email clients do not display images by default and even when
impersonated employee was not impersonated by the same             they do, the email may seem odd to the recipient. Therefore,
sender email address more than 100 times. While this thresh-       we currently do not address this use case, but a straightforward
old is not hard coded into the impersonation classifier, it was    way to address it would be to use OCR to extract the text
a threshold we used to filter emails for the initial training      within the image.
set, and therefore may bias the classifier. Note that we have      Using a legitimate reply-to address. As mentioned in §4.4
never observed an attacker impersonating an employee with          BEC-Guard relies on a list of legitimate reply-to domains to
the same email more than 20 times.                                 reduce false positives. This list could potentially be exploited.
   We believe this assumption is valid since BEC-Guard as-         For example, attackers could craft a LinkedIn or Salesforce
sumes that the organization is already using a volume-based        profile with the same name of the employee being imper-
security filter (e.g., the default spam protection of O365 or      sonated and send an impersonation email from that service.

USENIX Association                                                                    28th USENIX Security Symposium          1301
You can also read