TOWARD ACCURATE INFERENCE OF WEB ACTIVITIES FROM PASSIVE DNS DATA - IEEE IWQOS 2018

Page created by Jimmie Turner

IT & Technique

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

TOWARD ACCURATE INFERENCE OF WEB ACTIVITIES FROM PASSIVE DNS DATA - IEEE IWQOS 2018

Toward Accurate Inference of Web Activities
from Passive DNS Data
Jingxiu Su∗† , Zhenyu Li † , Stephane Grumbach‡ , Kave Salamatian§ , Chunjing Han† , Gaogang Xie†
∗ University of Chinese Academy of Sciences, China
† Institute
of Computing Technology, China
‡ Inria,France
§ LISTIC,University of Savoie,France

Abstract—DNS is a critical component of Internet architecture. filtering. Network address translation (NAT) is a method of
Almost all applications, in particular web based applications remapping one IP address space into another by modifying
that constitute the large majority of current Internet traffic, network address information in Internet Protocol (IP) datagram
leverage heavily on DNS. This makes DNS based measurements
a promising tool for understanding global properties of Internet packet headers while they are in transit across a traffic routing
traffic, e.g., sites audience, traffic matrix. However, using passive device [2]. The technique has become a popular and essential
DNS traces from local DNS servers is challenging because of tool in conserving global address space allocations in face of
DNS caching and NATs. The goal of this paper is twofold. First, IPv4 address exhaustion by sharing one Internet-routable IP
we show how to correct the bias due to DNS cache and the wide address of a NAT gateway for an entire private network.
use of NATs, to extract meaningful traffic information from DNS
traces. The techniques are then used and validated over a large In this paper, we discuss the effect of DNS caches and
dataset (1011 records) containing two days of full DNS access NAT of network wide traffic measurement obtained over DNS
from a major ISP providing both mobile and landline ADSL record from LDNS servers, and present methodological ways
in China. Second, we focus on the tracking activity and show of alleviating their effect to get around the optimized behavior
that although most sites accessed from China belong to Chinese of the DNS system to recover a realistic view of the real traffic.
corporations, most trackers belong to US ones. Mobile and ADSL
platforms are alike. The solutions to alleviate such issues can have extensive use
to other platforms. Then we use a large LDNS server log
I. I NTRODUCTION trace that consists of about 150 billion logs of a major mobile
Domain Name System (DNS) plays a crucial role in the and landline ADSL ISP in China, to validate our proposed
daily operation of Internet [1]. Almost all network services methodologies. We finally look into using this DNS trace
depend on DNS and leverage on its infrastructure. Since DNS for measuring the geographical distribution of web tracking
is already distributed, and can enable long-term observation of activities.
traffic parameters, looking at the local DNS servers of an ISP Web tracking technologies are used to collect and correlate
provides a simple monitoring vantage point to the ISP’s wide user web browsing behavior [3]. Such information is of
traffic that is easy to access and does not need to instrument interest to various areas, more generally, data gathered on
a large number of distributed measurement point. This makes Internet users browsing behavior represent a source of strategic
DNS a prominent source of information about global network information that have both economic and political value. Using
usages. the DNS records will give a direct and comprehensive view
However, using DNS measurement to infer traffic char- of the web tracking activity. We observe that while Chinese
acteristics is challenging. Extending observations made over web activity is strongly concentrated inside China with more
DNS traffic to overall traffic have two major methodological than 75% of web sessions going to Chinese web services, yet
challenges. The major issue with using DNS trace from local around 87% of tracking activity is ensured by US trackers.
DNS (LDNS) servers is the effect of DNS caching that This share is almost alike in Chinese and US sites.
filters out the DNS requests that are already cached in local The rest of the paper is organized as follows. We define the
and intermediate DNS caches. Caching is one of the oldest challenges and detail the methodology of accurate the data
techniques for improving performance in computer systems. in Section III. We investigate the inequality tracking activies
By storing information locally, caches typically enhance per- on geography in Section IV. We survey the related works in
formance by reducing access latency to data source and by Section V. We conclude the paper in Section VI.
reducing bandwidth requirements at the data source. Caching
has been widely explored by many client-server applications II. PASSIVE DNS DATA
(e.g., distributed file systems, web caching and DNS), and they The used dataset consists of all the DNS requests and their
derive significant benefits from doing so. resolution information received during two days by the DNS
Another issue is the use of Network Address Translation servers of a major mobile and ADSL (Asymmetric Digital
(NAT) middleboxes, which also has an impact on DNS cache Subscriber Line) ISP in China. The data were gathered from

978-1-5386-2542-2/18/$31.00 2018 IEEE

DNS servers located in different Chinese provinces and mu- 100
nicipalities covering the whole country. The dataset contains
about 150 billion DNS records in total, each having five fields:

CCDF
a timestamp (at second level precision), the anonymized source 10-2
IP sending the request, the domain name queried, the list of
Average number
resolved IP addresses and a field indicating if the address Variance number
resolution has been successful. It is noteworthy that even if Maximum number
10-4
the timescale of timestamp is 1 second, the trace are logs so 10-4 10-2 100 102 104
they are sorted, i.e. two DNS records timestamped at the same # of DNS requests per second

time, are guaranteed to happen with the same order as they Fig. 1: CCDF of observed DNS requests per second
are observed in the DNS log.
Identification of tracker domains: Because we will examine
the web tracking behavior using the DNS trace, we have several real users are hidden behind them. We therefore need
therefore to identify if a resource is relative to a tracker. For to detect these NATed addresses for not using them in analysis
this purpose, we use an approach based on a blacklist of of user based statistics like the audience of online resource or
filtering rules used for checking suspicious URLs by exact or co-occurence. Nevertheless, some other aggregated statistics
wildcards matching. This approach is commonly implemented can benefit from these NATed IPs even if there are several
by largely used Ad suppressing utilities [4]. We combined users behind them.
blacklists obtained from Adblock Plus [5], Ghostery [6] and We show in Table I the cross-correlation between the above
Disconnect [7]. These blacklists are partly overlapping. We defined three values. The average and the variance values
used in particular the specific China targeted blacklist from are strongly correlated, while the correlation with maximum
Adblock Plus [8] to ensure identification of all Chinese track- value is milder. Based on this observation, we use the average
ers. All these blacklists are used in practice and maintained number along with the maximum number of DNS requests
up to date by their providers. per second in order to classify IP addresses into NATed and
non-NATed ones.
III. PASSIVE DNS A NALYSIS M ETHODOLOGY
In this section, we discuss the impact of NAT and DNS Ave Var Max
caching on passive DNS analysis, and propose methodologies Ave 1.0000 0.9857 0.8777
to address the problems to obtain accurate analysis results. It Var 0.9857 1.0000 0.9282
is worth noting that although the context is in passive DNS Max 0.8777 0.9282 1.0000
traces from LDNS servers, our methodologies can be applied
to other relevant problems, like passive HTTP traces from TABLE I: Cross-correlation values between average (Ave),
ISP’s gateway that have the same issues as the passive DNS variance (Var) and maximum (Max) values of number of DNS
traces. requests per second

A. NAT Impact NAT detection is a well studied area and several active
It is important for the DNS analysis to ensure that at each and passive methods have been proposed to detect address
instant of time only a unique user is using a given IP address. translation boxes [10, 11]. However, most of these techniques
However, NAT middleboxes enable sharing a single routable leverage on header or payload contents. In our case, we
IP address between different users [9], that goes against this attempt to detect NATed IP addresses using only DNS queries.
need. In this section, we will present the methodology to Moreover, our goal is not per se to detect NAT boxes, but to
indentify the unique users for further analysis. detect cases where more than one users are using an IP address
From the DNS data, we observed for each source IP address simultaneously. In order to achieve this, we leverage on the
an unexpected high value of 4.01 DNS requests per second observation that NATed IP address generates a higher rate of
on average. In order to investigate more this value we derived DNS queries than non-NATed ones as they have several users
three values for each source IP address: the average number, behind them. It is noteworthy that a single web session might
the variance number, and the maximum number of DNS contain a large number of URLs resulting into a large influx
requests per second. We show in Figure 1 the Complemen- of DNS requests, and a large maximum number of requests.
tary Cumulative Distribution Function (CCDF) of these three This means that in practice, the DNS query flow comes from
values. a mixture of NATed and non-NATed nodes and we have to
As we can see from Figure 1, the average number of DNS classify incoming requests into these two classes.
request per second spans almost 6 orders of magnitude from Our dataset contains mobile and ADSL users. The practice
5 × 10−4 to 181, exhibiting a very heavy tail. This shows of mobile operator is to use NAT. This means that mobile users
clearly that some IP addresses are largely contributing to using IPv4 addresses are very likely behind NAT. The ISP who
the average. We can explain this by the fact that some IP provided the DNS dataset also provided us with IP address
addresses, in particular mobile IP addresses, are NATed and ranges used for ADSL and mobile networks. While mobile

users are very likely behind a NAT, ADSL users might also            user-session might consist of several TCP connections, opened
decide to share their ADSL connectivity and become NATed.            by the same host toward possibly different servers.
Because of the difference between these two category of users           There are relatively rich literatures on web user-session
we decided to analyze each one separately and compare the            identification [14–16]. Most of the existing works rely on
outcome.                                                             exhaustive packet or connection level traces and assume that
   NATed and non-NATed are separated through a mixture               only a single user is behind an IP address. In this paper, we
based classification. We fit the joint distribution of the average   propose a new methodology to detect user-sessions leveraging
and maximum values of DNS requests rate to a mixture of              only on DNS traces.
correlated General Gamma (GG) Distributions [12]. This GG               We first order the DNS dataset by source IP addresses and
distribution kernel is used in place of the classical Gaussian       generate for each observed IP address in the dataset a temporal
component because of the positivity constraint and its heavier       sequence of DNS requests. In the second step, we split each IP
tails. We further use a maximum Akaike information crite-            source address sequence’s into an alternation of user-sessions
rion [13] in order to select the number of mixture components.       and inactivity periods. A simple approach proposed in [14, 15]
This criterion gives that 2 components are enough to classify        used a chosen threshold θ. The splitting is done by aggregating
the observations. After classification we got two classes in         into a user-session any DNS request arriving before time
each category.                                                       threshold θ. This approach has been shown to work well if
                                                                     the threshold value is correctly chosen.
              Category     Class   Average   Maximum                    The choice of θ is generated through a mixture of distribu-
            Mobile users    1       0.012      2.45                  tion of the DNS request gap, i.e. the arrival time between DNS
            Mobile users    2       5.03      29.21                  requests. Because of the integer valued nature of the data, we
            ADSL users      1       0.046      3.83                  use a mixture of Poisson components to model the probability
                                                                     distribution of DNS request gaps. The Aikeke information
            ADSL users      2       0.78      18.60
                                                                     criteria shows that 3 classes are enough to model the empirical
TABLE II: Mixture Classification results: Average and Max-           distribution. Figure 2 shows the CCDF of the DNS request
imum number of DNS requests per second for each class                gaps measured over all DNS request sequences. We got three
obtained from the mixture of correlated GG distribution              classes after measuring: one class with an average of 0.05
                                                                     second, one with an average of 2.9 seconds, and the last with
   We present in Table II the result of the classification for       an average of 45 seconds. The first class contains only DNS
the two category of IP addresses. The table shows for each           gaps that are 0 (not shown in the Figure), the second class
class, the average and maximum values of the number of DNS           (shown as INSIDE user-session gaps in Figure 2) contains all
requests per second. In both IP address categories, there is a       DNS request gaps less than 5 seconds and the last class (shown
strong differentiation between two classes: one class generated      as BEYOND user-session gaps) contains the remaining DNS
very low number of DNS requests per second on average                request gaps. Based on the above observation, we have chosen
(0.012 for mobile users and 0.046 for ADSL users) and the            the threshold equals to 5 seconds, i.e., all DNS requests that
second class generated a much larger number of DNS requests          are distant by less than 5 seconds are assumed to belong to
per second. The first class is more compatible with a single         the same user-session.
user usage than the second class.
   Based on the above observation, we decided to use IP                        100
addresses detected in class 1 for single user analysis and
to assume that IP addresses detected in class 2 are NATed.                10-2
                                                                        CCDF

To see the filter impact on the original DNS dataset, we
took 10 million records which include 5,581,899 distinct IP
                                                                          10-4
addresses, after running the classification method, 29.2% of                         BEYOND user-session gaps
                                                                                     INSIDE user-session gaps
multi-users IP addresses have been pruned, remains 3,951,640                         Overall gaps
                                                                          10-6
single user IP addresses. We use only the logs from non-NATed                100                    101                102      103
                                                                                                          DNS request gaps
IP addresses in the following analysis.
                                                                                     Fig. 2: Distribution of DNS request gaps
B. Extraction of User-sessions
   Using DNS logs from non-NATed IP addresses, we further               Generally, a user-session has the following structure: an
deal with DNS caching impact. To this end, we first extract          initial site request which is made to a service/content provider,
user sessions.                                                       followed by a sequence of forthcoming requests to other
   The typical user behavior on Internet consists in activity        servers. As we are interested in trackers, we will consider only
period, where the user browses the Internet, alternating with        connections made to servers detected as tracker/advertiser by
silent period over which the user is not active. The active          the blacklists we describe in Section II. A user-session begins
period will be coined through the paper as user-session. A           with an access to a website and it is closed when the next DNS

request arrives later than the threshold θ. In between, any DNS user-session, while after rescaling this number increases to
request arriving is aggregated into the same user-session. 200. We investigated the source of these cases, and observed
In summary, a user-session relative to a user k (more that they are all linked to webpages or Internet services that
precisely relative to a non-NATed IP address with a single user relies on several others, so that a contact to one of them results
behind it) that begins at time t is a set Sik (t) containing the in a cascade of DNS accesses. News aggregators or Internet
canonical name of the first non-tracker server contacted during portals are example of such situations. In these cases, trackers
the user-session followed by a sequence of tracker domain in each contacted server add to each other resulting into a
names contacted during the same user-session. The sets Sik (t) large number of trackers in the session.
are the main raw data we will use in the forthcoming.
C. DNS Cache Impact
Using the above user-session extraction methodology, we
deal with the issue of DNS cache. A network client accessing
an online resource with a canonical name as to translate this
name to an IP address. However, before sending the DNS
request to the DNS server of its operator, the client first checks
if this information is available in its local DNS cache.
A requested DNS record not in the cache initiates a miss that
results in querying the DNS record from higher level of the
DNS cache hierarchy. When the record is retrieved, it is cached Fig. 3: CCDF of the number of trackers observed in a user-
during a time defined by the TTL ( time-to-live) attached to the session.
DNS request. A request arriving during the caching gets a hit.
In the DNS server at the ISP level, we will only observe the 3) Accuracy verification with Lightbeam: The last evalua-
first request and not the subsequent requests inside the caching tion we made is to validate the accuracy of the trackers in each
duration. This means that local cache filters out a relatively final user-session after applying the above two methodologies
large proportion of DNS requests that never reach the ISP to solve the NAT and DNS cache impact. We compare our
caches that we are monitoring, i.e., any observation using DNS result with Lightbeam, which is an add-on for browsers that
traces is a partial sampling of real Internet activity [17]. displays third party tracking cookies placed on the user’s
We propose a rescaling methodology to solve the above computer while visiting various websites. Lightbeam displays
cache issue based on user-session structure. a graph of the interactions and connections of sites visited
1) Rescaling the DNS observations: We have described and the tracks to which they provide information [18]. We
how to split the DNS requests into sets of user sessions revisit a sample of 809 sites (these sites are extract from DNS
Sik (t) in the previous subsection. However, different users, dataset), and use Lightbeam to monitor the tracking activities.
at different timestamps, and different locations that access the The comparison results are remarkable, an average accuracy
same destination service/content provider, will not observe all rate over the 809 sites is 91.7%.
the same DNS cache state. In other words, different user-
sessions going to the same content/service might contain IV. G EO - POPULARITY OF T RACKING ACTIVITIES
different but still incomplete list of contacted trackers. We can The previous analysis provides strong basis for using the
leverage these differences and merge the different sets Sik (t) DNS traces to analyze the audience of different web activities.
relative to the same initial site site1. This will results into This section takes tracking activities as an example to show the
an inflated set of trackers that will contain all that have been use of passive DNS traces after applying our methodologies to
missed through the DNS caching. We can rescale the DNS infer web activities. Specially, we consider the trackers from
observations, by replacing this merged set in place of any set the point of view of the geography.
Sik (t) beginning with the site site1. For simplicity but without lack of relevance, we assign a
2) DNS Rescaling results: We show here the results of do- tracker service to the country that is registered in the WHOis
ing the user-session splitting and rescaling the DNS requests. database[19] along with the corresponding canonical domain
As we will examine tracking activities, we focus here on name. The WHOis database contains contact information for
the effect of DNS rescaling methodology on trackers. After administrative and technical contact points along with the
doing the user-session splitting on the dataset, we ended up country. We therefore assign to each tracker the country
with 160 million user sessions containing at least one tracker reported in the WHOis database to the corresponding DNS
with an average of 1.86 DNS requests per user-session. After canonical domain name entry. The traffic resulting from the
rescaling, this number increases to 2.73. We show in Figure DNS traces we have analyzed can thus be attached to desti-
3 the distribution of the number of trackers observed in a nation countries. More precisely, the traffic load of a country
user-session both before and after the DNS rescaling. It can is measured as the number of DNS requests that resolve to an
be seen the DNS rescaling fattens strongly the tail of the IP address in this country, or more precisely to an IP address
distribution. Before rescaling we had up to 40 trackers in a that belongs to a corporation based in this country.

A. Geographical inequality of tracking activities

(a) Mobile (b) ADSL

Fig. 6: Overall mobile traffic share among mobiles and ADSL.

Fig. 4: Traffic share between countries from China.

We show the global view of geographic distribution by (a) Mobile (b) ADSL
measuing on the whole dataset. As can be seen in Figure 4,
Fig. 7: Advertisers/Trackers DNS traffic share among mobile
China is the destination of more than 73% percent of the traffic
and ADSL.
from China; while the US account for 24%. All other countries
account for less than 3% of the whole traffic.
When we consider instead the tracker traffic, we observe a ecosystem.
rather different trend. We show in Figure 5 the distribution of
the traffic to tracker services worldwide. It can be observed B. Mobile/ADSL comparision
that the US is attracting more than 87% of all the tracker traffic In Figure 6(a) and Figure 6(b), we show the share of
from China. The second rank is occupied by the GB with traffic we have observed over all DNS traces between different
7.2% of tracking traffic, while China itself only occupies the countries, on mobile and ADSL platforms. It can be seen that
third rank with 3.2% of the tracker traffic on its own territory. China is the destination of 76.32% of mobile and 67.24% of
These results confirm trends observed previously on a different ADSL based traffic; US accounts for 22.22% of mobile traffic
dataset in [20], where it was shown that China dominates its and 30.96% of ADSL. All other countries account for less
local Web with more than 80% of local sites, while these sites than 2% of traffic. China has a relatively higher occupancy on
contain a majority of US trackers. mobile over ADSL when compared to the second dominator
US.
We continue look at the tracker traffic and observe whether
it follows the above described trend. We show in Figure
7(a) and Figure 7(b) the geographical distribution of overall
traffic and tracking traffic respectively. We can see that US is
attracting more than 84% of both mobile and ADSL tracker
traffic. Interestingly the second rank is occupied by GB with
8.95 to 10.88% of tracking traffic and China only appears
at the third rank with 1.68% of mobile traffic and 4.37% of
ADSL traffic. Generally, the traffic share follows the similar
trend on mobile and ADSL platforms.
V. R ELATED WORKS
As DNS is perceived as a critical infrastructure [21], studies
have focused on DNS for the purpose of identifying malicious
activities [22, 23], managing large DNS infrastructures [24],
Fig. 5: Traffic share of tracking services from China. understanding how DNS server selection and caching works
in reality [25, 26], and modeling its infrastructure in order to
It should be noted that this surprising situation holds despite predict how DNS traffic will change under specific conditions
the fact that China has a rich advertisement and e-commerce [27]. In contrast, there has been little work on how to solve

the DNS inherent issues as explained in this paper, our work      [12] P. S. Bithas, N. C. Sagias, et al., “Distributions involving
is to fill this gap.                                                   correlated generalized gamma variables,” in Proc. Int.
   Another aspect related to our work is about online web              Conf. on Applied Stochastic Models and Data Analysis,
tracking. Web tracking ecosystem has been well analyzed                vol. 12, 2007.
[28–31]. However the previous studies either focused on           [13] Wikipedia, “Akaike information criterion.” https://en.
the analysis of specific types of third-party trackers or the          wikipedia.org/wiki/Akaike information criterion, 2017.
classification over world without inspect deep into China. Our    [14] V. Paxson and S. Floyd, “Wide area traffic: The failure
study examined the geographical distribution of web tracking           of poisson modeling,” IEEE/ACM Trans. Netw., vol. 3,
services that serve Chinese netizens. .                                pp. 226–244, June 1995.
                                                                  [15] C. Nuzman, I. Saniee, W. Sweldens, et al., “A compound
                     VI. C ONCLUSIONS                                  model for tcp connection arrivals for lan and wan appli-
   In this paper, we have developed new methodologies to               cations,” Comput. Netw., vol. 40, pp. 319–337, Oct. 2002.
accurately analyze the DNS log data, and get around the           [16] A. Bianco, G. Mardente, M. Mellia, M. Munafo, and
optimized behavior of the DNS system to recover a realistic            L. Muscariello, “Web user session characterization via
view of the real traffic, and recollect user-sessions. We used         clustering techniques,”
a large passive DNS dataset to illustrate the impact of DNS       [17] N. C. Fofack and S. Alouf, “Modeling modern dns
cache and NAT, and validate our methodologies. Specially,              caches,” ValueTools ’13, pp. 184–193, ICST, 2013.
we applied our methodologies on the dataset to infer web          [18] Lightbeam, “Shine a light on who is watching you.”
tracking activities through the DNS dataset. We observe that           https://www.mozilla.org/en-US/lightbeam/.
although the Chinese Internet is mostly organized around on-      [19] “Whois.net.” https://www.whois.net.
line services offered by Chinese corporations, the underlying     [20] C. Castelluccia, S. Grumbach, and L. Olejnik, “Data
trackers are mostly dominated by US corporations. Similar              harvesting 2.0: from the visible to the invisible web,”
observations are seen on mobile and ADSL platforms.                    Collection of Czechoslovak Chemical Communications,
                                                                       vol. 24, no. 3, pp. 760–765, 2013.
                 VII. ACKNOWLEDGEMENT                             [21] L. Deri, S. Mainardi, M. Martinelli, and E. Gregori,
   This work is supported in part by National Key R&D                  “Graph theoretical models of dns traffic,” in IWCMC,
Program of China (Grant No. 2016YFE0133000): EU-China                  2013, pp. 1162–1167, IEEE, 2013.
study on IoT and 5G(EXCITING), National Natural Science           [22] M. Antonakakis, R. Perdisci, et al., “Building a dynamic
Foundation of China (Grant No. 61572475 and 61502460),                 reputation system for dns.,” in USENIX security sympo-
and the Youth Innovation Promotion Association CAS.                    sium, pp. 273–290, 2010.
                                                                  [23] A. Berger and E. Natale, “Assessing the real-world dy-
                        R EFERENCES                                    namics of dns,” Traffic Monitoring and Analysis, pp. 1–
 [1] C. Liu and P. Albitz, DNS and Bind. ” O’Reilly Media,             14, 2012.
     Inc.”, 2006.                                                 [24] C.-S. Chen et al., “A unifying framework for intelli-
 [2] J. Technologies, Network Protocols Handbook. Javvin               gent dns management,” International journal of human-
     Technologies Inc., 2005.                                          computer studies, vol. 58, no. 4, pp. 415–445, 2003.
 [3] N. Schmucker, “Web tracking,” in SNET2 Seminar               [25] D. Wessels, M. Fomenkov, et al., “Measurements and
     Paper-Summer Term, 2011.                                          laboratory simulations of the upper dns hierarchy,” in
 [4] A. Yamada, H. Masanori, and Y. Miyake, “Web track-                PAM’04, pp. 147–157, Springer.
     ing site detection based on temporal link analysis,” in      [26] J. S. Otto, M. A. Sánchez, et al., “Content delivery
     WAINA’2010, pp. 626–631, IEEE, 2010.                              and the natural evolution of dns: remote dns trends,
 [5] W. Palant, “Adblock plus: Save your time and traffic.”            performance issues and alternative solutions,”
     https://easylist.adblockplus.org/en/, 2017.                  [27] Y. Koc, A. Jamakovic, and B. Gijsen, “A global reference
 [6] R. Bilton, “Ghostery: A web tracking blocker that actu-           model of the dns,” Sponsoring Institutions, p. 7, 2011.
     ally helps the ad industry,” vol. 31, 2012.                  [28] B. Krishnamurthy and C. Wills., “Privacy diffusion
 [7] Disconnect. https://disconnect.me/lists/malvertising.             on the web: A longitudinal perspective,” WWW ’09,
 [8] “Adblock plus easylist china.” https://easylist-downloads.        pp. 541–550, 2009.
     adblockplus.org/easylistchina+easylist.txt, 2017.            [29] B. Krishnamurthy, K. Naryshkin, and C. Wills, “Privacy
 [9] F. Audet and C. Jennings, “ Network Address Translation           leakage vs. protection measures: the growing discon-
     (NAT) Behavioral Requirements for Unicast UDP,” RFC               nect,” in Proceedings of the Web, vol. 2, pp. 1–10, 2011.
     4787, RFC Editor, January 2007.                              [30] F. Roesner, T. Kohno, and D. Wetherall, “Detecting
[10] S. M. Bellovin, “A technique for counting natted hosts,”          and defending against third-party tracking on the web,”
     IMW ’02, pp. 267–272, ACM, 2002.                                  NSDI’12, pp. 12–12, USENIX Association, 2012.
[11] G. Maier, F. Schneider, and A. Feldmann, “Nat usage          [31] M. Falahrastegar, H. Haddadi, S. Uhlig, and R. Mortier,
     in residential broadband networks,” PAM’11, pp. 32–41,            “Anatomy of the third-party web tracking ecosystem,”
     Springer-Verlag, 2011.

You can also read