Longitudinal characterization of X.509 revocation statuses

Page created by Tracy Salinas
 
CONTINUE READING
Linköping University | Department of Computer and Information Science
                                Bachelor’s thesis, 15 ECTS | IT-security
                              2021 | LIU-IDA/LITH-EX-G--2021/073--SE

  Longitudinal characterization
  of X.509 revocation statuses
  –  A framework for monitoring newly issued certificates
  from the most popular Certificate Transparency logs

  Longitudinell karaktärisering av certifikatåterkallning

  Adam Halim
  Max Danielsson

  Supervisor : Niklas Carlsson
  Examiner : Marcus Bendtsen

                                                   Linköpings universitet
                                                     SE–581 83 Linköping
                                              +46 13 28 10 00 , www.liu.se
Upphovsrätt
Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från
publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår.
    Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut en-
staka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning
och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva
detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande.
För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och
administrativ art.
    Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfat-
tning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd
mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är
kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart.
    För ytterligare information om Linköping University Electronic Press se förlagets hemsida
http://www.ep.liu.se/.

Copyright
The publishers will keep this document online on the Internet - or its possible replacement
- for a period of 25 years starting from the date of publication barring exceptional circum-
stances.
    The online availability of the document implies permanent permission for anyone to read,
to download, or to print out single copies for his/hers own use and to use it unchanged for
non-commercial research and educational purpose. Subsequent transfers of copyright cannot
revoke this permission. All other uses of the document are conditional upon the consent
of the copyright owner. The publisher has taken technical and administrative measures to
assure authenticity, security and accessibility.
    According to intellectual property law the author has the right to be mentioned when
his/her work is accessed as described above and to be protected against infringement.
    For additional information about the Linköping University Electronic Press and its proce-
dures for publication and for assurance of document integrity, please refer to its www home
page: http://www.ep.liu.se/.

       Adam Halim
   ©
       Max Danielsson
Abstract

    The X.509 landscape is one of the cornerstones of the internet today. It is used to es-
tablish trust between entities online. Revocations of X.509 certificates are a vital part of the
infrastructure to ensure that communicating parties can, in fact, be trusted. Today, these
revocations are handled by Certificate Authorities who provide either an OCSP response
or a CRL with the revocation status for their certificates.
    A framework was developed, written in Go, to enable longitudinal characterization of
X.509 revocation statuses. We show that using the framework, it is possible to conduct a
large scale analysis of X.509 certificates during an extended time. Using the data collected,
we present preliminary analysis results and discuss the implications of the findings.
    We conclude that CAs, in general, behave similarly, with a few exceptions. Further-
more, we believe that large scale longitudinal analysis of revocation statuses provides a
basis to hold CAs accountable and increase transparency in the X.509 landscape.
Acknowledgments

We want to thank our supervisor Niklas Carlsson for the great insight and help during this
project. Without his help, we could not have done this project, and we are very grateful for
the opportunity of working with him.

                                             iv
Contents

Abstract                                                                                                                                                                     iii

Acknowledgments                                                                                                                                                              iv

Contents                                                                                                                                                                      v

List of Figures                                                                                                                                                             vii

List of Tables                                                                                                                                                              viii

1   Introduction                                                                                                                                                              1
    1.1 Motivation . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     1
    1.2 Aim . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     1
    1.3 Research questions      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     1
    1.4 Contributions . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     2
    1.5 Delimitations . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     2
    1.6 Thesis outline . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     2

2   Background                                                                                                                                                                3
    2.1 Revocation statuses & certificates . . .                            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     3
    2.2 CAs . . . . . . . . . . . . . . . . . . . .                         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     3
    2.3 CRL . . . . . . . . . . . . . . . . . . . .                         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     4
    2.4 OCSP . . . . . . . . . . . . . . . . . . .                          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     4
    2.5 Other methods of certificate validation                             .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     4
    2.6 Certificate Transparency . . . . . . . .                            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     4
    2.7 Certificate types . . . . . . . . . . . . .                         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     5
    2.8 Related work . . . . . . . . . . . . . . .                          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     5

3   Data collection framework                                                                                                                                                 6
    3.1 Data collection methodology . . . . . . . . . .                                         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     6
         3.1.1 Overview of monitoring tool . . . . . .                                          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     6
         3.1.2 Hardware usage . . . . . . . . . . . . .                                         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     8
    3.2 Data structure and performance requirements                                             .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     8
    3.3 CT log fetching . . . . . . . . . . . . . . . . . .                                     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     9
         3.3.1 Logs that were followed . . . . . . . . .                                        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     9
    3.4 Periodical status checking . . . . . . . . . . . .                                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    10
         3.4.1 Unique chain certificates . . . . . . . . .                                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    10
         3.4.2 Verification of logged data . . . . . . . .                                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    10
         3.4.3 OCSP-response . . . . . . . . . . . . . .                                        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    11
         3.4.4 CRL . . . . . . . . . . . . . . . . . . . . .                                    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    11
         3.4.5 Certificate type categorization . . . . .                                        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    11
         3.4.6 Sources of error . . . . . . . . . . . . . .                                     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    11
    3.5 Analysis methodology . . . . . . . . . . . . . .                                        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    12

                                                                        v
4   Preliminary results                                                                               13
    4.1 Measured results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      13

5   Discussion                                                                                        17
    5.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    17
    5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   17
    5.3 The work in a wider context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       18

6   Conclusion                                                                                        20
    6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     20

Bibliography                                                                                          21

7   Appendix A                                                                                        23

8   Appendix B                                                                                        24

                                                   vi
List of Figures

3.1   Overview of CT-logging logic . . . . .        . . . . . . . . . .   . . . . . . . . . . . . . . . .   7
3.2   Overview of revocation checking logic         . . . . . . . . . .   . . . . . . . . . . . . . . . .   8
3.3   The database, divided into collections        (as described in      Section 3.4), showing all
      fields in the certificate data structure. .   . . . . . . . . . .   . . . . . . . . . . . . . . . .   9

                                                    vii
List of Tables

3.1   Min, max, average, mean and standard deviation of response times to the ten
      largest OCSP servers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                     9

4.1   Fraction of certificates with a CRL. . . . . . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   13
4.2   Fraction of certificates with OCSP. . . . . . . . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   13
4.3   Fraction of revoked certificates after a certain amount of days. . . .       .   .   .   .   .   .   .   .   .   14
4.4   Amount of un-revocations per CA. . . . . . . . . . . . . . . . . . . .       .   .   .   .   .   .   .   .   .   14
4.5   Reasons for revocation by Certificate Authority. . . . . . . . . . . .       .   .   .   .   .   .   .   .   .   15
4.6   Fraction of certificates that are of a given certificate type. . . . . . .   .   .   .   .   .   .   .   .   .   15
4.7   Fraction of revoked certificates per certificate type. . . . . . . . . . .   .   .   .   .   .   .   .   .   .   16
4.8   Fraction of revoked certificates for each certificate type for each CA.      .   .   .   .   .   .   .   .   .   16
4.9   Certificate validity periods split into three intervals. . . . . . . . . .   .   .   .   .   .   .   .   .   .   16

                                                  viii
1      Introduction

1.1       Motivation
The X.509 public key infrastructure (PKI) is a central part of the internet today and is re-
sponsible for ensuring communication is done securely. One component of the PKI is the
certificate authority (CA); this is the entity that is responsible for issuing certificates. Each
certificate a CA issues has a validity period, and under normal circumstances, the certificate
will be used for the entire validity period. However, there are cases where a certificate may
need to be rendered invalid before its validity period runs out. For example, if a private key
is compromised, a CA might want to make all certificates linked to that private key invalid
[3].

1.2       Aim
This thesis aims to gather and analyze data regarding revocation statuses in X.509 certifi-
cates and examine potential patterns connected to the revocations. This analysis may then
serve to increase awareness of inconsistencies in the process of invalidating certificates using
revocations.

1.3       Research questions
This thesis intends to answer the following questions regarding the X.509 certificates:

  1. Why do CAs have different revocations rates and is it possible to determine the cause
     behind the differences?

  2. Does the revocation statuses change over time for different CAs after a certificate has
     been revoked?

    To answer these questions, we analyze several factors of the revocation behaviour, in-
cluding how many certificate revocations occur for each CA and how their revocation rate
changes over time. Furthermore, the behavior of the individual certificates will be compared
to their CA to see if any patterns emerge. One such pattern could for example be the rate of
un-revocation during a certificates lifetime and if it varies between CAs.

                                               1
1.4. Contributions

1.4   Contributions
In this thesis, we developed a framework that allows monitoring CT logs for newly issued
X.509 certificates and revocation status checking for these certificates. We have also devel-
oped tools for characterizing CAs and certificate revocation statuses. Newly issued certifi-
cates were gathered for seven days from several CT logs, and their revocation statuses were
checked once every day, building a dataset with historic revocation statuses. Our dataset
shows that most CAs did not behave differently from each other when it came to revocation
handling.

1.5   Delimitations
Due to the timeline of this thesis, our dataset is limited regarding how many revocation
checks have been collected at the time of the writing of the thesis. Unfortunately, this lim-
its what kind of analysis can be done at this stage of the project, and this thesis will only
include a preliminary analysis of the data.
    Furthermore, the nature of CT logs and revocation checking does not include historical
data of previous revocation statuses. This makes it impossible to recreate our exact dataset,
and there is no way of verifying that the dataset was created correctly.

1.6   Thesis outline
The structure for the rest of this thesis begins with Chapter 2, covering the theory and con-
cepts neccesary to follow the methodology in Chapter 3 and results in Chapter 4. In Chapter
3, the data collection framework is explained in detail, covering both the design and data
collection of the framework. Chapter 4 presents results from the collected data, and a dis-
cussion of the results, and the work in a wider context, are presented in Chapter 5. Finally, a
conclusion and future work are presented in Chapter 6.

                                                                                             2
2      Background

2.1       Revocation statuses & certificates
The use of certificates is a crucial part of the web PKI. Websites can use certificates signed by
a CA to verify that they can be trusted. If a certificate’s security were to be compromised,
a CA could make the certificate invalid by revoking it. This changes a certificate’s status
to be revoked, and it should no longer be recognized as trusted. The revocation serves to
protect the servers and clients from fraudulent use of untrusted certificates. In practice, this
infrastructure comes with certain drawbacks such as increased load times and can end up
being ignored by the client-side either way[15].
    Revocation is typically done in two ways: with certificate revocation lists (CRLs) or via
the Online Certificate Status Protocol (OCSP). CRLs work by having CAs periodically issue a
timestamped list of revoked certificates (i.e. the CRL). This CRL is publicly available and can
be downloaded by a browser to verify a certificate’s validity [3]. OCSP allows an application
to issue a status request to an endpoint and waits for a response before accepting a certificate
as valid [19].
    CRLs and OCSP come with many issues regarding both speed and integrity, leading to
some browsers using proprietary ways of checking revocation statuses[13]. For example,
Chrome uses CRLSets1 , and Mozilla uses OneCRL2 . They both work by having the browser
send out a list of updated revoked certificates to its users via software updates.

2.2       CAs
CAs are an integral part of the web PKI since they are the origin of trust in the infrastructure.
This comes from the CA providing certificates to domains seeking to be regarded as secure.
The CA also undertakes the work of making sure that the domain is (to some extent) secure.
   CAs undergo a thorough process to be included as trusted root certificates as can be seen
both at Microsoft3 and Mozilla4 . These inclusion processes differ somewhat between software
vendors, contributing to differences in the set of trusted roots in each root store [10].
   1 https://www.imperialviolet.org/2012/02/05/crlsets.html
   2 https://blog.mozilla.org/security/2015/03/03/revoking-intermediate-certificates-introducing-onecrl/
   3 https://docs.microsoft.com/en-us/previous-versions//cc751157(v=technet.10)
   4 https://wiki.mozilla.org/CA/Application_Process

                                               3
2.3. CRL

2.3   CRL
Since CRLs grow linearly with each certificate added, they can become large, which results
in inefficient communication between a client and the CRL server [2][15]. Browsers such as
Chrome and Firefox do not even perform CRL checks anymore, favouring OCSP or their
own proprietary CRLs. CRLs are delivered by HTTP, which brings up privacy and security
concerns (they are, for example, vulnerable to Man In The Middle attacks). Furthermore,
fetching the CRL requires an additional connection to be established to the CRL issuer, in-
creasing page load time. If a CRL check fails (e.g. due to connectivity problems to the CRL
issuer), a browser can do two things. It can consider the certificate invalid (this is called hard-
fail) and display a warning, or it can ignore the check and accept the certificate (this is called
soft-fail) [13].
    The frequency of which new CRLs are issued depends on the issuer. It can be anything
from hourly to weekly [3]. Due to the infrequency of some CRL updates, it can take many
days for a certificate revocation to propagate through the internet. This leaves plenty of time
for an adversary to perform ill deeds.

2.4   OCSP
OCSP is an alternative to CRL that avoids some of the performance issues associated with
fetching large CRLs. It works by sending requests to OCSP servers that keep the current in-
formation about certificate statuses. A client can send a request to an OCSP server regarding
the status of a particular certificate, and the server responds with the current status of the
certificate [21].
    By eliminating the need to download a CRL, OCSP reduces overhead as a simple HTTP
query is enough to perform a revocation check. While performance is improved, the imple-
mentation does not come without its issues. For example, a CA receives information about
which domains are visited by a client with each request made, which raises privacy concerns
[15][13].
    If an OCSP query results in a revoked status response, CAs are encouraged to provide a
reason for revocation. The revocation reason is represented by the CRLReason field, which is
an integer between 0 - 10 (where seven is unused) [21].

2.5   Other methods of certificate validation
There are multiple other approaches to revocations besides the previously mentioned. Micali,
for example, proposes that an enhanced certificate revocation system [16] can reduce trans-
mission costs. Naor and Nissim demonstrate the use of their update scheme could drastically
improve the daily cost of updates [17].

2.6   Certificate Transparency
Certificate Transparency (CT) is an open framework that allows for the logging and monitor-
ing of all issued certificates. The CT logs were created to combat the issue with compromised
certificates being used maliciously for long periods. As every certificate is logged and open
to the public, it is difficult for an adversary to use fraudulent certificates [8] maliciously. The
transparency enables browsers to identify some rouge certificates [6].
     CAs are expected to add all their issued certificates to one or more CT logs. Each new
certificate is appended to the log and can not be removed once added. CT with signed cer-
tificate timestamps has reduced usage of certain behaviours, including less use of weak keys
and hashes within CAs [18].

                                                                                                 4
2.7. Certificate types

    Anybody can monitor and query a CT log for new entries. Clients can send a request to
several endpoints, all of which are specified in RFC 6962 [14]. Two endpoints of particular
interest in this paper are get-entries and get-sth. The endpoint get-entries requires
a start and end index and will result in a response containing all entries in an array. A request
to get-sth results in a response with the CT log’s current tree size, as well as the timestamp
of the most recently added entry.

2.7   Certificate types
X.509 certificates can be categorized and validated into three different types; Domain Valida-
tion (DV), Organizational Validation (OV), and Extended Validation (EV). These categories
dictate different levels of identity authentication for the certificate owner [24].

Domain Validation
A DV certificate guarantees that a certificate owner also is in ownership of the domain. The
validation does not require many checks to be performed and is typically done automatically
[24].

Organizational Validation
OV certificates guarantee that an organization owns the certificate. This is done by validating,
in addition to the domain name, the organization name. Further validation is optional and
can include additional information about the organization’s location, such as the country of
origin and the street address [24].

Extended Validation
EV certificates require extensive authentication and are only provided for legal entities that
own a domain. They provide the most security out of all certificates, and the procedures
for EV validation vary depending on the organization. The CA/Browser Forum dictate the
requirements needed for EV5 , and CAs may have their additional requirements for their EV
certificates [24].

2.8   Related work
Many studies relating to Certificate Transparency and X.509 certificates try to map out the
characteristics and behaviours of the CAs. One study was conducted by Korzhitskii and
Carlsson[11], where revocation statuses were made using CRL and OCSP and later analysed.
They compared revocation rates between CAs and compared the characteristics of revoked
certificates. Some of the things they found were that most CAs do not respond to OCSP re-
quests shortly after the certificate expires. The most significant difference between revocation
rates depends on the certificate’s origin CA and certain of its properties.
    There has been some insights into the area of misissuance of certificates and how it has
been steadily going down since 2012 [12].
    Heinl, Giehl and Kargl constructed a metric to evaluate the trustworthiness in CAs using
both technical and non-technical factors [7]. This creates an opportunity to compare results
between the Metric for the Evaluation and Reconsideration of Certificate Authority Trustwor-
thiness (MERCAT) and findings from other research.

   5 https://cabforum.org/overview-of-the-extended-validation-ssl-vetting-process/

                                                                                               5
3      Data collection framework

The data gathering and analysis to examine the certificate revocation statuses were split into
two parts to ease reproducibility. Data gathering was done, creating a tool capable of moni-
toring newly issued certificates during a limited period of 7 days and continuously checking
each certificate’s revocation status. The revocation status was checked continuously every 24
hours for 23 days. After monitoring the revocation statuses for all certificates, an analysis is
to be performed.

3.1       Data collection methodology
As previously mentioned, the data gathering of this paper was done in two parts. The first
part consisted of gathering newly issued certificates from CT logs using a tool that was de-
veloped for this project. The information was saved to a database running Mongo DB1 in a
virtual machine running Debian 10.
    The second part consisted of running CRL checks/OCSP requests to check each certifi-
cate’s revocation status. The tool did this every 24 hours and saved the results to the same
Mongo database. To perform the checks every 24 hours, cron2 scheduling was used.

3.1.1     Overview of monitoring tool
The tool, as previously mentioned, is responsible for monitoring CT logs and fetching all new
entries when they are added to the logs and then follow each certificate during an extended
period, as can be seen in Figure 3.1. The tool makes heavy use of the LogClient3 struct in
Google’s CT log repository. A LogClient provides functionality for interacting with a CT log,
and we use them for all requests to the CT logs. Following is a simplified overview of how
the first part works:

   1 https://www.mongodb.com/
   2 https://linux.die.net/man/5/crontab
   3 https://github.com/google/certificate-transparency-go/blob/master/client/

logclient.go#L33

                                               6
3.1. Data collection methodology

  1. CT log URLs are read from a configuration file, and a LogClient is created for each CT
     log. All LogClients are stored in an array.

  2. For each CT log, a request to get-sth is made, updating the current tree size for the
     CT log locally.

  3. A timer starts; every 0.5s, the program iterates through the array of LogClients and
     executes the main loop for one LogClient (step 4).

  4. Another request to get-sth is done. If the tree size is unchanged, return. If the tree
     size gets updated, go to step 5.

  5. The tool sends a request to get-entries, and new entries are retrieved; we download
     all certificates and their respective certificate chains.

  6. Certificates are processed and uploaded to the database.

  7. Go to step 4.

         Start            Fetch log tree size

                                                                            Loop

      Import logs                                           Change
                                                                            False
                                                   GET
                                 CT-Log
                                                                                                 t
                                                                  True                 Inser ates
                                                                                          e r t if ic   Loop
                                                                                    new c

                                                         Get certificates
                                        response                                                 DB

                          Figure 3.1: Overview of CT-logging logic

    When querying the get-entries endpoint, a start and end index is required. If an in-
dex difference of 1000 is sent, one might expect the CT log to respond with 1000 log entries.
During development, we noticed that this was not the case. Instead, a much smaller amount
of entries is sent. The amount seemed to be random, and we did not find a way of calculating
how many entries would be sent with a given start and end index. The current implemen-
tation instead starts with indexes and counts how many entries the response has. Then, the
start index is incremented to reflect how many entries were previously retrieved. This is done
in a loop until all of the entries are retrieved.
    The main loop is executed using goroutines. The use of goroutines allows the program to
perform function calls using concurrent threads [23]. This means that the program processes
multiple CT log updates concurrently, which increases performance.
    The second part of the tool is the revocation checking, again relying on the crypto/ocsp
package4 for creating requests in the correct format specified by RFC6960 [21]. The tool it-
erates through all certificates while performing OCSP-requests or checking CRLs to update
its current status to the database, as can be seen in Figure 3.2. This is also done similarly to
the first part using goroutines, allowing multiple certificates checks to be run concurrently,
increasing the revocation check capacity. Errors are also logged to keep a thorough timeline
of statuses.
   4 https://github.com/golang/crypto/blob/master/ocsp/ocsp.go

                                                                                                               7
3.2. Data structure and performance requirements

        Start                                                  Check status
                      DB
                                                                                                      Close
                                   Certificates
                                                                        True                  False
 Iterate collection                                                            Update

                               request            status
                                                                                 update
                                                                                certificate
                                                               Append info
                                   Responder                                                    DB

                           Figure 3.2: Overview of revocation checking logic

3.1.2     Hardware usage
The tool was deployed on a machine with an Intel Core i5-2500k and 8 GB of RAM. When re-
vocation checks are running, around 50 % of the CPU and less than 2 GB of RAM was utilised.
Running revocation checks in parallel with gathering newly issued certificates resulted in
RAM usage peaks of 2.5 GB. For storage, a 500 GB consumer-grade SSD was dedicated to the
VM running the database. We ran the monitoring tool in a VM with a 128 GB consumer-grade
SSD.

3.2     Data structure and performance requirements
Given the available hardware, it was crucial to maximize its potential fully. This was done by
properly structuring data, decreasing processing power, and the number of database transac-
tions needed to perform a task. This project aimed to monitor all CT logs listed in Appendix
A for seven days while performing revocation checks for each certificate once every 24 hours.
A seven-day sample was taken to estimate the average amount of certificates issued com-
bined from each CT log. The number of issued certificates per second came to be 120 - 140,
depending on which seven-day sample was looked at.
    When dimensioning for this project, an overestimate of 150 certificates per second was
used. An overestimate was used to give headroom if the throughput was temporarily in-
creased or if performance was temporarily reduced. If we were to collect 150 certificates per
second for seven days, it would mean that every hour block would have 150 ∗ 7 = 1050
certificates logged per second in total. With this in mind, a lower-bound revocation rate re-
quirement of 1050 was set.
    We used a data structure containing all the necessary information to perform revocation
checking for every certificate logged. The data structure also contained fields to ease with
analysis and the entire certificate, PEM formatted, as shown in Figure 3.3. The revocation tool
was developed, and a revocation check rate of around 1500 was achieved. With a revocation
capacity around 50 % larger than the expected need, unforeseen load variations could be
handled.
    Performing revocation checks on our certificates was particularly demanding. It requires
lots of processing power and is very taxing on the storage device. During testing, we noticed
that the revocation check rate would drop unexpectedly at seemingly random times. One of
our suspicions was that the OCSP responders were to blame and not our hardware, attribut-
ing the slower request rate to higher round trip times (RTT) to specific servers. We decided
to measure the average OCSP response time for the ten largest OCSP responders from our
dataset to verify that this was the case. 1000 requests were sent per OCSP server, and the
measured response times are shown in Table 3.1.

                                                                                                              8
3.3. CT log fetching

                  Database

                                            Cert 1

                             0                            id
                                            Cert 2        certIndex
                             1                            OCSP
                                                          CRL
                                            Cert 3        ctLog
                             2
                                                          cert (PEM)
                                                          certChain[]
                             3              Cert 4        time
                                                          change[]
                             4

Figure 3.3: The database, divided into collections (as described in Section 3.4), showing all
fields in the certificate data structure.

        OCSP server                         Min (ms)        Max           Avg          µ        σ
        http://r3.o.lencr.org                  4.586      32.227         9.422     8.725    3.592
        http://ocsp.digicert.com               4.094     309.388        94.538   126.290   82.629
        http://ocsp.sectigo.com                5.346    1735.998        11.200     8.512   54.098
        http://ocsp.comodoca.com               5.545      46.194         9.786     8.596    3.769
        http://ocsp.sca1b.amazontrust.com      4.243      25.727        10.217    10.552    4.031
        http://ocsp.pki.goog/gts1d4            3.947      16.462         5.847     4.740    2.395
        http://oneocsp.microsoft.com/ocsp      6.015      26.065        11.362    10.629    4.664
        http://zerossl.ocsp.sectigo.com        5.689      47.538        10.645     9.954    4.043
        http://ocsp.msocsp.com                 9.756     211.289        24.056    21.198   14.147
        http://ocsp.godaddy.com/              24.481      74.467        43.166    48.474   13.910
Table 3.1: Min, max, average, mean and standard deviation of response times to the ten
largest OCSP servers.

    As previously mentioned, all the responders are not available every time a request is sent
or return an error. If a request takes more than ten seconds, it is timed out to free up resources.
When running the logging tool, the bandwidth usage was measured to be roughly 1 MB per
second. For the revocation checking, an estimated 5 MB per second of bandwidth was used.

3.3     CT log fetching
A tool was developed for this project written in Go. The tool relies heavily on Google’s Certifi-
cate Transparency codebase, a repository with tools related to Certificate Transparency. The
tool also relies on many functions in the crypto/x509 and the encoding/pem packages in
Go.

3.3.1    Logs that were followed
To get a representative collection of certificates, multiple CT logs from different CAs and dif-
ferent validity times were monitored. When choosing which CT logs to monitor, Cloudflare’s
log list was used5 . All of the major, non-test-logs were used: Cloudflare Nimbus, Digicert
Nessie, Digicert Yeti, Google Argon, Google Xenon, Let’s Encrypt Oak, Sectigo Mammoth,
and Sectigo Sabre. Logs that were for the years 2020 and earlier were ignored, as already
expired certificates were not of interest to this study. For a complete list of CT logs, see Ap-
pendix A.
   5 https://ct.cloudflare.com/logs

                                                                                                    9
3.4. Periodical status checking

3.4     Periodical status checking
After the initial seven day collection period, our dataset consisted of 79M certificates. When
performing revocation checks, we have to access particular sets of certificates frequently and
quickly. This puts a high demand on performance since we do not want to waste any time.
One way in which this is achieved is by grouping our certificates in different collections in
our database. A collection in MongoDB is a grouping of documents, where a document is a
record in the database6 . Since revocation checking is done once every 24 hours per certificate,
the database is divided up into 24 collections named 0 − 23. Whenever the database logs a
newly issued certificate, it is stored in the collection that matches the current hour.
    The revocation check only updates the database if a change has occurred in the status.
This is done to prevent storing a lot of unnecessary data. When a certificate is issued, it is
assumed to be Good when issued, so if a certain database entry does not show any changes,
it has not been revoked. This is also applied to the many errors (see Section 3.5) that can
occur, getting logged as to show that if a change occurs while the server is unreachable, it
shows in the dataset.
    With this implementation, time is saved when performing revocation checks. Instead of
searching through the entire database, filtering for which hour the certificate was logged at,
the program can fetch a collection without filtering. This saves time and lessens the strain
on the database. Moreover, the tool uses the batchSize function7 within MongoDB, which
lets the tool handle a specified amount of documents at a time instead of returning the entire
batch in one go. Instead of waiting for a couple of minutes while the database returns an
entire collection, revocation can be done almost instantly as a small batch is retrieved, all the
while the database continues returning small batches of records.
    To avoid performing too many revocation checks concurrently, the program uses a
semaphore to limit the number of checks performed simultaneously. During testing, a
semaphore size between 500 − 1000 gave a good balance of performance while not risking
overwhelming the machine with work. In the end, we decided that a semaphore size of 700
was to be used, giving plenty of headroom performance-wise if, for any reason, the revoca-
tion check rate would decrease significantly.

3.4.1    Unique chain certificates
During testing, we found that a lot of certificates had chain certificates in common. The tool
only stores unique chain certificates in the database in a separate collection. This prevents
duplicate entries in the database, which reduces storage needs. Our dataset with 79M certifi-
cates had only 951 unique chain certificates, which saved roughly 350 GB.

3.4.2    Verification of logged data
With the vast amount of certificates logged in the database, it had to be ensured that no
certificate avoided detection from the monitoring program and that there were no duplicates
from the same log. This was achieved by a series of tests, each of which further strengthened
the validity of the data collected.
    After running the monitoring program for a few days, millions of certificates have been
logged into the database. In the database, queries were ran filtering each CT log. By looking
at the first entry for a particular CT log, we could compare its CT log index to the last entry’s
index. Between the first and the last entry, we know how many certificates should be logged
in the database for that CT log. Filtering the database for each CT log, it could be ensured
that the amount of logged entries matched the difference in indexes between the first and the
last entry.
   6 https://docs.mongodb.com/manual/reference/glossary/
   7 https://docs.mongodb.com/manual/reference/method/cursor.batchSize/

                                                                                              10
3.4. Periodical status checking

   Tests were also run to measure the average throughput of issued certificates per CT log.
These numbers were compared to Cloudflare’s log list8 to sanity check the monitoring pro-
gram, as throughputs for all CT logs are listed there.

3.4.3   OCSP-response
Our dataset concluded that almost all certificates provided an OCSP URL (>99.9 %). For
every certificate logged in the database, a field for the OCSP URL is present. The certificate,
along with its issuer’s certificate, is sent to the OCSP URL to perform an OCSP query. As
previously mentioned, OCSP requests run concurrently to improve performance.
    For a successful OCSP response, the certificate’s status is sent, which is either good, re-
voked, or unknown. A revocation time and an optional revocation reason are sent as well.
The status, revocation time, and revocation reason are stored in the database.
    The program runs revocation checks once per hour. If a response for a given certificate is
unchanged since the last check, no updates are made to the database. Instead, it is assumed
that a request is made each 24 hours. For example, if a certificate has two status updates
changes ten days apart, it is assumed that the first status was retrieved every day until the
status change. This was done to increase performance and to save data.
    For requests that throw an error, the response is logged and timestamped. The most com-
mon errors were given a number and were stored in the database to ease analysis.

3.4.4   CRL
The dataset showed that less than 1 in 10000 certificates did not provide an OCSP URL and
instead contained a CRL URL. The program downloads the CRL and checks to see if the
certificate is listed to perform a revocation check. If it is listed, the certificate is logged as
revoked with the crlReason code 7 (otherwise unused) to indicate that it is from a CRL.
Currently, for CRLs, the program does not extract and save the crlReason.

3.4.5   Certificate type categorization
When categorizing our certificate into different types, we used the CA/Browser Forum’s
Baseline Requirements9 . The Baseline Requirements contains a list of reserved Object IDs
(OIDs) that can be used to identify a certificate type. For each certificate, if it contained an
OID matching a type listed in the Baseline Requirements, we would categorize it as such.
Unfortunately, the OIDs are optional, and CAs are not required to use them for certificate
type assertion, meaning that we could not categorize 100 % of certificate in our dataset.

3.4.6   Sources of error
The CT framework is open for everybody, and certificates can be logged on multiple CT
logs. The tool does not check if a certificate already has been logged when a new entry
is monitored. This leads to our dataset containing duplicates certificates, making the total
number of unique certificates smaller than what we collect. Since we are not aware of how
many duplicates there is in the dataset, our results will be inaccurate as it is uncertain which
CAs post their certificate to multiple logs and how often they do so. As a result, our results
could be skewed towards a particular CA if they are more likely to submit their certificate to
multiple logs.
    Furthermore, since each collection only contains certificates logged during one particular
hour, the results might not reflect the entire dataset accurately. Some CAs may submit more
(or less) certificates during different hours of the day, which could skew our results.
   8 https://ct.cloudflare.com/logs
   9 https://cabforum.org/wp-content/uploads/CA-Browser-Forum-BR-1.7.6.pdf

                                                                                               11
3.5. Analysis methodology

3.5   Analysis methodology
Before analysing our dataset, we ran extensive tests to validate that our data collection was
done correctly. Furthermore, we cross-referenced our data with other papers to see if the
results were similar. In particular, we compared our data to the one Korzhitskii and Carlsson
collected in their paper [11].
    As previously mentioned, there is no publicly available data of revocation history. This
makes it hard to compare our data with others, especially with the way the X.509 landscape
is constantly changing. Moreover, internet traffic habits and server capacities are constantly
changing, making it challenging to compare metrics such as timeouts caused by server over-
load.
    Our dataset contains newly certificates gathered for seven days, where the earliest added
certificates have had 23 revocation check performed, and the oldest certificates have had 16
revocation checks. In total, around 79M certificates were gathered after the initial seven day
collection period. Formatting and analysing this data was very resource-intensive and re-
quired a lot of time process. Due to our time constraint, we decided to perform a preliminary
analysis on a subset of our data. Instead of analysing all 24 collections, we decided to study
only one collection containing around 3M certificates.

                                                                                           12
4      Preliminary results

A preliminary analysis of the dataset was performed. The dataset contains 3M certificates that
have had their revocation statuses checked between 16 and 23 times. We used an assortment
of bash scripts and a tool written in Go to generate formatted data. Our results are based
on comparing characteristics between different CAs. We chose to look at the ten largest CA
issuers in our dataset and group together the rest as "Other". Around 40000 certificates are
listed under the "Other" category, roughly 1.3 % of the dataset. A breakdown of the number
of issued certificates for each CA can be found in Appendix B.

4.1       Measured results
To start, we looked at how many of the certificates provided an OCSP responder and a CRL
issuer. We also looked at how many certificates did not provide any of these. A tiny amount
of certificates provided neither OCSP nor CRL; only 539 did this (< 0.02 %). Worth noting is
that none of the larger CAs fell under this category, as all 539 certificates were listed under
"Other". Looking further, 99.98 % of certificates provided an OCSP server, and only 16.54 %
provided a CRL. A per CA breakdown of these numbers can be seen in Table 4.1 and Table
4.2. All of the large CAs provide an OCSP server, and most CAs also provide a CRL. One CA
that stands out is Let’s Encrypt, which is disproportionally larger than the rest and does not
provide CRL.

  Certificate Authority         CRL    Fraction of total         Certificate Authority       OCSP     Fraction of total
  Amazon                      1.0000             0.0227          Amazon                      1.0000             0.0227
  Cloudflare, Inc.            1.0000             0.0405          Cloudflare, Inc.            1.0000             0.0405
  cPanel, Inc.                1.0000             0.0386          cPanel, Inc.                1.0000             0.0386
  DigiCert Inc                0.4110             0.0188          DigiCert Inc                1.0000             0.0457
  GoDaddy.com, Inc.           1.0000             0.0046          GoDaddy.com, Inc.           1.0000             0.0046
  Google Trust Services LLC   1.0000             0.0120          Google Trust Services LLC   1.0000             0.0120
  Let’s Encrypt               0.0000             0.0000          Let’s Encrypt               1.0000             0.7595
  Microsoft Corporation       1.0000             0.0172          Microsoft Corporation       1.0000             0.0172
  Sectigo Limited             0.0260             0.0011          Sectigo Limited             1.0000             0.0407
  ZeroSSL                     0.0000             0.0000          ZeroSSL                     1.0000             0.0055
  Other                       0.7632             0.0100          Other                       0.9867             0.0130
  Total                                          0.1654          Total                                          0.9998

Table 4.1: Fraction of certificates with a CRL.                 Table 4.2: Fraction of certificates with OCSP.

                                                           13
4.1. Measured results

    We looked at how many of the certificates were revoked and found that only a tiny amount
were revoked. One interesting thing to follow was how the ratio of revoked certificates would
change over time. To do this, the dataset was filtered to only include certificates and revo-
cation statuses that had been monitored for a specific amount of days and later analyzed. In
total, we found that around 1 % certificates were revoked and that the amount of revoked
certificates grew over time. Two CAs that stand out are GoDaddy and Google; they have a
significantly higher revocation rate than the rest of the CAs. Cloudflare and Microsoft also
stand out; both of them have none of their certificates revoked. A per CA breakdown can be
seen in Table 4.3.

            Certificate Authority          4 days   8 days    16 days    16+ days
            Amazon                         0.0061   0.0089     0.0089      0.0089
            Cloudflare, Inc.                    0        0          0           0
            cPanel, Inc.                   0.0061   0.0069     0.0070      0.0070
            DigiCert Inc                   0.0221   0.0338     0.0349      0.0351
            GoDaddy.com, Inc.              0.1107   0.2011     0.2355      0.2470
            Google Trust Services LLC      0.1562   0.2029     0.2146      0.2173
            Let’s Encrypt                  0.0019   0.0030     0.0035      0.0037
            Microsoft Corporation               0        0          0           0
            Sectigo Limited                0.0083   0.0111     0.0106      0.0109
            ZeroSSL                        0.0409   0.0639     0.0645      0.0645
            Other                          0.0452   0.0721     0.0749      0.0759
            Total                          0.0063   0.0096     0.0104      0.0104
          Table 4.3: Fraction of revoked certificates after a certain amount of days.

    While Table 4.3 only displays the amount of revoked certificates during specified time
intervals, a certificate can go from being revoked to being un-revoked. This was highly un-
common, and only four un-revocations were found in the dataset, as seen in Table 4.4.

                   Certificate Authority               Un-revocations
                   Amazon                                           0
                   Cloudflare, Inc.                                 0
                   cPanel, Inc.                                     0
                   DigiCert Inc                                     0
                   GoDaddy.com, Inc.                                1
                   Google Trust Services LLC                        0
                   Let’s Encrypt                                    3
                   Microsoft Corporation                            0
                   Sectigo Limited                                  0
                   ZeroSSL                                          0
                   Other                                            0
                   Total                                            4
                        Table 4.4: Amount of un-revocations per CA.

                                                                                          14
4.1. Measured results

   Looking at the revocation reasons, we see that the reason unspecified is the most com-
mon. CAs are not required to specify a reason for revocation, but all of them did in our
dataset. A per CA breakdown can be seen in Table 4.5.

                                                                                                                            cessationOfOperation

                                                                                                                                                                                       priviligeWithdrawn
                                                                                       affiliationChanged

                                                                                                                                                                       removeFromCRL
                                                     keyCompromise

                                                                                                                                                                                                            aACompromise
                                                                       cACompromise

                                                                                                                                                    certificateHold
                                       unspecified

                                                                                                             superseded
    Certificate Authority
    Amazon                           254
    Cloudflare, Inc.
    cPanel, Inc.                     396
    DigiCert Inc                    1964                                                                     5
    GoDaddy.com, Inc.                                      2                          264                   18            1259                                                         16
    Google Trust Services LLC       3617
    Let’s Encrypt                   2919                   9                                   1            41            641
    Microsoft Corporation
    Sectigo Limited                  735
    ZeroSSL                          428
    Other                            733             35                                18                    78            375                      27
    Total                          11046             46                     0         283                   142           2275                      27                   0             16                        0
                 Table 4.5: Reasons for revocation by Certificate Authority.

    The most common type of certificate is a DV certificate. Almost all CAs had most of their
certificates as DV. Table 4.6 shows how many certificates the CAs has in each certificate type.

                                                                            Certificate type
         Certificate Authority                                          OV     EV        DV                                                                           None
         Amazon                                                                      1.0000
         Cloudflare, Inc.                                            1.0000
         cPanel, Inc.                                                                1.0000
         DigiCert Inc                                                0.3701 0.0076 0.6222                                                                             0.0000
         GoDaddy.com, Inc.                                           0.0046 0.0018 0.9936
         Google Trust Services LLC                                                   1.0000
         Let’s Encrypt                                                               1.0000
         Microsoft Corporation                                       0.6813          0.3185                                                                           0.0002
         Sectigo Limited                                             0.0236 0.0024 0.9740
         ZeroSSL                                                                     1.0000
         Other                                                       0.3971 0.0121 0.5775                                                                             0.0133
         Total                                                       0.0753 0.0006 0.9240                                                                             0.0002
             Table 4.6: Fraction of certificates that are of a given certificate type.

                                                                                                                                                                                                                           15
4.1. Measured results

    Looking at revocation rates per certificate type, we found that EV certificates were most
likely to be revoked. Table 4.7 shows a breakdown of revocation rates per certificate type.

                     Certificate type    Fraction of revoked certificates
                     OV                                           0.0098
                     EV                                           0.0566
                     DV                                           0.0040
                     Other                                        0.0018
                Table 4.7: Fraction of revoked certificates per certificate type.

   A breakdown of revoked certificates categorized by certificate type for each CA can be
seen in Table 4.8. Once again, we see that EV certificates stand out from the rest and GoDaddy
having a much higher revocation rate than the rest of CAs.

                                                     Certificate type
         Certificate Authority                   OV     EV        DV            None
         Amazon                                               1.0000
         Cloudflare, Inc.                     1.0000
         cPanel, Inc.                                         1.0000
         DigiCert Inc                         0.3701 0.0076 0.6222             0.0000
         GoDaddy.com, Inc.                    0.0046 0.0018 0.9936
         Google Trust Services LLC                            1.0000
         Let’s Encrypt                                        1.0000
         Microsoft Corporation                0.6813          0.3185           0.0002
         Sectigo Limited                      0.0236 0.0024 0.9740
         ZeroSSL                                              1.0000
         Other                                0.3971 0.0121 0.5775             0.0133
         Total                                0.0753 0.0006 0.9240             0.0002
       Table 4.8: Fraction of revoked certificates for each certificate type for each CA.

   Table 4.9 shows how many certificates were in three given time intervals. A majority of
the certificates had a validity period between 90 and 365 days, which might not come as a
surprise, considering all of Let’s Encrypt certificates have a validity of 90 days1 .

                                         Amount      Fraction of total
                         =90 days       2662676                0.863
                         >=365 days       394369                0.128
               Table 4.9: Certificate validity periods split into three intervals.

    In total, 92.3 % of revocation check attempts were successful (success meaning a response
of either Good, Revoked, or Unknown). Less than 0.04 % of our requests were timed out,
and most of our failures (45.26 %) were responses with the status bad OCSP signature:
crypto/rsa: verification error. Closely following, our second most common error
(35.44 %) was ocsp:error from server: unauthorized.

   1 https://letsencrypt.org/2015/11/09/why-90-days.html

                                                                                            16
5      Discussion

Unfortunately, our preliminary analysis was done on a limited dataset due to time con-
straints. Not only was the analysis done on a subset of all data, but our dataset is also still
growing every day. More time is needed to further grow the dataset for a more accurate
representation of the X.509 landscape.

5.1       Method
As mentioned earlier, our tool relies heavily on certain libraries connected to the Go standard
library and Google’s Certificate Transparency codebase. Since other developers have written
these libraries, we have not had time to familiarize ourselves with the code. Some of the
functionality in our tool is not well documented and could have differences in what we think
the code does, compared to reality. For example, we use a function called createRequest
from the crypto/ocsp library1 . We are not fully aware of how the request is made, and we
are not sure if the OCSP nonce extension is used.
    A nonce is a cryptographic, pseudorandomly generated number that is included with
the OCSP request [20]. When using the nonce extension when sending OCSP requests, the
response is bound to the request. This means that, by using a nonce, one can be sure that an
OCSP response is accurate and is not cached. Since we are not sure if the nonce extension
currently is used, we believe there is a chance that many of our OCSP responses are cached.

5.2       Results
What we can conclude from our results is that newer CAs seem to opt-out of using CRL. Let’s
Encrypt and ZeroSSL; two free certificate issuers don’t use CRL.
   The revocation rates do not differ much between the CAs. Two CAs that stood out were
GoDaddy and Google, having 24.70 % and 21.73 % of their certificates revoked. However, this
could be an anomaly due to the small sample size and the short period for which revocation
checks were made.
   One thing that is self-evident by looking at Table 4.2 is that all the major CAs provided
OCSP for all of their certificates. For the smaller, grouped up CAs, 98.7 % of the certificates
   1 https://pkg.go.dev/golang.org/x/crypto/ocsp#CreateRequest

                                              17
5.3. The work in a wider context

had an OCSP provider, and 76.3 % provided a CRL. Some of the smaller CAs might lack the
resources, or interest, to properly handle certificate issuing and revocations. This could be
further backed by the 539 certificates that lacked both a CRL and an OCSP provider were
categorized as "Other".
     In total, only around 1 % of certificates were revoked. Most CAs had a fraction of revoked
certificates between 3 - 7 %, and Let’s Encrypt, being the largest, only had 0.037 % of cer-
tificates revoked. Two CAs that stood out from the rest were GoDaddy and Google, having
24.70 % and 21.73 % of certificates revoked, respectively. From Table 4.5, we see that a ma-
jority of GoDaddy’s reasons for revocation was due to cessationOfOperation, meaning
that the owner of the certified domain no longer needs the certificate. This might be due to
GoDaddy, being a popular web hosting company, having many websites hosted for private
individuals. We think individuals are more likely to have their web hosting plan cancelled,
making certificates useless and being revoked, leading to a high revocation rate.
     All certificates revoked by Google have the same revocation reason; unspecified, as
seen in Table 4.5. This makes it difficult to extrapolate the underlying reason for revocation.
Furthermore, we consider the high revocation rate an anomaly, as many of the other CAs are
companies within the same industry.
     On the other end of the spectrum, we have Cloudflare and Microsoft with no revocations
at all. For certificates issued by Microsoft, we noticed that the certificates had the common
name "Microsoft Azure TLS Issuing CA" and that the domain names always were a Microsoft
Azure domain. With this in mind, we believe that these certificates were issued to cloud
applications running on Microsoft Azure, and that revocation of these certificates is highly
unlikely. For Cloudflare, we were, unfortunately, unable to find a reason as to why we found
zero revocations.
     Looking at the distribution of certificate types between CAs in Table 4.6, we see that most
certificates were DV. We believe that this is the case because DV certificates are the simplest
and cheapest.
     As seen in Table 4.9, 86.27 % of certificates had a validity period between 90 and 365 days.
Since Let’s Encrypt stands for 75.9 % of certificates, and all of their certificates have a validity
period of 90 days, without exceptions2 , this comes quite naturally.

5.3   The work in a wider context
In a wider context, this work is about security on the web for regular people. Today the
browsers still only soft fail on OCSP requests unless a user explicitly forces the browser to
hard fail [13]. This proves that even with better implementations for OCSP and wide use
among CAs, the core issue of availability for a user still triumphs over the security, at least
when you ask the browsers. We believe that OCSP is a step in the right direction compared
with using CRLs, but it does not provide enough security and speed for the modern web.
    While proper handling of certificates may be done securely, certificates themselves can be
vulnerable to exploits. One well known exploit in the TLS library OpenSSL3 is Heartbleed[5].
The bug allowed an attacker to access a server’s memory without any authorization, allow-
ing sensitive information to be leaked. This means that even if findings show that CAs act in
a proper manner in terms of revoking certificates it does not guarantee security. Worth men-
tioning is that recent solutions like the public key pinning mentioned as a possible solution
to some of the security issues with CAs, it ended up being removed by the major browsers
[1]. Some instances of improvements like CT provide increasead security and accountability
while still having a minor impact on the rest of the web experience [22].
    Interesting to note is that when Heartbleed was discovered, 87 % of certificates were not
revoked even when running a risk of being used fraudulently [25]. This shows how the trust
   2 https://letsencrypt.org/2015/11/09/why-90-days.html
   3 https://www.openssl.org/

                                                                                                 18
5.3. The work in a wider context

in CAs might be too big for the time being. Work trying to change the entire key infrastructure
to reduce the amount of trust put in the CAs exists for reasons just like this [9].

                                                                                            19
6      Conclusion

In this thesis, we have created a framework for gathering newly issued X.509 certificate on a
large scale while also enabling periodic revocation checking. The simplicity of this framework
allows others to monitor any combination of CT logs they want and track revocation changes
regardless of interval choice. This enables more thorough studies to be conducted on the
X.509 landscape.
    Seeing as our results generally point to CAs behaving in a similar fashion, with a few
odd behaviours. Since the analysis was performed on a preliminary data set with a majority
of certificates still within their validity periods, a lot of interesting behaviours could not be
observed at this stage. We believe that analysing certificates in the wild is vital to holding
CAs accountable for insecure behaviours.

6.1       Future work
The tool presented in this thesis could be improved upon to scale better with demand for
bigger periods of collection. As demonstrated in A search engine backed by Internet-wide
scanning [4], there is a considerable potential to increase the number of requests made by
the tool. Together with a more specialized database, it could also be extended to use other
forms of revocation checking such as Micali’s enhanced certificate revocation system [16] or
the certificate update scheme suggested by Naor and Nissim [17]. This could provide better
performance and additional data points.
   Presenting the findings with other measurements such as MERCAT could also provide
greater insight into how the different CAs handle certificates.

                                               20
Bibliography

 [1]   J. A Berkowsky and T. Hayajneh. “Security issues with certificate authorities”. In: 2017
       Proc. IEEE 8th Annual Ubiquitous Computing, Electronics and Mobile Communication Con-
       ference (UEMCON), pp. 449–455.
 [2]   L. Chuat, A. Abdou, R. Sasse, C. Sprenger, D. Basin, and A. Perrig. “SoK: Delegation
       and revocation, the missing links in the web’s chain of trust”. In: Proc. IEEE European
       Symposium on Security and Privacy (EuroS&P). 2020.
 [3]   D. Cooper, S. Santesson, S. Farrell, S. Boeyen, R. Housley, and W. Polk. Internet X.509
       Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile. RFC 5280.
       RFC Editor, May 2008.
 [4]   Z. Durumeric, D. Adrian, A. Mirian, M. Bailey, and J. A. Halderman. “A search engine
       backed by Internet-wide scanning”. In: Proc. ACM SIGSAC Conference on Computer and
       Communications Security. 2015, pp. 542–553.
 [5]   Z. Durumeric, F. Li, J. Kasten, J. Amann, J. Beekman, M. Payer, C. Weaver, D. Adrian,
       V. Paxson, M. Bailey, et al. “The matter of heartbleed”. In: Proc. conference on internet
       measurement conference. 2014, pp. 475–488.
 [6]   J. Gustafsson, G. Overier, M. Arlitt, and N. Carlsson. “A first look at the CT landscape:
       Certificate transparency logs in practice”. In: Proc. International Passive and Active Net-
       work Measurement. Springer. 2017, pp. 87–99.
 [7]   M. P. Heinl, A. Giehl, N. Wiedermann, S. Plaga, and F. Kargl. “MERCAT: A Metric for
       the Evaluation and Reconsideration of Certificate Authority Trustworthiness”. In: Proc.
       ACM SIGSAC Conference on Cloud Computing Security Workshop. New York, NY, USA,
       2019, pp. 1–15.
 [8]   How CT works : Certificate transparency. https : / / certificate . transparency .
       dev/howctworks/. Accessed: 2021-3-18.
 [9]   T. Hyun-Jin Kim, L. Huang, A. Perrig, C. Jackson, and V. Gligor. “Accountable key
       infrastructure (AKI) a proposal for a public-key validation infrastructure”. In: Proc. in-
       ternational on World Wide Web. 2013, pp. 679–690.
[10]   N. Korzhitski and N. Carlsson. “Characterizing the Root Landscape of Certificate
       Transparency Logs”. In: Proc. IFIP Networking. June 2020, pp. 190–198.
[11]   N. Korzhitskii and N. Carlsson. “Revocation Statuses on the Internet”. In: Proc. Passive
       and Active Measurement. 2021, pp. 175–191.

                                                 21
Bibliography

[12]   D. Kumar, Z. Wang, M. Hyder, J. Dickinson, G. Beck, A. Adrian, J. Mason, Z. Durumeric,
       J. A. Halderman, and M. Bailey. “Tracking certificate misissuance in the wild”. In: Proc.
       IEEE Symposium on Security and Privacy (SP). 2018, pp. 785–798.
[13]   A. Langley. ImperialViolet. https://www.imperialviolet.org/2014/04/19/
       revchecking.html. Accessed: 2021-3-18.
[14]   B. Laurie, A. Langley, and E. Kasper. Certificate Transparency. RFC 6962. RFC Editor,
       June 2013.
[15]   Y. Liu, W. Tome, L. Zhang, D. Choffnes, D. Levin, B. Maggs, A. Mislove, A. Schulman,
       and C. Wilson. “An End-to-End Measurement of Certificate Revocation in the Web’s
       PKI”. In: Proc. Internet Measurement Conference. 2015, pp. 183–196.
[16]   S. Micali. “Enhanced certificate revocation system”. In: Massachusetts Institute of Tech-
       nology, Cambridge, MA (1995), pp. 1–10.
[17]   M. Naor and K. Nissim. “Certificate revocation and certificate update”. In: IEEE Journal
       on selected areas in communications 18.4 (2000), pp. 561–570.
[18]   C. Nykvist, L. Sjöström, J. Gustafsson, and N. Carlsson. “Server-side adoption of cer-
       tificate transparency”. In: Proc. International Passive and Active Network Measurement.
       Springer. 2018, pp. 186–199.
[19]   A. Retana and D. Cheng. OSPFv3 Instance ID Registry Update. RFC 6969. RFC Editor,
       July 2013.
[20]   M. Sahni. Online Certificate Status Protocol (OCSP) Nonce Extension. RFC 8954. RFC Edi-
       tor, Nov. 2020.
[21]   S. Santesson, M. Myers, R. Ankney, A. Malpani, S. Galperin, and C. Adams. X.509 In-
       ternet Public Key Infrastructure Online Certificate Status Protocol - OCSP. RFC 6960. RFC
       Editor, June 2013.
[22]   E. Stark, R. Sleevi, R. Muminovic, D. O’Brien, E. Messeri, A. P. Felt, B. McMillion, and
       P. Tabriz. “Does certificate transparency break the web? Measuring adoption and error
       rate”. In: Proc. IEEE Symposium on Security and Privacy (SP). 2019, pp. 211–226.
[23] The Go Programming Language Specification. Accesed: 2021-04-29. URL: https : / /
     golang.org/ref/spec.
[24] What Are the Different Types of SSL Certificates? Aug. 2013. URL: https://pkic.org/
     2013/08/07/what- are- the- different- types- of- ssl- certificates/
     #organizational-validation-ov.
[25]   L. Zhang, D. Choffnes, D. Levin, T. Dumitraş, A. Mislove, A. Schulman, and C. Wilson.
       “Analysis of SSL certificate reissues and revocations in the wake of Heartbleed”. In:
       Proc. Conference on Internet Measurement Conference. 2014, pp. 489–502.

                                                                                             22
You can also read