TRANSPARENT ENCRYPTION FOR CLOUD-BASED SERVICES - Gergő Ládi - Dr. Levente Buttyán - HTE

Page created by Gary Manning
 
CONTINUE READING
TRANSPARENT ENCRYPTION FOR CLOUD-BASED SERVICES - Gergő Ládi - Dr. Levente Buttyán - HTE
Budapest University of Technology and Economics
 Faculty of Electrical Engineering and Informatics
 Department of Networked Systems and Services

 Gergő Ládi

TRANSPARENT ENCRYPTION
FOR CLOUD-BASED SERVICES

 ADVISOR

 Dr. Levente Buttyán
 BUDAPEST, 2017
Table of Contents
Table of Contents ........................................................................................................ 2
Összefoglaló ................................................................................................................. 7
Abstract ....................................................................................................................... 8
1 Introduction .............................................................................................................. 9
 1.1 Definitions ........................................................................................................... 9
 1.2 Problem Statement ............................................................................................... 9
 1.3 Challenges ......................................................................................................... 11
 1.4 Outline............................................................................................................... 12
2 Goals, Tasks, Objectives, and Strategies ............................................................... 13
 2.1 Goals ................................................................................................................. 13
 2.2 Tasks ................................................................................................................. 13
 2.3 Objectives and Strategies ................................................................................... 15
3 Initial Research ....................................................................................................... 16
 3.1 Related Work..................................................................................................... 16
 3.1.1 Publications ................................................................................................ 16
 3.1.2 Similar Software ......................................................................................... 17
 3.1.2.1 Boxcryptor ........................................................................................... 17
 3.1.2.2 Cipherdocs ........................................................................................... 17
 3.1.2.3 CloudFogger ........................................................................................ 18
 3.1.2.4 SeaFile ................................................................................................. 18
 3.1.2.5 Cryptomator ......................................................................................... 18
 3.1.2.6 Tresorit ................................................................................................ 18
 3.1.3 Summary of Related Work .......................................................................... 19
 3.2 Enumerating Potential Services for Encryption .................................................. 19
4 Picking and Analysing Services ............................................................................. 21
 4.1 Detailed Analysis of Evernote's Communication Protocol .................................. 21
 4.1.1 Protocol Analysis ........................................................................................ 21
 4.1.2 Message Analysis ....................................................................................... 24
 4.1.2.1 Initial Page Load .................................................................................. 24
 4.1.2.2 Reading Note Contents ......................................................................... 25
 4.1.2.3 Creating Notes ..................................................................................... 26
 4.1.2.4 Editing Notes ....................................................................................... 26
4.1.2.5 Editing Reminders ................................................................................ 27
 4.1.2.6 Deleting Notes ..................................................................................... 29
 4.1.3 Analysis Summary ...................................................................................... 29
 4.2 Detailed Analysis of Google Calendar’s Communication Protocol ..................... 30
 4.2.1 Protocol Analysis ........................................................................................ 30
 4.2.2 Message Analysis ....................................................................................... 32
 4.2.2.1 Initial Page Load .................................................................................. 32
 4.2.2.2 Dynamic Loading ................................................................................. 34
 4.2.2.3 Creating Events .................................................................................... 35
 4.2.2.4 Editing Events ...................................................................................... 37
 4.2.2.5 Deleting Events .................................................................................... 37
 4.2.3 Analysis Summary ...................................................................................... 38
 4.3 Quick Analyses .................................................................................................. 39
 4.3.1 Dropbox...................................................................................................... 39
 4.3.2 Dynalist ...................................................................................................... 40
 4.3.3 OneNote (Online) ....................................................................................... 40
 4.3.4 SimpleNote ................................................................................................. 40
 4.4 Analysis Summary ............................................................................................. 40
5 Designing a Transparent Encryption Layer .......................................................... 42
 5.1 Intercepting Traffic ............................................................................................ 42
 5.1.1 Hijacking DNS Queries ............................................................................... 42
 5.1.2 Proxying Connections ................................................................................. 43
 5.1.3 Handling Certificates .................................................................................. 44
 5.1.3.1 The “Problem” with Certificates ........................................................... 44
 5.1.3.2 Becoming a Trusted Root Certificate Authority .................................... 44
 5.1.3.3 Validating the Provider’s Certificate..................................................... 45
 5.2 Inspecting and Altering Traffic .......................................................................... 45
 5.3 Encrypting/Decrypting Messages ....................................................................... 46
 5.3.1 Key Management ........................................................................................ 46
 5.3.2 Using Format Preserving Encryption ........................................................... 46
 5.3.2.1 Format-Preserving Encryption for Text ................................................ 48
 5.3.2.2 Format-Preserving Encryption for Date and Time ................................ 48
 5.4 Design Summary ............................................................................................... 48
6 Implementing a Prototype ...................................................................................... 49
6.1 Intercepting traffic ............................................................................................. 49
 6.1.1 DNS Hijacking............................................................................................ 49
 6.1.2 Creating Certificates ................................................................................... 50
 6.1.3 Implementing the Proxy .............................................................................. 51
 6.2 Inspecting and Altering Traffic .......................................................................... 52
 6.3 Encrypting/Decrypting Messages ....................................................................... 54
 6.3.1 Key Management ........................................................................................ 54
 6.3.2 Initialization Vectors ................................................................................... 54
 6.3.3 Format preserving encryption ...................................................................... 54
 6.3.3.1 Format-Preserving Encryption for Text ................................................ 55
 6.3.3.2 Format-Preserving Encryption for Date and Time ................................ 56
7 Testing the Prototype ............................................................................................. 57
 7.1 Smoke Testing ................................................................................................... 57
 7.1.1 Smoke Testing the DNS Hijacking Component ........................................... 57
 7.1.2 Smoke Testing the TLS Proxy..................................................................... 58
 7.1.3 Smoke Testing the FPE Module .................................................................. 59
 7.2 Unit Testing ....................................................................................................... 60
 7.2.1 Unit Testing the Filters................................................................................ 60
 7.2.2 Unit Testing the FPE Module ...................................................................... 61
 7.3 Integration Testing ............................................................................................. 61
8 Conclusion .............................................................................................................. 63
9 Further Considerations .......................................................................................... 64
 9.1 Possible Threats ................................................................................................. 64
 9.1.1 Ever-Changing APIs ................................................................................... 64
 9.1.2 New Security Measures ............................................................................... 64
 9.2 Plans for Improvement ....................................................................................... 64
 9.2.1 Supporting Multiple Users .......................................................................... 64
 9.2.2 Usage in Enterprise Environments .............................................................. 65
 9.2.3 More Services ............................................................................................. 65
 9.2.4 Linux Compatibility .................................................................................... 65
 9.2.5 User Experience .......................................................................................... 65
References .................................................................................................................. 66
Appendix.................................................................................................................... 69
 A. Table of Abbreviations .................................................................................... 69
B. Table of Figures ............................................................................................... 71
C. Table of Exhibits ............................................................................................. 71
D. Exhibits ........................................................................................................... 72
HALLGATÓI NYILATKOZAT

Alulírott Ládi Gergő, szigorló hallgató kijelentem, hogy ezt a diplomatervet meg nem
engedett segítség nélkül, saját magam készítettem, csak a megadott forrásokat
(szakirodalom, eszközök stb.) használtam fel. Minden olyan részt, melyet szó szerint,
vagy azonos értelemben, de átfogalmazva más forrásból átvettem, egyértelműen, a forrás
megadásával megjelöltem.

Hozzájárulok, hogy a jelen munkám alapadatait (szerző(k), cím, angol és magyar nyelvű
tartalmi kivonat, készítés éve, konzulens(ek) neve) a BME VIK nyilvánosan hozzáférhető
elektronikus formában, a munka teljes szövegét pedig az egyetem belső hálózatán
keresztül (vagy hitelesített felhasználók számára) közzétegye. Kijelentem, hogy a
benyújtott munka és annak elektronikus verziója megegyezik. Dékáni engedéllyel
titkosított diplomatervek esetén a dolgozat szövege csak 3 év eltelte után válik
hozzáférhetővé.

Kelt: Budapest, 2017. 12. 17.

 ...…………………………………………….
 Ládi Gergő
Összefoglaló

 Az idei elemzések szerint a felhő alapú szolgáltatások egyre nagyobb
népszerűségre tesznek szert, mind vállalati közegben, mind az otthoni felhasználók
körében. Legyen szó fájlok tárolásáról, levelezésről, naptár- és időkezelésről,
jegyzetelésről, vagy akár jelszókezelésről, előszeretettel veszünk igénybe online
szolgáltatásokat. Ez biztonsági szempontból számos veszéllyel jár – adataink
elveszhetnek, sérülhetnek, vagy még rosszabb: illetéktelenek kezébe kerülhetnek.

 Az elmúlt néhány évben többtíz biztonsági incidensről olvashattunk a hírekben,
melyek áldozatai között voltak kicsi és nagy, híres és kevésbé ismert cégek is. Az
incidensek során sokszor többszázezer felhasználó adatai szivárogtak ki és kerültek
nyilvánosságra az interneten. Ez nem csak azért veszélyes, mert esetlegesen bizalmas
információk (például magánjellegű üzenetek, üzleti titkok) kerülhetnek a versenytársak
kezébe, de azért is, mert az adatok között szerepelhetnek jelszavak vagy jelszóval
egyenértékű adatok. Ezen adatok birtokában egy támadó képessé válhat arra, hogy az
áldozatok más szolgáltatásokhoz tartozó fiókjaihoz hozzáférjen, ezzel pedig még több
esetlegesen érzékeny adatra tehessen szert.

 A fenti probléma egyik lehetséges megoldása az úgynevezett transzparens
titkosítás. Ennek működési elve az, hogy az adatokat még helyben, a szolgáltatóhoz való
továbbítás előtt titkosítjuk (így oda már csak titkosított formában jutnak el), majd később,
még mielőtt a kliens feldolgozná azokat, kititkosításra kerülnek, szintén helyben. Ezáltal,
még ha betörés áldozata is lesz egy szolgáltató, vagy akár csak egy máshonnan
megszerzett felhasználónév-jelszó párossal lép be egy támadó, csak a titkosított, számára
értéktelen adathalmazt látja.

 Diplomatervem keretében választok egy felhő alapú szolgáltatást, elemzem az
általa használt kommunikációs protokollt, majd megtervezek és elkészítek egy szoftvert,
amely képes biztonságot nyújtani a szolgáltató felé irányuló forgalom releváns
üzeneteinek transzparens titkosításával.
Abstract

 Recent surveys have shown that cloud services are becoming more and more
popular, both in the enterprise sector and among individuals. Be it online file storage, e-
mail, calendar & time management, note taking, or even password management, we rely
heavily on online services. From a security standpoint, this poses several risks – data
might be lost or corrupted, or even worse: accessed by unauthorized individuals.

 In the past few years, tens of security incidents, hitting big and small, little-known
and famous companies alike, were covered in the news. Many of these breaches resulted
in several hundreds of thousands of user records being leaked and made available on the
internet – some exclusively on the black market, others to the general public. This is not
only dangerous because potentially confidential information (such as private messages or
trade secrets) might get into the hands of competitors, but also because user records may
contain passwords or password equivalents. Using said information, it might be possible
for an adversary to get into the accounts of these victims for other services, gaining access
to even more potentially sensitive information.

 One of the possible solutions to this issue is employing transparent encryption,
the principle of which is to encrypt information locally, before it is being sent to the cloud
service provider (where it is stored in an encrypted form), and then, upon reception,
decrypt it before it is processed by the local client. This way, even if the cloud service
provider itself is compromised, or is accessed using stolen credentials, the attacker can
obtain nothing but encrypted pieces of information.

 Within this thesis, I am to choose a cloud service, analyse the communication
protocol used, then design and implement a piece of software that can perform transparent
encryption by identifying and modifying the relevant messages in transport.

 8
1 Introduction

1.1 Definitions
 For the purposes of this document, unless otherwise noted, the terms cloud
service, cloud service provider, service provider, and provider refer to a service (and the
company that provides said service) that is available through the internet and that lets its
users upload and store user data.

 User data shall refer to documents, images, and other files, as well as personal
information including but not limited to names, addresses, telephone numbers, and dates
of birth that need not be known by the provider in order to fulfil its purpose. For example,
in the case of a web shop, a shopper’s address and telephone number are not considered
user data as these are needed to ensure delivery, while in the case of a contact manager
application, they are.

1.2 Problem Statement
 Over the course of the past couple of years, we could see public cloud-based
services gain ground over traditional self-hosted or serverless solutions. This shift
towards public cloud services could be observed not only in the enterprise sector, but also
among home users. Gartner's forecast, titled Public Cloud Services, Worldwide, 2014-
2020 corroborates these observations, and further adds that this process is not expected
to stop in the following years, although the speed of change may begin decreasing,
starting from 2020 [1]. A different publication by RightScale points out that the typical
user, as of 2017, leverages 1.8 public cloud services on average as part of his daily routine
and is experimenting with a further 1.8 services [2].

 This growing interest resulted in several existing companies adapting their
software and services to the cloud, as well as new companies entering the market,
promising easy-to-use applications that provide access to your information, regardless of
which one of your devices you're using. Services appeared providing online file storage
and synchronization, calendar management, image sharing, note taking, or even password
management. Either to speed up the process of development in order to be first, or simply
to cut costs, security analyses were often skipped. As it was later revealed, the lack of
security measures and improper security design were the two main reasons for most of

 9
recent years' data breaches [3]. These breaches affected big and small companies alike,
causing loss of fame, revenue, and their users' trust. To make matters worse, some of
these incidents are not discovered until after several months, or even years have passed.

 The risks of using public cloud services is fivefold:

 1) Data loss: If a provider ceases operations without notifying clients, clients
 lose their files unless they have other copies of these. An example of this
 would be the sudden closure of MegaUpload, a file storage and sharing
 service, in 2011.

 2) Direct data theft: If a provider is breached where potentially sensitive
 information is stored (e.g. files, images, or notes), these could be accessed
 by unauthorized individuals.

 3) Indirect data theft: If any site or service is breached where the user hasn't
 stored anything valuable, the attacker may still gain access to the users'
 email addresses and passwords or equivalent authentication information
 (such as hashes of passwords). In this case, if the users had accounts with
 different providers where they used the same usernames and passwords,
 the attackers will be able to log in and access the users' sensitive
 information, even if the service itself was reasonably secure.

 4) Insider access: The provider itself may access the users' sensitive
 information without their knowledge or permission. This could be a
 malicious employee or a system designed to perform data mining on user
 data to extract features deemed interesting in order to create user profiles
 to be sold or otherwise used for profit.

 5) Nation state attackers: The provider may be forced to, or may decide to
 hand over user data to nation states or law enforcement agencies, which
 could put the users' lives at risk. A good example for this could be Arabic
 countries, where the internet is heavily regulated, and people having
 opposing views to the current political party are often chased down.

 10
The first risk may be eliminated by having a proper backup plan (such as the 3-2-
1 strategy1) in place. The rest of these risks could also be eliminated, or at least greatly
reduced by employing transparent encryption. The principle of transparent encryption is
that, before being sent over the network by client applications unencrypted, data is
intercepted on the client computer (or on a trusted device on the home network), and is
encrypted by a separate piece of software. This encrypted data is then received and stored
by the provider. When the client application needs this data later, it requests the data from
the server, which then sends it back – still encrypted – to the client. Before it could be
processed by the client application, is it intercepted again by the previously mentioned
software, and is decrypted on-the-fly. Finally, the client application receives the data
unencrypted, just as it was expecting it, then processes the data as needed. In case an
attacker manages to get his hands on a file (in any of the manners detailed above) that
was encrypted this way, he will have gained nothing but a blob of garbage (from his point
of view). This approach can be extended to cover not only files, but also other kinds of
potentially sensitive information, such as text fields, dates or credit card numbers.

 Transparent encryption has the advantage that neither the client application nor
the server-side code has to be modified in any way, therefore it can be used even if the
service provider does not support such extra security measures, and the provider itself
does not have to spend resources implementing said measures. In addition, if the
encryption layer is implemented by independent developers, the provider does not have
to be trusted (which may, otherwise, implement intentionally weak or flawed encryption,
or leave backdoors). Furthermore, since transparent encryption is implemented as a
separate piece of software, it can be licensed under a different licensing model than the
original software – this is advantageous because the encryption software can be made
open source, even if the underlying client application is closed source.

1.3 Challenges
 When designing and implementing a transparent encryption layer, one may face
several challenges that need to be overcome in order to succeed. The most crucial part is
analysing and learning how the target service works and how the client application

 1
 The 3-2-1 strategy: have 3 copies of your critical data. These copies should reside on at least 2
different kinds of media (e.g. two on hard disks, one on an optical disk), and 1 copy should be kept off-site.

 11
communicates with the web service, followed by the design of encryption methods that
generate output that pass possible validation checks made by the providers.

 Firstly, if the communication channel itself is encrypted, the encryption method
has to be understood, circumvented, then reimplemented in the transparent encryption
layer. In the event that the service uses a non-standard or proprietary encryption
mechanism, these steps may end up rather time consuming. Secondly, in addition to being
encrypted, the channel may also be authenticated or otherwise tamper-proofed, which,
again, needs to be circumvented, and may even make it impossible to create a truly
transparent proxy. Thirdly, the protocol spoken by the parties also has to be – at least
partially – understood. This may turn out to be a custom, undocumented proprietary
binary protocol, which could take several days to reverse. Fourthly, some legal
agreements or country laws may not permit the disassembly and/or analysis of the client
application. Finally, in certain cases, the cloud service expects the data to conform to a
certain format and/or be in a specific range of values, and if encryption is used, the
ciphertext will most likely not meet these requirements.

1.4 Outline
 Within the next sections, I will choose and introduce a cloud-based service, find
a way of intercepting its messages, analyse these, then design, implement, and test the
prototype of a software that is capable of performing transparent encryption on the
relevant messages.

 In section 2, I will explain in detail the problem to be solved, and the goals to be
reached. In section 3, I will research and provide an overview of similar works and results,
as well as identify services that could be possible candidates for this project. In section 4,
I will analyse at least one cloud-based software, its channels of communication, message
types, then identify which (parts of) these need to be encrypted to ensure the security of
user data. Then, in section 5, I will design a system that makes it possible to intercept and
encrypt/decrypt messages identified in the previous section. Afterwards, in section 6, I
will implement a prototype using the previous design. Then, in section 7, I will perform
tests on the prototype. In section 8, I will summarize my work, then, finally, in section 9,
I will explain how this prototype could be further improved, as well as list possible issues
that might make it difficult to use or maintain such a solution.

 12
2 Goals, Tasks, Objectives, and Strategies

2.1 Goals
 Goals describe what must be achieved for the project to be considered successful.
As defined in the thesis assignment description, I will have six top-level goals:

 1) Choosing a cloud-based service that can be used to demonstrate transparent
 encryption

 2) Analysing the communication protocol that is used by the service

 3) Identifying the relevant protocol elements that should be encrypted in order to
 provide confidentiality

 4) Identifying a transparent encryption scheme that could be used to provide
 confidentiality, and perhaps, adapting it to better suit the current task

 5) Designing and implementing the encryption layer, and integrating it with the
 client-side application

 6) Testing the implementation and summarizing the results

 In addition to the above, I’ve chosen to add an extra goal as a sort of a zeroth step:
 researching similar or related existing solutions.

2.2 Tasks
 Tasks are technically breakdowns of goals.

 Goal 1) can be broken down to two main tasks:

 a) Making a list of potential services, possibly with the help of search engines
 and peers

 b) Choosing a service that is expected to use a limited set of relatively simple
 messages. It would also be ideal if this service was not completely
 unknown to me.

 Goal 2) will consist of four distinct tasks:

 13
a) Identifying the transport layer (OSI layer 4) protocol2 that is used by the
 application

 b) Identifying whether the application uses a well-known higher-layer (e.g.
 OSI layer 7) protocol or not

 c) Identifying whether the protocol messages are encrypted and/or covered
 by integrity protection

 d) Identifying the structure of the protocol messages (if any)

 Goal 3) is comprised of the following three tasks:

 a) Identifying the data types that will need to be protected (e.g. files, text,
 images, phone numbers, etc.)

 b) Identifying the messages in which these are transmitted

 c) Identifying the location of these data elements within these messages

 Goal 4) includes another three tasks:

 a) Enumerating the algorithms that could be used

 b) Choosing the most suitable group of algorithms

 c) Identifying how the keys and other necessary parameters will be managed

 Goal 5) can be broken down to another set of three:

 a) Planning the architecture of the prototype, defining what components there
 are and how they interact with each other

 b) Choosing a paradigm and a language the prototype will be implemented
 in

 c) Implementing each component in the chosen language and paradigm

 Finally, goal 6) will include the following four tasks:

 a) Determining the type and number of tests needed

 2
 Open Systems Interconnection model: an abstract model that can be used to describe the means
and methods of network communication between hosts. It consists of 7 independent layers, with each layer
being responsible for a different role, such as physical or logical addressing, or retransmission of lost data.

 14
b) Planning the tests, writing test cases

 c) Performing the tests

 d) Evaluating the results, drawing conclusions

 In addition, the previously mentioned zeroth goal will be made up of two tasks:

 a) Looking for related publications in conference archives and publication
 indexing services

 b) Using search engines to find related software

2.3 Objectives and Strategies
 Objectives define the deadline, while strategies define the order of completion.

 In my case, the tasks are mostly linear, with each task depending on the previous
ones.

 • The literature research phase should be done first so as to avoid possible
 duplicate work. It is expected to take one or two weeks, depending on the
 findings.

 • Goal 1) must be completed next as all others depend upon it. This should take
 at most one week.

 • Goal 2) must follow afterwards, as none of the others are available at this
 point. This is task expected to take one week.

 • Goals 3) and 4) may be done in parallel, although it would be preferable to
 finish goal 3) before starting goal 4). These are expected to take two and three
 weeks, respectively.

 • Goal 5) depends on all previous goals and is the only one that can be
 completed next. It is expected to take four weeks.

 • Finally, goal 6) can be completed. This should take two weeks.

 15
3 Initial Research

3.1 Related Work

3.1.1 Publications
 Based on searches I conducted using the IEEE Xplore Digital Library and Google
Scholar, two of the biggest research databases, over 2 000 papers have been published to
date that are related to cloud and encryption. Approximately 200 papers are connected to
transparent encryption, while only a fifth of those are also related to cloud computing.
There exist some solutions for specific use cases, such as transparently encrypting data
stored in a local filesystem [4][5], in MongoDB databases [6], HDFS3 [7][8], or
transmitted between virtual machines and their hosts [9][10].

 Application Layer Encryption for Cloud by Saxena et al. [11] describes a similar,
but not necessarily transparent method, user-layer encryption (as they name it). In 2012,
Diallo et al. published CloudProtect [12], a middleware written in the Java programming
language that can transparently encrypt certain data fields in Google Docs documents, as
well as Google Calendar items, while also making it possible to share encrypted items
with others. However, the solution presented in this paper only seems to consider
unencrypted HTTP4 sessions, which are less and less common these days. Just recently,
in 2017, Newport et al. described a system in their paper, A Secure Cloud Storage System
for Small and Medium Enterprises [13], that is capable of encrypting files on the fly, then
storing them in Dropbox, a cloud-based file storage service. Although the exact details of
the method are not specified, I would infer from the examples that this solution works by
creating a local folder, looking for changes in the files and folders within, then encrypting
the changed files, putting them in an actual Dropbox folder, to be uploaded to the cloud.

 3
 Hadoop Distributed File System: a distributed file system that stores data on commodity
machines and is often used in clustered environments.
 4
 Hypertext Transfer Protocol: a text-based application layer (OSI L7) protocol that is most
commonly used by browsers to access web-based content.

 16
3.1.2 Similar Software
 The next phase of my research consisted of discovering software that was similar
in functionality to what I aim to achieve within this thesis.

3.1.2.1 Boxcryptor
 The first solution I found was Boxcryptor. It is a closed-source software that
creates a virtual drive that can be used for secure storage. When a file is written on this
drive, it is encrypted instead, then stored in one of the supported cloud storage services
(this approach is often called the overlay method as it overlays an existing solution). The
free version only supports Dropbox as its back-end, while the paid version includes
support for Google Drive, OneDrive, Box, Cubby, and several others [14]. It uses AES5
with a key length of 256 bits in CBC6 mode to encrypt files, with each file being encrypted
with a different key. The AES keys are encrypted with the user’s 4096-bit RSA7 key, and
are appended to the files [15]. Boxcryptor also supports master keys and sharing files with
other users or groups. It works on Windows and Linux, as well as several mobile
platforms.

3.1.2.2 Cipherdocs
 The next solution I came across was cipherdocs. It is an open-source project,
available on GitHub8. According to the project documentation [16], it works with most
cloud storage solutions, including Dropbox, Google Drive and OneDrive. Instead of the
overlay method, it changes the file extensions of the encrypted files to .gpg, which, when
opened, are decrypted into a temporary (local) folder, then re-encrypted and moved back

 5
 Advanced Encryption Standard: a commonly used symmetric encryption algorithm for which no
known feasible attacks exist as of today
 6
 Cipher Block Chaining: a mode of operation of a block cipher, in which the input of the n th
encryption function is not just the n th block of plaintext, but the nth plaintext block XORed (bitwise
eXclusive OR) with the ciphertext of the (n-1)th block. For the first block, the first plaintext block is XORed
with an initialization vector.
 7
 Rivest-Shamir-Adleman: a commonly used asymmetric encryption algorithm, the strength of
which is based on the factoring problem (factorizing the product of two large primes)
 8
 A development platform that makes it easy to share code, enable other programmers to propose
changes to the code, track issues, publish releases, automate testing, and manage projects.

 17
if changed. It uses an OpenPGP9 implementation to encrypt files. Designed for single-
user mode, sharing encrypted files is not supported. It works on Windows only.

3.1.2.3 CloudFogger
 A third provider I intended to check out was CloudFogger, but as of 4 Oct, 2017,
only a notice appears on their website 10, stating that the service is no longer available,
recommending users to try Boxcryptor instead.

3.1.2.4 SeaFile
 The next solution to be investigated was SeaFile, an open-source, Git11-based
storage system. It is self-hosted, meaning that the server has to be installed by, and the
storage space has to be provided by the user, requiring more knowledge of computer
systems than the previously introduced solutions. It supports sharing files among users
and groups [17]. Neither the website nor their GitHub page mention any specifics of the
encryption process. Supports Windows and Linux.

3.1.2.5 Cryptomator
 Another application I tried was Cryptomator. It is free and open-source, and acts
as an overlay above Dropbox, Google Drive, and other services, just like Boxcryptor. It
uses AES with 256-bit long keys, and encrypts file names as well as the structure of the
folders. Written in Java, it works on Windows and Linux as well.

3.1.2.6 Tresorit
Last, but not least, I checked out Tresorit. It turned out to be different: it is a stand-alone
service that has its own client, and does not overlay an existing service. It supports 2-
factor authentication, has a version history, and the more expensive plans include sharing
files. It uses AES-256 for encryption. It is closed-source and has no free plan.

 9
 An e-mail encryption framework based on Phil Zimmermann’s software titled Pretty Good
Privacy (PGP).
 10
 https://www.cloudfogger.com
 11
 A distributed version control system for files, often used by programmers.

 18
3.1.3 Summary of Related Work
 Based on the above, it can be concluded that the need for encryption for cloud-
based services has already been recognized, resulting in several publications and
implementations.

 As for the papers, while many are related to my project, some of them employ
non-transparent methods, some of them focus on enterprise-level solutions more than
home users, and some of them just describe what should be done, but now how. It is
apparent, however, that by combining existing proposals and extending the result with
some of my ideas, it is possible to build a system that solves the problems detailed in the
problem statement.

 As for the implementations, it can be said that none of them are truly transparent
in that they all require the end-user to change how he uses the underlying cloud service
(for example, by having to store and manage files on a new virtual drive instead of
Dropbox). Furthermore, it seems that all the solutions focus on securing files, and there
are no well-known implementations to secure note-taking applications or calendars.

3.2 Enumerating Potential Services for Encryption
 I used three different approaches to find cloud-based services that could possibly
be made more secure using transparent encryption. First, I listed the services that I
currently use, used before, or at least have heard about. Second, I asked some of my
colleagues and friends about which services they used. Third, I used three different search
engines, Google, Yahoo and Bing, querying for typical search terms such as online file
storage, online calendar, note management, and alternatives, to see if
there's anything I missed.

 After compiling the lists from all the sources, I ended up with Table 3.1. The table
shows the services found in alphabetical order, the type of the service, and the source
where the entry came from.

 Service Type Source
 Apple Cloud file storage colleagues
 Box.com file storage heard about
 Dropbox file storage used before
 Dynalist.io note/task management Yahoo
 Evernote note management used before
 Google Calendar calendar management used before

 19
Google Drive file storage used before
 Google Keep calendar management heard about
 Note Taking Express note management Bing
 OneDrive file storage using it
 OneNote (online) note management colleagues
 Outlook (online) calendar management colleagues
 SimpleNote note management Google
 SpiderOak file storage colleagues
 Sync.com file storage Google
 Zoho Notebook note management Google
Table 3.1 – A list of services that could potentially be encrypted transparently

 20
4 Picking and Analysing Services

 Having found that there exist several solutions that focus on securing online file
storage services, but none that aim to secure calendar applications and note-taking
services, I have chosen to focus on these latter two categories. Even though the
assignment description only requires me to analyse one service, if time permits, I would
like to examine more of them in order to gain insight on current trends. I expect this to
help me design the transparent encryption layer later more efficiently.

 The first service to be analysed was chosen to be Evernote. Evernote, as its name
suggests, is a note-taking application that also supports reminders, task lists, and
interactive content. Even though Evernote isn't the most popular service on the list, I have
previously worked with its API12 as part of a related project. The API isn't exactly what
I'd call simple, but being at least somewhat familiar with it makes Evernote an ideal
candidate for starters.

4.1 Detailed Analysis of Evernote's Communication Protocol

4.1.1 Protocol Analysis
 Evernote has a web-based, a desktop, and a mobile client. This would make it
logical to assume that all of these clients use a common API, which is likely to be a web-
based API. To confirm this hypothesis, I opened Firefox, a web browser, loaded the
landing page (https://www.evernote.com), then pressed F12 to show the browser’s
Developer Tools window. Although it might be called differently, all modern browsers
have this feature today. This feature makes it possible to inspect and manipulate the
DOM13 of websites, run/inject arbitrary JavaScript code locally, perform benchmarks, test
the layout with various screen sizes and aspect ratios, and, of course, take a peek at what
is being sent on the network (see Figure 4.1).

 12
 Application Programming Interface: a set of data structures and function declarations that
describe how a service that implements this API may be consumed (interacted with).
 13
 Document Object Model: a model that treats an (X)HTML document as a tree, where each node
represents an element in the document.

 21
Figure 4.1 – Evernote's messages in Firefox's Developer Tools

 It can be seen that the web client of Evernote uses HTTP over SSL 14/TLS15 over
TCP16 to send and receive messages.

 To further confirm the hypothesis, I downloaded and installed the Evernote
application from Play Store, the official application store of Google. After installation, I
made sure that my Android-based phone was connecting to the same WiFi network my
laptop was on, then opened Wireshark, a network packet capture/analysis application.
After logging in to Evernote on the mobile, I could see in Wireshark that it was connecting
to the same IP address as the browser, and that it was also using TLS. I could not see what
was being transmitted, but this is expected since the traffic itself is encrypted.

 Having confirmed the hypothesis, the analysis followed. Within its scope, I logged
in, created two notes, set their title and contents, added a reminder to one of them, changed

 14
 Secure Sockets Layer: a set of cryptographic protocols that aim to secure communication
channels. Typical services include encryption, integrity protection and authentication.
 15
 Transport Layer Security: a (more secure) successor of SSL.
 16
 Transmission Control Protocol: a connection-oriented transport layer (OSI L4) protocol that
offers reliability via the retransmission of lost segments. It also reorders out-of-order segments and supports
flow control.

 22
the contents, changed the date of the reminder, created a third note, deleted it, and then
examined the results:

 • For all of the actions that create or change a note, AJAX17 calls are made
 to the Evernote API. The HTTP method (verb) used for these calls is
 always POST, no matter whether the action is a read, update, or delete
 operation. From this, I can conclude that the Evernote API is not a
 RESTful18 API.

 • The endpoint for the relevant messages is always either
 https://www.evernote.com/shard/s###/enweb/notestore or
 https://www.evernote.com/shard/s###/enweb/notestore/ext, where ### is
 a three-digit number, possibly referring to the specific server on which my
 session exists.

 • The response always has the MIME19 type text/json, although the response
 body is never a valid JSON20 object. It always begins with two forward
 slashes and the letters O, K (i.e. //OK), which, in JavaScript, denotes a
 comment, however, this is not valid in JSON. This is most likely used as
 a security measure against object hijacking attacks via script inclusion,
 which the two forward slashes break by making the entire response behave
 like a comment if included maliciously. The actual client implementations
 can just ignore the first four characters of each response, and parse the rest
 as a regular (and valid) JSON object.

 17
 Asynchronous JavaScript and XML: a means of performing asynchronous calls in web
applications that is typically used to dynamically load page contents without having to reload the entire
page.
 18
 Representational State Transfer: a REST (or RESTful) API is a stateless API that identifies the
resource to be queried or manipulated in the URL, and uses HTTP verbs to specify the action to be carried
out on the target resource.
 19
 Multipurpose Internet Mail Extensions: originally designed to describe the types and contents
of files that are sent as e-mail attachments, MIME has been adopted to be used in other protocols as well.
 20
 JavaScript Object Notation: a JavaScript-like data format that is commonly used in modern web
APIs, especially object-oriented ones.

 23
• The request body always consists of several tens of data fields separated
 by pipe symbols ( | ). Some of the fields contain values such as
 java.lang.String/2004016611, hinting that these fields may be Java objects
 serialized into strings. It took me a while to figure out what framework or
 API serializes data like this, but I succeeded. The framework was
 identified as GWT 21, while the protocol being spoken as GWT-RPC22 wire
 protocol. While there is no official documentation for this, it was reverse-
 engineered in 2012 [18]. Even though GWT-RPC parsers are only
 available in Java, this does not limit the languages that can be used to
 implement a transparent proxy for Evernote since we don’t need to parse
 and interpret each field in the message, only specific ones that are always
 in the same position.

 • The messages have no replay or integrity protection. I was able to replay
 an API call using Firefox's Edit and Resend function. I changed several
 letters in the content of a note that was to be edited, then replayed the
 message. The server responded with a 200 OK message. After a reload,
 the note had its content successfully modified.

4.1.2 Message Analysis

4.1.2.1 Initial Page Load
 After logging in, when the site is first loaded, a call to the findNotesMetadata
function is issued. As shown later, this function returns a list of notes that are owned by
the user that is logged in.

 7|0|10|https://www.evernote.com/focusclient/|78E137D3512D195F071EB90374365
 42A|com.evernote.web.shared.GWTNoteStoreInterface|findNotesMetadata|com.ev
 ernote.edam.notestore.NoteFilter/3387378272|I|com.evernote.edam.notestore.
 NotesMetadataResultSpec/2285571585|[Z/1413617015|Etc/GMT-1||1|2|3|4|4|5|6|

 21
 Google Web Toolkit: an open-source framework that makes it possible to create JavaScript-
based web applications in Java.
 22
 Remote Procedure Call: a means of implementing inter-process communication, in which the
caller can invoke methods on remote servers, then process the results as if they were the results of local
function calls.

 24
6|7|5|8|4|1|1|1|0|0|0|0|0|0|0|2|0|9|10|0|50|7|8|11|1|0|1|1|0|0|1|0|1|1|0|1
 |0|1|0|1|0|1|0|1|0|1|

 This request does not contain anything sensitive, but we can see that the version
of the GWT-RPC protocol being used is 7.

 The response looks as follows:

 //OK[150,2,0,0,0,'VjcO6_o',0,11,0,8,0,0,10,'A','VjcO30g',0,'A',0,0,0,0,'A'
 ,'A','A','A',0,0,0.0,0.0,0,0,0,0,0,0,6,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,12,2,
 5,0,0,0,1,1,0,6,2,4,'VjcPCUY',0,9,0,8,0,0,7,'A','VjcO$Kw',0,'A',0,0,0,0,'A
 ','A','A','A',0,0,0.0,0.0,0,0,0,0,0,0,6,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,12,2
 ,5,0,0,0,1,1,0,6,2,4,2,3,1,1,1,3,2,1,["com.evernote.edam.notestore.NotesMe
 tadataList/3192047339","[Z/1413617015","java.util.ArrayList/4159755760","c
 om.evernote.edam.notestore.NoteMetadata/2574771094","com.evernote.edam.typ
 e.NoteAttributes/74627218","me4626","b1dc0239-b0b1-47ee-a17e-b69b66ba70e4"
 ,"My second note","9a8dddfd-0d87-4e7c-b81d-086a5aa3f9ee","My first
 note"],0,7]

 Here, I can see that I have two notes, surrounded with what appear to be GUIDs23.
From that, I can infer that notes are identified by the GUIDs in the system. One has the
GUID of b1dc0239-b0b1-47ee-a17e-b69b66ba70e4 and the title My second note, the
other the GUID of 9a8dddfd-0d87-4e7c-b81d-086a5aa3f9ee and the title My first note.
The note titles here are of interest since these may contain sensitive information.

4.1.2.2 Reading Note Contents
 The content of notes is retrieved using getHtmlNoteContent:

 7|0|11|https://www.evernote.com/focusclient/|304280BD765DCDBF4A79609E10928
 B10|com.evernote.web.shared.GWTNoteStoreExtensions|getHtmlNoteContent|java
 .lang.String/2004016611|java.util.List|S|Z|b1dc0239-b0b1-47ee-a17e-b69b66b
 a70e4|java.util.ArrayList/4159755760|/shard/s652/res/|1|2|3|4|5|5|6|5|7|8|
 9|10|0|11|-1|0|

 The request contains the GUID of the note whose contents should be retrieved,
while the response carries the contents of the note:

 //OK[1,["\x3Cbody class\x3D\"ennote\"\x3E\x3Cdiv\x3ESecond note content.
 \x3Cbr clear\x3D\"none\"/\x3E\x3C/div\x3E\x3C/body\x3E"],0,7]

 23
 Globally Unique Identifier: a 128-bit hexadecimal number that is supposed to uniquely identify
an object. GUIDs usually are represented in the form 8-4-4-4-12, e.g. 12345678-90ab-cdef-1234-
567890abcdef

 25
It may be observed that note bodies are considered and handled as HTML24, with
the special characters (such as < and >) escaped. In a more readable format, the above
note body would be:
 Second note content. 

 Obviously, note contents are prime targets for encryption.

4.1.2.3 Creating Notes
 Notes are created by calling the createNote function.

 7|0|14|https://www.evernote.com/focusclient/|304280BD765DCDBF4A79609E10928
 B10|com.evernote.web.shared.GWTNoteStoreExtensions|createNote|com.evernote
 .edam.type.Note/4071998839|java.lang.String/2004016611|java.util.List|[Z/1
 413617015|com.evernote.edam.type.NoteAttributes/74627218|me4626|Untitled||
 java.util.ArrayList/4159755760|1|2|3|4|3|5|6|7|5|8|6|0|1|1|0|1|0|1|9|8|12|
 0|0|0|0|0|0|0|0|0|0|0|1|0|0|10|0|0|0|0|0|0|0|0|0|0|A|A|A|A|0|0|0|0|A|0|0|0
 |VjcXc9v|A|0|0|11|0|0|0|0|0|12|0|VjcXnr6|13|14|0|

 The function creates a new instance of the note object, sets its title to Untitled, and
its content to an empty string. There's nothing sensitive here.

 The response:

 //OK['VjcXnoo',155,8,0,0,0,0,0,7,0,6,'A','VjcXc5I',154,72,77,-51,-102,-
 43,36,23,-32,115,44,30,-68,-25,63,34,73,16,5,0,'A',0,0,0,0,'A','A','A',
 'A',0,0,0.0,0.0,0,0,0,0,0,0,4,0,0.0,1,0,0,0,0,0,0,0,0,0,0,0,12,2,3,1,1,1,0
 ,1,1,1,6,2,1,["com.evernote.edam.type.Note/4071998839","[Z/1413617015","co
 m.evernote.edam.type.NoteAttributes/74627218","me4626","[B/3308590456","b3
 ba7ec9-9b0a-4ec3-8bba-fea2d9d07537","Untitled"],0,7]

 Here, it is shown that the new, third note was created and that it was assigned a
GUID of b3ba7ec9-9b0a-4ec3-8bba-fea2d9d07537. There is nothing sensitive here,
either.

4.1.2.4 Editing Notes
 Changes to note titles and contents are sent using the updateNoteIfUsnMatches
function.

 7|0|16|https://www.evernote.com/focusclient/|304280BD765DCDBF4A79609E10928
 B10|com.evernote.web.shared.GWTNoteStoreExtensions|updateNoteIfUsnMatches|
 com.evernote.edam.type.Note/4071998839|java.lang.String/2004016611|java.ut
 il.List|[Z/1413617015|com.evernote.edam.type.NoteAttributes/74627218|me462
 6|[B/3308590456|b1dc0239-b0b1-47ee-a17e-b69b66ba70e4|My second note|

 24
 Hypertext Markup Language: the markup language in which the layout of websites and web
applications are written.

 26
Second note content with updates.
 |java.util.ArrayList/4159755760|1|2|3|4|3|5|6|7|5|8|6|1|1|1|0|1|1|1|9|8|12
 |0|0|0|0|0|0|0|0|0|0|0|0|0|0|10|0|0|0|0|0|0|0|0|0|0|A|A|A|A|0|0|0|0|A|0|11
 |16|-63|-38|103|95|59|-46|120|35|67|-28|-12|55|80|-97|87|-57|173|VjcO$Kw|A
 |12|0|13|0|0|0|0|0|14|150|VjcQlIZ|15|16|0|

 The request contains the GUID of the note to be edited, the new title, and the new
contents. The title and the contents to be encrypted here.

 The response:

 //OK[1,'VjcQk84',151,9,0,0,0,0,0,8,0,7,'A','VjcO$Kw',186,56,-126,-45,-
 106,15,-115,-53,-4,75,84,33,48,-128,-12,-95,125,16,6,0,'A',0,0,0,0,'A',
 'A','A','A',0,0,0.0,0.0,0,0,0,0,0,0,5,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,12,2,4
 ,1,1,1,0,1,1,1,6,2,3,1,1,2,1,["com.evernote.edam.notestore.UpdateNoteIfUsn
 MatchesResult/226232967","[Z/1413617015","com.evernote.edam.type.Note/4071
 998839","com.evernote.edam.type.NoteAttributes/74627218","me4626","[B/3308
 590456","b1dc0239-b0b1-47ee-a17e-b69b66ba70e4","My second note"],0,7]

 The GUID and the note title are echoed back.

 The function name has the conditional clause if USN matches in its name, with
USN most likely meaning Update Sequence Number. USNs are used in multi-client
systems to ensure that an update cannot accidentally overwrite changes that were made
by another update in the meantime.

4.1.2.5 Editing Reminders
 Reminders are set using the updateNote function.

 7|0|12|https://www.evernote.com/focusclient/|78E137D3512D195F071EB90374365
 42A|com.evernote.web.shared.GWTNoteStoreInterface|updateNote|com.evernote.
 edam.type.Note/4071998839|[Z/1413617015|com.evernote.edam.type.NoteAttribu
 tes/74627218|me4626|[B/3308590456|b1dc0239-b0b1-47ee-a17e-b69b66ba70e4|cd3
 306ff-56c9-4ae8-9def-75ee96af1ee8|My second note|1|2|3|4|1|5|5|6|6|1|1|1|0|
 1|1|1|7|6|12|0|0|0|0|0|1|0|1|0|0|0|0|0|0|8|0|0|0|0|0|0|0|0|0|0|A|VjcRAIZ|
 Vky2QMA|A|0|0|0|0|A|0|9|16|125|-95|12|-128|48|33|84|75|-4|-53|-115|15|-106
 -45|-126|56|186|VjcO$Kw|A|10|0|11|0|0|0|0|0|12|153|VjcQk84|

 The GUID of the note (and for some reason, its title) are sent to the server. The
request also contains several alphanumeric fields of length 7 (highlighted in blue or red).
I knew these fields contained dates and times somehow, but I wasn’t sure how. I figured
out that the one highlighted in red stores the date of the reminder by changing the
reminder date multiple times. Smaller changes to the date resulted in smaller changes in
the value, same dates resulted in the same value, and a later date resulted in a value that
succeeded the previous values in alphabetical order (in other words, I got a string that was
“greater” if the date was also “greater” than the previous one). I had a feeling these were

 27
You can also read