I Forgot Your Password: Randomness Attacks Against PHP Applications

Page created by Bernard Baldwin
 
CONTINUE READING
I Forgot Your Password: Randomness Attacks Against PHP Applications∗

                    George Argyros                                      Aggelos Kiayias
            Dept. of Informatics & Telecom.,                   Dept. of Informatics & Telecom.,
                  University of Athens,                               University of Athens,
             argyros.george@gmail.com                                aggelos@di.uoa.gr
                                                             & Computer Science and Engineering,
                                                             University of Connecticut, Storrs, USA.

                           Abstract                              PHP for example lacks a built-in cryptographically se-
                                                                 cure PRNG in its core and until recently, version 5.3, it
   We provide a number of practical techniques and               tottaly lacked a cryptographically secure randomness
algorithms for exploiting randomness vulnerabilities             generation function.
in PHP applications.We focus on the predictability of               This left PHP programmers with two options: They
password reset tokens and demonstrate how an attacker            will either implement their own PRNG from scratch
can take over user accounts in a web application via             or they will employ whatever functions are offered by
predicting or algorithmically derandomizing the PHP              the API in a “homebrew” and ad-hoc fashion. In ad-
core randomness generators. While our techniques are             dition, backwards compatibility and other issues (cf.
designed for the PHP language, the principles behind             section 2), often push the developers away even from
our techniques and our algorithms are independent of             the newly added randomness functions, making their
PHP and can readily apply to any system that utilizes            use very limited. As we will demonstrate and heavily
weak randomness generators or low entropy sources.               exploit in this work, this approach does not produce
Our results include: algorithms that reduce the entropy          secure web applications.
of time variables, identifying and exploiting vulnera-              Observe that using a low entropy source or a crypto-
bilities of the PHP system that enable the recovery or           graphically weak PRNG to produce randomness does
reconstruction of PRNG seeds, an experimental analy-             not necessarily imply that an attack is feasible against
sis of the Håstad-Shamir framework for breaking trun-           a system. Indeed, so far there have been a very limited
cated linear variables, an optimized online Gaussian             number of published attacks based on the insecure us-
solver for large sparse linear systems, and an algorithm         age of PRNG functions in PHP, while popular exploit
for recovering the state of the Mersenne twister gen-            databases1 contain nearly zero exploits for such vul-
erator from any level of truncation. We demonstrate              nerabilities (and this may partially explain the delay in
the gravity of our attacks via a number of case studies.         the PHP community adopting secure randomness gen-
Specifically, we show that a number of current widely            eration functions). Showing that such attacks are in
used web applications can be broken using our tech-              fact very practical is the objective of our work.
niques including Mediawiki, Joomla, Gallery, osCom-                 In this paper we develop generic techniques and al-
merce and others.                                                gorithms to exploit randomness vulnerabilities in PHP
                                                                 applications. We describe implementation issues that
                                                                 allow one to either predict or completely recover the
1     Introduction                                               initial seed of the PRNGs used in most web applica-
                                                                 tions. We also give algorithms for recovering the in-
Modern web applications employ a number of ways                  ternal state of the PRNGs used by the PHP system, in-
for generating randomness, a feature which is critical           cluding the Mersenne twister generator and the glibc
for their security. From session identifiers and pass-           LFSR based generator, even when their output is trun-
word reset tokens, to random filenames and password              cated. These algorithms could be used in order to
salts, almost every web application is relying on the            attack hardened PHP installations even when strong
unpredictability of these values for ensuring secure op-         seeding is employed, as it is done by the Suhosin ex-
eration. However, usually programmers fail to under-             tension for PHP and they may be of independent inter-
stand the importance of using cryptographically secure           est.
pseudorandom number generators (PRNG) something                     We also conducted an extensive audit of several pop-
that opens the potential for attacks. Even worse, the            ular PHP applications. We focused on the security
same trend holds for whole programming languages;                of password reset implementations. Using our attack
    ∗ Research   partly supported by ERC Project CODAMODA.         1 e.g.   http://www.exploit-db.com

                                                             1
framework we were able to mount attacks that take               35K(≈ 215 ) requests. Sample benchmarks that we per-
over arbitrary user accounts with practical complex-            formed in various applications and server installations
ity. A number of widely used PHP applications are               show that on average one can perform up to 222 re-
affected (see Figure 7), while we believe that the im-          quests in the course of a day.
pact is even larger in less known applications.
   Our results suggest that randomness attacks should
                                                                2     PHP System
be considered practical for PHP applications and ex-
isting systems should be audited for these vulnerabili-         We will now describe functionalities of the PHP sys-
ties. Weak randomness is a grave vulnerability in any           tem that are relevant to our attacks. We first describe
secure system as it was also recently demonstrated in           the different modes in which PHP might be running,
the widely publicized discovery of common primes in             and then we will do a description of the randomness
RSA public-keys by Lenstra et al. [14]. We finally              generation functions in PHP. We focus our analysis in
stress that our techniques apply in any setting beyond          the Apache web server, the most popular web server at
PHP, whenever the same PRNG functions are used and              the time of this writing, however our attacks are easily
the attack vector relies on predicting a system defined         ported to any webserver that meets the configuration
random object.                                                  requirements that we describe for each attack.
   This is only an extended abstract, a full version can
be found in [1].
                                                                2.1    Proccess management
1.1    Attack model                                             There are different ways in which a PHP script is ex-
                                                                ecuted. These ways affect its internal states, and thus
In Figure 1 we present our general attack template. An          the state of its PRNGs. We will focus on the case when
attacker is trying to predict the password reset token in       PHP is running as an Apache module, which is the de-
order to gain another user’s privileges (say an admin-          fault installation in most Linux distributions and is also
istrator’s). Each time the attacker makes a request to          very popular in Windows installations.
the web server, his request is handled by a web appli-
cation instance, usually represented by a specific op-          mod php: Under this installation the Apache web
erating system process, which contains some process             server is responsible for the process management.
specific state. The web application uses a number of            When the server is started a number of child proccesses
application objects with values depending on its in-            are created and each time the number of occupied pro-
ternal state, with some of these objects leaking to the         cesses passes a certain threshold a new process is cre-
attacker through the web server responses. Examples             ated. Conversely, if the idle proccesses are too many,
of such objects are session identifiers and outputs of          some processes are killed. One can specify a maxi-
PRNG functions. Although our focus is in password               mum number of requests for each process although this
reset functions, the principles that we use and the tech-       is not enabled by default. Under this setting each PHP
niques that we develop can be readily applied in other          script runs in the context of one of the child processes,
contexts when the application relies on the generation          so its state is preserved under multiple connections un-
of random values for security applications. Examples            less the process is killed by the web server process
of such applications are CAPTCHA’s and the produc-              manager. The configuration is similar in the case the
tion of random filenames.                                       web server uses threads instead of processes.

Attack complexity. Since we present explicit practi-            Keep-Alive requests. The HTTP protocol offers a
cal attacks, we define next the complexity under which          request header, called Keep-Alive. When this header
an attack should be consider practical. There are two           is set in an HTTP request, the web server is instructed
measure of complexity of interest. The first is the time        to keep the connection alive after the request is served.
complexity and the second is the query or communi-              Under mod php installations this means that any sub-
cation complexity. For some of our attacks the main             sequent request will be handled from the same process.
compuational operation is the calculation of an MD5             This is a very important fact, that we will use in our
hash. With current GPU technologies an attacker can             attacks. However in order to avoid having a process
perform up to 230 MD5 calculations per second with              hang from one connection for infinite time, most web
a $250 GPU, while with an additional $500 can reach             servers specify an upper bound on the number of con-
up to 232 calculations [9]. These figures suggest that          sequent keep-alive requests. The default value for this
attacks that require up to 240 MD5 calculations can             bound in the Apache web server is 100.
be easilty mounted. In terms of communication com-
plexity, most of our attacks have a query complexity
                                                                2.2    Randomness Generation
of a few thousand requests at most, while some have
as little as a few tens of requests. Our most commu-            In order to satisfy the need for generating randomness
nication intensive attacks (section 5) require less than        in a web application, PHP offers a number of different

                                                            2
Figure 1: Attack template.

randomness functions. We briefly describe each func-                ditive feedback generator (resembling a Linear
tion below.                                                         Feedback Shift Register (LFSR)), while in Win-
    php combined lcg()/lcg value():                  the            dows it is an LCG. The numbers generated by
    php combined lcg() function is used internally                  rand() are in the range [0, 231 − 1] but like before
    by the PHP system, while lcg value() is its                     the two optional arguments give the ability to map
    public interface. This function is used in order to             the random number to the range [min, max]. Like
    create sessions, as well as in the uniqid function              before the srand() function seeds the generator
    described below to add extra entropy. It uses two               similarly to the mt srand() function.
    linear congruential generators (LCGs) which it                  openssl random pseudo bytes(length,
    combines in order to get better quality numbers.                strong): This function is the only function
    The output of this function is 64 bits.                         available in order to obtain cryptographically
    uniqid(prefix, extra entropy):                 This             secure random bytes. It was introduced in version
    function returns a string concatenation of the                  5.3 of PHP and its availability depends on the
    seconds and microseconds of the server time con-                availability of the openssl library in the system.
    verted in hexadecimal. When given an additional                 In addition, until version 5.3.4 of PHP this
    argument it will prefix the output string with the              function had performance problems [2] running
    prefix given. If the second argument is set to true,            in Windows operating systems. The strong
    the function will suffix the output string with an              parameter, if provided, is set to true if the
    output from the php combined lcg() function.                    function returned cryptographically strong bytes
    This makes the total output to have length up to                and false otherwise. For these reasons, and
    15 bytes without the prefix.                                    for backward compatibility, its use is still very
                                                                    limited in PHP applications.
    microtime(),         time():       The function
                                                               In addition the application can utilize an operating sys-
    microtime() returns a string concatenation
                                                               tem PRNG (such as /dev/urandom). However, this
    of the current microseconds divided by 106 with
                                                               does not produce portable code since /dev/urandom
    the seconds obtained from the server clock. The
                                                               is unavailable in Windows OS.
    time() function returns the number of seconds
    since Unix Epoch.
    mt srand(seed)/mt rand(min, max):                          3   The entropy of time measurements
    mt rand is the interface for the Mersenne
    Twister (MT) generator [15] in the PHP system.             Although ill-advised (e.g., [5]) many web applica-
    In order to be compatible with the 31 bit output of        tions use time measurements as an entropy source.
    rand(), the LSB of the MT function is discarded.           In PHP, time is accessed through the time() and
    The function takes two optional arguments which            microtime() functions. Consider the following prob-
    map the 31 bit number to the [min, max] range.             lem. At some point a script executing a request made
    The mt srand() function is used to seed the MT             by the attacker makes a time measurement and use the
    generator with the 32 bit value seed; if no seed           results to, say, generate a password reset token. The
    is provided then the seed is provided by the PHP           attacker’s goal it to predict the output of the measure-
    system.                                                    ment made by the PHP script. The time() function
    srand(seed)/rand(min, max): rand is the in-                has no entropy at all from an attacker point of view,
    terface function of the PHP system to the rand()           since the server reveals its time in the HTTP response
    function provided by libc. In unix, rand() ad-             header as dictated by the HTTP protocol. On the other

                                                           3
hand, microtime ranges from 0 to 106 giving a max-               succession two password reset requests, one for his ac-
imum entropy of about 20 bits. We develop two dis-               count and one for the target user’s account, then these
tinct attacks to reduce the entropy of microtime()               requests will be processed by the application with a
that have different advantages and mostly target two             very small time difference and thus the conditional en-
different scenarios. The first one, Adversial Time Syn-          tropy of the target user’s password reset token given
chronization, aims to predict the output of a specific           the attacker’s token will be small. Thus, the attacker
time measurement when there is no access to other                can generate a token for an account he owns and in
such measurements. The second, Request Twins, ex-                fast succession a token for the target account. Then
ploits the fact that the script may enable the attacker to       the microtime() used for generating the token of his
generate a correlated leak to the target measurement.            account can be used to approximate the microtime()
                                                                 output that was used for the token of the target account.
Adversarial Time Synchronization (ATS). As we
mentioned above, in each HTTP response the web                   Experiments. We conducted a series of experiments
server includes a header containing the full date of the         for both our algorithms using the following setup. We
server including hour, minutes and seconds. The basic            created a PHP “time” script that prints out the current
observation is that although we get no leak regarding            seconds and microseconds of the server. To evaluate
the microseconds from the HTTP date header we know               the ATS algorithm we first performed synchronization
that when a second changes the microseconds are ze-              between a client and the server and afterwards we sent
roed. We use this observation to narrow down their               a request to the time script and tried to predict the value
value.                                                           it would return. To evaluate the Request Twins algo-
   The algorithm proceeds as follows: We connect to              rithm we submitted two requests to the time script in
the web server and issue pairs of HTTP requests R1               fast succession and measured the difference between
and R2 in corresponding times T 1 and T 2 until a pair           the output of the two responses.
is found in which the date HTTP header of the cor-                  In Figure 3 we show the time difference between
responding responses is different. At that point we              the server’s time and our client’s calculation for four
know that between the processing of the two HTTP                 servers with different CPU’s and RTT parameters. Our
requests the microseconds of the server were zeroed.             experiments suggest that both algorithms significantly
We proceed to approximate the time of this event S in            reduce the entropy of microseconds (up to an average
localtime, denoted by the timestamp D, by calculating            of 11 bits with ATS and 14 bits with Request Twins)
the average RTT of the two requests and offsetting the           having different advantages each. Specifically, the
middle point between T 2 and T 1 by this value divided           ATS algorithm seems to be affected by large RTT val-
by two.                                                          ues while it is less affected by differences in the CPU
   In the Apache web server the date HTTP header is              speed. The situation is reversed for Request Twins
set after processing the request of the user. If the at-         where the algorithm is immune to changes in the RTT
tacker requests a non existent file, then the point the          however, it is less effective in old systems with low
header is set is approximatelly the point that a valid           processing speed.
request will start executing the PHP script. It fol-
lows that if the attacker uses ATS with HTTP requests            4    Seed Attacks
to not existent files then he will synchronize approx-
imately with the beggining of the script’s execution.            In this section we describe attacks that allow either the
Given a steady network where each request takes RT2 T            recovery or the reconstruction of the seeds used for the
time to reach the target server, our algorithm devia-            PHP system’s PRNGs. This allows the attacker to pre-
tion depends only on the rate that the attacker can send         dict all future iterations of these functions and hence
HTTP requests. In practice, we find that the algo-               reduces the entropy of functions rand(),mt rand(),
rithm’s main source of error is the network distance             lcg value() as well as the extra entropy argument
between the attacker’s system and the server cf. Fig-            of uniqid() to zero bits. We exploit two properties
ure 3. The above implementation we described is a                of the seeds used in these functions. The first one is
proof-of-concept and various optimizations can be ap-            the reusage of entropy sources between different seeds.
plied to improve its accuracy.                                   This enables us to reconstruct a seed without any ac-
                                                                 cess to outputs of the respective PRNG. The second is
Request Twins. Consider the following setting: an                the small entropy of certain seeds that allows one to
application uses microtime() to generate a password              recover its value by bruteforce.
token for any user of the system. The attacker has ac-              We present three distinct attacks. The first attack al-
cess to a user account of the application and tries to           lows one to recover the seed of the internal LCG seed
take over the account of another user. This allows the           used by the PHP system using a session identifier. Us-
attacker to obtain password reset tokens for his account         ing that seed our second attack reconstructs the seed of
and thus outputs of the microtime() function. The                rand() and mt rand() functions from the elements
key observation is that if the attacker performs in rapid        of the LCG seed without any access to outputs of these

                                                             4
Configuration                          ATS                          Req. Twins
                                      CPU(GHz) RTT(ms)                min       max          avg       min       max     avg
                                       1 × 3.2        1.1              0       4300         410          0      1485      47
                                       4 × 2.3        8.2              5       76693        4135        565     1669 1153
                                       1 × 0.3         9              53       39266        2724       1420     23022 4849
                                       2 × 2.6        135             73      140886       83573         2      1890     299

                                  Figure 3: Effectiveness of our time entropy lowering techniques against four servers
       Figure 2: ATS.             of different computational power and RTT. Time measurements are in microseconds.

functions. Finally, we exploit the fact that the seed           actly most of the values. Specifically, the client IP ad-
used in these functions is small enough for someone             dress is the attacker’s IP address and the Unix Epoch
with access to the output of these functions to recover         can be determined through the date HTTP header. In
its value by bruteforce.                                        addition, if php combined lcg() is not initialized at
                                                                the time the session is created, as it happens when a
                                                                fresh process is spawned, then it is seeded. The state
Generating fresh processes. Our attacks on this sec-            of the php combined lcg() is two registers s1 , s2 of
tion rely on the ability of the attacker to connect to a        size 32 bits each, which are initialized as follows. Let
process with a newly initialized state. We describe a           T1 and T2 be two subsequent time measurements. Then
generic technique against mod php in order to achieve           we have that
a connection to a fresh process. Recall that in mod php
when the number of occupied processes passes a cer-             s1 = T1 .sec ⊕ (T1 .usec  11) and s2 = pid ⊕ (T2 .usec  11)
tain threshold new processes are created to handle the
new connections. This gives the attacker a way to force          where pid denotes the current process id, or if threads
the creation of fresh processes: The attacker creates a         are used the current thread id 2 .
large number of connections to the target server with               Process id’s have a range of 215 values in Linux
the keep-alive HTTP header set. Having occupied a               systems In Windows systems the process id’s (resp.
large number of processes the web server will create            threads) are also at most 215 unless there are more
a number of new processes to handle subsequent re-              than 215 active processes (resp. threads) in the system
quests. The attacker, keeping the previous connections          which is a very unlikely occurence.
open, makes a new one which, given that the attacker                Observe now that the session calculation involves
created enough connections, will be handled by a fresh          three time measurements T0 , T1 and T2 . Given that
process.                                                        these three measurements are conducted succesivelly
                                                                it is advantageous to estimate their entropy by examin-
                                                                ing the random variables T0 , ∆1 = T1 −T0 , ∆2 = T2 −T1 .
4.1    Recovering the LCG seed from Ses-                        We conducted experiments in different systems to es-
       sion ID’s                                                timate the range of values for ∆1 and ∆2 . Our exper-
In this section we present a technique to recover the           iments suggest that ∆1 ∈ [1, 4] while ∆2 ∈ [0, 3]. We
php combined lcg() seed using a PHP session iden-               also found a positive linear correlation in the values of
tifier. In PHP, when a new session is created using the         the two pairs. This enables a cutdown of the possible
respective PHP function (session start()), a pseu-              valid pairs. These results suggest that the additionally
dorandom string is returned to the user in a cookie, in         entropy introduced by the two ∆ variables is at most 5
order to identify that particular session. That string is       bits.
generated using a conjuction of user specific and pro-              To summarize, the total remaining entropy of the
cess specific information, and then is hashed using a           session identifier hash is the sum of the microseconds
hash function which is by default MD5, however there            entropy from T0 (≈ 20 bits) the two ∆ variables (≈
is an option to use other hash functions such as SHA-1.         5 bits) and the process identifier(15 bits). These give
The values contained in the hash are:                           a total of 40 bits which is tractable cf. section 1.1.
      Client IP address (32 bits).                              Furthermore the following improvements can be made:
                                                                (1) Using the ATS algorithm the microseconds entropy
      A time measurement: Unix epoch and microsec-
                                                                can be reduced as much as 11 bits on average. (2) The
      onds (32 + 20 bits).
                                                                attacker can make several connections to fresh pro-
      A value generated by php combined lcg() (64
                                                                cesses instead of one, in rapid succession, obtaining
      bits).
    Notice now that in the context of our attack model             2 In PHP versions before 5.3.2 the seed used only one time mea-

the attacker controls each request thus he knows ex-            surement which made it even weaker.

                                                            5
session identifiers from each of the processes. Because            the PRNG functions rand() and mt rand() when the
the requests were made in a small time interval the                attacker has access to the output of these functions. We
preimages of the hashes obtained belong into the same              exploit the fact that the seed used by the PHP system is
search space, thus improving the probability of invert-            only 32 bits. Thus, an attacker who connects to a fresh
ing one of the preimages proportionally to the number              process and obtains a number of outputs from these
of session identifiers identifiers obtained. Our experi-           functions can bruteforce the 32 bit seed that produces
ments with the request twins technique suggest that at             the same output.
least 4 session identifiers can be obtained from within                 We emphasize that this attack works even if the out-
the same search space thus offering a reduction of at              puts are truncated or passed through transformations
least two bits. Adding these improvements reduces the              like hash functions. The requirements of the attack is
search time up to 227 MD5 computations.                            that the attacker can define a function from the set of
                                                                   all seeds to a sufficiently large range and can obtain a
4.2    Reconstructing the PRNG Seed from                           sample of this function evaluated on the seed that the
                                                                   attacker tries to recover. Additionally for the attack to
       Session ID’s                                                work this function should behave as a random map.
In this section we exploit the fact that the PHP system                 Consider the following example. The attacker has
reuses entropy sources between different generators, in            access to a user account of an application which gen-
order to reconstruct the PRNG seed used by rand()                  erates a password reset token as 6 symbols where each
and mt rand() functions from a PHP session identi-                 symbol is defined as g(mt rand()) where g is a ta-
fier. In order to predict the seed we only need to find            ble lookup function for a table with 60 entries contain-
a preimage for the session id, using the methods de-               ing alphanumeric characters. The attacker defines the
scribed in the previous section. One advantage of this             function f to be the concatenation of two password re-
attack is that it requires no outputs from the affected            set tokens generated just after the PRNG is initialized.
functions.                                                         The attacker samples the function by connecting to a
   When a new process is created the internal state of             fresh process and resetting his password two times.
the functions rand() and mt rand() is uninitialized.               Since the table of function g contains 60 entries, the
Thus, when these functions are called for the first time           attacker obtains 6 bits per token symbol, giving a total
within the script a seed is constructed as follows:                range to the function f of 72 bits.
                                                                        The time complexity of the attack is 232 calculations
seed = (epoch × pid) ⊕ (106 × php combined lcg())                  of f however, we can reduce the online complexity
                                                                   of the attack using a time-space tradeoff. In this case
where epoch denotes the seconds since epoch and pid                the online complexity of the attack can be as little as
denotes the process id of the process handling the re-             216 . The query complexity of the attack depends on
quest. It it easy to notice, that an attacker with access to       the number of requests needed to obtain a sample of
a session id preimage has all the information needed in             f . In the example given above the query complexity is
order to calculate the seed used to initialize the PRNGs           two requests.
since:
     epoch is obtained through the HTTP Date header.
     pid is known from the seed of the                             5     State recovery attacks
     php combined lcg() obtained through the
     preimage of the session id from section 4.1.                  One can argue that randomness attacks can be easily
     php combined lcg() is also known, since the at-               thwarted by increasing the entropy of the seeding for
     tacker has access to its seed, he can easily predict          the PRNG functions used by the PHP system. For ex-
     the next iteration after the initial value.                   ample, the suhosin PHP hardening extension replaces
In summary the technique of this section allows the re-            the rand() function with a Mersene Twister generator
construction of the seed of the mt rand() and rand()               with separate state from mt rand() and offers a larger
functions given access to a PHP session id of a fresh              seed for both generators getting entropy from the oper-
process. The time complexity of the attack is the                  ating system3 .
same as the one described in section 4.1 while the                    We show that this is not the case. We exploit the
query complexity is one request, given that the attacker           algebraic structure of the PRNGs used in order to re-
spawned a fresh process (which itself requires only a              cover their internal state after a sufficient number of
handful of requests).                                              past outputs (leaks) have been observed by the attacker.
                                                                   Any such attack has to overcome two challenges. First,
                                                                   web applications usually need only a small range of
4.3    Recovering the Seed from Applica-
       tion leaks                                                     3 The suhosin patch installed in some Unix operating systems

                                                                   by default does not include the randomness patches, rather than it
In contrast to the technique presented in the previous             mainly offers protection from memory corruption attacks. The full
section, the attack presented here recovers the seed of            extension is usually installed separately from the PHP packages.

                                                               6
lowing range for possible values for n:

                                                                         (l − a) · M       (l − a + 1) · M
                                                                     b               c≤n≤b                 c−1
                                                                          b−a+1               b−a+1
                                                                  Therefore, given a bucket number l we are able to
                                                               find an upper and lower bound for the original number
Figure 4: Mapping a random number n ∈ [M] to 7                 denoted respectively by Ll and Ul . In order to recover
buckets and the respective bits of n that are revealed         a part of the original number n one can simply find the
given each bucket.                                             number of most significant bits of Ll and Ul that are
                                                               equal and observe that these bits would be the same
                                                               also in the number n. Therefore, given a bucket l we
                                                               can compare the MSBs of both numbers and set the
random numbers, for example to sample a random en-             MSBs of n to the largest sequence of common most
try from an array. To achieve that, the PHP system             significant bits of Ll ,Ul .
maps the output of the PRNG to the given range, an                Notice that in some cases even the most signifi-
action that may break the linearity of the generators.         cant bit of the two numbers are different, thus we
Second, in order to collect the necessary leaks the at-        are be unable to infer any bit of the original number
tacker may need to reconnect to the same process many          n with absolute certainty. For example, in Figure 4
times to collect the leaks from the same generator in-         given that a number falls in bucket 3 we have that
stance. Since, there could be many PHP processes run-          920350134 ≤ n ≤ 1227133512. Because 920350134 <
ning in the system, this poses another challenge for the       230 and 1227133512 > 230 we are unable to infer any
attacker.                                                      bit of the original number n.
   In this section we present state recovery algo-                Another important observation is that this specific
rithms for the truncated PRNG functions rand() and             truncation algorithm allows the recovery of a fragment
mt rand(). The algorithm for the latter function               of the MSBs of the original number. Therefore, in the
is novel, while regarding the former we implement              following sections we will assume that the truncation
and evaluate the Håstad-Shamir cryptanalytic frame-           occurs in the MSBs and we will describe our algo-
work [8] for truncated linearly related variables. We          rithms based on MSB truncated numbers. However, all
begin by discussing the way truncation takes place in          algorithms described work for any kind of truncation.
the PHP system. Afterwards, we tackle the problem of
reconnecting into the same server process. Finally we
present the two algorithms against the generators.             5.2   Process distinguisher
                                                               As we mentioned in section 2.1, if one wants to receive
                                                               a number of leaks from the same PHP process one can
5.1   Truncating PRNG sequences in the
                                                               use keep-alive requests. However, there is an upper
      PHP system                                               bound that limits the number of such requests (by de-
As mentioned in section 2.2 the rand() and                     fault 100). Therefore, if the attacker needs to observe
mt rand() functions can map their output to a user             more outputs beyond the keep-alive limit the connec-
defined range. This has the effect of truncating the           tion will drop and when the attacker connects back to
functions’ output. Here we discuss the process of trun-        the server he may be served from a different process
cating the output and its implications for the attacker.       with a different internal state. Therefore, in order to
   Let n ∈ [M] = {0, . . . , M − 1} be a random number         apply state recovery attacks (which typically require
generated by rand() or mt rand(), where M = 231                more than 100 requests), we must be able to submit
in the PHP system. In order to map that number in the          all the necessary requests to the same process. In this
range [a, b] where a < b the PHP system maps n to a            section, we will describe a generic technique that finds
number l ∈ [a, b] in the following way:                        the same process over and over using the PHP session
                                                               leaks described in section 4.1.
                         n · (b − a + 1)                           While we cannot avoid disconnecting from a pro-
                l = a+                                         cess after we have submitted the maximum number of
                                M
                                                               keep-alive requests, we can start reconnecting back to
   We can view the process above as a mapping from             the server until we hit the process we were connected
the set of numbers in the range [M] to b − a + 1 “buck-        before and continue to submit requests. The problem
ets.” Our goal is to recover as many bits as possible          in applying this approach is that it is not apparent to
of the original number n. Observe that given l it is           distinguish whether the process we are currently con-
possible to recover immediately up to blog(b − a + 1)c         nected to is the one that was serving us in the previ-
most significant bits (MSB) of the original number n           ous connection. To distinguish between different pro-
as follows:                                                    cesses, we can use the preimage from a session iden-
   Given that n belongs to bucket l we obtain the fol-         tifier. Recall that the session id contains a value from

                                                           7
the php combined lcg() function, which in turn uses               the first place. Nevertheless, techniques exist [16] to
process specific state variables. Thus, if the session            remotely determine the traffic of a server and thus al-
is produced from the same process as before then the              low the attacker to find an appropriate time window
php combined lcg() will contain the next state from               within which he will attempt this attack.
the one it was before. This gives us a way to find the              Based on the above, in the following sections we
correct process among all the server processes running            will assume that the attacker is able to collect the nec-
in the server. In summary the algorithm will proceed              essary number of leaks from the targeted function.
as follows:
  1. The attacker obtains a session identifier and a              5.3     State recovery for mt rand()
     preimage for that id using the techniques dis-               The mt rand() function uses the Mersenne Twister
     cussed in section 4.1.                                       generator in order to produce its output. In this section
  2. The attacker submits the necessary requests to ob-           we give a description of the Mersenne Twister genera-
     tain leaks from the PRNG, using the keep-alive               tor and present an algorithm that allows the recovery of
     HTTP header until the maximum number of re-                  the internal state of the generator even when the output
     quests is reached.                                           is truncated. Our algorithm also works in the presence
                                                                  of non consecutive outputs as in the case resulting from
  3. The attacker initiates connections to the server re-         the buckets truncation algorithm of the PHP system (cf.
     questing session identifiers. He attempts to ob-             section 5.1).
     tain a preimage for every session identifier using
     the next value of the php combined lcg() from                Mersenne Twister. Mersenne Twister, and specifi-
     the one used before or, if the server has high traf-         cally the widely used MT19937 variant, is a linear
     fic, the next few iterations. If a preimage is ob-           PRNG with a 624 32-bit word state. The MT algo-
     tained the attacker repeats step 2, until all neces-         rithm is based on the following recursion: for all k,
     sary leaks are obtained.
   Notice that obtaining a preimage after disconnect-             xk+n = xk+m ⊕((xk ∧0x80000000)|(xk+1 ∧0x7fffffff))A
ing requires to bruteforce a maximum number of 20
                                                                  where n = 624 and m = 397. The logical AND oper-
bits (the microseconds), and thus testing for the cor-
                                                                  ation with 0x80000000 discards all but the most sig-
rect session id is an efficient procedure. Even if the
                                                                  nificant bit of xk while the logical AND with 0x7fffffff
application is not using PHP sessions, or if a preimage
                                                                  discards only the MSB of xk+1 . A is a 32 × 32 ma-
cannot be obtained, there are other, application spe-
                                                                  trix for which multiplication by a vector x is defined as
cific, techniques in order to find the correct process.
                                                                  follows:
                                                                                                      if x31 = 0
                                                                                    
A generic technique for Windows. In the case of                                       (x  1)
                                                                             xA =
Windows systems the attacker can employ another                                       (x  1) ⊕ a if x31 = 1
technique to collect the necessary leaks from the same
process in case the server has low traffic. In unix               Here a = (a0 , a1 , ..., a31 ) = 0x9908B0DF is a constant
servers with apache preforked server + mod php all                32-bit vector (note that we use x31 to denote the LSB
idle processes are in a queue waiting to handle an in-            of a vector x). The output of this recurrence is finally
coming client. The first process in the queue handles             multiplied by a 32 × 32 non singular matrix T , called
a client and then the process goes to the back of the             the tempering matrix, in order to produce the final out-
queue. Thus, if an attacker wants to reconnect to the             put z = xT .
same process without using some process distinguisher
he will need to know exactly the number of processes              State recovery. Since the tempering matrix T is non
in the system and if there are any intermediate requests          singular, given 624 outputs of the MT generator one
by other clients while the attacker tries to reconnect to         can easily compute the original state by multiplying
the same process. However, in Windows prethreaded                 the output z with the inverse matrix T −1 thus obtain-
server with mod php things are slightly better for an             ing the state variable used as xi = zi T −1 . After recov-
attacker. Threads are in a priority queue and when a              ering 624 state variables one can predict all future it-
thread in the first place of the queue handles a request          erations. However, when the output of the generator is
from a client it returns again in that first place and han-       truncated, predicting future iterations is not as straight-
dles the first subsequent incoming request. Thus, an              forward as before because it is not possible to locally
attacker which manages to connect to that first thread            recover all needed bits of the state variables given the
of the server, can rapidly close and reopen the connec-           truncated output.
tions thus leaving a very small window in which that                 The key observation in recovering the internal state
thread could be occupied by another client. Of course,            is that due to the fact that the generator is in GF(2) the
in high traffic servers the attacker would have a diffi-          truncation does not introduce non linearity even though
culty connecting in a time when the server is idle in             there are missing bits from the respective equations.

                                                              8
Thus, we can express the output of the generator as a                         or application behavior, linear dependencies may arise
set of linear equations in GF(2) which, when solved,                          between the resulting equations and therefore we may
yield the initial state that produced the observed se-                        need a larger number of observed outputs.
quence. From the basic recurrence of MT we can de-                               Because we cannot know in advance when the sys-
rive the following equations for each individual bit:                         tem will become solvable or the equations that will be
                                                                              included, we employ an online version of Gaussian
Lemma 5.1. Let x0 , x1 , . . . be an MT sequence and j >                      elimination in order to form and solve the resulting
0. Then the following equations hold for any k ≥ 0:                           system. In this way, the attacker can begin collecting
                                                                              leaks and gradually feed them to our Gaussian solver
  1. x0jn+k = x(0j−1)n+k+m ⊕ (x(31j−1)n+k+1 ∧ a0 )
                                                                              until he is notified that a sufficient number of indepen-
  2. x1jn+k = x(1j−1)n+k+m ⊕ x(0j−1)n+k ⊕ (x(31j−1)n+k+1 ∧                    dent equations have been collected. Note that regular
                                                                              Gaussian elimination uses both elementary row and el-
      a1 )                                                                    ementary column operations. However, because we do
  3. ∀i, 2 ≤ i ≤ 31 : xijn+k = x(i j−1)n+k+m ⊕                                not have in advance the entire linear system we cannot
                                                                              use elementary column operations. Instead we make
      x(i−1        ⊕ (x(31j−1)n+k+1 ∧ ai )
         j−1)n+k+1                                                            Gaussian elimination using only elementary row oper-
                                                                              ations and utilize a bookkeeping system to enter equa-
Proof. The equations follow directly from the basic re-
                                                                              tions in their place as they are produced by the leaks
currence.
                                                                              supplied to the solver. Our solver employs a sparse
   In addition since the tempering matrix is only a lin-                      vector representation and is capable of solving overde-
ear transformation of the bits of the state variable xi ,                     termined sparse systems of tens of thousands of equa-
we can similarly express each bit of the final output of                      tions in a few minutes.
MT as a linear equation of the bits of the respective                            We ran a sequence of experiments to determine the
state variable.                                                               solvability of the system when a different number of
   To recover the initial state of MT, we generate all                        bits is truncated from the output. In addition we ran
equations over the state bit variables x0 , x1 , . . . , x19936 .             experiments when the outputs of the MT generator is
To map any position in the MT sequence in an equation                         passed through the PHP truncation algorithm, with dif-
over this set of variables, we apply the equations of the                     ferent user defined ranges. All experiments were con-
lemma above recursively until all variables in the right                      ducted in a 4 × 2.3 GHz machine with 4 GB of RAM.
hand side have index below 19937.                                                In Figure 5 we present the number of equations
   Depending on the positions observed in the MT se-                          needed when the PHP truncation algorithm is used.
quence the resulting linear system will be different.                         In the x-axis we have the logarithm of the number
The question that remains is whether that system is                           of buckets. We also show the standard deviation ap-
solvable. Regarding the case of the 31-bit truncation,                        pearing as vertical bars. It can be seen that the num-
i.e. only the MSB of the output word is revealed, we                          ber of equations needed is much higher than the the-
can use known properties of the generator in order to                         oretical lower bound of 19937 and fluctuates between
easily prove the following:                                                   27000 and 33000. Neverthless, the number of leaks re-
                                                                              quired is decreasing linearly to the number of buckets
Lemma 5.2. Suppose we obtain the MSB of 19937
                                                                              we have. The reason is that although we have more lin-
consecutive words from the MT generator. Then the
                                                                              early dependend equations, the total number of equa-
resulting linear system is uniquely solvable.
                                                                              tions we obtain due to the larger number of buckets is
Proof. It is known that the MT sequence is 19937-                             bigger.
distributed to 1-bit accuracy4 . The linear system is
uniquely solvable iff the rows are linearly independent.                      Implementation error in the PHP system. The
Suppose that a set k ≤ 19937 of rows are lineary de-                          PHP system up to current version, 5.3.10, has an error
pendent. Then the last row of the set k obtained is                           in the implementation of the Mersenne Twister gen-
computable from the other members of the k-set some-                          erator (we discovered this during the testing of our
thing that contradicts the order of equidistribution of                       solver). Specifically the following basic recurrence is
MT.                                                                           effectively used in the PHP system due to a program-
   The above result is optimal in the sense that this is                      ming error:
the minimum number of observed outputs needed for
the system to become fully determined. In the case                            xk+n = xk+m ⊕ ((xk ∧ 0x80000000)|(xk+1 ∧ 0x7ffffffe)|(xk ∧ 0x1))A
we obtain non consecutive outputs due to truncation
                                                                               As a result the PHP system uses a different generator
   4 Suppose   that a sequence is k-distributed to u-bit accuracy. Then       which, as it turns out, has slightly more linear depen-
knowledge of the u most significant bits of l words does not allow
one to make any prediction for the u bits of the next word when l < k.
                                                                              dencies than the MT generator. This means that prob-
This is the cryptographic interpretation of the “order of equidistribu-       ably the randomness properties of the PHP generator
tion” whose exact definition can be found in [15].                            are poorer compared to the original MT generator.

                                                                          9
Figure 5: Solving MT; y-axis:number of equations; x-axis: number of buckets (logarithm). Standard deviation
shown as vertical bars.

5.4    State recovery for rand()                                is a non-linear generator. Nevertheless, the non linear-
                                                                ity introduced by the truncation of the LSB is negligi-
We turn now to the problem of recovering the state
                                                                ble and one can easily recover the initial values given
of rand() given a sequence of leaks from this gen-
                                                                enough outputs of the generator.
erator. While mt rand() is implemented within the
PHP source code and thus is unchanged across differ-
ent enviroments, the rand() function uses the respec-           State recovery. Notice that if the generators used
tive function defined from the standard library of the          have a small state such as the Windows LCGs then
operating system. This results in different implemen-           state recovery is easy, by applying the attack from sec-
tations across different operating systems. There are           tion 4 to bruteforce the entire state of the generator.
mainly two different implementations of rand() one              However, on the Glibc generator, which has a state of
from the glibc and one from the Windows library.                992 bits, these attacks are infeasible assuming that the
                                                                state is random. Although LCGs and the Glibc gener-
                                                                ators are different, they both fall into the same crypt-
Windows rand(). The rand() function defined in
                                                                analytic framework introduced by Håstad and Shamir
Windows is a Linear Congruential Generator (LCG).
                                                                in 1985 for recovering values of truncated linear vari-
An LCG is defined by a recurrence of the form
                                                                ables. This framework allows one to uniquely solve
               Xn+1 = (aXn + c) mod m                           an underdefined system of linear equations when the
                                                                values of the variables are partially known. In this sec-
Although LCGs are fast and require a small memory               tion we will discuss our experiences with applying this
footprint there are many problems which make them               technique in the two aforementioned generators: The
insufficient for many uses, including of course cryp-           LCG and the additive-feedback generator of glibc. We
tographic purposes. The parameters used by the Win-             will briefly describe the algorithm for recovering the
dows LCG are a = 214013, c = 2531011, m = 232 . In              truncated variables in order to discuss our experiments
addition, the output is truncated by default and only           and results. The interested reader can find more infor-
the top 15 bits are returned. If PHP is running in a            mation about the algorithm in the original paper [8].
threaded server in Windows then the parameters of the              Suppose we are given a system with l linear equa-
LCG used are a = 1103515245, c = 12345, m = 215 .               tions on k variables modulo m denoted by x1 , x2 , . . . , xk ,

                                                                          a11 x1 + a12 x2 + · · · + a1k xk = 0 mod m
Glibc rand(). In the past, glibc also used an LCG for
the rand() function. Subsequently an LFSR-like “ad-                       a21 x1 + a22 x2 + · · · + a2k xk = 0 mod m
ditive feedback” design was adopted. The generator
has a state of 31 words (of 32 bits each), over which it                                     ...
is defined by the following recurrence:                                   al1 x1 + al2 x2 + · · · + alk xk = 0 mod m

              ri = (ri−3 + ri−31 ) mod 232                         where l < k and each variable xi is partially known.
                                                                We want to solve the system uniquely by utilizing the
In addition the LSB of each word is discarded and the           partial information of the k variables xi .
output returned to the user is oi = ri  1. An interest-           We use the coefficients of the l equations to create a
ing note is that the man page of rand() states that rand        set of l vectors, where each vector is of the form vi =

                                                           10
(a1 , . . . , ak ). In addition we add to this set the k vectors        against truncated LCG’s with secret parameters.
m · ei , 0 < i ≤ k. The cryptanalytic framework exploits                   Applying the attack in the glibc additive feedback
properties of the lattice L that is defined as the linear               generator we found that the LLL algorithm became a
span of these vectors. Observe that the dimension of L                  bottleneck in the algorithm running time; due to its
is k and in addition for every vector v ∈ L we have that                large state the algorithm required a large number of
∑ki=1 vi xi = 0 mod m.                                                  leaks to recover even small truncation levels there-
   Given the above the attack works as follows: first a                 fore increasing the lattice dimension that was given
lattice is defined using the recurrence that defines the                to the LLL algorithm. Our testing system (a 3.2GHz
linear generator; then, a lattice basis reduction algo-                 cpu with 2GB memory) ran out of memory when 7
rithm is employed to create a set of linearly indepen-                  bits were truncated. The version of LLL we em-
dent equations modulo m with small coefficients; fi-                    ployed (SageMath 4.8) has time complexity O(k5 )
nally, using the partially known values for each vari-                  where k is the dimension of the lattice (which repre-
able, we convert this set of equations to equations over                sents roughly the number of leaks). The best time-
the integers which can be solved uniquely. Specifi-                     complexity known is O(k3 log k) derived from [12];
cally, we use the LLL [13] algorithm in order to obtain                 this may enable much higher truncation levels to be re-
a reduced basis B for the lattice L. Now because B =                    covered for the glibc generator, however we were not
{w j } is a basis, the vectors of B are linearly indepen-               able to test this experimentally as no implementation
dent. The key observation is that the lattice definition                of this algorithm is publicly available.
implies that w j ·x = w j ·(xunknown +xknown ) = d j ·m for
                                                                            We conclude that truncated LCG type of generators
some unknown d j . Now as long as xunknown · w j < m/2
                                                                        can be broken (in the sense of entirely recovering their
(this is the critical condition for solvability) we can
                                                                        internal state) for all but extremely high levels of trun-
solve for d j and hence recover k equations for xunknown
                                                                        cation (e.g. in the case of 32-bit state LCG’s mod-
which will uniquely determine it.
                                                                        ulo 232 when they are truncated to 16 buckets or less).
   The original paper provided a relation between the                   For additive feedback type of generators, such as the
size of xknown and the number of leaks required from                    one in glibc, the situation is similar, however higher
the generator so that the upper bound of m/2 is ensured                 recursion depths require more leaks (with a linear re-
given the level of basis reduction achieved by LLL. In                  lationship) that in turn affect the lattice dimension re-
the case of LCGs the paper demanded the modulo m to                     sulting in longer running times. Comparing the results
be squarefree. However, as shown above, in the gen-                     between the LCGs and the additive feedback genera-
erators used it holds that m = 232 and thus their argu-                 tors one may find some justification for the adoption
ments do not apply. In addition, the lattice of the addi-               of the latter in recent versions of glibc : it appears that
tive generator of glibc is different than the one gener-                - at least as far as lattice-based attacks are concerned -
ated by an LCG and thus needs a different analysis.                     it is harder to predict truncated glibc sequences (com-
   We conducted a thorough experimental analysis of                     pared to say, Windows LCG’s) due to the higher run-
the framework focusing on the two types of generators                   ning times of LLL reduction (note though that this does
above. In each case we tested the maximum possible                      not mean that these are cryptographically secure).
value of xknown to see if the m/2 bound holds for the
reduced LLL basis. In the following paragraphs we
will briefly discuss the results of these experiments for
these types of generators.
                                                                        6     Experimental results and Case studies
   In Figure 6 we show the relationship between the
number of leaks required for recovering the state with
the lattice-attack and the number of leaks that are trun-               In order to evaluate the impact of our attacks on real
cated for four LCGs: the Windows LCG, the glibc                         applications we conducted an audit to the password
LCG (which are both 32 bits), the Visual Basic LCG                      reset function implementations of popular PHP appli-
(which is 24 bits) and an LCG used in the MMIX of                       cations. Figure 7 shows the results from our audit.
Knuth (which is 64 bits). It is seen that the number                    In each case succesfully exploiting the application re-
of leaks required is very small but increases sharply as                sulted in takeover of arbitrary user accounts5 and in
more bits are truncated. In all cases the attack stops                  some cases, when the administrator interface was af-
being useful once the number of truncated bits leaves                   fected, of the entire application. In addition to iden-
none but the log w − 1 most significant bits where w is                 tifiying these vulnerabilities we wrote sample exploits
the size of the LCG state. The logarithm barrier seems                  for some types of attack we presented, each on one af-
to be uniformly present and hints that the MSB’s of                     fected application.
a truncated LCG sequence may be hard to predict (at
least using the techniques considered here). A similar                      5 The only exception to that is the HotCRP application where
logarithmic barrier was also found in the experimental                  passwords were stored in cleartext thus there was no password reset
analysis that was conducted by Contini and Shparlin-                    functionality. However, in this case we were able to spoof registra-
ski [3] when they were investigating Stern’s attack [17]                tions for arbitrary email accounts.

                                                                   11
Figure 6: Solving LCGs with LLL; y-axis:number of leaks; x-axis: number of bits truncated.

               Application                 Attack                 Application              Attack
                mediawiki          4.2     4.3    5.3      •        Joomla         4.3                     •
               Open eClass         4.2     4.3    5.4      •        MyBB          ATSc     4.1c     5.3c   ◦
                taskfreak          4.2     4.3    5.3      •       IpBoard        ATSc     4.1c     4.2c   •
                 zen-cart         ATS      RT              •       phorum          4.2     4.3      5.3    •
             osCommerce 2.x       ATS      RT              •       HotCRP          4.2     4.3      5.3    •
             osCommerce 3.x        4.2     4.3    5.4      •        gazelle        4.3     5.3             •
                   elgg           ATSc     4.2    4.3      •       tikiWiki        4.2     4.3      5.4    •
                 Gallery          RTc      4.1c 4.2c       •         SMF          ATSc     4.3c            ◦

Figure 7: Summary of audit results. The c superscript denotes that the attack need to be used in combination with
other attacks with the same superscript. The • denotes a full attack while ◦ denotes a weakness for which the
practical exploitation is either unverified or requires very specific configurations. The number denotes the section
in which the applied attack is described in the paper.

6.1   Selected Audit Results                                    namely a time measurement from uniqid(), an out-
                                                                put from the MT generator and an output from the
Many applications we audited where trivially vulnera-
                                                                php combined lcg() through the extra argument in
ble to our attacks since they used the affected PRNG
                                                                the uniqid() function. In addition the output is
functions in a straightforward manner, thus making it
                                                                passed through the MD5 hash function so its infeasi-
pretty easy for an attacker to apply our techniques and
                                                                ble to recover the initial values given the output of this
exploit them. However some applications attempted
                                                                function. Since we do not have access to the output
to defend against randomness attacks by creating cus-
                                                                of the function, the state reconstruction attack seems
tom token generators. We will describe some attacks
                                                                an appropriate choice for attacking this token gener-
that resulted from using our framework against custom
                                                                ation algorithm. Indeed, the Gallery application uses
generators.
                                                                PHP sessions thus an attacker can use them to predict
                                                                the php combined lcg() and mt rand() outputs. In
Gallery. PHP Gallery is a very popular web based                addition by utilizing the request twins technique from
photo album organizer. In order for a user to reset his         section 3 the attacker can further reduce the search
password he has to click to a link, which contains the          space he has to cover to a few thousand requests.
security token. The function that generates the token is
the following:
function hash($entropy="") {                      Joomla. Joomla is one of the most popular CMS ap-
  return md5($entropy . uniqid(mt_rand(), true)); plications, and it also have a long history of weak-
}                                                 nesses in its generation of password reset tokens [4,
                                                                11]. Until recently, the code for the random token gen-
The token is generated using three entropy sources,             eration was the following:

                                                           12
function genRandomPassword( $length=8 ) {                       with a newly initialized MT generator then there at
  $salt = abc...xyzABC...XYZ0123456789 ;                        most 232 possible tokens, independently of the entropy
  $len = strlen ( $salt );                                      of config.secret. This gives an interesting attack
  $makepass = ‘‘’’;                                             vector: We generate two tokens from a fresh process
- $stat = @stat ( FILE ) ;                                      sequentially for a user account that we control. Then
- if (empty($stat) || !isarray($stat))
                                                                we start to connect to a fresh process and request a to-
-    $stat=array(phpuname());
- mt_srand(crc32(microtime().implode(|,$stat)));
                                                                ken for our account. If the token matches the token
  for($i=0;$i
is infeasible. The normal action would be to follow              zen-cart first seeds the mt rand() generator with the
the same path as we did in the Gallery application               microtime() function and then uses the mt rand()
where we had a similar problem and utilize the seed              function to produce a new password for the user. Thus,
reconstruction attack which does not require an output           there at most 106 possible passwords which could be
of the PRNGs. However, the Gazelle application uses              produced. Our exploit used the request twins tech-
custom sessions (which are generated using the same              nique to reset both our password and the target user’s
function), and thus we cannot apply that attack either.          password. Afterwards, we bruteforced the generated
The solution lies into slightly mofiying the seed recov-         password for our account to recover the microtime()
ery attack. Instead of asking the question “which seed           value that produced it. This takes at most a few sec-
produces this mt rand() sequence”, which is shuffled             onds on any modern laptop. Then, our exploit brute-
and thus affected by the second PRNG, we instead ask             forces the passwords generated by microtime() val-
which seed produces the unsorted set which contains              ues close to the one that generated our own new pass-
the characters of our string. This set is not affected by        word. We ran our exploit in a network with RTT
the shuffling and thus we can effectively bruteforce             around 9 ms, and Zen-Cart was installed in a 4 × 2.3
the mt rand() seed independently. After recovering               GHz server. The average difference of the two pass-
the mt rand() seed we know the initial sequence that             words was about 3600 microseconds, and the exploit
was produced and we can subsequently recover the                 needed at most two times that requests since we don’t
seed of rand() using the same attack.                            know which password was produced first. With the
                                                                 rate of 2500 requests per minute that our implementa-
                                                                 tion achieves, the attack is completed in a few minutes.
6.2    Attacks Implementation
                                                                 Phorum. Phorum is a classic bulletin board applica-
In addition to auditing the applications, we imple-              tion. It was used, among others, by the eStream com-
mented a number of our attacks targeting selected ap-            petition as an online discussion platform. In order for
plications. In particular, we implemented a seed re-             a user to reset his password the following function is
covery attack against Mediawiki, a state reconstruction          used:
attack against the Phorum application and the request            function phorum_gen_password($charpart=4, $numpart=3)
twins technique against Zen-cart. In the following sec-          {
tions we will briefly describe each vulnerability and                $vowels = ... //[char array];
the results of our attacks implementation.                           $cons = ... //[char array];
Mediawiki. Mediawiki is a very popular wiki appli-                   $num_vowels = count($vowels);
cation used, among others, by Wikipedia. Mediawiki                   $num_cons = count($cons);
uses mt rand() in order to generate a new password                   $password="";
when the user requests a password reset. In order to
predict the generated password we use the seed recov-                for($i = 0; $i < $charpart; $i++){
ery attack of section 4.3. The function f that we sam-               $password .= $cons[mt_rand(0, $num_cons - 1)]
ple is the one used to generate a CSRF token which is                          . $vowels[mt_rand(0, $num_vowels - 1)];
the following:                                                       }
                                                                     $password = substr($password, 0, $charpart);
 function generateToken( $salt = ’’ ) {                              if($numpart){
   $token = dechex(mt_rand()).dechex(mt_rand());                         $max=(int)str_pad("", $numpart, "9");
   return md5( $token . $salt );                                         $min=(int)str_pad("1", $numpart, "0");
 }                                                                       $num=(string)mt_rand($min, $max);
                                                                     }
   Our function f given a seed s first seeds the                     return strtolower($password.$num);
mt rand() generator and then uses that generator to              }
produce a token as the function above. To fully eval-
uate the practicality of the attack we implemented the              What makes this function interesting in the context
attack online, without any time-space tradeoff. Our im-          of state recovery is that at if called with no arguments
plementation was able to cover around 1300000 seed               (as it is in the application), at least four mt rand()
evaluations of f per second in a dual-core laptop with           leaks are discarded in each call. We implemented
two 2.3 GHz processors. This allowed us to cover the             the attack having the application installed in a Win-
full 232 range in about 70 minutes. Of course, using             dows server with the Apache web server and we used
a time-space tradeoff the search time could be further           our generic technique for Windows in order to recon-
reduced to a few minutes.                                        nect to the same process. On average, the attack re-
                                                                 quired around 1100 requests and 11 reconnections of
Zen cart. Zen-Cart is a popular eCommerce applica-
                                                                 our client. The running time was about 30 minutes, and
tion. At the time of this writing, a sample database
                                                                 the main source of overhead was the system solving.
which shops enter volunterily numbers about 2500 ac-
                                                                 This fact is mainly explained from the small number
tive e-shops 6 . In order to reset a user’s password
                                                                 of buckets and the lost leaks of each iteration. Nev-
  6 www.zen-cart.com/index.php?main_page=showcase                erthless, the attack remained highly practical, as we

                                                            14
You can also read