Spectral properties of the Google matrix of the World Wide Web and other directed networks

Page created by Tyrone Casey
 
CONTINUE READING
PHYSICAL REVIEW E 81, 056109 共2010兲

  Spectral properties of the Google matrix of the World Wide Web and other directed networks

                                  Bertrand Georgeot, Olivier Giraud,* and Dima L. Shepelyansky
                 Laboratoire de Physique Théorique (IRSAMC), Université de Toulouse–UPS, F-31062 Toulouse, France
                                        and LPT (IRSAMC), CNRS, F-31062 Toulouse, France
                                        共Received 17 February 2010; published 25 May 2010兲
                 We study numerically the spectrum and eigenstate properties of the Google matrix of various examples of
              directed networks such as vocabulary networks of dictionaries and university World Wide Web networks. The
              spectra have gapless structure in the vicinity of the maximal eigenvalue for Google damping parameter ␣ equal
              to unity. The vocabulary networks have relatively homogeneous spectral density, while university networks
              have pronounced spectral structures which change from one university to another, reflecting specific properties
              of the networks. We also determine specific properties of eigenstates of the Google matrix, including the
              PageRank. The fidelity of the PageRank is proposed as a characterization of its stability.

              DOI: 10.1103/PhysRevE.81.056109                     PACS number共s兲: 89.75.Hc, 89.20.Hh, 05.40.Fb, 72.15.Rn

                      I. INTRODUCTION                                  matrix A and repeated applications of G on a random vector
                                                                       converges quickly to the PageRank vector, after 50–100 it-
    The rapid growth of the World Wide Web 共WWW兲 brings                erations for ␣ = 0.85 which is the most commonly used value
the challenge of information retrieval from this enormous              关2兴. The PageRank vector is real non-negative and can be
database which at present contains about 1011 webpages. An             ordered by decreasing values p j, giving the relative impor-
efficient algorithm for ranking webpages was proposed in 关1兴           tance of the node j. It is known that when ␣ varies, all
and is now known as the PageRank algorithm 共PRA兲. This                 eigenvalues evolve as ␣␭i where ␭i are the eigenvalues for
PRA formed the basis of the Google search engine, which is             ␣ = 1 and i = 2 , . . . , N, while the largest eigenvalue ␭1 = 1, as-
used by many internet users in everyday life. The PRA al-              sociated with the PageRank, remains unchanged 关2兴.
lows us to determine efficiently a vector ranking the nodes of             The properties of the PageRank vector for WWW have
a network by order of importance. This PageRank vector is              been extensively studied by the computer science community
obtained as an eigenvector of the Google matrix G built on             and many important properties have been established 关4–8兴.
the basis of the directed links between WWW nodes 共see,                For example, it was shown that p j decreases approximately
e.g., 关2兴兲:                                                            in an algebraic way p j ⬃ 1 / j␤ with the exponent ␤ ⬇ 0.9 关4兴.
                                                                       It is also known that typically for the Google matrix of
                                                                       WWW at ␣ = 1 there are many eigenvalues very close or
                     G = ␣S + 共1 − ␣兲E/N.                       共1兲    equal to ␭ = 1 and that even at finite ␣ ⬍ 1 there are degen-
                                                                       eracies of eigenvalues with ␭ = ␣ 共see, e.g., 关9兴兲.
                                                                           In spite of the important progress obtained during these
Here S is the matrix constructed from the adjacency matrix             investigations of PageRank vectors, the spectrum of the
Aij of the directed links of the network of size N, with Aij           Google matrix G was rarely studied as a whole. Neverthe-
= 1 if there is a link from node j to node i and Aij = 0 other-        less, it is clear that the structure of the network is directly
wise. Namely, Sij = Aij / 兺kAkj if 兺kAkj ⬎ 0, and Sij = 1 / N if all   linked to this spectrum. Eigenvectors other than the PageR-
elements in the column j of A are zero. The last term in Eq.           ank describe the relaxation processes toward the steady state
共1兲 with uniform matrix Eij = 1 describes the probability 1            and also characterize various communities or subsets of the
− ␣ of a random surfer propagating along the network to                network. Even if models of directed networks of small-world
jump randomly to any other node. The matrix G belongs to               type 关10兴 have been analyzed, constructed, and investigated,
the class of Perron-Frobenius operators which describe the             the spectral properties of matrices corresponding to such net-
evolution of classical dynamical systems and Markov chains             works were not so much studied. Generally for a directed
共the description of general properties of such operators can           network the matrix G is nonsymmetric and thus the spectrum
be find, e.g., in 关3兴兲. For 0 ⬍ ␣ ⬍ 1 it has a unique maximal          of eigenvalues is complex. Recently the spectral study of the
eigenvalue at ␭ = 1, separated from the others by a gap of size        Google matrix for the Albert-Barabasi 共AB兲 model 关11兴 and
at least 1 − ␣ 共see, e.g., 关2兴兲. The eigenvector associated to         randomized university WWW networks was performed in
this maximal eigenvalue is the PageRank vector, which can              关12兴. For the AB model the distribution of links is typical of
be viewed as the steady-state distribution for the random              scale-free networks 关10兴. The distribution of links for the
surfer. Usual WWW networks correspond to very sparse                   university network is approximately the same and is not af-
                                                                       fected by the randomization procedure. Indeed, the random-
                                                                       ization procedure corresponds to the one proposed in 关13兴
 *Present address: Laboratoire de Physique Théorique et Modèles        and is performed by taking pairs of links and inverting the
Statistiques, UMR 8626 du CNRS, Université Paris-Sud, Orsay,           initial vertices, keeping unchanged the number of ingoing
France.                                                                and outgoing links for each vertex. It was established that the

1539-3755/2010/81共5兲/056109共8兲                                   056109-1                           ©2010 The American Physical Society
GEORGEOT, GIRAUD, AND SHEPELYANSKY                                                     PHYSICAL REVIEW E 81, 056109 共2010兲

spectra of the AB model and the randomized university net-            A much larger sample of university networks from the
works were quite similar. Both have a large gap between the       same database was actually used, including universities from
largest eigenvalue ␭1 = 1 and the next one with 兩␭2兩 ⬇ 0.5 at     the US, Australia, and New Zealand, in order to ensure that
␣ = 1. This is in contrast with the known property of WWW         the results presented were representative.
where ␭2 is usually very close or equal to unity 关2,9兴. Thus it       As opposed to the full spectrum of the Google matrix, the
appears that the AB model and the randomized scale-free           PageRank can be computed and studied for much larger ma-
networks have a very different spectral structure compared to     trix sizes. In the studies of Sec. IV, we therefore included
real WWW networks. Therefore it is important to study the         additional data from the university networks of Oxford in
spectral properties of examples of real networks 共without         2006 共www.oxford.ac.uk兲 with 173 733 sites and 2 917 014
randomization兲.                                                   links taken from 关14兴, and the network of Notre Dame Uni-
    In this paper, we thus study the spectra of Google matri-     versity from the US taken from the database 关15兴 with
ces for the WWW networks of several universities and show         325 729 sites and 1 497 135 links 共without removing any
that indeed they display very different properties compared       node兲.
to random scale-free networks considered in 关12兴. We also             In addition, we also investigated several vocabulary net-
explore the spectra of a completely different type of real        works constructed from dictionaries; the network data were
network, built from the vocabulary links in various dictionar-    taken from 关16兴.
ies. In addition, we analyze the properties of eigenvectors of        共i兲 Roget dictionary 共1022 vertices and 5075 links兲 关17兴.
the Google matrix for these networks. A special attention is      The 1022 vertices correspond to the categories in the 1879
paid to the PageRank vector and in particular we characterize     edition of Roget’s Thesaurus of English Words and Phrases.
its sensitivity to ␣ through a new quantity, the PageRank         There is a link from category X to category Y if Roget gave
fidelity.                                                         a reference to Y among the words and phrases of X or if the
    The paper is organized as follows. In Sec. II we give the     two categories are related by their positions in the book.
description of the university and vocabulary networks whose           共ii兲 Online Dictionary of Library and Information Science
Google matrices we consider. The properties of spectra and        共ODLIS兲, version December 2000 关18兴 共2909 vertices and
eigenstates are investigated in Sec. III. The fidelity of Pag-    18419 links兲. A link 共X , Y兲 from term X to term Y is created
eRank and its other properties are analyzed in Sec. IV. Sec-      if the term Y is used in the definition of term X.
tion V explores various models of random networks for                 共iii兲 Free On-Line Dictionary of Computing 共FOLDOC兲
which the spectrum can be closer to the one of real networks.     关19兴 共13 356 vertices and 120 238 links兲. A link 共X , Y兲 from
The conclusion is given in Sec. VI.                               term X to term Y is created if the term Y is used in the
                                                                  definition of term X.
                                                                      Distribution of ingoing and outgoing links for the univer-
   II. DESCRIPTION OF NETWORKS OF UNIVERSITY                      sity WWW networks is similar to those of much larger
              WWW AND DICTIONARIES                                WWW networks discussed in 关4,8,10兴. An example is shown
                                                                  in the Appendix for the network of Liverpool John Moores
    In order to study the spectra and eigenvectors of Google      University, together with data from AB models discussed in
matrices of real networks, we numerically explored several        关12兴 共see Fig. 12兲.
systems.
    Our first example consists in the WWW networks of UK
universities, taken from the database 关14兴. The vertices are           III. PROPERTIES OF SPECTRUM AND EIGENSTATES
the HTML pages of the university websites in 2002. The
links correspond to hyperlinks in the pages directing to an-          To study the spectrum of the networks described in the
other webpage. To reduce the size of the matrices in order to     previous section, we construct the Google matrix G associ-
perform exact diagonalization, only webpages with at least        ated to them at ␣ = 1. After that the spectrum ␭i and right
one outlink were considered. There are still dangling nodes,      eigenstates ␺i of G 共satisfying the relation G␺i = ␭i␺i兲 are
despite of this selection, since some sites have outlinks only    computed by direct diagonalization using standard linear al-
to sites with no outlink. We checked on several examples that     gebra 共LAPACK兲 routines. Since G is generally a nonsym-
the general properties of the spectra were not affected by this   metric matrix for our networks, the eigenvalues ␭i are dis-
reduction in size. We present data on the spectra from five       tributed in the complex plane, inside the unit disk 兩␭i兩 ⱕ 1.
universities:                                                         The spectrum for our eight networks is shown in Fig. 1.
    共i兲 University of Wales at Cardiff 共www.uwic.ac.uk兲, with     An important property of these spectra is the presence of
2778 sites and 29 281 links.                                      eigenvalues very close to ␭ = 1 and moreover we find that
    共ii兲 Birmingham City University 共www.uce.ac.uk兲; 10 631       ␭ = 1 eigenvalue has significant degeneracy. It is known that
sites and 82 574 links.                                           such an exact degeneracy is typical for WWW networks 共see,
    共iii兲 Keele University 共Staffordshire兲 共www.keele.ac.uk兲;     e.g., 关8,9兴兲, and the absence of spectral gap has been seen in
11 437 sites and 67 761 links.                                    other real networks 关20兴. In addition to this exact degeneracy,
    共iv兲 Nottingham Trent University 共www.ntu.ac.uk兲;             there are quasidegenerate eigenvalues very close to ␭ = 1. It
12 660 sites and 85 826 links.                                    is important to note that these features are absent in the spec-
    共v兲 Liverpool John Moores University 共www.livjm.ac.uk兲;       tra of random networks studied in 关12兴 based on the AB
13 578 sites and 11 1648 links.                                   model and on the randomization of WWW university

                                                            056109-2
SPECTRAL PROPERTIES OF THE GOOGLE MATRIX OF…                                                    PHYSICAL REVIEW E 81, 056109 共2010兲

             1                  A                   B                                      1
           0.5                                                                          0.75

                                                                                 W(γ)
                                                                                         0.5
             0
                                                                                        0.25
          -0.5                                                                             0
            -1                                                                          300
             1                                                                          200

                                                                                 ξ
                                C                   D
           0.5                                                                          100
             0                                                                             00     2.5       5       7.5      10
                                                                                 (a)                        γ
          -0.5
            -1                                                                             1
             1                  E                   F                                   0.75

                                                                                 W(γ)
           0.5                                                                           0.5
             0                                                                          0.25
                                                                                           0
          -0.5                                                                          300
            -1                                                                          200

                                                                                 ξ
             1                  G                  H                                    100
           0.5                                                                             00     2.5       5       7.5      10
             0                                                                   (b)                        γ
          -0.5                                                             FIG. 2. 共Color online兲 Top: Roget dictionary, ␣ = 1. Top panel:
            -1                                                         normalized density of states W 共red/gray兲 obtained as a derivative of
                 -1      0          1 -1      0         1              a smoothed version of the integrated density 共smoothed over a small
                                                                       interval ⌬␥ varying with matrix size兲, integrated density is shown in
    FIG. 1. Distribution of eigenvalues ␭i of Google matrices in the   black. Bottom panel: PAR of eigenvectors as a function of ␥; de-
complex plane at ␣ = 1 for dictionary networks: Roget                  generacy of ␭ = 1 is 18 共note that the value W共0兲 corresponds to
共A , N = 1022兲, ODLIS 共B , N = 2909兲, and FOLDOC 共C , N = 13 356兲;     eigenvalues with 兩␭兩 = 1兲. Bottom: ODLIS dictionary, same as top;
university WWW networks: University of Wales 共Cardiff兲                 degeneracy of ␭ = 1 is 4.
共D , N = 2778兲, Birmingham City University 共E , N = 10 631兲, Keele
University 共Staffordshire兲 共F , N = 11 437兲, Nottingham Trent Uni-
versity 共G , N = 12 660兲, Liverpool John Moores University 共H , N      ization of eigenvectors ␺i共j兲, we use the participation ratio
= 13 578兲共data for universities are for 2002兲.                         共PAR兲 defined by ␰ = 共兺 j兩␺i共j兲兩2兲2 / 兺 j兩␺i共j兲兩4. This quantity
                                                                       gives the effective number of vertices of the network con-
networks, where the spectrum is characterized by a large
                                                                       tributing to a given eigenstate ␺i; it is often used in solid-
gap between the first eigenvalue ␭1 = 1 and the second one
                                                                       state systems with disorder to characterize localization prop-
with 兩␭2兩 ⬇ 0.5. For example, the spectrum shown in
                                                                       erties 共see, e.g., 关21兴兲. The dependence of the density of
Fig. 1 panel H corresponds to the same university whose
randomized spectrum was displayed in Fig. 1 共bottom panel兲             states W共␥兲 in ␥, which gives the number of eigenstates in
in 关12兴. Clearly the structure of the spectrum becomes very            the interval 关␥ , ␥ + d␥兴, is shown in Figs. 2–5 共top panels兲.
different after randomization of links. Another property of            The normalization is chosen such that 兰⬁0 W共␥兲d␥ = 1,
the spectra displayed in Fig. 1 that we want to stress is the          corresponding to the total number of eigenvalues N
presence of clearly pronounced structures which are different          共equal to the matrix size兲. We also show the integrated ver-
from one network to another. The structure is less pro-                sion of this quantity in the same panels. In the same figures
nounced in the case of the three spectra obtained from dic-            we show the PAR ␰ of the eigenstates as a function of ␥
tionary networks. In this case, the spectrum is flattened, be-         共bottom panels兲.
ing closer to the real axis. In contrast, for the WWW                      It is clear that for the dictionary networks the density of
university networks, the spectrum is spread out over the unit          states W depends on ␥ in a relatively smooth way, with a
disk. However, there is still a significant fraction of                broad maximum at ␥ ⬇ 1 – 2. The distribution of PAR has
eigenvalues close to the real axis. We understand this feature         also a maximum at approximately the same values. The case
by the existence of a significant number of symmetric ingo-            of the dictionary FOLDOC is a bit special, showing a bimo-
ing and outgoing links 共48% in the case of the Liverpool               dal distribution which is also clearly seen in the dependence
John Moores University network兲, which is larger compared              of ␰ on ␥. This comes from the fact that the distribution of
to the case of randomized university networks considered in            eigenvalues in Fig. 1 共panel C兲 is highly asymmetric with
关12兴.                                                                  respect to the imaginary axis. The latter case has also no
   To characterize the spectrum, we introduce the relaxation           degeneracy at ␭ = 1. In these three networks the density of
rate ␥ defined by the relation 兩␭兩 = exp共−␥ / 2兲. For character-       states decreases for ␥ approaching 0. We note that the

                                                                 056109-3
GEORGEOT, GIRAUD, AND SHEPELYANSKY                                                            PHYSICAL REVIEW E 81, 056109 共2010兲

                     1                                                                   1
                  0.75                                                                0.75
           W(γ)

                                                                               W(γ)
                   0.5                                                                 0.5
                  0.25                                                                0.25
                     0                                                                   0
                 4000                                                                   80
         ξ

                                                                                 ξ
                 2000                                                                  40
                    00    2.5      5       7.5     10                                   00
                                   γ                                                                      5                 10
          (a)                                                                  (a)                        γ
                    1                                                                    1
                 0.75                                                                 0.75
          W(γ)

                                                                               W(γ)
                  0.5                                                                  0.5
                 0.25                                                                 0.25
                    0                                                                    0
                   50                                                                 300
            ξ

                                                                                      200

                                                                               ξ
                   25
                                                                                      100
                    00    2.5      5       7.5     10                                    00
                                   γ                                                            2.5       5       7.5       10
          (b)                                                                  (b)                        γ
   FIG. 3. 共Color online兲 Top: FOLDOC dictionary, same as Fig. 2;      FIG. 5. 共Color online兲 Top: Nottingham Trent University, same
degeneracy of ␭ = 1 is 1; Bottom: University of Wales 共Cardiff兲,    as Fig. 2; degeneracy of ␭ = 1 is 229. Bottom: Liverpool John
same as top; degeneracy of ␭ = 1 is 69.                             Moores University, same as top; degeneracy of ␭ = 1 is 109; other
                                                                    degeneracy peaks correspond to ␭ = 1 / 2 共16兲, ␭ = 1 / 3 共8兲; ␭ = 1 / 4
integrated version of the density of states reaches a plateau       共947兲, ␭ = 1 / 5 共97兲, being located at ␥ = −2 ln ␭; other degeneracies
for ␥ ⱖ 6 – 7. This saturation value is less than 1, meaning        are also present, e.g., ␭ = 1 / 冑2 共41兲.
that a certain nonzero fraction of eigenvalues are extremely
close to ␭ = 0.
                                                                        For the WWW university networks, the density of states is
                    1                                               much more inhomogeneous in ␥. Even if a broad maximum
                 0.75                                               is visible, there are sharp peaks at certain values of ␥. The
          W(γ)

                                                                    sharpest peaks correspond to exact degeneracies at certain
                  0.5
                                                                    complex values of ␭. The degeneracies are especially visible
                 0.25                                               at the real values ␭ = 1 / 2, ␭ = 1 / 3, and other 1 / n with integer
                    0                                               values of n. We attribute this phenomenon to the fact that the
                 300                                                small number of links gives only a small number of different
                 200
          ξ

                                                                    values for the matrix elements of the matrix G. For the uni-
                 100                                                versity networks, the degeneracy at ␭ = 1 is much larger than
                    00                                              in the case of dictionaries. The integrated densities of states
                                   5               10
          (a)                      γ                                show visible vertical jumps which correspond to the degen-
                                                                    eracies; their growth saturates at ␥ ⬇ 7 showing that about
                    1                                               30– 50 % of the eigenvalues are located in the vicinity
                 0.75                                               of ␭ = 0.
          W(γ)

                  0.5                                                   The PAR distribution for the university networks fluctu-
                                                                    ates strongly even if a broad maximum is visible. Typical
                 0.25
                                                                    values have ␰ ⬇ 100, which is small compared to the matrix
                    0                                               size N ⬃ 104. This indicates that the majority of eigenstates
                 150                                                are localized on certain zones of the network. This does not
                 100                                                exclude that certain eigenstates with a larger ␰ will be delo-
          ξ

                   50                                               calized on a large fraction of the network in the limit of very
                    00                                              large N.
                         2.5       5      7.5      10
          (b)                      γ                                    The exact G matrix diagonalization requires significant
                                                                    computer memory and is practically restricted to matrix size
   FIG. 4. 共Color online兲 Top: Birmingham City University, same     N of about N ⬍ 30 000. However, real networks such as
as Fig. 2; degeneracy of ␭ = 1 is 71; Bottom: Keele University      WWW networks can be much larger. It is therefore important
共Staffordshire兲, same as top; degeneracy of ␭ = 1 is 205.           to find numerical approaches in order to obtain the spectrum

                                                              056109-4
SPECTRAL PROPERTIES OF THE GOOGLE MATRIX OF…                                                                     PHYSICAL REVIEW E 81, 056109 共2010兲

      0.8
      0.6
                                                                                                         -1
      0.4
      0.2
                                                                                                         -2

                                                                                                 log10(pi)
        0                                                                                                -3
     -0.2
     -0.4                                                                                                -4
     -0.6
     -0.8
            -1    -0.5   0    0.5       1 -1        -0.5         0   0.5     1                           -5
   FIG. 6. 共Color online兲 Cloud of eigenvalues for Liverpool John
                                                                                                         -6
Moores University, ␣ = 1. Circles: full matrix N = 13 578. Stars:                                            0    1    2    3    4          5
truncated matrix of size 8192 共left兲 and 4096 共right兲.                                                                  log10(i)
                                                                                                 (a)

of large networks using approximate methods. A natural pos-                                              -1
sibility is to order the sites through the PageRank method                                               -2
and to consider the spectrum of the 共properly renormalized兲                                              -3

                                                                                                 log10(pi)
truncated matrix restricted to the sites with PageRank larger                                            -4
than a certain value. In this way, the truncation takes into
account the most important sites of the network. The effect                                              -5
of such a truncation is shown in Fig. 6 for the largest net-                                             -6
work of our sample. The numerical data show that the global                                              -7
features of the spectrum are preserved by moderate trunca-
                                                                                                            0     1    2    3    4          5
tion, but individual eigenvalues deviate from their exact val-                                                          log10(i)
                                                                                                 (b)
ues when more than 50% of sites are truncated. Probably
future developments of this approach are needed in order to                               FIG. 8. 共Color online兲 Some PageRank vectors p j for Notre-
be able to truncate a larger fraction of sites.                                        Dame University 共top panel兲 and Oxford 共bottom panel兲. From top
                                                                                       to bottom at log10共i兲 = 5: ␣ = 0.49 共black兲, 0.59 共red兲, 0.69 共green兲,
                                                                                       0.79 共blue兲, 0.89 共violet兲, and 0.99 共orange兲. Dashed line indicates
      IV. FIDELITY OF PAGERANK AND ITS OTHER                                           the slope −1.
                     PROPERTIES

   In the previous section we studied the properties of the                            know how sensitive the PageRank is with respect to the
full spectrum and all eigenstates of the G matrix for several                          Google parameter 共damping parameter兲 ␣ in Eq. 共1兲. The
real networks. The PageRank is especially important since it                           localization property of the PageRank can be quantified
allows to obtain an efficient classification of the sites of the                       through the PAR ␰ defined above. The dependence of ␰ on ␣
network 关1,2兴. Since the networks usually have small number                            is shown in Fig. 7 for four university WWW networks, in-
of links, it is possible to obtain the PageRank by vector it-                          cluding two from Fig. 1 共panels D and H兲 and two of much
eration for enormously large size of networks as described in                          larger sizes 共Notre Dame and Oxford兲. For ␣ → 0 the PAR
关1,2兴.                                                                                 goes to the matrix size since the G matrix is dominated by
   Due to this significance of the PageRank, it is important                           the second part of Eq. 共1兲. However, in the interval 0.4⬍ ␣
to characterize its properties. In addition, it is important to                        ⬍ 0.9 the dependence on ␣ is rather weak, indicating stability
                                                                                       of the PageRank. For 0.9⬍ ␣ ⬍ 1 the PAR value has a local
                                                                                       maximum where its value can be increased by a factor of
                 40                                                                    2–3. We attribute this effect to the existence of an exact
                                        40
                                        30
                                        20
                                                                                       degeneracy of the eigenvalue ␭ = 1 at ␣ = 1, discussed in the
                                    ξ

                 30                     10
                                         0
                                                                                       previous section. In spite of this interesting behavior of ␰ in
                                             0.95      0.975
                                                        α
                                                                     1                 the vicinity of ␣ = 1, the value of ␰, which gives the effective
                 20                                                                    number of populated sites, remains much smaller than the
            ξ

                                                                                       network size. In other models considered in 关12,22兴, a delo-
                 10
                                                                                       calization of the PageRank was observed for some ␣ values
                                                                                       so that ␰ was growing with system size N. For the WWW
                                                                                       university networks considered here, delocalization is clearly
                  0                                                                    absent 共network sizes in Fig. 7 vary over two order of mag-
                   0.2       0.4             0.6           0.8           1
                                             α
                                                                                       nitudes兲. This is in agreement with the value of the exponent
    FIG. 7. 共Color online兲 PAR ␰ of PageRank as a function of ␣ for                    ␤ ⬇ 0.9 for the PageRank decay, which was found for large
University of Wales 共Cardiff兲 共black/dashed兲, Notre-Dame 共blue,                        samples of WWW data in 关4,8兴. Indeed, for that value of ␤,
dotted兲, Liverpool John Moores University 共red/long dashed兲, and                       PAR should be independent of system size.
Oxford 共green/solid兲 Universities 共curves from top to bottom at ␣                          Our data for PageRank distribution also show its stability
= 0.6兲. Network sizes vary from N = 2778 to N = 325 729. Inset is a                    as a whole for variation in ␣ in the interval 0.4⬍ ␣ ⬍ 0.9, as
zoom for data from Oxford University close to ␣ = 1.                                   it is shown in Fig. 8.

                                                                                 056109-5
GEORGEOT, GIRAUD, AND SHEPELYANSKY                                                                           PHYSICAL REVIEW E 81, 056109 共2010兲

                                 1
                                                                                                0.4
          2                    0.9
            ||
                               0.8                                                              0.2
                               0.7
                                                                                                  0
                               0.6
                               0.5                                                             -0.2
                               0.4
                                                                                               -0.4
                               0.30    0.2       0.4             0.8      1
                                                           0.6                                        -0.2   0    0.2    0.4    0.6   0.8     1
           (a)                                         α
                                                                   1.25                 FIG. 10. Spectrum of eigenvalues ␭ in the complex plane for the
                                  1                                                 Avrachenkov-Lebedev model 关6兴, with N = 211 共network size兲,
                                                                   1                ␣ = 0.85, m = 5 outgoing links per node. Multiplicity of links is taken
                               0.75                                                 into account in the construction of G.
                                                                   0.75
                  α'

                                0.5
                                                                   0.5
                               0.25                                                 of ␤ close to the one of real networks. For example, the
                                                                   0.25             model studied by Avrachenkov and Lebedev 关6兴 allows us to
                                      0.25 0.5 0.75 1                               obtain analytical expressions for ␤ with a value close to the
           (b)                              α                      0
                                                                                    one obtained for WWW. It is interesting to see what are the
                                                                                    spectral properties of this model. In Fig. 10 we show the
    FIG. 9. 共Color online兲 PageRank fidelity f共␣ , ␣⬘兲 for Notre-
Dame University 共N = 325 729兲; top panel: f共␣ , ␣⬘ = 0.85兲
                                                                                    spectrum for this model for ␣ = 0.85. Our data show that this
= 兩具␺共␣兲 兩 ␺共0.85兲典兩2 关see Eq. 共2兲兴; bottom panel: color density plot of            model has an enormous gap, thus being very different from
f共␣ , ␣⬘兲.                                                                          spectra of real networks shown in Fig. 1.
                                                                                       The above results, together with those of 关12兴, show that
                                                                                    many commonly used network models are characterized by a
   The sensitivity of the PageRank with respect to ␣ can be                         large gap between ␭ = 1 and the second eigenvalue, in con-
more precisely characterized through the PageRank fidelity                          trast with real networks. In order to build a network model
defined as                                                                          where this gap is absent, we introduce here what we call the
                                                                                    color model. It is an extension of the AB model that allows
                               f共␣, ␣⬘兲 =   冏兺
                                             j
                                                                  冏
                                                 ␺1共j, ␣兲␺1共j, ␣⬘兲 ,
                                                                      2
                                                                              共2兲   to obtain results for the spectral distribution that are closer to
                                                                                    real networks. We divide the nodes into n sets 共“colors”兲,
                                                                                    allowing n to grow with network size. Each node is labeled
                                                                                    by an integer between 0 and n − 1. At each step, links and
where ␺1共j , ␣兲 is the eigenstate at ␭ = 1 of the Google matrix                     nodes are added as in the AB model but also with probability
G with parameter ␣ in Eq. 共1兲; here the sum over j runs over                        ␩ the new node is introduced with a new color. The only
the network sites 共without PageRank reordering兲. We remind                          links authorized between nodes are links within each set.
that the eigenvector ␺1共j , ␣兲 is normalized by 兺 j␺1共j , ␣兲2                       Such a structure implies that the second eigenvalue of matrix
= 1. Fidelity is often used in the context of quantum chaos
and quantum computing to characterize the sensitivity of
wave functions with respect to a perturbation 关23,24兴. The
variation in this quantity with ␣ and ␣⬘ is shown in Fig. 9.                                    0.4
The fidelity reaches its maximum value f = 1 for ␣ = ␣⬘. Ac-
cording to Fig. 9 共bottom panel兲, the stability plateau where                                   0.2
fidelity remains close to 1, indicating stability of PageRank,
                                                                                                  0
is broadest for ␣ = 0.5. This is in agreement with previous
results presented in 关25兴, where the same optimal value of ␣                                   -0.2
was found based on different arguments.
                                                                                               -0.4
              V. SPECTRUM OF MODEL SYSTEMS
                                                                                                      -0.4 -0.2 0     0.2 0.4 0.6 0.8         1
                                                                                        FIG. 11. Spectrum of eigenvalues ␭ in the complex plane for the
   The results obtained in 关12兴 compared to those presented                         color model, N = 213, p = 0.2, q = 0.1, and ␣ = 0.85. Nodes are divided
in the previous section show that while the spectrum of the                         into n color sets labeled from i = 0 to n − 1; nodes and links are
network has a large gap between ␭ = 1 and the other eigen-                          created according to AB model; only authorized links are links
values, still certain properties of the PageRank can be similar                     within a color set i. This rule is broken with probability ⑀ = 10−3. We
in both cases 共e.g., the exponent ␤兲. In fact the studies per-                      start with three color sets; with probability ␩ a new color is intro-
formed in the computer science community were often based                           duced 共we take ␩ = 10−2兲. In the example displayed, when the num-
on simplified models, which can nevertheless give the value                         ber of nodes reaches N, n = 83 colors.

                                                                              056109-6
SPECTRAL PROPERTIES OF THE GOOGLE MATRIX OF…                                                       PHYSICAL REVIEW E 81, 056109 共2010兲

G is real and exactly equal to ␣ 关26兴. The colors correspond        presented here is a first step in this direction. We note that
to communities in the network, in the sense that each color         Ulam networks built from dynamical maps can capture cer-
represents a set of nodes with links only within the set.           tain properties of real networks in a relatively good manner
    In order to have a more realistic model, we allow for the       关22,28兴. In these latter networks, it is possible to have a
rule for links to be broken with some probability ␧. That is,       delocalization of the PageRank when ␣ or map parameters
at each time step an link between two nodes is chosen at            vary; we did not observe such feature here.
random according to the rules of the AB model. Then if it               Indeed, our data show that the PageRank remains local-
agrees with the color rule above it is used; if it does not then    ized for all values of ␣ ⬎ 0.3. We also showed that the use of
with probability 1 − ␧ it is just omitted, and with probability ␧   the fidelity as a quantity to characterize the stability of Pag-
it is nevertheless added.                                           eRank enables to identify a stability plateau located around
    The spectrum of this color model is shown in Fig. 11 for        ␣ = 0.5.
␣ = 0.85. The second eigenvalue is now exactly at ␭ = 0.85,             We think that future investigation of the spectral proper-
demonstrating the absence of a gap. There is also a set of          ties of the Google matrix will open new access to identifica-
eigenvalues which is located on the real line, but the majority     tion of important communities and their properties which can
of states remains inside a circle 兩␭兩 ⬍ 0.3 as in the AB model.     be hidden in the tail of the PageRank and hardly accessible
    Thus the color model allows to eliminate the gap but still      to classification by the PageRank algorithm. Furthermore,
the distribution of eigenvalues ␭ in the complex plane re-          the degeneracies at various values of ␭ and the characteristic
mains different from the spectra of real networks shown in          patterns directly visible in the spectra of the Google matrix
                                                                    should allow to identify other hidden properties of real net-
Fig. 1: the structures prominent in real networks are not vis-
                                                                    works.
ible, and eigenvalues in the gap are concentrated only on the
real axis or close to it.

                                                                                                   ACKNOWLEDGMENTS
                      VI. CONCLUSION
                                                                       We thank Leonardo Ermann and Klaus Frahm for discus-
    In this work we performed numerical analysis of the spec-       sions and CALcul en MIdi-Pyrénées 共CALMIP兲 for the use
tra and eigenstates of the Google matrix G for several real         of their supercomputers.
networks. The spectra of the analyzed networks have no gap
between first and second eigenvalues, in contrast with com-
monly used scale-free network models 共e.g., AB model兲. The
spectra of university WWW networks are characterized by                                               APPENDIX
complex structures which are different from one university to
another. At the same time, PageRank of these university net-           Here we show in Fig. 12 the distributions of links for the
works look rather similar. In contrast, the Google matrices of      AB model discussed in 关12兴 and for the university WWW
vocabulary networks of dictionaries have spectra with much          network 共panel H of Fig. 1兲.
less structure.
    These studies show that usual models of random scale-
free networks miss many important features of real networks.
                                                                                              0
In particular, they are characterized by a large spectral gap,                               -1
                                                                                log Pc (k)

which is generally absent in real networks. The absence of                                   -2
                                                                               in

this gap has been linked to the presence of community struc-                                 -3
                                                                                             -4
tures in the network 关8,9,20,26兴. In the language of dynami-                                 -5
cal systems, the physical origin of this gap can be related to                                0
the known property of small-world and scale-free networks
                                                                                log Pc (k)

                                                                                             -1
that only logarithmic time 共in system size兲 is needed to go                                  -2
                                                                               out

                                                                                             -3
from any node to any other node 共see, e.g., 关10兴兲. Due to that,                              -4
the relaxation process in such networks is fast and the gap,                                 -5
                                                                                             -6
being inversely proportional to this time, is accordingly very                                 0       1           2          3
                                                                                                           log k
large. In contrast, the presence of weakly coupled communi-
ties in real networks makes the relaxation time very large, at          FIG. 12. 共Color online兲 Cumulative distribution of ingoing links
least for certain configurations. An interesting future investi-        共k兲 共top panel兲 and of outgoing links Pout
                                                                    Pin
                                                                      c                                         c 共k兲 共bottom panel兲 for
gation would be to use algorithms for community detection           the AB model with vector size N = 214, for q = 0.1 共black/solid兲 and
共such as the ones reviewed in 关27兴兲, in order to assess the         q = 0.7 共red/dashed兲, data are averaged over 80 realizations of AB
precise link between the distributions of eigenvalues of the        model, and for the network of Liverpool John Moores University
Google matrix and the community structures for the net-             with N = 13578, 共panel H in Fig. 1兲 共blue/dotted兲. Average number
works considered. It is also desirable to construct new ran-        of ingoing or outgoing links is 具k典 = 6.43 for q = 0.1, 具k典 = 14.98 for
dom scale-free models which could capture in a better way           q = 0.7, 具k典 = 8.2227 for LJMU. Dashed straight line indicates the
the actual properties of real networks. The color model             slope −1. Logarithms are decimal.

                                                              056109-7
GEORGEOT, GIRAUD, AND SHEPELYANSKY                                                           PHYSICAL REVIEW E 81, 056109 共2010兲

 关1兴 S. Brin and L. Page, Comput. Netw. ISDN Syst. 30, 107            关11兴 R. Albert and A.-L. Barabási, Phys. Rev. Lett. 85, 5234
     共1998兲.                                                               共2000兲.
 关2兴 A. M. Langville and C. D. Meyer, Google’s PageRank and           关12兴 O. Giraud, B. Georgeot, and D. L. Shepelyansky, Phys. Rev. E
     Beyond: The Science of Search Engine Rankings 共Princeton               80, 026107 共2009兲.
     University Press, Princeton, 2006兲; D. Austin, AMS Feature       关13兴 S. Maslov and K. Sneppen, Science 296, 910 共2002兲.
     Columns      共2008兲     available    at   http://www.ams.org/    关14兴 Academic       Web      Link    Database     Project     http://
     featurecolumn/archive/pagerank.html                                   cybermetrics.wlv.ac.uk/database/
 关3兴 M. Brin and G. Stuck, Introduction to Dynamical Systems
                                                                      关15兴 A.-L. Barabási webpage at the University of Notre Dame
     共Cambridge University Press, Cambridge, England, 2002兲.
                                                                           http://www.nd.edu/networks/resources.htm
 关4兴 D. Donato, L. Laura, S. Leonardi, and S. Millozzi, Eur. Phys.
                                                                      关16兴 V. Batagelj and A. Mrvar, 共2006兲: Pajek datasets. URL: http://
     J. B 38, 239 共2004兲; G. Pandurangan, P. Raghavan, and E.
                                                                           vlado.fmf.uni-lj.si/pub/networks/data/
     Upfal, Internet Math. 3, 1 共2006兲.
 关5兴 P. Boldi, M. Santini, and S. Vigna, in Proceedings of the 14th   关17兴 Peter Mark Roget: Roget’s Thesaurus of English Words and
     International Conference on World Wide Web, edited by A.              Phrases, http://www.gutenberg.org/etext/22
     Ellis and T. Hagino 共ACM Press, New York, 2005兲, p. 557; S.      关18兴 J. M. Reitz 共2002兲: ODLIS: Online Dictionary of Library and
     Vigna, in Proceedings of the 14th International Conference on         Information Science, http://vax.wcsu.edu/library/odlis.html
     World Wide Web, edited by A. Ellis and T. Hagino 共ACM            关19兴 FOLDOC 共2002兲: Free on-line dictionary of computing, edited
     Press, New York, 2005兲 p. 976.                                        by D. Howe, http://foldoc.org/
 关6兴 K. Avrachenkov and D. Lebedev, Internet Math. 3, 207             关20兴 E. Estrada, Europhys. Lett. 73, 649 共2006兲; Phys. Rev. E 75,
     共2007兲.                                                               016103 共2007兲.
 关7兴 K. Avrachenkov, N. Litvak, and K. S. Pham, in Algorithms and     关21兴 F. Evers and A. D. Mirlin, Rev. Mod. Phys. 80, 1355 共2008兲.
     Models for the Web-Graph: Fifth International Workshop,          关22兴 D. L. Shepelyansky and O. V. Zhirov, Phys. Rev. E 81,
     WAW 2007 San Diego, CA, Proceedings, Lecture Notes in                 036213 共2010兲.
     Computer Science Vol. 4863, edited by A. Bonato and F. R. K.     关23兴 T. Gorin, T. Prosen, T. H. Seligman, and M. Znidaric, Phys.
     Chung 共Springer-Verlag, Berlin, 2007兲, p. 16.                         Rep. 435, 33 共2006兲.
 关8兴 Algorithms and Models for the Web-Graph: Sixth International     关24兴 K. M. Frahm, R. Fleckinger, and D. L. Shepelyansky, Eur.
     Workshop, WAW 2009 Barcelona, Proceedings, Springer-                  Phys. J. D 29, 139 共2004兲.
     Verlag, Berlin, Lecture Notes in Computer Science Vol. 5427,     关25兴 K. Avrachenkov, N. Litvak, and K. S. Pham, Internet Math. 5,
     edited by K. Avrachenkov, D. Donato, and N. Litvak                    47 共2008兲.
     共Springer, Berlin, 2009兲.                                        关26兴 T. Haveliwala and S. Kamvar, Stanford University Technical
 关9兴 S. Serra-Capizzano, SIAM J. Matrix Anal. Appl. 27, 305                Report No. 582 共2003兲.
     共2005兲.                                                          关27兴 S. Fortunato, Phys. Rep. 486, 75 共2010兲.
关10兴 S. N. Dorogovtsev and J. F. F. Mendes, Evolution of Networks     关28兴 L. Ermann and D. L. Shepelyansky, Phys. Rev. E 81, 036221
     共Oxford University Press, Oxford, 2003兲.                              共2010兲.

                                                                056109-8
You can also read