Node metadata can produce predictability transitions in network inference problems

Page created by Leslie Hughes
 
CONTINUE READING
Node metadata can produce predictability transitions in network inference problems

                                                                                   Oscar Fajardo-Fontiveros,1, ∗ Marta Sales-Pardo,1, † and Roger Guimerà2, 1, ‡
                                                                           1
                                                                               Department of Chemical Engineering, Universitat Rovira i Virgili, 43007 Tarragona, Catalonia
                                                                                                        2
                                                                                                          ICREA, 08010 Barcelona, Catalonia
                                                                                                               (Dated: March 29, 2021)
                                                                       Network inference is the process of learning the properties of complex networks from data. Besides using
                                                                    information about known links in the network, node attributes and other forms of network metadata can help
                                                                    to solve network inference problems. Indeed, several approaches have been proposed to introduce metadata
                                                                    into probabilistic network models and to use them to make better inferences. However, we know little about
                                                                    the effect of such metadata in the inference process. Here, we investigate this issue. We find that, rather than
arXiv:2103.14424v1 [physics.data-an] 26 Mar 2021

                                                                    affecting inference gradually, adding metadata causes abrupt transitions in the inference process and in our
                                                                    ability to make accurate predictions, from a situation in which metadata does not play any role to a situation
                                                                    in which metadata completely dominates the inference process. When network data and metadata are partly
                                                                    correlated, metadata optimally contributes to the inference process at the transition between data-dominated and
                                                                    metadata-dominated regimes.

                                                      Many systems can be represented as networks, with nodes                    We find that, contrary to what one may expect, node meta-
                                                   representing units (for example, people in a social network,               data do not affect the inference problem gradually. Rather,
                                                   or proteins in a protein-protein interaction network), and                 even when the weight of metadata increases smoothly, the
                                                   links representing interactions between the units (for exam-               inference process undergoes a transition from a situation in
                                                   ple, friendship relationships or physical binding interactions             which metadata does not play any role, to a situation in which
                                                   between proteins). Network inference is the process of infer-              metadata completely dominates the inference process. When
                                                   ring the properties of those networks from data; typical net-              network data and metadata are partly correlated, metadata op-
                                                   work inference problems include the identification of groups               timally contributes to the inference process at the transition
                                                   of nodes with similar connection patterns, or the identification           between data-dominated and metadata-dominated regimes.
                                                   of unobserved interactions, that is, link prediction [1–6]. Net-
                                                   work inference and, in particular, link prediction are increas-
                                                   ingly important in problems with applications ranging from
                                                                                                                                  I. MULTIPARTIPARTITE MIXED-MEMBERSHIP
                                                   the prediction of interactions between drugs [7–9] to the pre-              STOCHASTIC BLOCK MODELS WITH LABELED LINKS
                                                   diction of human preferences and decisions [10–13].
                                                      Typically, network inference starts from observations of
                                                   some of the links in the network, which are used to predict                   We introduce a very general network model based on
                                                   unobserved links or to infer other network properties. How-                stochastic block models [3, 25, 26] that allows us to deal
                                                   ever, other sources of information such as system dynamics                 with (directed or undirected) unipartite and bipartite networks,
                                                   [14, 15] or node attributes [13, 16–23] can also be used to aid            whose links are binary or labeled, and with node attributes of
                                                   in the inference process. Here we study how node attributes                different types that can be combined as needed (Fig. 1). As
                                                   are introduced in the inference process, and what is the effect            we discuss below, this model extends and generalizes previ-
                                                   of using such metadata.                                                    ous models.
                                                      We present our work in terms of the problem of link pre-                   In what follows we use the terminology of recommender
                                                   diction in recommender systems [11, 12, 24], in which the                  systems [11, 12, 24] although, as previously mentioned, the
                                                   goal is to predict the association between users and items (for            model is completely general and applicable to any type of re-
                                                   example, books or movies). However, our conclusions ap-                    lational data with node attributes. Our objective is to model
                                                   ply to network inference problems in general. We introduce a               a bipartite network with labeled links connecting N users to
                                                   multipartite network model that encompasses and generalizes                M items (for example, movies or books). Links rij repre-
                                                   previous attempts to use node metadata in network inference                sent ratings of users i to items j and are labeled, that is, rij
                                                   problems (Fig. 1). Within this framework, the problem of link              can take values in a finite discrete set such as {like, dislike},
                                                   prediction in general unipartite or bipartite networks is just a           {green, yellow, red}, or {0, 1, . . . , R}. To model these rat-
                                                   particular case. Unlike most previous approaches, our mul-                 ings, we assume that: (i) there are user and item groups, and
                                                   tipartite network model allows us to control the importance                users and items belong to mixtures of such groups; (ii) the
                                                   of the node metadata and thus to investigate when and how                  probability that a user i rates item j with rij depends only of
                                                   metadata helps in the inference.                                           the groups to which they belong.
                                                                                                                                 These assumptions lead to a bipartite [10, 11, 27] mixed-
                                                                                                                              membership [28] stochastic block model [12] in which the
                                                                                                                              probability that user i gives item j a rating r is
                                                   ∗   oscar.fajardo@urv.cat
                                                   †
                                                                                                                                                              X
                                                       marta.sales@urv.cat                                                                    Pr[rij = r] =         θiα ηjβ pαβ (r) .      (1)
                                                   ‡   roger.guimera@urv.cat; Corresponding author
                                                                                                                                                               αβ
2

 (a)                                                                      cluding age group in the example). The probability that user
       Gender                                                 Genre 1     i has an excluding attribute e (that is, the probability that the
                                                                          link ei` between user i and attribute node ` is of type e) is
        Age                                                   Genre 2
                                                                                                           X
                                                                                          Pr[ei` = e] =        θiα qα (e) ,             (2)
 (b)                                                                                                             α

                                                                          where qα (e) is the probability
                                                                                                   P        that a user of group α has an
 (c)
                                                                          attribute of type e, and e qα (e) = 1. For items, the expres-
                                                                          sion is identical except that we use item membership vectors
                                                                          η instead of user membership vectors θ.
                                                                             We also consider non-excluding attributes, such as item
                                                                          genre (for example, a movie could be both “action” and “west-
                                                                          ern”). We model each of these non-excluding attribute types
                                                                          as individual attribute nodes connected to user or item nodes
                                                                          by links that are typically binary (either do or do not have the
FIG. 1. Multipartipartite mixed-membership stochastic block
                                                                          attribute) but that could in general be also labeled. Then, the
model with labeled links. (a), We cast the recommendation prob-
lem (in which one aims to predict how users will rate certain items)
                                                                          probability that item i has attribute g of type a is also modeled
into a network inference problem. Here, users rate movies with three      using a mixed-membership, bipartite stochastic block model
possible ratings (green, orange or red). Additionally, we have exclud-                                    X
ing attributes for users (two excluding genders and three excluding                       Pr[aig = a] =       θiα ζgγ q̂αγ (a)          (3)
age groups, represented by different shades of the same color) and                                          αγ
non-excluding attributes for movies (two movie genres; the connec-
tion to these attributes is binary, yes/no, but in general it does not    where ζgγ is the membership vector of attribute g and q̂αγ (a)
need to be). Similar to ratings, we represent these attributes as bi-     is the probability that a user in group α has an attribute of type
partite networks. Although we frame our description of the model in       a for an attribute in attribute group γ. As before, the expres-
terms of recommendations or link prediction in a bipartite network,       sion for item non-excluding attributes is identical, just replac-
the problem of link prediction in regular unipartite networks is just a   ing user membership vectors θ by item membership vectors
particular case in which user nodes and item nodes are the same. (b)
                                                                          η.
Each bipartite network in the multipartite network is modeled using a
mixed-membership stochastic block model (see text). The individual
block models are coupled by the user and item membership vectors
(θ and η, respectively), shown in (c) along with all other model pa-              II.   MODEL POSTERIOR AND INFERENCE
rameters and their dimensions (see text).
                                                                             Our objective is to model the observed ratings RO , and to
                                                                          predict the value of some unobserved ratings R. For this, and
Here, θ i is the normalized membership vector of user i, and              given Eq. (1), we need to infer the parameters θ, η and p from
each element θiα represents   the probability that user i belongs         RO ; the posterior distribution over these parameters is given
                   P
to group α (with α θiα = 1). Similarly, η j is the normal-                by
ized membership vector of item j; ηjβ represents the proba-                        P (θ, η, p|RO ) ∝ P (RO |θ, η, p) P (θ, η, p)
bility that item j belongs to group β. Finally, pαβ (r) is the
probability that a user in group α and an item in group β are                                      ≡ LR (θ, η, p) P (θ, η, p) ,           (4)
connected
   P        with a rating r. The normalization condition here             where LR (θ, η, p) = P (RO |θ, η, p) is the likelihood of the
is r pαβ (r) = 1.                                                         model and P (θ, η, p) is the prior over model parameters. Ac-
   We note that the association between nodes (users and                  cording to Eq. (1), the likelihood is
items) and attributes can also be represented as a bipartite net-                                                                
work. Therefore we can model node-attribute associations in                                         Y      X
a similar manner to ratings. Because we are interested in how                   LR (θ, η, p) =                               O 
                                                                                                                θiα ηjβ pαβ (rij ) . (5)
node attributes can help in the inference of the model for rat-                                  (i,j)∈RO        αβ
ings (θ, η, p), we consider that membership vectors for users
(θ) and items (η) in their respective attribute networks are the            Similarly, if we decide to jointly model the ratings and the
same as in the model for the ratings.                                     metadata encoded in the observed user and item attributes AO ,
   We consider both excluding and non-excluding attributes.               we also need to infer the values of the parameters ζ, q and q̂)
For excluding attributes, having one attribute excludes from              using the posterior
having another; for example, a user’s age group cannot be 30-                P (θ, η, ζ, p, q, q̂|RO , AO ) ∝ LR (θ, η, p) ×
39 years old and 40-49 years old simultaneously. We model                                                     Y
each set of excluding attributes as a single attribute node (for                                            ×    LAk (θ, η, ζ, q, q̂) ×
example, an age node) that is connected to users or items                                                            k
through labeled links (each label representing a mutually ex-                                               × P (θ, η, ζ, p, q, q̂)       (6)
3

where LAk (θ, η, ζ, q, q̂) = P (AOk |θ, η, ζ, q, q̂) is the likeli-           III.   RELATIONSHIP TO PREVIOUS WORK
hood of the k-th attribute network (for example, the age at-
tribute network for users, or the genre attribute network for            The literature on using metadata for link prediction and rec-
items). For the k-th excluding attribute, this likelihood reads       ommender systems is vast, and includes all sort of approaches
                        Y
                              "
                                X
                                                       #              ranging from simple heuristics to sophisticated machine learn-
       Ak                                 k    O                      ing methods. However, our interest here is more closely re-
     L (θ, η, q) =                   θiα qα ((ek )i`k ) ,      (7)
                      (i,`k )∈AO      α                               lated to probabilistic approaches to network inference, even
                               k
                                                                      when those approaches are not applied directly to link pre-
where `k is the k-th non-excluding attribute and the product is       diction [13, 16–21]—as shown in Refs. [22, 23], once model
over all nodes i for which we observe attribute `k .                  parameters are inferred for, for example, community detec-
  For the k-th non excluding attribute we have                        tion, they can easily be used to predict links as well. Our fo-
                        Y
                             "
                               X
                                                          #           cus on approaches based on probabilistic generative models is
   Ak
 L (θ, η, ζ, q̂) =                      k k        O
                                   θiα ζgγ q̂αγ ((ak )ig ) . (8)      motivated by three characteristics of such approaches: (i) all
                      (i,g)∈AO     αγ                                 assumptions in them are explicit; (ii) principled (as opposed to
                             k
                                                                      heuristic) and sometimes even exact inference approaches are
where the product is over all observed associations between           possible; and (iii) their results are more readily interpretable.
nodes i and attributes g within the k-th class of non-excluding       These three characteristics make probabilistic approaches es-
attributes.                                                           pecially appropriate for our ultimate goal of understanding
   Ignoring normalizing constants, and in a spirit similar to         how node attributes enter and help in the inference process.
Refs. [17, 23], we define a parametric log-posterior as                  From this perspective, the multipartite mixed-membership
  π(θ, η, ζ, p, q, q̂|RO , AO ) = LR (θ, η, p) +                      stochastic block model is useful because it extends and gen-
                                  X                                   eralizes previous models. By introducing excluding and non-
                                +     λk LAk (θ, η, ζ, q, q̂) ,(9)    excluding attributes, the model can accommodate simultane-
                                        k                             ously attributes like those considered in Refs. [19, 23] (ex-
          R                      Ak
where L (θ, η, p) and L (θ, η, ζ, q, q̂) are the log-                 cluding) and in Refs. [17, 18] (non-excluding). It can also
likelihoods of ratings and attributes, respectively. For λk = 0,      combine an arbitrary number of attributes of different types,
we recover Eq. (4) with uniform priors on the parameters, thus        unlike approaches that can only deal with single attributes
completely ignoring all metadata. Conversely, for λk = 1,             [19, 23] or, more often, with a single type of attribute; and it
we are jointly modeling the network of ratings and the net-           deals naturally with missing attribute data, unlike approaches
work of attributes as in Eq. (6), with uniform priors on the          that require all node attributes to be known [16, 20]. Since
parameters. By tuning the values of λk we can interpolate             attributes are modeled with a stochastic block model, our ap-
between these situations, and extrapolate to situations with          proach also automatically clusters attributes that have similar
λk > 1 in which we would eventually only model the at-                effects on the data (for example, age groups that show similar
tribute network (λk  1). The terms corresponding to the              behavior) as in Ref. [18]. Unlike most previous approaches
attribute models can indistinctly be interpreted as part of the       for attributed networks, nodes and attributes in our model be-
likelihood of a joint model of ratings and attributes, similar to     long to mixtures of groups, which makes the model more ex-
Refs. [17, 18, 22, 23], or as a non-uniform prior over mem-           pressive [12], links between nodes and to attributes can be
bership vectors as in Refs. [16, 19, 20]. If interpreted as part      labeled, and the influence of the attributes can be tuned on
of a joint model, then λk can be seen as some factors that            and off (as in Ref. [23]). As stated above, this last feature is
are needed because attribute data are somehow less (or more)          precisely the main focus of our work.
reliable than rating data, perhaps because we have reason to
believe that attributes are more (or less) subject to noise, or
because each rating corresponds, in fact, to a mean over sev-                            IV.   SYNTHETIC DATA
eral observations. Conversely, if interpreted as priors over the
partitions, λk should be interpret as hyperparameters defin-             We first use synthetic data to validate the expectation-
ing how certain we are a priori about the importance of node          maximization inference approach and to investigate the role
attributes.                                                           of introducing node attributes. We generate synthetic data
   Either way, this parametrized posterior allows us to inves-        with a model similar to the model Fig. 1. Our synthetic rating
tigate how the metadata encoded in the attribute networks             networks consist of 200 users and 200 items, partitioned into
enter the inference process for the ratings, and under which          K = 2 groups of users and L = 4 groups of items. Users
conditions it results in better and more predictive models for        have an excluding attribute labeled “male” or “female”, and
those ratings. To do this, we maximize the posterior for fixed        items have an excluding attribute labeled from 0 to 3, which
values of λk using an expectation-maximization algorithm              may represent four different genres.
[12, 19, 22, 23] (see Appendix A), which gives the most plau-            In the simplest case, in which ratings and attributes are
sible parameter values. Because the posterior landscape is in         completely correlated, all female users have membership vec-
general rugged, we perform several runs of the EM algorithm           tors θ f = (0.8, 0.2); conversely, all male users have θ m =
and compute the average probability for each unobserved rat-          (0.2, 0.8). Similarly, an item with attribute a has a mem-
ing to make predictions (see [12] and Appendix A).                    bership of 0.8 to group a and 0.067 to all other groups. To
4

                                                                                                              0.30
simulate partial correlation c or even no correlation (c = 0)                                                 0.25
                                                                                                                     (a)                                                                       10
                                                                                                                                                                                                  3
                                                                                                                                                                                                  2
                                                                                                                                                                                                                10
                                                                                                                                                                                                                  3
                                                                                                                                                                                                                   2
                                                                                                                                                                                                                       (b)                                  0.2
                                                                                                                                                                                               10               10
between membership vectors and attributes, with probability

                                                                    Totally correlated
                                                                                                              0.20                                                                             10
                                                                                                                                                                                                  1
                                                                                                                                                                                                                10
                                                                                                                                                                                                                   1
                                                                                                                                                                                                                                                            0.1

                                                                                         Relative accuracy,

                                                                                                                                                                                                                                                                   Relative accuracy,
                                                                                                                                                                                                  0                0
1 − c we reassign each node attribute to a value selected uni-                                                0.15                                                                             10               10

                                                                                                                                                                                                        user
                                                                                                                                                                                                    1              1
                                                                                                              0.10                                                                             10              10                                           0.0
formly at random among all possibilities (2 for users and 4 for                                               0.05                                                                             10
                                                                                                                                                                                                    2
                                                                                                                                                                                                               10
                                                                                                                                                                                                                   2
                                                                                                                                                                                                    3              3                                         0.1
                                                                                                                                                                                               10              10
items).                                                                                                       0.00
                                                                                                                                                                                               10
                                                                                                                                                                                                    4
                                                                                                                                                                                                               10
                                                                                                                                                                                                                   4
                                                                                                              0.05
                                                                                                                                                                                                                                                             0.2
   For the experiments reported in Fig. 2, we consider all at-                                                0.10
                                                                                                                                                                                               0                  0

tribute links, but only a number |RO | = 400 of observed                                                      0.30
                                                                                                                     (c)                                                                       10
                                                                                                                                                                                                  3
                                                                                                                                                                                                                10
                                                                                                                                                                                                                  3
                                                                                                                                                                                                                       (d)                                  0.2
                                                                                                              0.25                                                                                2                2
ratings (that is, 1% of all generated ratings). Although the                                                  0.20
                                                                                                                                                                                               10               10

                                                                    75% correlated
                                                                                                                                                                                                  1                1
                                                                                                                                                                                               10               10                                          0.1

                                                                                         Relative accuracy,

                                                                                                                                                                                                                                                                   Relative accuracy,
synthetic data are created with item genre as an excluding                                                    0.15                                                                             10
                                                                                                                                                                                                  0
                                                                                                                                                                                                                10
                                                                                                                                                                                                                   0

                                                                                                                                                                                                        user
                                                                                                                                                                                                    1              1
                                                                                                              0.10                                                                             10              10                                           0.0
attribute, we carry out the inference process assuming that                                                   0.05                                                                             10
                                                                                                                                                                                                    2
                                                                                                                                                                                                               10
                                                                                                                                                                                                                   2
                                                                                                                                                                                                    3              3
genre is a non-excluding attribute, which is what one would                                                   0.00                                                                             10
                                                                                                                                                                                                    4
                                                                                                                                                                                                               10
                                                                                                                                                                                                                   4
                                                                                                                                                                                                                                                             0.1
                                                                                                              0.05                                                                             10              10
likely assume in real settings where the generating model is                                                  0.10
                                                                                                                                                                                               0                  0                                          0.2

unknown.                                                                                                      0.30                                                                                3               3
                                                                                                              0.25
                                                                                                                     (e)                                                                       10               10     (f)                                  0.2
   We infer the values of the model parameters using the                                                      0.20
                                                                                                                                                                                               10
                                                                                                                                                                                                  2
                                                                                                                                                                                                                10
                                                                                                                                                                                                                   2

                                                                    50% correlated
                                                                                                                                                                                                  1                1
                                                                                                                                                                                               10               10                                          0.1

                                                                                         Relative accuracy,

                                                                                                                                                                                                                                                                   Relative accuracy,
expectation-maximization equations, and use the inferred pa-                                                  0.15                                                                             10
                                                                                                                                                                                                  0
                                                                                                                                                                                                                10
                                                                                                                                                                                                                   0

                                                                                                                                                                                                        user
                                                                                                                                                                                                    1              1
rameters to predict unobserved ratings in the bipartite ratings                                               0.10                                                                             10
                                                                                                                                                                                                    2
                                                                                                                                                                                                               10
                                                                                                                                                                                                                   2
                                                                                                                                                                                                                                                            0.0
                                                                                                              0.05                                                                             10              10
network. We do this for different levels of correlation c be-                                                 0.00                                                                             10
                                                                                                                                                                                                    3
                                                                                                                                                                                                               10
                                                                                                                                                                                                                   3                                         0.1
                                                                                                                                                                                                    4              4
                                                                                                                                                                                               10              10
tween the ratings and the attribute networks (Fig. 2), from a                                                 0.05
                                                                                                                                                                                               0                  0                                          0.2
                                                                                                              0.10
situation c = 1 in which the attributes are perfectly correlated                                              0.30                                                                                3               3
with user and item membership vectors (all male users belong                                                  0.25
                                                                                                                     (g)                                                                       10
                                                                                                                                                                                                  2
                                                                                                                                                                                                                10
                                                                                                                                                                                                                   2
                                                                                                                                                                                                                       (h)                                  0.2
                                                                                                                                                                                               10               10
                                                                                                              0.20
to one group and have identical parameters, and all females                                                                                                                                       1                1

                                                                    0% correlated
                                                                                                                                                                                               10               10                                          0.1

                                                                                         Relative accuracy,

                                                                                                                                                                                                                                                                   Relative accuracy,
                                                                                                              0.15                                                                                0                0
                                                                                                                                                                                               10               10
belong to another group with different parameters; items with

                                                                                                                                                                                                        user
                                                                                                                                                                                                    1              1
                                                                                                              0.10                                                                             10              10                                           0.0
                                                                                                                                                                                                    2              2
                                                                                                                                                                                               10              10
each genre belong to the exact same mixture of groups) to a                                                   0.05
                                                                                                                                                                                               10
                                                                                                                                                                                                    3
                                                                                                                                                                                                               10
                                                                                                                                                                                                                   3                                         0.1
                                                                                                              0.00
situation c = 0 in which user and item memberships and at-                                                    0.05                                                                             10
                                                                                                                                                                                                    4
                                                                                                                                                                                                               10
                                                                                                                                                                                                                   4

                                                                                                                                                                                               0                  0                                          0.2
tributes are completely uncorrelated (Fig. 2).                                                                0.10
                                                                                                                     0 10   4        3        2        1        0        1        2        3                           0 10 410 310 210 1 100 101 102 103
                                                                                                                                10       10       10       10       10       10       10
   Since we focus on sparse observations in which the number                                                                                      item                                                                                item

of observed ratings is low (only 1% of all ratings), model pa-
rameters cannot be inferred accurately from the ratings alone.               FIG. 2. Predictive performance and effect of metadata on syn-
Therefore, when we only consider the observed ratings RO                     thetic ratings. We create synthetic ratings from 200 users on 200
and ignore all attributes AO by setting λuser = λitem = 0 in                 items, with different levels of correlation c between ratings and node
Eq. (9) (λuser and λitem correspond to the user and item at-                 attributes (see text). We then use 5-fold cross-validation to calcu-
tribute networks, respectively), the prediction of unobserved                late the performance of the expectation-maximization equations at
links is suboptimal, that is, the inferred probabilities of unob-            predicting unobserved ratings. In particular, we take as a reference
                                                                             the predictive accuracy a0 of the algorithm when all attributes are
served links differ significantly from the actual probabilities
                                                                             ignored (λuser = λitem = 0), and measure relative accuracy α
used to build the network.                                                   for a given pair (λuser , λitem ) as the log-ratio α(λuser , λitem ) =
   When there is perfect correlation between node attributes                 log [a(λuser , λitem )/a0 ]. The value α(λuser , λitem ) = 0 (dashed
and group memberships, considering the attributes AO by set-                 line) thus indicates no change with respect to the reference a0 , and
ting λuser > 0 and λitem > 0 should in principle help in the                 α(λuser , λitem ) > 0 (respectively, α(λuser , λitem ) < 0) indicates
inference process. In fact, since attributes are perfectly cor-              predictions that are more (less) accurate than those obtained by ig-
related to group memberships, in the limit λuser → ∞ and                     noring node attributes. The maximum possible relative performance
λitem → ∞ nodes will be forced into the correct groups and                   (dotted line) is obtained when each rating is assigned the exact prob-
predictions should be near optimal. This is what we observe                  ability that was used to generate it. For each value of the corre-
                                                                             lation ((a)-(b), full correlation, c = 1; (c)-(d), c = 0.75; (e)-(f),
in our numerical experiments (Fig. 2a). Interestingly, as we
                                                                             c = 0.50; (g)-(h), no correlation, c = 0) we show the variation of
increase the weight of the attributes in the log-posterior from              α(λuser , λitem ) with λitem for different values of λuser (left), and
λuser = λitem = 0, the effect on prediction accuracy is not                  the whole dependence of α(λuser , λitem ) on both λuser and λitem
smooth. Rather, below certain threshold values of λuser and                  (right).
λitem , using the attributes does not have any significant ef-
fect on prediction accuracy. Then, at those threshold values,
a transition occurs and prediction accuracy increases abruptly               lation, when attributes are partly correlated with the true group
until it reaches its theoretical maximum, as expected.                       memberships of the nodes, the change in performance is not
   When attributes and ratings are completely uncorrelated                   monotonic as we increase the importance of the attributes. As
(Fig. 2d), the role of attributes is reversed. Predictions are               before, when λuser and λitem are small enough, we observe
equally suboptimal at λuser = λitem = 0, but then, as λuser                  no difference with the situation in which the attributes are ig-
and λitem cross certain threshold values, predictions suddenly               nored entirely. In the other extreme, when λuser → ∞ and
worsen as user and item nodes are forced into groups that                    λitem → ∞ user and item nodes are forced into groups that
are uncorrelated with their real membership vectors and, thus,               match partly, but not perfectly, the true group memberships
with the observed ratings.                                                   of the nodes, so the performance may increase or decrease
   Unlike the extreme cases of total correlation or zero corre-              with respect to the situation with no attributes, depending on
5

                                                                                                                                                                                                                                                          0.010
Totally correlated   3000                                                             (a)                                                         0.000                                                                                                                                                       10
                                                                                                                                                                                                                                                                                                                 3
                                                                                                                                                                                                                                                                                                                 2
                                                                                                                                                                                                                                                                                                              10
                                                                                                                                                  0.025                                                                                                   0.005                                               10
                                                                                                                                                                                                                                                                                                                 1

                                                                                                                      Relative accuracy,

                                                                                                                                                                                                                                     Relative accuracy,
 Log-posterior,
                     3200                                                                                                                         0.050                                                                                                                                                       10
                                                                                                                                                                                                                                                                                                                 0

                                                                                                     Age

                                                                                                                                                                                                                                                                                                                       user
                                                                                                                                                                                                                                                                                                                   1
                                                                                                                                                                                                                                                          0.000                                               10
                                                                                                                                                  0.075                                                                                                                                                            2
                     3400                                                                                                                                                                                                                                                                                     10
                                                                                                                                                                                                                                                                                                                   3
                                     Optimal model for data                                                                                       0.100                                                                                                   0.005                                               10
                                                                                                                                                                                                                                                                                                                   4
                                                                                                                                                                                                                                                                                                              10
                     3600            Optimal model for attributes                                                                                 0.125   (a)                                                                                                     (b)                                         0
                                     Optimal model at transition                                                                                                                                                                                          0.010
                     3800                                                                                                                                                                                                                                 0.010
                                                                                                                                                                                                                                                                                                              10
                                                                                                                                                                                                                                                                                                                 3
                                 4               3             2         1        0              1                                                 0.00                                                                                                                                                          2
                            10              10            10        10       10             10                                                                                                                                                            0.005
                                                                                                                                                                                                                                                                                                              10
                                                                                                                                                                                                                                                                                                                 1
                                                                                                                                                                                                                                                                                                              10

                                                                                                                             Relative accuracy,

                                                                                                                                                                                                                                     Relative accuracy,
                                                                                                                                                   0.02
                                                                                                                                                                                                                                                                                                                 0
                                                                                      (b)                                                                                                                                                                                                                     10

                                                                                                     Gender

                                                                                                                                                                                                                                                                                                                       user
                                                                                                                                                   0.04                                                                                                                                                            1
                                                                                                                                                                                                                                                          0.000                                               10
                                                                                                                                                                                                                                                                                                                   2
75% correlated

                                                                                                                                                                                                                                                                                                              10
                     3500                                                                                                                          0.06
Log-posterior,

                                                                                                                                                                                                                                                                                                                   3
                                                                                                                                                                                                                                                          0.005                                               10
                                                                                                                                                                                                                                                                                                                   4
                                                                                                                                                   0.08                                                                                                                                                       10
                                                                                                                                                          (c)                                                                                                     (d)                                         0
                                                                                                                                                                                                                                                          0.010
                     4000            Optimal model for data                                                                                                                                                                                               0.010                                                  3
                                                                                                                                                                                                                                                                                                              10
                                     Optimal model for attributes                                                                                  0.00
                                                                                                                                                                                                                                                                                                              10
                                                                                                                                                                                                                                                                                                                 2
                                     Optimal model at transition

                                                                                                     Age and gender
                     4500                                                                                                                                                                                                                                 0.005                                               10
                                                                                                                                                                                                                                                                                                                 1

                                                                                                                             Relative accuracy,

                                                                                                                                                                                                                                     Relative accuracy,
                                                                                                                                                   0.05                                                                                                                                                          0
                                                                                                                                                                                                                                                                                                              10

                                                                                                                                                                                                                                                                                                                       user
                                 4               3             2         1        0              1                                                                                                                                                        0.000                                               10
                                                                                                                                                                                                                                                                                                                   1
                            10              10            10        10       10             10                                                     0.10                                                                                                                                                            2
                                                                                                                                                                                                                                                                                                              10
                                                                                                                                                                                                                                                                                                                   3
                                                                                                                                                                                                                                                          0.005                                               10
                                                                                      (c)                                                          0.15
                                                                                                                                                                                                                                                                                                              10
                                                                                                                                                                                                                                                                                                                   4

                     3500                                                                                                                                 (e)                                                                                                     (f)                                         0
50% correlated

                                                                                                                                                   0.20                                                                                                   0.010
Log-posterior,

                                                                                                                                                          0 10   4        3        2        1        0         1        2        3                                      2        1               0        1
                                                                                                                                                                     10       10       10       10        10       10       10                                    10        10              10       10
                                                                                                                                                                                       item                                                                                          item
                     4000
                                     Optimal model for data
                     4500            Optimal model for attributes                                    FIG. 4. Predictive performance and effect of metadata on the
                                     Optimal model at transition                                     MovieLens data set. As in Fig. 2, we take as a reference the
                                 4               3             2         1        0              1   predictive accuracy a0 of the algorithm when all attributes are ig-
                            10              10            10        10       10             10
                                                                                                     nored (λuser = λitem = 0), and measure relative accuracy α
                                                                                      (d)            for a given pair (λuser , λitem ) as the log-ratio α(λuser , λitem ) =
                     3500                                                                            log [a(λuser , λitem )/a0 ]. We consider three different attributes for
 0% correlated
Log-posterior,

                                                                                                     user nodes: (a)-(b), age; (c)-(d), gender; (e)-(f), age and gender com-
                     4000
                                                                                                     bined as a single attribute. We plot the whole range of λuser (left),
                                     Optimal model for data                                          and zoom into the intermediate (shaded) region of λuser in which
                     4500
                                     Optimal model for attributes                                    predictions are significantly more accurate than the reference (right).
                                     Optimal model at transition
                     5000
                                 4               3             2         1        0              1
                            10              10            10        10       10             10
                                                                                                     respectively. Regardless of the correlation between ratings
                                                                                                     and attributes, we find that the transition in predictability in
FIG. 3. Transition between data-dominated and metadata-                                              Fig. 2 coincides with the region where the data-dominated and
dominated inference regimes. For the synthetic data in Fig. 2, we                                    metadata-dominated posteriors cross. By considering Eq. (9)
plot the log-posterior π(θ, η, ζ, p, q, q̂|RO , AO ) as a function of the                            we see that this must be the case. Indeed, for each attribute
hyperparameter λ = λitem = λuser for three models: the model                                         network we find three regimes—one dominated by the LR
that maximizes the data likelihood LR , the model that maximizes
                                                                                                     term, one dominated by the LA term, and one in which both
the metadata likelihood LA , and the model that maximizes the pos-
terior when two previous cases cross (that is, have equal posteriors).                               terms are comparable. Unless there is perfect or almost per-
The position of the crossing coincides with the transitions and the                                  fect correlation between attributes and node memberships,
maxima observed in Fig. 2.                                                                           any improvement in predictive power must come from con-
                                                                                                     sidering both the observed ratings and the observed attributes,
                                                                                                     and therefore in the transition region.
whether the correlation is high (Fig. 2b) or low (Fig. 2c).
However, we find that the most predictive models in this case
are those at intermediate values of λuser and λitem , precisely                                                                                                                                          V.         REAL DATA
at the transition region where both the observed ratings and
the observed attributes play a role in determining the most                                             Finally, we analyze two empirical data sets and study
plausible group memberships. In this case, the inferred node                                         whether we observe the same behaviors as in the synthetic
memberships do not coincide with either those that maximize                                          data. First, we consider the 100K MovieLens data set [29],
LR of those that maximize LAk .                                                                      which contains 100,000 ratings of movies by users. Age and
   To understand the transition from the rating-dominated to                                         gender attributes are available for users, which we model as
the attribute-dominated regime, we study the posterior of                                            excluding attributes (Fig. 4). Movies have genre attributes,
the two extreme models corresponding to the maximum a                                                which we model as non-excluding attributes. The relative
posterior estimates obtained by expectation-maximization for                                         weights of user and movie attributes are given by the parame-
λuser = λitem = 0 and for λuser = λitem → ∞ (Fig. 3).                                                ters λusers and λitems .
These are the most plausible models when only data (ratings)                                            Just as in the synthetic networks with small but finite cor-
and only metadata (attributes) are taken into consideration,                                         relation, we observe an intermediate value of λuser and λitem
6

                                     Party                                                                      this case, predictive accuracy does not improve monotonically
                                     Party and State                                                            with λuser because, for very large values, representatives are
                     0.15            State
Relative accuracy,

                                                                                                                forced into small groups that are more prone to fluctuations,
                     0.10                                                                                       that is, the model overfits the data thus worsening the predic-
                                                                                                                tive power with respect to considering large groups associated
                     0.05                                                                                       to party affiliation alone.

                     0.00
                                 4             3            2        1           0        1        2        3
                            10            10           10       10          10       10       10       10                             VI.   CONCLUSION
                                                                         user

          FIG. 5. Predictive performance and effect of metadata on the U.S.                                        There is ample evidence that using node metadata can help
          Congress data set. As in Fig. 2, we take as a reference the predictive                                to solve network inference problems. As we have discussed,
          accuracy a0 of the algorithm when all attributes are ignored (λuser =                                 several approaches have been proposed in recent years to in-
          0), and measure relative accuracy α for a given λuser as the log-ratio                                troduce node attributes into probabilistic network models, and
          α(λuser ) = log [a(λuser )/a0 ]. We consider three different attributes                               to use them to make better inferences about, for example, the
          for user nodes: Party, State, and party and State simultaneously.                                     group structure of networks or the existence of unobserved
                                                                                                                interactions. In these approaches, node attributes are intro-
                                                                                                                duced either as part of a whole-system model (including both
          that provides more accurate rating predictions than either con-                                       the links between nodes and node attributes), or as priors over
          sidering the observed ratings alone or considering the node                                           the parameters of the model for the links (for example, as pri-
          attributes alone. This behavior is similar when we consider                                           ors for the node group memberships that, in turn, determine
          age only, gender only, or age and gender simultaneously. As                                           the probability of existence of links). However, beyond the
          in synthetic networks, the optimal combination of rating data                                         improvement in performance that they may entail in a given
          and node metadata occurs for values of λ such that the ratings                                        task such as group detection or link prediction, we know lit-
          network and the attributes networks have comparable contri-                                           tle about the effect that node attributes have in the inference
          butions to the log-posterior.                                                                         process. Here, our goal has been to clarify this issue.
              Second, we consider a data set on the votes of 441 mem-                                              Regardless of whether attributes are introduced as part of a
          bers of the U.S. House of Representatives in the 108th U.S.                                           whole model or as a prior for model parameters, they appear
          Congress [30] (Fig. 5). Between Jannuary 2003 and Jannuary                                            in probabilistic models as additional terms in the likelihood or
          2005, these representatives voted on 1,217 bills, casting one                                         the posterior. As we have shown, our results depend on this
          of 9 different types of vote, which, following previous anal-                                         simple observation alone—only when all terms in these like-
          yses, we simplify to Yes, No, and Other [30]. In this data                                            lihoods or posteriors are comparable in magnitude, or when
          set, “users” are the representatives and “items” are the bills.                                       attributes are perfectly correlated with ratings, can we expect
          The ratings represent the votes of the representatives on the                                         attributes to improve the inference process. In this sense, our
          bills. For representatives, we have attribute data indicating                                         findings here may be expected to be universal.
          their party and state, which we model as excluding attributes.                                           From a practical point of view, our work helps to under-
          Although all votes of all members are recorded in the data                                            stand when certain approaches will not work. For example,
          set (in total, 536,698 votes), for the purpose of our analysis                                        our results suggest that modeling data and metadata jointly
          we infer the parameters of the multipartite mixed-membership                                          will only improve link predictions (or other network inference
          stochastic block model using 1% of the data, and predict the                                          problems) if two conditions are fulfilled simultaneously: (i)
          remaining 99% (and repeat this using each 1% of the data as                                           the metadata are correlated to the data; (ii) as we have men-
          training set).                                                                                        tioned, the balance between amount of data and metadata is
              Again, the effects of introducing the attributes in the infer-                                    such that their likelihoods (LR and LA above) are of the same
          ence process are very similar to those we encounter in syn-                                           order. If the first condition is not fulfilled, using metadata will
          thetic data (Fig. 5). When using only the state of the represen-                                      in general worsen predictions, rather than improving them;
          tatives, we observe a behavior that is compatible with small                                          if the second condition is not fulfilled, one may, in practice,
          but finite correlation between attribute and voting patterns,                                         inadvertently ignore either the data or the metadata and thus
          since the optimal predictive performance is observed at inter-                                        make, again, suboptimal predictions.
          mediate values of λuser . Rather, when we consider party af-                                             Some works have intuitively addressed this problem by in-
          filiation we observe a behavior that is compatible with almost                                        troducing tuning parameters akin to our λk [17, 23]. However,
          perfect correlation between attribute and voting behavior. In-                                        the impact of those parameters has not been studied in detail
          deed, in this case the predictive performance of the model in-                                        and, instead, their values are typically chosen among a very
          creases monotonically with λuser , with an abrupt transition at                                       limited set by means of cross-validation. Our work clarifies
          λuser ≈ 1, just as for perfectly correlated attributes in syn-                                        how the value of those parameters should be chosen, and why.
          thetic data. When state and party are combined into a single                                             From a broader perspective, our work opens the door to
          excluding attribute (for example, “Democrat from Texas” is a                                          understanding the role of different terms in probabilistic net-
          group), we observe a behavior compatible with strong (but im-                                         work models, as well as the transitions that occur between the
          perfect) correlation between attributes and voting behavior. In                                       regimes in which one term or another dominates. This sets
7

the stage for more systematic approaches to building better                have
probabilistic models of network systems.                                                          X               X
                                                                                    LAk =                   log       θiα qαk (i`k )
                                                                                               (i,`k )∈AO         α
                                                                                                        k

                                                                                                  X               X
                                                                                                                       k          θiα qαk (i`k )
                                                                                           =                log       σi`   (α)
                                                                                                                  α
                                                                                                                          k
                                                                                                                                    σi`k (α)
                                                                                               (i,`k )∈AO
                                                                                                        k

                   ACKNOWLEDGMENTS
                                                                                                  X         X
                                                                                                                   k              θiα qαk (i`k )
                                                                                           ≥                      σi`k
                                                                                                                       (α) log       k (α)
                                                                                                                                                     (A2)
                                                                                                          α
                                                                                                                                    σi`
                                                                                               (i,`k )∈AO
                                                                                                        k
                                                                                                                                        k

   The authors acknowledge support by the Spanish Ministe-                 where σi`k
                                                                                        (α) is the auxiliary distribution, and to simplify the
                                                                                      k
rio de Economı́a y Competitividad (Grants FIS2016-78904-                   notation we have defined qαk (i`k ) ≡ qαk ( eO
                                                                                                                             
                                                                                                                            k i`k ).
C3-P-1 and PID2019-106811GB-C31) and by the Govern-
                                                                              Finally, for the term corresponding to non-excluding node
ment of Catalonia (Grant 2017SGR-896).
                                                                           attributes we have
                                                                                              X          X
                                                                                  LAk =              log          k
                                                                                                             θiα ζgγ q̂αγ (ig)
                                                                                           (i,g)∈AO          αγ
                                                                                                  k
                                                                                                                                      k
                                                                                               X            X
                                                                                                                    k
                                                                                                                                θiα ζgγ  q̂αγ (ig)
                                                                                       =              log         σ̂ig (α, γ)        k
                                                                                                             αγ
                                                                                                                                   σ̂ig (α, γ)
       Appendix A: Expectation-maximization equations                                      (i,g)∈AO
                                                                                                  k
                                                                                                                                      k
                                                                                               X      X
                                                                                                              k
                                                                                                                                θiα ζgγ q̂αγ (ig)
                                                                                       ≥                    σ̂ig (α, γ) log          k (α, γ)
                                                                                                                                                  (A3)
                                                                                                                                   σ̂ig
   We aim to maximize the parametric log-posterior in Eq. (9)                              (i,g)∈AO
                                                                                                  k
                                                                                                    αγ
as a function of the model parameters θ, η, p, ζ, q and q̂. Be-
                                                                                    k
cause logarithms of sums are hard to deal with, we use a                   where σ̂ig (α, γ) is the auxiliary distribution, and to simplify
                                                                                                                              
variational P
            trick that first introduces an auxiliary distribution
                                                        P                  the notation we have defined q̂αk (ig) ≡ q̂αγ
                                                                                                                      k
                                                                                                                         ( aO
                                                                                                                            k ig ).
p(x)
P     with     x p(x) = 1 into a sum P     of terms as x x =                 Note that, in Eqs. (A1)-(A3) above, the equality is satisfied
   x p(x) (x/p(x)).      Then because        x p(x) (x/p(x)) =             when maximizing with respect to the auxiliary distributions.
hx/p(x)i weP   can use  Jensens’  inequality
                                    P        loghyi ≥ hlog yi to           By solving these optimization problems we obtain
write log [ x p(x) (x/p(x))] ≥ x p(x) log [x/p(x)].
                                                                                                                             O
                                                                                                               θiα ηjβ pαβ (rij )
  Because both rating and attribute terms in Eq. (9) contain                          ωij (α, β) = P                              O
                                                                                                                                       ,             (A4)
                                                                                                            α0 β 0 θiα ηjβ pα β (rij )
                                                                                                                      0   0  0 0
logarithms of sums, we introduce an auxiliary distribution for
each of the terms as follows. For the ratings, we have                                   k                  θiα qαk (i`k )
                                                                                        σi` k
                                                                                              (α) = P                k
                                                                                                                              ,                      (A5)
                                                                                                            α0 θiα qα0 (i`k )
                                                                                                                   0

                                                                                                                    k
                                                                                        k
                                                                                                               θiα ζgγ  q̂αγ (ig)
                X              X                                                      σ̂ig (α, γ) = P                              .                 (A6)
      LR =                                                                                                  α γ iα gγ q̂α γ (ig)
                                                                                                                  θ    ζ
                                                  O
                         log        θiα ηjβ pαβ (rij )r                                                      0  0    0     0   0 0

              (i,j)∈RO         αβ
                                                                           Therefore, the auxiliary distributions have the following inter-
                                                             O
                X              X               θiα ηjβ pαβ (rij )          pretations: ωij (α, β) is the contribution of user group α and
          =              log        ωij (α, β)
                                                  ωij (α, β)               item group β to the probability that user i gives item j a rating
              (i,j)∈RO         αβ                                            O    k
                                                                           rij ; σi` k
                                                                                       (α) is the contribution of user group (or item group) α
                                                              O
                                                θiα ηjβ pαβ (rij )         to the probability that user (item) i has attribute type (eO   k )i`k
                X        X
          ≥                    ωij (α, β) log                      (A1)                                                         k
                                                   ωij (α, β)              in the k-th excluding attribute; and, finally, σ̂ig     (α, γ) is the
              (i,j)∈RO αβ
                                                                           contribution of groups α and γ to the probability that, for the
                                                                           k-th non-excluding attribute, the association between node i
                                                                           and attribute g is of type (aO  k )ig .
where ωij (α, β) is the auxiliary distribution.                               Using Lagrange multipliers for the normalization con-
                                                                           straints, and equating to zero the derivatives of the log-
  For the term corresponding to excluding node attributes we               posterior with respect to the model parameters yields

                                                                         k
                                       P        P                    P            P      P    P l
                                         j∈∂i     β   ωij (α, β) +   λk σi`
                                                                     k     k
                                                                             (α) + l λl g∈∂i k γ σ̂ig (α, γ)
                               θiα =                                P        k
                                                                                 P     l
                                                                                                                                                     (A7)
                                                                di + k λk δi + l λl ∆i

where ∂ik is the set of k-th attributes associated with user i, di is the degree of user i in the network of ratings, and ∆li = |∂i l |.
8

                    k
Note that the term σi` k
                         (α) is equal to zero if user i does not have attribute `k , so that δik = 1 if user i has exclusive attribute `k
and zero otherwise.
                                                                     k
                                  P       P                 P                  P        P       P l
                                     i∈∂j    α ωij (α, β) +    k λk σj`k (β) +    l λl    i∈∂jk   γ σ̂ij (β, γ)
                           ηjβ =                               P       k
                                                                            P         l
                                                                                                                                    (A8)
                                                         dj + k λk δj + l λl ∆j

where ∂jk is the set of k-th attributes associated with item j,                    Appendix B: Expectation-maximization algorithm
dj is the degree of item j in the network of ratings, and ∆lj =
|∂j l |. As before, the term σj`k
                                  k
                                    (β) is equal to zero if item j             To obtain a maximum of the posterior we start by be-
does not have attribute `k , so that δjk = 1 ifitem j has exclusive         rating random initial conditions for each model parameter
attribute `k and zero otherwise.                                            θ, η, p, ζ, q, q̂.
                                                                               The we perform iteratively two steps until model parame-
                                                                            ters convergence:
                                                     k
                                P           P
                                    i∈∂gk      α   σ̂ig (α, γ)                 1. Expectation step: compute the auxiliary functions
                       k
                      ζgγ   =                                        (A9)                       k              k
                                            ∆kg                                   ωij (α, β), σi` k
                                                                                                    (α), and σ̂ig (α, γ) using current values
                                                                                  for θ, η, p, ζ, q, q̂ using Eqs. A.4, A.5 and A.6.
where where ∂gk is the set of nodes associated with attribute g,               2. Maximization step: Compute the new values for the
and ∆kg = |∂g k |. Additionally, we have                                          model parameters using the values for the auxiliary
                        P                                                         functions and Eqs. A.7 - A.12.
                                         0
                              (i,j)∈RO |rij     = rωij (α, β)
         pαβ (r) =              P                                   (A10)      Because the posterior landscape is very rugged, to make
                                    (i,j)∈RO    ωij (α, β)                  predictions we perform the EM algorithm 10 times and con-
                                                                            sider all of the models to estimate the average probability that
                                                                            user i rates item j with rating r (see [31]) as follows:
                                                      k
                      P
                            (i,`k )∈AO  |(eO )i` =e σi`k (α)                                                   N
        qαk (e)   =               P k k kk                          (A11)                                  1 X
                                     (i,`k ) σi`k (α)                       hp(rij = r|RO , AO
                                                                                             k )i ≈                pn (rij = r|RO , AO
                                                                                                                                     k , (. . . ))
                                                                                                          N n=1
                                                                                                                                             (B1)
                      P                                 k                   where (. . . )        =     {θ, η, p, ζ, q, q̂}, and pn (rij         =
                            (i,g)∈AO    O          = aσ̂ig (α, γ)
                                   k |(ak )ig                               r|RO , AOk , (. . . )) is the probability that user i rates item
         k
       q̂αγ (a)   =             P                 k (α, γ)
                                                                    (A12)
                                  (i,g)∈AO      σ̂ig                        j with rating r in run n of the EM algorithm.
                                         k

 [1] D. Liben-Nowell and J. Kleinberg, “The link-prediction prob-            [8] M. Tarrés-Deulofeu, A. Godoy-Lorite, R. Guimerà, and
     lem for social networks,” J. Am. Soc. Inf. Sci. Tec. 58, 1019–              M. Sales-Pardo, “Tensorial and bipartite block models for link
     1031 (2007).                                                                prediction in layered networks and temporal networks,” Phys.
 [2] A. Clauset, C. Moore, and M. E. J. Newman, “Hierarchical                    Rev. E 99, 032307 (2019).
     structure and the prediction of missing links in networks.” Na-         [9] Michael P. Menden, Dennis Wang, Mike J. Mason, Bence
     ture 453, 98–101 (2008).                                                    Szalai, Krishna C. Bulusu, Yuanfang Guan, Thomas Yu, Jae-
 [3] R. Guimerà and M. Sales-Pardo, “Missing and spurious interac-              woo Kang, Minji Jeon, Russ Wolfinger, Tin Nguyen, Mikhail
     tions and the reconstruction of complex networks.” Proc. Natl.              Zaslavskiy, AstraZeneca-Sanger Drug Combination DREAM
     Acad. Sci. U. S. A. 106, 22073–22078 (2009).                                Consortium, In Sock Jang, Zara Ghazoui, Mehmet Eren Ah-
 [4] L. Lü, L. Pan, T. Zhou, Y.-C. Zhang, and H.E. Stanley, “Toward             sen, Robert Vogel, Elias Chaibub Neto, Thea Norman, Eric
     link predictability of complex networks,” Proc. Natl. Acad. Sci.            K. Y. Tang, Mathew J. Garnett, Giovanni Y. Di Veroli, Stephen
     U.S.A. 112, 2325–2330 (2015).                                               Fawell, Gustavo Stolovitzky, Justin Guinney, Jonathan R. Dry,
 [5] A. Ghasemian, H. Hosseinmardi, A. Galstyan, E. M. Airoldi,                  and Julio Saez-Rodriguez, “Community assessment to advance
     and A. Clauset, “Stacking models for nearly optimal link pre-               computational prediction of cancer drug combinations in a
     diction in complex networks,” Proc. Natl. Acad. Sci. USA 117,               pharmacogenomic screen,” Nat. Comm. 10, 2674 (2019).
     23393–23400 (2020).                                                    [10] R. Guimerà and M. Sales-Pardo, “Justice blocks and pre-
 [6] R Guimerà, “One model to rule them all in network science?”                dictability of U.S. Supreme Court votes,” PLoS ONE 6, e27188
     Proc. Natl. Acad. Sci. USA 117, 25195–25197 (2020).                         (2011).
 [7] R. Guimerà and M. Sales-Pardo, “A network inference method            [11] R. Guimerà, A. Llorente, E. Moro, and M. Sales-Pardo, “Pre-
     for large-scale unsupervised identification of novel drug-drug              dicting human preferences using the block structure of complex
     interactions,” PLoS Comput. Biol. 9, e1003374 (2013).                       social networks,” PLoS ONE 7, e44620 (2012).
9

[12] A. Godoy-Lorite, R. Guimerà, C. Moore, and M. Sales-Pardo,              els with multiple continuous attributes,” Appl. Netw. Sci. 4, 54
     “Accurate and scalable social recommendation using mixed-                (2019).
     membership stochastic block models,” Proc. Natl. Acad. Sci.       [23]   M. Contisciani, E. A. Power, and C. De Bacco, “Community
     U.S.A. 113, 14207 –– 14212 (2016).                                       detection with node attributes in multilayer networks,” Sci. Rep.
[13] S. Cobo-López, A. Godoy-Lorite, J. Duch, M. Sales-Pardo, and            10, 1–16 (2020).
     R. Guimerà, “Optimal prediction of decisions and model selec-    [24]   Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization tech-
     tion in social dilemmas using block models,” EPJ Data Sci. 7 ,           niques for recommender systems,” Computer 42, 30–37 (2009).
     48 (2018) 7, 48 (2018).                                           [25]   P. W. Holland, K. B. Laskey, and S. Leinhardt, “Stochastic
[14] M. Timme, “Revealing Network Connectivity from Response                  blockmodels: First steps,” Soc. Networks 5, 109–137 (1983).
     Dynamics,” Phys. Rev. Lett. 98, 224101 (2007).                    [26]   K. Nowicki and T. A. B. Snijders, “Estimation and prediction
[15] T.P. Peixoto, “Network reconstruction and community detec-               for stochastic blockstructures,” J. Am. Stat. Assoc. 96, 1077–
     tion from dynamics,” Phys. Rev. Lett. 123, 128301 (2019).                1087 (2001).
[16] C. Tallberg, “A Bayesian approach to modeling stochastic          [27]   T.-C. Yen and D. B. Larremore, “Community detection in bi-
     blockstructures with covariates,” J. Math. Sociol. 29, 1–23              partite networks with stochastic block models,” Phys. Rev. E
     (2004).                                                                  102, 032309 (2020).
[17] J. Yang, J. McAuley, and J. Leskovec, “Community detection        [28]   E. M. Airoldi, D. M. Blei, S. E Fienberg, and E. P. Xing,
     in networks with node attributes,” in 2013 IEEE 13th Interna-            “Mixed membership stochastic blockmodels,” J. Mach. Learn.
     tional Conference on Data Mining (2013) pp. 1151–1156.                   Res. 9, 1981–2014 (2008).
[18] D. Hric, T. P. Peixoto, and S. Fortunato, “Network structure,     [29]   F. M. Harper and J. A. Konstan, “The Movielens datasets: His-
     metadata, and the prediction of missing nodes and annotations,”          tory and context,” ACM Trans. Interact. Intell. Syst. 5 (2015).
     Phys. Rev. X 6, 031038 (2016).                                    [30]   A. S. Waugh, L. Pei, J. Fowler, P. Mucha, and M. A. Porter,
[19] M. E. J. Newman and A. Clauset, “Structure and inference in              “Party polarization in Congress: A network science approach,”
     annotated networks,” Nat. Comm. 7, 11863 (2016).                         arXiv: Physics and Society (2009).
[20] A. White and T. B. Murphy, “Mixed-membership of experts           [31]   A. Godoy-Lorite, R. Guimerà, C. Moore, and M. Sales-Pardo,
     stochastic blockmodel,” Netw. Sci. 4, 48–80 (2016).                      “Accurate and scalable social recommendation using mixed-
[21] L. Peel, D. B. Larremore, and A. Clauset, “The ground truth              membership stochastic block models,” Proc. Natl. Acad. Sci.
     about metadata and community detection in networks,” Sci.                U.S.A. 113, 14207 –– 14212 (2016).
     Adv. 3 (2017), 10.1126/sciadv.1602548.
[22] N. Stanley, T. Bonacci, and R. Kwitt, “Stochastic block mod-
You can also read