Dual Preference Distribution Learning for Item Recommendation

Page created by Willard Ramsey
 
CONTINUE READING
Dual Preference Distribution Learning for Item Recommendation
Dual Preference Distribution Learning for Item
 Recommendation
 XUE DONG, School of Software, Shandong University, China
 XUEMENG SONG∗ , School of Computer Science and Technology, Shandong University, China
arXiv:2201.09490v2 [cs.IR] 26 May 2022

 NA ZHENG, School of Computer Science and Technology, Shandong University, China
 YINWEI WEI, School of Computing, National University of Singapore, Singapore
 ZHONGZHOU ZHAO, DAMO Academy, Alibaba Group, China
 HONGJUN DAI, School of Software, Shandong University, China
 Recommender systems automatically recommend users items that they probably like, for which the goal is to
 represent the user and item and model their interaction. Existing methods have primarily learned the user’s
 preferences and item’s features with vectorized representations, and modeled the user-item interaction by the
 similarity of their representations. In fact, the user’s different preferences are related and such relations could
 help better understand the user’s preferences. Toward this end, we propose to represent the user’s preference
 with multi-variant Gaussian distribution, and model the user-item interaction by calculating the probability
 density at the item in the user’s preference distribution. In this manner, the mean vector of the Gaussian
 distribution is able to capture the center of the user’s preferences, while its covariance matrix captures the
 relations of these preferences. In particular, in this work, we propose a dual preference distribution learning
 framework, which captures the user’s preferences to both the items and attributes by a Gaussian distribution,
 respectively. As a byproduct, identifying the user’s preference to specific attributes enables us to provide the
 explanation of recommending an item to the user. Extensive quantitative and qualitative experiments on six
 public datasets demonstrate the effectiveness of the proposed DUPLE method.

 CCS Concepts: • Information systems → Retrieval models and ranking; Recommender systems; •
 Mathematics of computing → Distribution functions; Computing most probable explanation.

 Additional Key Words and Phrases: Recommender System, Preference Distribution Learning, Explainable
 Recommendation.

 ACM Reference Format:
 Xue Dong, Xuemeng Song, Na Zheng, Yinwei Wei, Zhongzhou Zhao, and Hongjun Dai. 2018. Dual Preference
 Distribution Learning for Item Recommendation. J. ACM 37, 4, Article 111 (August 2018), 23 pages. https:
 //doi.org/XXXXXXX.XXXXXXX

 ∗ Corresponding author.

 Authors’ addresses: Xue Dong, dongxue.sdu@gmail.com, School of Software, Shandong University, Jinan, China, 250101;
 Xuemeng Song, sxmustc@gmail.com, School of Computer Science and Technology, Shandong University, Qingdao, China,
 266237; Na Zheng, zhengnagrape@gmail.com, School of Computer Science and Technology, Shandong University, Qingdao,
 China, 266237; Yinwei Wei, weiyinwei@hotmail.com, School of Computing, National University of Singapore, Singapore;
 Zhongzhou Zhao, zhongzhou.zhaozz@alibaba-inc.com, DAMO Academy, Alibaba Group, Zhejiang, China, 311121; Hongjun
 Dai, dahogn@sdu.edu.cn, School of Software, Shandong University, Jinan, China, 250101.

 Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
 provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and 111
 the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.
 Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
 prior specific permission and/or a fee. Request permissions from permissions@acm.org.
 © 2018 Association for Computing Machinery.
 0004-5411/2018/8-ART111 $15.00
 https://doi.org/XXXXXXX.XXXXXXX

 J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2018.
Dual Preference Distribution Learning for Item Recommendation
111:2 Xue Dong, Xuemeng Song, Na Zheng, Yinwei Wei, Zhongzhou Zhao, and Hongjun Dai

1 INTRODUCTION
To save users from selecting items by themselves on the web, recommender systems have attracted
much research attention [21, 25, 30, 32, 44, 47]. Specially, these systems aim to recommend users
items that they probably prefer, according to the their historical interactions with items. The
key to fulfill this is to effectively represent the user and item, and based on that predict their
potential interactions [46]. Existing studies have primarily represented the item and user with
vectorized representations, where the different dimensions of the item representation capture the
item’s different features, while those of the user representation indicate the user’s corresponding
preferences to the features. Accordingly, the user-item interaction can be modeled by calculating
the similarity of their representations via techniques, like the inner product [21, 46], Euclidean
distance [42] and non-linear neural networks [16].
 In a sense, with either the inner product or Euclidean distance, each dimension of the user
and item representations is conducted separately and equally. Nevertheless, in fact, the user’s
preferences to different item features are related. For example, if a user has watched many action
movies, most of which are played by the actor Jackie Chan, then we can learn that the user’s
preferences to the features action movie and Jackie Chan are positively related. Although using the
non-linear neural networks [16], the relations among different dimensions can be learned implicitly
via multiple fully-connected layers. But it is hard to exhibit what the networks actually learn.
 Therefore, in this work, we aim to explicitly model the relations among the user’s preferences to
different features for a better recommendation. In particular, we propose to represent the user’s
preferences with a multi-variant Gaussian distribution rather than a vectorized representation. In
this manner, the mean vector of the Gaussian distribution is able to capture the center of the user’s
preferences, and its covariance matrix captures the relations of these preferences. However, it is non-
trivial due to the following two challenges. 1) How to accurately model the user’s preference based
on the Gaussian distribution is one major challenge. And 2) the user’s preferences to items usually
come from his/her judgements towards the item’s attributes (e.g., genres of the movie) [31, 58].
Therefore, how to seamlessly integrate the user’s preferences to the general item and specific
attribute to strengthen the recommendation in terms of both quality and interpretability consists
another crucial challenge for us.
 To address these challenges, we propose a DUal Preference distribution LEarning framework
(DUPLE), shown in Fig. 1, which jointly learns the user’s preferences to the general item and spe-
cific attribute with a probabilistic distribution, respectively. Notably, in our context, we name the
user’s preferences to the general item and specific attribute as the general and specific preferences,
respectively. As illustrated in Fig. 1, DUPLE consists of two key components: the parameter con-
struction and general-specific transformation modules. In particular, to capture the user’s general
preference, we learn a general preference distribution by the parameter construction module that
works on constructing the essential parameters, i.e., the mean vector and covariance matrix, based
on the user’s ID embedding. We regard the probability density of the item’s ID embedding in this
distribution as the user’s general preference. To capture the user’s specific preference, we first
design a general-specific transformation module that works on predicting the item’s attribute
representation (depicting what attributes the item has) from its ID embedding supervised by the
items’ attribute labels. And then, the transformation, forming the gap between the general item
and its specific attributes, enables us to derive a specific preference distribution from the general
one1 . Similarly, the probability density of the item’s attribute embedding in the specific preference
distribution can be regarded as the user’s specific preferences to the attributes. Ultimately, we
fuse the user’s general and specific preferences with a linear combination to predict the user-item
1 Detailed mathematical proof will be given in Section 3.3.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2018.
Dual Preference Distribution Learning for Item Recommendation 111:3

 Item ID A Preferred Item
 
 , Σ ) embedding 
 
 High-Top
 Printed Shoes
 Parameter Item 
 Construction General Preference
 User ID 
embedding Σ Distribution
 General-Specific
 Transformation

 Similarity
 General preference learning ො , Σ ) Item attribute Attribute Label 
 Printed Shoes representation ො (BOWs Embedding)
 Specific preference learning
 Explanation production High-Top
 Maximize Specific Preference
 Σ Distribution

Fig. 1. Illustration of the proposed dual preference distribution learning framework (DUPLE), which jointly
learns the user’s preferences to general item (blue flow) and specific attribute (green flow) for the better
recommendation.

interaction, based on which we can perform the item recommendation. Extensive experiments on
six version datasets derived from the Amazon Product [30] and MovieLens [13] demonstrate the
superiority of our DUPLE over state-of-the-art methods.
 Besides, DUPLE can further infer what attributes the user prefers, and explain why the model
recommends the item to the user. To be more specific, the user’s the specific preference distribution
learns the user’s preferences to item attributes, we thus can summarize a preferred attribute profile
for the user. Accordingly, the overlap between the attributes that the user prefers and an item has
can be as the explanation of recommending the item to the user.
 We summarize our main contributions as follows:
 • We propose a dual preference distribution learning framework (DUPLE) that captures the
 user’s preferences to both the general item and specific attributes with the probabilistic
 distributions. To the best of our knowledge, this is the first attempt to capture the relations
 of the user’s preferences with the covariance matrix of the distribution. Learning such
 information, we can better model the user’s preferences and gain a better recommendation
 performance.
 • Benefited from learning the user’s specific preference distribution, we can summarize the
 preferred attributes of the user, and based on that provide the explanation for the recommen-
 dation, namely, give the reasons why we recommend the item for the user
 • Extensive qualitative and quantitative experiments conducted on six real-world datasets have
 proven the both the effectiveness and explainability of the proposed DUPLE method. We will
 release our codes to facilitate other researchers.

2 RELATED WORK
In this section, we briefly introduce traditional recommender systems in Subsection 2.1 and ex-
plainable recommender systems in Subsection 2.2.

2.1 Recommender Systems
Initial researches utilize the Collaborative Filtering techniques [37] to capture the user’s preferences
from the interacted relationships between users and items. Matrix Factorization (MF) is the most

 J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2018.
111:4 Xue Dong, Xuemeng Song, Na Zheng, Yinwei Wei, Zhongzhou Zhao, and Hongjun Dai

popular method [21, 22, 33]. It focuses on factorizing the user rating matrix into the user matrix
and item matrix and predicting the user-item interaction by the similarity of their representations.
Considering that different users may have different rating habits, Koren et al. [21] introduced
the user and item biases into the matrix factorization, achieving a better performance. Several
other researchers argued different users may have similar preferences to items. Consequently,
Yang et al. [54] and Chen et al. [5] clustered the users into several groups according to their
historical item interactions and separately captured the common preferences of users in each group.
Recently, due to the great performance of graph convolutional networks, many approaches have
resorted to constructing a graph of users and items according to their historical interactions, and
exploring the high-order connectivity from user-item interaction [41, 44, 46–48]. For example,
Wang et al. [46] served users and items as nodes and their interaction histories as edges between
nodes. And then, they proposed a three-layer embedding propagation to propagate messages from
items to user and then back to items. Wang et al. [47] designed the intent-aware interaction graph
that disentangles the item representation into several factors to capture user’s different intents.
 Beyond directly learning the user’s preferences from the historically user-item interactions,
other researchers began to leverage the item’s rich context information to capture the user’s
detailed preferences to items. Several of them incorporated the visual information of items to
improve the recommendation performance [15, 30, 56]. For example, He et al. [15] enriched the
item’s representation by extracting the item’s visual feature by a CNN-based network from its
image and added it into the matrix factorization. Yang et al. [56] further highlighted the region
of the item image that the user is probably interested in. Some other researches explore the item
attributes [2, 31, 58] or user reviews [6, 39, 52] to learn the user’s preferences to the item’s specific
aspects. For example, Pan et al. [31] utilized the attribute representations as the regularization of
learning the item’s representation during modeling the user-item interaction. In this manner, the
user’s preferences to attributes can be tracked by the path of the user to item and then to attributes.
And Chen et al. [6] proposed a co-attentive multi-task learning model for recommending user items
and generating the user’s reviews to items. Besides, the multi-modal data has been proven to be
important for the recommendation [10, 23, 49]. For example, Wei et al. [49] constructed a user-item
graph on each modality to learn the user’s modal-specific preferences and incorporated all the
preferences to predict the user-item interaction.
 Although these above approaches are able to capture the user’s preferences to items or specific
contents of the item (e.g., attributes), they fail to explicitly model the relations of the user’s different
preferences, which is beneficial for the better recommendation. In this work, we proposed to
capture the user’s preferences with the multi-variant Gaussian distribution, and model the user-
item interaction with the probability density of the item in the user’s preference distribution. In
this manner, the relations of the user’s preferences can be captured by the covariance matrix of the
distribution.

2.2 Explainable Recommender Systems
The recommender systems trained by deep neural networks are perceived as a black box only able to
predict a recommendation. Thus, to make the recommendation more transparent and trustworthy,
interpretable recommender systems [4, 58] are therefore gaining popularity, which focus on what
and why to recommend an item. Initial approaches can provide rough reasons of recommending
an item based on the similar items or users [34, 36]. Thereafter, researchers attempt to seek more
explicit reasons provided to users, e.g., explain recommendations with item attributes, reviews, and
reasoning paths.
 Interpretation with Item Attributes. This group of interpretable recommender systems con-
sider that the user likes an item may be caused by its certain attributes, e.g., “you may like Harry

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2018.
Dual Preference Distribution Learning for Item Recommendation 111:5

Potter because it is an adventure movie”. Thus, mainstream approaches in this research line have
been dedicated to bridging the gap between users and attributes [4, 17, 31, 55]. In particular,
Wang et at. [45] employed a tree-based model to learn explicit decision rules from item attributes,
and designed an embedding model to generalize to unseen decision rules on users and items.
Benefited from the attention mechanism, several researchers [4, 31] learned the item embedding
with the fusion of its attribute embeddings. By checking the attention weights, these methods can
infer how does each attribute cause the high/low rating score.
 Interpretation with Reviews. These methods leverage the review as the extra information to
the user-item interaction, which can infer the user’s altitude towards one item [8, 9, 28, 35, 50]. In
particular, several approaches aggregated review texts of users/items and adopted the attention
mechanism to learn the users/items embeddings [12, 28, 35, 50]. Based on the attention weights,
the model can highlight the high words in reviews as explanations. Different from highlighting the
review words as explanations, other researches attempt to automatically generate reviews for a
user-item pair [6, 9, 24, 29]. Specifically, Costa et al.[9] designed a character-level recurrent neural
network, which generates review explanations for user-item pair using long-short term memories.
Li et al.[24] proposed a more comprehensive model to generate tips in review systems. Inspired
by human information processing model in cognitive psychology, Chen et al. [6] developed an
encoder-selector-decoder architecture, which exploits the correlations between recommendation
and explanation through co-attentive multi-task learning.
 Interpretation with Reasoning Paths. This kind of approaches construct a user-item inter-
action graph and aim to find a explicitly path on the graph that traces the decision-masking
process [1, 14, 18, 44, 51, 52]. In particular, Ai et al. [1] constructed a user-item knowledge graph
of users, items, and multi-type relations (e.g., purchase and belong), and generated explanations
by finding the shortest path from the user to the item. Wang et al. [44] proposed a Knowledge
Graph Attention Network (KGAT) that explicitly models the high-order relations in the knowledge
graph in an end-to-end manner. Xian et al. [51] proposed a reinforcement reasoning approach
over knowledge graphs for interpretable recommendation, where agent starts from a user and is
trained to reach the correct items with high rewards. Further, considering that users and items
have different intrinsic characteristics, He et al. [14] designed a two-stage representation learning
algorithm for learning better representations of heterogeneous nodes. Yang et al. [57] proposed a
Hierarchical Attention Graph Convolutional Network (HAGERec) that involves the hierarchical
attention mechanism to exploit and adjust the contributions of each neighbor to one node in the
knowledge graph.
 Despite of their achievements in the explainable recommendation, existing approaches mainly a
discriminative method to provide an explanation, i.e., they provide an explanation for a given user-
item pair. However, in fact, users select items according to their inherent preferences. Therefore,
we argue that the explainable recommender systems should be a generative method that first
summarizes the user’s preferences and then the explanation can be easily derived from the overlap
between the user’s preferences and item properties.

3 THE PROPOSED DUPLE MODEL
We now present our proposed dual preference distribution learning framework (DUPLE), which
is illustrated in Fig. 1. It is composed of two key modules. 1) The parameter construction module,
which learns a general preference distribution to capture the user’s preferences to general items
by constructing its essential parameters (i.e., the mean vector and covariance matrix). And 2) the
general-specific transformation module, which bridges the gap between the item and its attributes.
It facilitates us to learn the user’s specific preference distribution by transforming the parameters
in the general preference distribution to their specific counterparts. In the rest of this section, we

 J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2018.
111:6 Xue Dong, Xuemeng Song, Na Zheng, Yinwei Wei, Zhongzhou Zhao, and Hongjun Dai

 Table 1. Summary of the Main Notations.

 Notation Explanation
 U, I, A The sets of users, items, and attributes respectively.
 I The set of historical interacted items of the user ∈ U.
 A The set of attributes of the item ∈ I.
 A The summarized preferred attribute profile of the user .
 D The training set of triplet ( , , ), ∈ U, ∈ I , ∉ I .
 , ID embeddings of the user and item , respectively.
 Bag-of-words embedding of the item .
 , Mean vector and covariance matrix of the user ’s general preference distribution.
 
 , Mean vector and covariance matrix of the user ’s specific preference distribution.
 , The general and specific preferences of the user to the item , respectively.
 
 The final preference of the user to the item .
 To-be-learned set of parameters.

first briefly define our recommendation problem in Subsection 3.1. Then, we detail the two key
modules in Subsections 3.2 and 3.3, respectively. Finally, we illustrate the user-item interaction
modeling in Subsection 3.4, followed by the explainable recommendation in Subsection 3.5.

3.1 Problem Definition
Without losing generality, suppose that we have a set of users U, a set of items I, and a set of item
attributes A that can be applied to describe all the items in I. Each user ∈ U is associated with a
set of items I that the user historically likes. Each item ∈ I is annotated by a set of attributes A .
Following mainstream recommender models [16, 22, 47], we describe a user (an item ) with an
ID embedding vector ∈ R ( ∈ R ), where denotes the embedding dimension. Besides, to
describe the item with its attributes, we represent the item by a bag-of-words embedding of its
attributes ∈ R | A | , where the -th elements = 1 refers to the item has the -th attribute in A.
 Inputs: The inputs of the dual preference distribution learning framework (DUPLE) consist of 3
parts: the set of users U, set of items I, and the set of item attributes A.
 Outputs: Given a user , item with its attributes A , DUPLE predicts the user-item interaction
from both the general item and specific attribute perspectives.
 To improve the readability, we declare the notations used in this paper. We use the squiggled
letters (e.g., X) to represent sets. The bold capital letters (e.g., ) and bold lowercase letters (e.g., )
to represent matrices and vectors, respectively. Let the non-bold letters (e.g., ) denote scalars. The
notations used in this paper are summarized in Table 1.

3.2 Parameter Construction Module
The user ’s general preference distribution G( , ) could capture his/her preferences to the
 
general items. More specifically, the mean vector refers to the center of the user’s general
 
preference, while the covariance matrix stands for the relationship among the latent variables
 
affecting the user’s preferences to items.
 The parameter construction module will separately construct the mean vector and covariance,
as they are different in the form and mathematical properties. In particular, the mean vector is a
 -dimensional vector indicating the center of the general preference. Therefore, we simply adopt
one fully connected layer to map the user’s ID embedding to the mean vector of the general

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2018.
Dual Preference Distribution Learning for Item Recommendation 111:7

preference distribution ∈ R as follows,
 
 (1)
 
 = + ,
where ∈ R × and ∈ R are the non-zero parameters to map the user’s ID embedding.
 As the covariance matrix is a symmetric and positive semi-definite matrix according to its
mathematical properties, it is difficult to construct it directly based on the user’s ID embedding.
 ′
Instead, we propose to first derive a low-rank matrix ∈ R × , ′ < , from the user’s ID
embedding as a bridge, and then construct the covariance matrix through the following equation,
 = T, (2)
 
 ′
where = arranged by
 [ 1 ; 2 ; · · · ; ] ′ column vectors, which can be derived by the column-
specific transformation as follows,
 = Σ + Σ , = 1, 2, · · · , ′, (3)
where Σ ∈ and
 R × ∈ are the non-zero parameters to derive the -th column of
 Σ R 
 . These parameters and the user’s ID embedding (i.e., non-zero vector) guarantee that the
 
 is a non-zero vector. With the simple algebra derivation, the covariance matrix derived
 
 T
according to Eqn. (2) is symmetric as = ( T ) T = T = , and positive semi-definite as
 
 = = || || ≥ 0, ∀ ≠ 0, ∈ R .
 T T T 2 

3.3 General-Specific Transformation
The general-specific transformation module works on predicting the item’s attribute representation
from its ID embedding. Benefiting from it forms a bridge between the item and its attributes, we can
learn the user’s specific preference from the general preference by transforming its mean vector
and covariance matrix to their specific counterparts. Instead of introducing another branch to
construct the specific preference distribution anew, i.e., learning its parameters and through
the user’s ID embedding, as the similar with the parameter construction module, this has the
following benefits. (1) The user’s general preference toward an item often comes from his/her
specific judgments toward the item’s attributes. In light of this, it is promising to derive the specific
preference distribution by referring the general preference distribution. (2) Learning the general
and specific preferences of the user with one branch allows the knowledge learned by each other
to be mutually referred. And 3) this would facilitate the specific preference modeling directly from
the pure general preference, enables the testing cases that lack item attribute details.
 In particular, following the approach [31], we adopt the linear mapping as the general-specific
transformation to predict the item ’s attribute representation ^ from its ID embedding as follows,
 ^ = , (4)
where ∈ is the parameter of the general-specific transformation.
 R | A |× 
 We adopt the ground-true item’s attribute label vector to supervise the general-specific
transformation learning. Specifically, we enforce the predicted item ’s attribute representation ^ 
to be as close as to its ground truth attribute label vector by the following objective function,
 ∑︁
 L = − log( ( ( ^ , ) − ( ^ , ))), (5)
 , ∈I, ≠ 

where is the ground-truth attribute label vector of another item . We estimate the similarity
between item ’s attribute representation and the ground-truth attribute label vector with their
 T ^
cosine similarity as: ( , ^ ) = | | | ^ | . By minimizing this objective function, the item ’s attribute
representation ^ is able to indicate what attributes the item has.

 J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2018.
111:8 Xue Dong, Xuemeng Song, Na Zheng, Yinwei Wei, Zhongzhou Zhao, and Hongjun Dai

 We then introduce how to construct the user ’s specific preference distribution G( , )
from the general one based on the general-specific transformation. Differently from the general
preference distribution, each dimension in the specific preference distribution refers to a specific
item’s attribute. Therefore, the mean vector ∈ R | A | , i.e., the center of the user’s specific
preference, indicates what attributes that the user prefers. The covariance matrix ∈ R | A |×| A |
stands for the relations of these preferences.
 Technically, we project the parameters of the general preference distribution, i.e., and ,
 
into their specific counterparts as follows,
  
 ^ = ,
 ^ = T . (6)
 
 ^ , are the mean vector and covariance
 In this manner, the transformed parameters, i.e., ^ and 
 ^ = . The rationality
matrix of the specific preference distribution, respectively, i.e., ^ = , 
proof is given as follows,
 • Argument 1. ^ is the mean vector of the user ’s specific preference distribution, i.e., ^ = .
 • Proof 1. Referring to the mathematical definition of the mean vector, and the additivity and
 homogeneity of the linear mapping, we have,
 1 ∑︁
 ^ = = ( )
 |I |
 ∈I 
 1 ∑︁ 1 ∑︁
 = = 
 |I | |I |
 ∈I ∈I 
 1 ∑︁
 = ^ := .
 |I |
 ∈I 

 □
 • Argument 2. ^ is the covariance matrix of the user ’s specific preference distribution, i.e.,
 ^ = .
 
 • Proof 2. Referring to the mathematical definition of the covariance matrix, and the additivity
 and homogeneity of the linear mapping, we have,
 ^ = T
 
 1 ∑︁
 ( − )( − ) T ) T
 
 = (
 |I |
 ∈I 
 1 ∑︁
 ( − )( − ) T T
 
 =
 |I |
 ∈I 
 1 ∑︁
 ( ( − ))( ( − )) T
 
 =
 |I |
 ∈I 
 1 ∑︁
 ( − )( − ) T
 
 =
 |I |
 ∈I 
 1 ∑︁
 = ( ^ − ^ )( ^ − ^ ) T
 |I |
 ∈I 
 1 ∑︁
 = ( ^ − )( ^ − ) T := .
 |I |
 ∈I 
 □

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2018.
Dual Preference Distribution Learning for Item Recommendation 111:9

Algorithm 1 Dual Preference Distribution Learning Framework (DUPLE).
Input: The sets of users U and items I. The training set D. The set of attributes A of each ∈ I.
 The number of the optimization step .
Output: The parameters of DUPLE.
 1: Initialize the user embedding , ∈ U, and item embedding , ∈ I.
 Initialize the model parameters .
 2: for in [1, · · · , ] do
 3: Randomly draw a mini-batch of training triplets ( , , ) from D.
 Calculate the mean vector and covariance matrix of the general preference distribution
 
 4:
 through Eqn. (1) and (2), respectively.
 5: Calculate the mean vector and covariance matrix of the specific preference distribution
 through Eqn. (6).
 6: Predict the user-item interaction through Eqn. (9).
 7: Update the parameters through Eqn. (11):
 L
 ← − 
 8: end for

3.4 User-Item Interaction Modeling
We predict the user-item interaction by the probability density of the item in the both the general
and specific preference distributions. In particular, in the user ’s general preference distribution, the
probability density of the item ’s ID embedding can be as the proxy to the general preference 
 
of the user toward the item . Formally, we define as follows,
 
 = ( | , )
 1 1 (7)
 exp − ( − ) T ( ) −1 ( − ) ,
 
 = √︁ 
 2 | | 2
where | | and ( ) −1 are the determinant and inverse matrix of the covariance matrix of the
 
user’s general preference distribution defined in Eqn. (2), respectively. We can define the specific
preference 
 of the user toward the item as the probability density of the item ’s attribute

representation ^ in the similar form as follows,
 
 = ( ^ | , )
 1 1 (8)
 exp − ( ^ − ) T ( ) −1 ( ^ − ) .
 
 = √︁
 2 | | 2
 Ultimately, the user-item interaction between the user and item can be defined as the linear
combination of the two preferences as follows,
 (9)
 
 = + (1 − ) ,
where ∈ [0, 1] is a hyper-parameter for adjusting the trade-off between the two terms.
 Regarding the optimization of the general preference modeling, we enforce the interaction
between a user and his/her historically interacted item larger than other items with no historical
interaction through the following objective function,
 ∑︁ 
 L = − log( ), (10)
 + 
 ( , , ) ∈ D

where D : {( , , )| ∈ U, ∈ I , ∉ I } is the set of training triplets, where item is the user’s
preferred item while item is the item that the user has not interacted. The triplet ( , , ) indicates

 J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2018.
111:10 Xue Dong, Xuemeng Song, Na Zheng, Yinwei Wei, Zhongzhou Zhao, and Hongjun Dai

 User ID embedding 

 Parameter A Given Item
 Construction

 General-Specific
 Transformation
 Mean Vector of
 Specific Distribution 

 User’s Preferred Drame Fantasy
 Overlap Item Attributes
 Attribute Profile Adventure Adventure

 We recommend this item, because you like adventure movies. Explanation

Fig. 2. Illustration of the explanation generation of the proposed DUPLE. We first derive the user’s preferred
attribute profile with the mean vector of the learned user specific distribution. We then use the overlapped
attributes between the learned attribute profile and the recommended item’s attributes as the explanation.

that the user prefers item than item . It is worth noting that we can increase the number of the
items that have not been interacted.
 Ultimately, the final objective function of the proposed method is defined as follows,
 L = min L + L , (11)
 
where L and L are defined in Eqn. (5) and (10), respectively. refers to the set of to-be-learned
parameters of the proposed framework. The detailed training process of DUPLE is summarized in
Algorithm 1.

3.5 Explanation Generation
In addition to the item recommendation, DUPLE can also explain why to recommend an item to a
user as a byproduct, as shown in Fig. 2. In particular, in the specific preference distribution, each
dimension refers to a specific attribute and the mean vector indicates the center of the user’s specific
preference. Accordingly, we can infer what attributes the user likes and summarize a preferred
attribute profile for the user. Technically, suppose that the user prefers attributes totally, then
we can define the preferred attribute profile A as follows,

 1, 2, ..., = arg max ( ), ∈ R | A | ,
 
 (12)
 A = { 1 , 2 , ..., },
where arg max (·) is the function returning the indices of the largest elements in the vector. 
is the -th attribute in A. After building the preferred attribute profile of the user , for a given
recommended item associated with its attributes A , we can derive the reason why the user likes
the item by checking the overlap between A and A . Suppose that A ∩ A = { , }. Then
we can provide the explanation for user as “you may like the item for its attributes and ”.
 Besides, it is worth mentioning that the diagonal elements of the covariance matrix in the
specific preference distribution can capture the significances of user’s different preferences to
attributes. Specifically, if the value of the diagonal element in one dimension is small, its tiny
change will sharply affect the prediction of the user-item interaction, and we can infer that the
user’s preference to the attribute in the corresponding dimension is “strict”, i.e., significant.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2018.
Dual Preference Distribution Learning for Item Recommendation 111:11

Table 2. Statistics of the six public datasets (after preprocessing), including the numbers of users (#user),
items (#item), their interactions (#rating), and attributes (#attri.), as well as the density of the dataset

 Amazon Product Dataset
 #user #item #rating #attri. density
 Women’s Clothing 19,972 285,508 326,968 1,095 0.01%
 Men’s Clothing 4,807 43,832 70,723 985 0.03%
 Cell Phone & Accessories 9,103 51,497 132,422 1,103 0.03%
 MovieLens Dataset
 #user #item #rating #attri. density
 MovieLens-small 579 6,296 48,395 761 1.33%
 MovieLens-1M 5,950 3,532 574,619 587 2.73%
 MovieLens-10M 66,028 10,254 4,980,475 489 0.74%

4 EXPERIMENTS
In this section, we first introduce the dataset details and experimental settings in Subsections 4.1
and 4.2, respectively. And then, we conduct extensive experiments by answering the following
research questions:
 (1) Does DUPLE outperform the state-of-the-art methods?
 (2) How do the different variants of the Gaussian distribution perform?
 (3) What are the learned relations of the user’s preferences?
 (4) How is the explainable ability of DUPLE for the item recommendation?

4.1 Dataset and Pre-processing
To verify the effectiveness of DUPLE, we adopted six public datasets with various sizes and densities:
the Women’s Clothing, Men’s Clothing, Cell Phones & Accessories, MovieLens-small, MovieLens-
1M, and MovieLens-10M. The former three datasets are derived from Amazon Product dataset [30],
where each item is associated with a product image and a textual description. The latter three
datasets are released by MovieLens dataset [13], where each movie has the title, published year,
and genre information. User ratings of all the six datasets range from 1 to 5. To gain the reliable
preferred items of each user, following the studies [26, 27], we only kept the user’s ratings that are
larger than 3. Meanwhile, similar to the studies [11, 19], for each dataset, we filtered out users and
items that have less than 10 interactions to ensure the dataset quality.
 Since there is no attribute annotation in all the datasets above, following the studies [43, 50], we
adopted the high-frequency words in the item’s textual information as the ground truth attributes
of items. Specifically, for each dataset, we regarded the words that appear in the textual description
of more than 0.1% of items in the dataset as the high-frequency words. Notably, the stopwords (like
“the”) and noisy characters (like “/”) are not considered. The final statistics of the datasets are listed
in Table 2, including the numbers of users (#user), items (#item), their interactions (#rating), and
attributes (#attri.), as well as the density of the dataset. Similar to the study [47], we calculated
the dataset density by the formula ## ×# . To gain a better understanding of the dataset, we
 
further provide the data distribution of users with different number of interacted items in Fig. 3.
The horizontal axis refers to the number range of interacted items. We omitted the users that have
more than 120 interacted items in Fig. 3, because their proportion is too small.

4.2 Experimental Settings
Baselines. We compared the proposed DUPLE method with the following baselines.

 J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2018.
111:12 Xue Dong, Xuemeng Song, Na Zheng, Yinwei Wei, Zhongzhou Zhao, and Hongjun Dai

 0.6
 Women's Clothing
 0.5
 Men's Clothing

 (User Proportion)
 0.4 Cell Phone & Accessories
 MovieLens-small
 0.3
 MovieLens-1M
 0.2 MovieLens-10M

 0.1

 0
 10 20 30 40 50 60 70 80 90 100 110 120
 (Number of Interacted Items)

Fig. 3. Data distribution of users with different number of interacted items of the six datasets. The horizontal
and longitudinal axes refer to the number range of the interacted items, and the user proportion, respectively.

 • VBPR [15]. This method represents the item with both the ID embedding and visual content
 feature2 . The user’s preferences to an item are measured by the inner product between their
 corresponding representations. Ultimately, VBPR adopts the BPR framework to enforce that
 the user’s preferences toward a preferred item are larger than another item.
 • AMR [40]. To enhance the recommending robustness, this approach involves the adversarial
 learning onto the VBPR model. Specifically, it trains the network to defend an adversary by
 adding perturbations to the item image.
 • DeepICF+a [53]. This method assesses the user’s preferences toward an item by evaluating
 the visual similarity between the given item and the user’s historical preferred items. It sums
 the visual features of historical preferred items based on the similarities between the given
 item and historical items. The result is then fed into a multi-layer perceptron to predict the
 user-item interaction.
 • NARRE [3]. This method adds the item’s attributes into the matrix factorization. It enriches
 the item representation with the attribute representation, which is the weighted-summation of
 all the word-embedding of the item’s attributes. Ultimately, based on the learned weights for
 the attribute representations, NARRE can capture the user’s specific preferences to attributes.
 • AMCF [38]. This method also adds the item’s attributes into the matrix factorization by
 enriching the item representation with the attribute representation. Different from NARRE,
 AMCF regards attributes as the orthogonal vectors, which is only as a regularization for the
 item representation learning. Similarly, according to the attention weights, AMCF can capture
 the user’s specific preferences to different attributes.
 It is worth noting that the first three baselines, i.e., VBPR, AMR, and DeepICF+a, only focus
on the user’s general preference, termed as single-preference-based (SP-based) method. The rest
methods, i.e., NARRE, AMCF, and our proposed DUPLE consider both the general and specific
preferences during recommending items, termed as dual-preference-based (DP-based) method.
 Data Split. We adopted the widely used leave-one-out evaluation [7, 11, 16] to split the training,
validation, and testing sets. In particular, for each user, we randomly selected an item from his/her
preferred items for validation and testing, respectively, and leaved the rest for training. In the
validation and testing, in order to avoid the heavy computation on all user-item pairs, following the
studies [11, 16], we composed the candidate item set by one ground-truth item and 100 randomly
selected negative items that have not been interacted by the user.
2 It is released in http://jmcauley.ucsd.edu/data/amazon/links.html.

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2018.
Dual Preference Distribution Learning for Item Recommendation 111:13

 MRR HR@10 Loss

 Women’s Clothing Men’s Clothing Cell Phone & Accessories
 0.5 0.70 0.5 0.70 0.5 0.70
 0.4 0.60 0.4 0.60 0.4 0.60

 Trainin Loss

 Trainin Loss

 Trainin Loss
 0.3 0.50 0.3 0.50 0.3 0.50
Metrics

 Metrics

 Metrics
 0.2 0.40 0.2 0.40 0.2 0.40
 0.1 0.30 0.1 0.30 0.1 0.30

 0 0.20 0 0.20 0 0.20
 0 120 240 360 480 600 720 (K) 0 80 160 240 320 400 480 560 (K) 0 120 240 360 480 600 720 (K)
 Training Step Training Step Training Step

 MovieLens-small MovieLens-1M MovieLens-10M
 0.5 0.65 0.5 0.65 0.5 0.65

 0.4 0.58 0.4 0.60 0.4 0.60

 Trainin Loss
 Trainin Loss

 Trainin Loss
 0.3 0.3 0.55 0.3 0.55
 Metrics
Metrics

 0.51

 Metrics
 0.2 0.44 0.2 0.50 0.2 0.50
 0.1 0.37 0.1 0.45 0.1 0.45
 0 0.30 0 0.40 0 0.40
 0 50 100 150 200 250 300 (K) 0 100 200 300 400 500 (K) 0 140 280 420 560 700 840 (K)
 Training Step Training Step Training Step

Fig. 4. The curves during the training on the six datasets of the proposed DUPLE model. The training loss is
shown in grey line corresponding to the right longitudinal axis. MRR and HR@10 on the validation set are
shown in orange and blue lines, respectively, corresponding to the left longitudinal axis.

 Evaluation Metrics. We adopted the Area Under Curve (AUC), Mean Reciprocal Rank (MRR),
Hit Rate (H@10) and Normalized Discounted Cumulative Gain (N@10) truncated the ranking list at
10 to comprehensively evaluate the item recommendation performance. In particular, AUC indicates
the classification ability of the model in terms of distinguishing the user’s likes and dislikes. MRR,
H@10, and N@10 reflect the ranking ability of the model in terms of the top-N recommendation.
 Implementation Details. For all methods, following the study [11], we unified the dimension
 of the item’s and user’s ID embedding as 64 for the fair comparison. Besides, to gain the powerful
representation of the user and item, we added two fully connected layers to transform the raw ID
embeddings of the user and item to the condensed ones, respectively, before feeding them into
the network. We used the random normal initialization for the parameters and trained them by
Adam optimizer [20] with a learning rate of 10−4 and batch size of 128. For different dataset, the
number of the optimization steps is different. This is because a large dataset needs more steps to
converge. We showed the curves of the training loss in Eqn. (11) and metrics on the validation set
on the six datasets in Fig. 4. Without losing the generality, we reported the MRR and NDCG@10.
For our method, we tuned the trade-off parameter in Eqn.(9) from 0 to 1 with the stride of 0.1 for
each dataset, and the dimension ′ in Eqn.(2) from [2, 4, 8, 16, 32].

4.3 Comparison of Baselines (RQ1)
For each method, we reported its average results of three runs with different random initialization of
model parameters. Furthermore, we conducted one-sample t-tests to justify that all of enhancements.
Table 3 and Table 4 shows the performance of baselines and our proposed DUPLE method on the
Amazon and MovieLens datasets, respectively. The best and second-best results are in bold and
underlined, respectively. The row “%improv” indicates the relative improvement of DUPLE over
the best baseline. The statistical significance is marked with the superscript ∗ . From Table 3 and 4,
we have the following observations:

 J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2018.
111:14 Xue Dong, Xuemeng Song, Na Zheng, Yinwei Wei, Zhongzhou Zhao, and Hongjun Dai

Table 3. Experiment results of Amazon datasets. The best and second-best results are in bold and underlined,
respectively. %improv is the relative improvement of our method over the strongest baselines. The superscript
∗ denotes the statistical significance for -value < 0.05.

 Amazon Metrics
 Method
 Dataset AUC MRR HR@10 HDCG@10
 VBPR 71.42 13.10 29.45 14.85
 AMR 72.68 15.14 34.22 16.39
 DeepICF+a 74.46 20.20 40.21 23.43
 Women’s Clothing
 NARRE 72.04 15.21 34.52 18.15
 AMCF 73.61 14.73 35.68 17.69
 DUPLE 77.48∗ 20.35∗ 42.43∗ 23.60∗
 %improv +4.07 +0.76 +5.54 +0.73
 VBPR 71.08 13.63 31.07 15.72
 AMR 73.84 15.82 32.67 15.49
 DeepICF+a 74.82∗ 17.31∗ 38.23∗ 22.41
 Men’s Clothing
 NARRE 69.84 14.02 31.07 16.28
 AMCF 73.66 15.70 35.95 18.47
 DUPLE 76.00∗ 19.35∗ 40.95 23.62
 %improv +1.58 +11.84 +7.13 +5.40
 VBPR 73.42 15.83 34.07 18.17
 AMR 74.42 16.38 36.43 20.97
 DeepICF+a 76.12 16.71 38.33 20.01
 Cell Phone & Accessories
 NARRE 74.37 15.66 35.33 19.74
 AMCF 75.01 14.90 34.98 19.46
 DUPLE 80.65∗ 20.38∗ 45.47∗ 25.14∗
 %improv +5.95 +21.97 +18.63 +25.64

 (1) AMR outperforms VBPR on all the datasets. This indicates that adding perturbations enables
 the network to be robust and has a better generalization ability.
 (2) DeepICF+a generally achieves better performance than VBPR and AMR in the datasets
 derived from Amazon Product. This indicates that considering the contents of items that a
 user historically prefers is beneficial to comprehensively capture the user’s preference. On the
 contrary, in the datasets derived from MovieLens, DeepICF+a underperforms VBPR and AMR.
 This suggests that the dense MovieLens datasets, in which each user may have too many
 interacted items, could lead to the information redundancy, whereby decrease the model
 performance.
 (3) In datasets from MovieLens, on average, DP-based methods (i.e., NARRE, AMCF, and DUPLE)
 gain the better performance than the SP-based methods (i.e., VBPR, AMR, and DeepICF+a).
 It proves that jointly learning the user’s general and specific preferences helps to better
 understand the user overall preferences and gains a better recommendation performance.
 (4) However, in datasets derived from Amazon Product, it is unexpected that DP-based methods
 underpreform DeepICF+a. This may be because that, in these datasets, DeepICF+a incorporates
 the item visual feature to enhance the item representation, which is essential to tackle the
 visual-oriented recommendation task (e.g., fashion recommendation).
 (5) DUPLE outperforms all the baselines in terms of all metrics across different datasets, which
 demonstrates the superiority of our proposed framework over existing methods. This may
 be due to the fact that by modeling the user preferences with the probabilistic distributions,
 DUPLE is capable of exploring the correlations of items and attributes that a user likes,

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2018.
Dual Preference Distribution Learning for Item Recommendation 111:15

Table 4. Experiment results of MovieLens datasets. The best and second-best results are in bold and underlined,
respectively. %improv is the relative improvement of our method over the strongest baselines. The superscript
∗ denotes the statistical significance for -value < 0.05.

 MovieLens Metrics
 Method
 Dataset AUC MRR HR@10 HDCG@10
 VBPR 72.04 12.12 29.36 13.49
 AMR 74.66 12.66 32.70 13.85
 DeepICF+a 75.03 12.56 31.34 16.69
 MovieLens-small
 NARRE 76.30 13.23 31.84 17.41
 AMCF 76.06 11.91 31.15 17.04
 DUPLE 77.48∗ 20.35∗ 42.43∗ 23.60∗
 %improv +3.60 +12.89 +16.56 +19.47
 VBPR 73.95 15.26 33.83 17.23
 AMR 74.73 15.37 34.21 18.52
 DeepICF+a 72.65 13.87 31.88 14.71
 MovieLens-1M
 NARRE 77.54 15.87 37.11 17.97
 AMCF 77.90 15.42 38.18 18.79
 DUPLE 78.21 17.39∗ 39.85∗ 20.49∗
 %improv +0.41 +9.64 +4.38 +9.05
 VBPR 76.19 15.17 34.97 17.47
 AMR 78.12 16.15 38.47 18.39
 DeepICF+a 74.06 13.73 32.19 14.86
 MovieLens-10M
 NARRE 82.12 19.39 43.60 21.36
 AMCF 80.50 16.68 40.91 20.68
 DUPLE 82.71 21.09∗ 47.23∗ 24.62∗
 %improv +0.72 +8.77 +8.34 15.26

 whereby gains the better performance. This also suggests that there does exist the latent
 correlation among the items and attributes that a user prefers.

4.4 Comparison of Variants (RQ2)
In order to verify that the user’s different preferences are related, i.e., each dimension of the user’s
two preference distributions are dependent, we introduced the variant of our model, termed as
DUPLE-iden, whose covariance matrices of the two distributions (i.e., and ) are set to be
 
diagonal matrices, i.e., the off-diagonal elements of the covariance matrix are all zeros.
 Besides, to further demonstrate that the user’s different preferences contribute differently to
predict the user-item interaction, we designed DUPLE-diag method that = = E, where E is
 
an identity matrix. Formally, according to Eqn.(7) and Eqn.(8), by omitting constant terms, the user
 1
 ∝ − 12 | | ^ − | | 2 ,
 
 ’s general and specific preferences can be simplified as ∝ − 2 | | − | |2 and 
 
respectively. Intuitively, the variant DUPLE-diag leverages Euclidean distance to measure the user
preference, whose philosophy is as similar as the baselines we used.
 For each method, we reported the average result of three runs with different random initialization
of the model parameters. The results are shown in Table 5 and the detailed analysis is given as
follows.
 (1) DUPLE-diag performs worse than DUPLE. The reason behind may be that, equipped with
 the non-diagonal covariance matrix, the specific preference distribution in DUPLE is able
 to capture the relations of the user’s preferences, which helps to better understand the user
 preferences and gain a better performance.

 J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2018.
111:16 Xue Dong, Xuemeng Song, Na Zheng, Yinwei Wei, Zhongzhou Zhao, and Hongjun Dai

Table 5. The results of the comparison of different variants of DUPLE. DUPLE-iden and DUPLE-diag set the
covariance matrix of the preference distributions are identity and diagonal matrices, respectively. The best
results are highlighted in bold.

 Metrics
 Method
 AUC MRR HR@10 HDCG@10
 DUPLE-iden 72.23 18.48 37.14 21.09
 Women’s Clothing DUPLE-diag 75.19 18.43 40.14 21.22
 DUPLE 77.48 20.35 42.43 23.60
 DUPLE-iden 72.80 19.27 38.86 20.38
 Men’s Clothing DUPLE-diag 75.76 18.16 41.26 22.16
 DUPLE 76.00 19.35 40.95 23.62
 DUPLE-iden 77.84 19.49 41.10 24.00
 Cell Phone & Accessories DUPLE-diag 80.64 20.99 46.64 25.65
 DUPLE 80.65 20.38 45.47 25.14
 DUPLE-iden 77.26 14.89 36.26 19.32
 MovieLens-Small DUPLE-diag 78.33 15.13 36.09 18.11
 DUPLE 79.07 16.43 38.86 20.80
 DUPLE-iden 76.38 16.56 37.65 19.57
 MovieLens-1M DUPLE-diag 76.53 15.66 36.87 17.57
 DUPLE 78.21 17.39 39.85 20.49
 DUPLE-iden 79.16 17.57 40.32 20.93
 MovieLens-10M DUPLE-diag 80.07 18.48 41.84 21.74
 DUPLE 82.71 21.09 47.23 24.62

 (2) DUPLE-iden performs comparably with the existing baselines on average. The reason behind
 may be that with the same identity covariance matrix setting, as the same as baselines, DUPLE-
 iden measures the user preferences based on the Euclidean distance between the user and
 item representations, and achieves the similar performance.
 (3) In Cell Phone & Accessories dataset, it is unexpected that DUPLE performs worse than
 DUPLE-diag on average. This may be attributed to that the user’s preferences to items in this
 category rarely interact with each other. For example, whether a user prefers black phone
 will not be influenced by whether he/she prefers its LCD-screen. Therefore, leveraging more
 parameters to capture these relations of user’s preferences only decreases the performance
 of DUPLE.

4.5 Visualization of User’s Preferences Relations (RQ3)
In order to further gain a deeper insight of the relations of the user’s preferences, we visualized the
learned covariance matrix of the specific preference distribution of a randomly selected user in
the Women’s Clothing and MovieLens-1M datasets, respectively. For clarity, instead of visualizing
the relations of the user’s preferences to all the attributes, we randomly picked up 20 attributes
and visualized their corresponding correlation coefficients in the covariance matrix of the specific
preference distribution in each dataset by heat maps in Fig. 5. The darker blue color indicates a
higher relation of two user’s preferences. We circled the four most prominent preference pairs,
where two pairs with the highest relation are surrounded by the yellow boxes, and two pairs with
the lowest relation by the red boxes.
 From Fig. 5, we can see that in the dataset Women’s Clothing, this user’s preferences to attributes
black and sexy, as well as attributes silk and sexy, are highly relevant, while those to attributes
casual and sexy, as well as attributes sweatshirt and sexy are less relevant. This is reasonable as

J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2018.
Dual Preference Distribution Learning for Item Recommendation 111:17

 Wonderful

 Adventure
 Sweatshirt

 Animation

 Superman
 Childrens
 Romance

 Midnight
 Leggings

 America

 Comedy
 Princess
 Running

 Women
 Metallic

 Fantasy

 Fantasy
 Athletic

 Friends
 Golden

 Winter

 Drama
 Father
 Bootie

 Happy

 Action
 Casual

 Sports

 Lovely

 Fabric

 Crime
 Tights

 Short
 Black

 Mini
 Sexy

 Lace

 Scifi
 Soft
 Black Silk Friends
 1.00
 Tights Fantasy
 Casual Happy 0.75
 Bootie Animation
 Sexy America 0.50
 Silk Princess
 Sports Romance
 Mini Wonderful 0.25
 Golden Childrens
 Athletic Father 0.00
 Short Fantasy
 Lovely Superman
 Winter Adventure -0.25
 Lace Midnight
 Metallic Scifi -0.50
 Fabric Comedy
 Leggings Crime
 -0.75
Sweatshirt Drama
 Soft Action
 Running Women -1.00
 (a) Women’s Clothing (b) MovieLens-1M

Fig. 5. Relation visualization for the user’s preferences of a random user in Women’s Clothing (a) and
MovieLens-1M (b) datasets, respectively. The darker color indicates the higher relation, where two highest
and lowest preference pairs are surrounded by yellow and red boxes, respectively.

one user that prefers the sexy garments are more probably to like garments in black color, but
hardly like casual garments. As for the MovieLens-1M dataset, the user’s preference to attribute
Princess has the high relevance with the preferences to attributes Friends and Women, while the
user’s preferences to the attribute Scifi are mutually exclusive with those to attributes Animation
and Princess. These relations uncovered by DUPLE also make sense. Overall, these observations
demonstrate that the covariance matrix of one’s specific preference distribution is able to capture
the relations of the his/her preferences.

4.6 Explainable Recommendation (RQ4)
To evaluate the explainable ability of our proposed DUPLE, we first provided examples of our
explainable recommendation in Subsection 4.6.1. And then, we conducted a subjective psycho-visual
test to judge the explainability of the proposed DUPLE in Subsection 4.6.2.
4.6.1 Explainable Recommendation Examples. Fig. 6 shows four examples of our explainable recom-
mendation, for each the user’s historical preferred items including images and textual descriptions
are listed in the left column, the learned user’s preferred attribute profile is shown in the center
column, and an item associated with the recommended explanations is given in the right column.
The users in Fig. 6 from the top to bottom are from the datasets Men’s Clothing, Women’s Clothing,
Cell Phone & Accessories, and MovieLens-1M, respectively. The reason why we only provided
the example in the dataset MovieLens-1M rather than all the three datasets MovieLens-small,
MovieLens-1M, and MovieLens-10M, lies in that items in these three datasets are all movies. From
Fig. 6, we have the following observations.
 (1) The summarized users’ preferred attribute profiles in the center column are in line with the
 users’ historical preferences. For example, as for the first user, he has bought many sporty shoes
 and upper clothes, based on which we can infer that he likes sports and prefers the sporty style.
 These inferences are consistent with the preferred attributes DUPLE summarizes, e.g., balance
 and running. In addition, as for the 4-th user, he/she has watched several animations like

 J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2018.
You can also read