Personalized News Recommendation: A Survey - arXiv

Page created by Glenn Fox
 
CONTINUE READING
Personalized News Recommendation: A Survey - arXiv
Personalized News Recommendation: A Survey

 CHUHAN WU, Department of Electronic Engineering & BNRist, Tsinghua University, China
 FANGZHAO WU, Microsoft Research Asia, China
 YONGFENG HUANG, Department of Electronic Engineering & BNRist, Tsinghua University, China
 XING XIE, Microsoft Research Asia, China
arXiv:2106.08934v2 [cs.IR] 8 Jul 2021

 Personalized news recommendation is an important technique to help users find their interested news
 information and alleviate their information overload. It has been extensively studied over decades and
 has achieved notable success in improving users’ news reading experience. However, there are still many
 unsolved problems and challenges that need to be further studied. To help researchers master the advances in
 personalized news recommendation over the past years, in this paper we present a comprehensive overview of
 personalized news recommendation. Instead of following the conventional taxonomy of news recommendation
 methods, in this paper we propose a novel perspective to understand personalized news recommendation
 based on its core problems and the associated techniques and challenges. We first review the techniques for
 tackling each core problem in a personalized news recommender system and the challenges they face. Next,
 we introduce the public datasets and evaluation methods for personalized news recommendation. We then
 discuss the key points on improving the responsibility of personalized news recommender systems. Finally,
 we raise several research directions that are worth investigating in future. This paper can provide up-to-date
 and comprehensive views to help readers understand the personalized news recommendation field. We hope
 this paper can facilitate research on personalized news recommendation and as well as related fields in natural
 language processing and data mining.
 CCS Concepts: • Information systems → Recommender systems; Personalization; • Computing method-
 ologies → Natural language processing.
 Additional Key Words and Phrases: news recommendation, personalization, survey, user modeling, natural
 language processing
 ACM Reference Format:
 Chuhan Wu, Fangzhao Wu, Yongfeng Huang, and Xing Xie. 2021. Personalized News Recommendation: A
 Survey. 1, 1 (July 2021), 40 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn

 1 INTRODUCTION
 In the era of the Internet, online news distributing platforms such as Microsoft News1 have attracted
 hundreds of millions of users [172]. Due to the convenience and timeliness of online news services,
 many users have shifted their news reading habits from conventional newspapers to digital news
 content [115]. However, a large number of news articles are created and published every day,
 and it is impossible for users to browse through all available news to seek their interest news
 1 https://microsoftnews.msn.com

 Authors’ addresses: Chuhan Wu, Department of Electronic Engineering & BNRist, Tsinghua University, Beijing, 100084,
 China, wuchuhan15@gmail.com; Fangzhao Wu, Microsoft Research Asia, Beijing, China, 100080, wufangzhao@gmail.com;
 Yongfeng Huang, Department of Electronic Engineering & BNRist, Tsinghua University, Beijing, 100084, China, yfhuang@
 tsinghua.edu.cn; Xing Xie, Microsoft Research Asia, Beijing, China, 100080, xingx@microsoft.com.

 Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
 provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
 the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.
 Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
 prior specific permission and/or a fee. Request permissions from permissions@acm.org.
 © 2021 Association for Computing Machinery.
 XXXX-XXXX/2021/7-ART $15.00
 https://doi.org/10.1145/nnnnnnn.nnnnnnn

 , Vol. 1, No. 1, Article . Publication date: July 2021.
Personalized News Recommendation: A Survey - arXiv
2 Work in Progress

 News
 Candidate
 Pool
 News
 News Top K
 User
 Platform News

 … …
 ① Visit ② Recall ③ Ranking

 User
 Profile

 Personalized News
 Recommender

 ④ Display

 ⑤ Update

 User Behaviors

 Fig. 1. An example workflow of personalized news recommender systems.

information [156]. Thus, personalized news recommendation techniques, which aim to select
news according to users’ personal interest, are critical for news platforms to help users alleviate
their information overload of users and improve news reading experience [92]. Researches on
personalized news recommendation have also attracted increasing attention from both academia
and industry in recent years [115, 155].
 An example workflow of personalized news recommender system is shown in Fig. 1. When a
user visits the news platform, the news platform will recall a small set of candidate news from a
large-scale news pool, and the personalized news recommender will rank these candidate news
articles according to the user interests inferred from user profiles. Then, the top K ranked news
will be displayed to the user, and the user behaviors on these news will be recorded by the platform
to update the maintained user profile for providing future services. Although many prior works
have extensively studied these problems in different aspects, personalized news recommendation
remains challenging. For example, news articles on news websites usually have short life cycles.
Many new articles emerge every day, and old ones will expire after a short period of time. Thus,
news recommendation faces a severe cold-start problem. In addition, news articles usually contain
rich textual information such as title and body. Thus, it is very important to understand news
content from their texts with advanced natural language processing techniques. Moreover, there
is usually no explicit user feedback such as reviews and ratings on news platforms. Thus, we
need to infer the personal interests of users from their implicit feedback like clicks. However,
user interests are usually diverse and dynamic, which poses great challenges to user modeling
algorithms. The complexity of personalized news recommendation makes it a fascinating research
topic with various challenges to be tackled [34].
 A comprehensive overview of existing personalized news recommendation approaches can
provide useful guidance for future research in this field. Over the past years, there are many survey
papers that review the techniques of news recommendation [6, 8, 30–32, 34, 51, 66, 89, 92, 117, 130,
143]. For example, Li et al. [92] reviewed the personalized news recommendation methods based
on handcrafted features to build news and user representations. They covered many traditional

, Vol. 1, No. 1, Article . Publication date: July 2021.
Personalized News Recommendation: A Survey - arXiv
Personalized News Recommendation: A Survey 3

methods, including collaborative filtering (CF) based ones that use the IDs of users and news, content-
based ones that use features extracted from the content of news and the user behaviors on news,
and hybrid ones that rely on content-based collaborative filtering. They also studied the datasets
used by these methods and their techniques for user and news representation construction, data
processing and user privacy protection. Feng et al. [34] reviewed news recommendation approaches
in many different scenarios including personalized and non-personalized ones. For personalized
news recommendation methods, they also classify them into three categories, i.e., CF-based, content-
based, and hybrid ones. They mainly studied the techniques adopted by different methods, the
challenges they tackled, and the datasets and metrics for evaluation. However, many recent works
especially those based on deep learning are not covered by existing survey papers, which is not
beneficial for researchers to track recent advances in the personalized news recommendation field.
In addition, the conventional news recommendation method taxonomy (i.e., CF-based, content-
based and hybrid) used by many existing surveys cannot meet the development of this field, and a
more systematic overview of existing news recommendation methods is needed to help understand
their characteristics and inspire further research.
 In this paper, we present a comprehensive review of the personalized news recommendation
field. Instead of reviewing existing personalized news recommendation methods based on the
conventional taxonomy, in this survey we propose a novel perspective to review them based on
the core problems involved in personalized news recommendation and the associated techniques
and challenges. We first introduce the framework of developing a personalized news recommender
system in Section 2. Next, we systematically review the core problems, techniques and challenges in
personalized news recommendation, including: news modeling, user modeling, news ranking, model
training, datasets, benchmarks and evaluation, which are introduced in Sections 3-7, respectively.
We then present some discussions on developing responsible news recommender systems in
Section 10, and raise several potential future directions in Section 11. Finally, we conclude this
paper in Section 12.

2 FRAMEWORK OF PERSONALIZED NEWS RECOMMENDATION
Personalized news recommendation techniques have been widely used in many online news
websites [115, 172]. Different from non-personalized news recommendation methods that suggest
news articles solely based on non-personalized factors [79] such as news popularity [22, 103,
106, 175], editors’ demonstration [152] and geographic information [17, 141], personalized news
recommendation can consider the personal interest of each individual user to provide personalized
news services and better satisfy users’ need.
 Existing surveys on personalized news recommendation usually classify methods into three
categories, i.e., collaborative filtering-based, content-based and hybrid ones [92]. However, this
classification criteria cannot adapt to the recent advances in news recommendation because many
methods with diverse characteristics fall in the same category without distinguishment. For example,
the category of content-based method includes traditional semantic-based methods, contextual
bandit-based methods and recent deep learning-based methods, which is difficult for researchers
to understand the technical paradigm of personalized news recommendation. Thus, a systematic
overview of existing techniques is required to help understand the development of this field.
 Instead of following the conventional taxonomy, in this survey we propose a novel perspective
to review existing personalized news recommendation techniques based on the core problems
involved in the development of a personalized news recommender system. A common framework
of personalized news recommendation model development is shown in Fig. 2. We can see that
there are several key problems in this framework. First, news modeling is the backbone of news
recommendation and a core problem is how to understand the content and characteristics of news.

 , Vol. 1, No. 1, Article . Publication date: July 2021.
4 Work in Progress

 Objective Ranking Results

 … Evaluation
 Training

 Model

 Responsibility
 Ranking
 Accuracy

 User News
 Representation Representation

 User News
 Model Model

 User Data
 Profile

 User
 News

 Fig. 2. A framework of the key components in developing personalized news recommendation model.

In addition, user modeling is required to understand the personal interest of users on news, and
it is critical to accurately infer user interest from user profiles like behaviors. Based on the news
and user representations built by the news and user models, the next step is ranking candidate
news according to certain policies such as the relevance between news and user interest. Then,
it is important to train the recommendation model with proper objectives to make high-quality
news recommendations, and evaluating the ranking results given by the recommendation model
is also a core problem in the development of personalized recommender systems. Besides, the
datasets and benchmarks for news recommendation are also necessities in designing personalized
news recommendation models. Moreover, beyond developing accurate models, improving the
responsibility of intelligent systems has been a spotlight problem in recent years. How to develop
responsible news recommender systems is a less studied but extremely important problem in
personalized news recommendation. Next, we briefly discuss the key problems mentioned above in
the following sections.

2.1 News Modeling
News modeling aims to understand the characteristics and content of news, which is the backbone
of news recommendation. There are mainly two kinds of techniques for news modeling, i.e., feature-
based news modeling and deep learning-based news modeling. Feature-based news modeling
methods usually rely on handcrafted features to represent news articles. For instance, in many
methods based on collaborative filtering (CF), news articles are represented by their IDs [24, 133].

, Vol. 1, No. 1, Article . Publication date: July 2021.
Personalized News Recommendation: A Survey 5

However, on most news websites novel news articles are published continuously and old ones soon
vanish. Thus, representing news articles with their IDs will suffer from severe cold-start problems,
and the performance is usually suboptimal.
 Considering the drawbacks of ID-based news modeling methods, most approaches incorporate
content features to represent news. Among them, many methods use features extracted from news
texts for news modeling. For instance, Capelle et al. [14] proposed to represent news with Synset
Frequency-Inverse Document Frequency (SF-IDF), which uses WordNet synonym set to replace
the term frequencies in TF-IDF. Besides the news texts, many methods also explore to incorporate
various factors that may have influence on users’ news browsing decisions into news modeling,
such as news popularity and recency [88]. However, in these methods, the features to represent
news are usually manually designed, which usually requires much effort and domain knowledge.
In addition, handcrafted features are usually not optimal in representing the semantic information
encoded in news texts.
 With the development of natural language processing techniques in recent years, many methods
employ neural NLP models to learn deep representations of news. For example, Okura et al. [115]
proposed to use autoencoders to learn news representations from news content. Wang et al. [151]
proposed to use a knowledge-aware convolutional neural network (CNN) to learn news representa-
tions from news titles and their entities. Wu et al. [159] proposed to learn news representations
from news titles via a combination of multi-head self-attention and additive attention networks.
Wu et al. [166] studied to use pre-trained language models to encode news texts. These deep
learning-based news modeling methods can automatically learn informative news representations
without the need of manual feature engineering, and they can usually better understand news
content than traditional feature-based methods.

2.2 User Modeling
User modeling techniques in news recommendation aim to understand users’ personal interest
in news. Similar to news modeling, user modeling methods can also be roughly classified into
two categories, i.e., feature-based and deep learning-based. Some feature-based methods like CF
represent users with their IDs [24, 133]. However, they usually suffer from the sparsity of user data
and cannot model user interest accurately. Thus, most feature-based methods consider other user
information such as click behaviors on news. For example, Garcin et al. [43] proposed to use Latent
Dirichlet Allocation (LDA) to extract topics from the concatenation of news title, summary and
body. The topic vectors of all clicked news are further aggregated into a user vector by averaging.
There are also several works that explore to incorporate other user features into user modeling,
such as demographics [83], location [35] and access patterns [88]. However, feature-based user
modeling methods also require an enormous amount of domain knowledge to design informative
user features in specific scenarios, and they are usually suboptimal in representing user interests.
 There are several methods that use neural networks to learn user representations from users’
click behaviors. For example, Okura et al. [115] proposed to use a GRU network to learn user
representations from clicked news. Wu et al. [156] proposed a personalized attention network
to learn user representations from clicked news in a personalized manner. These methods can
automatically learn deep interest representations of users for personalized news recommendation,
which are usually more accurate than handcrafted user interest features.

2.3 News Ranking
On the basis of news and user interest modeling, the next step is rank candidate news in a person-
alized way according to user interest. Most methods rank news based on their relevance to user
interest, and how to accurately measure the relevance between user interest and candidate news is

 , Vol. 1, No. 1, Article . Publication date: July 2021.
6 Work in Progress

their core problem. Some methods measure the user-news relevance based on their representations.
For example, Goossen et al. [47] proposed to compute the cosine similarity between the Concept
Frequency-Inverse Document Frequency (CF-IDF) features extracted from candidate news and
clicked news, which was further used for personalized candidate news ranking. Okura et al. [115]
used the inner product between news and user embeddings to compute the click scores, and ranked
candidate news based on these scores. Gershman et al. [45] proposed to use an SVM model for each
individual user to classify whether this user will click a candidate news based on news and user
interest features. In several recent methods, the relevance between candidate news and user interest
is modeled in a fine-grained way by matching candidate news with clicked news. For example,
Wang et al. [150] proposed to match candidate news and clicked news with a 3-D convolutional
neural network to mine the fine-grained relatedness between their content. However, ranking
candidate news and user interest merely based on their relevance may recommend news that are
similar to those previously clicked by users [128], which may cause the “filter bubble” problem.
 A few methods use reinforcement learning for news ranking. Li et al. [85] first explore to model
the personalized news recommendation task as a contextual bandit problem. They proposed a
LinUCB approach that computes the upper confidence bound (UCB) of each arm efficiently in
closed form based on a linear payoff model, which can match news with users’ personal interest
and meanwhile explore making diverse recommendations. DRN [187] uses a deep reinforcement
learning approach to find the interest matching policy that optimizes the long-term reward. In
addition, it uses a Dueling Bandit Gradient Descent (DBGD) method for exploration. These methods
usually optimize the long-term reward rather than the current click probability, which has the
potential to alleviate the filter bubble problem by exploring more diverse user interest.

2.4 Model Training
Many personalized news recommendation methods employ machine learning models for news
modeling, user modeling and interest matching. How to train these models to make accurate
recommendations is a critical problem. A few methods train their models by predicting the ratings
on news given by users. For example, the Grouplens [133] system is trained by predicting the
unknown ratings in the user-news matrix. However, explicit feedback such as ratings is usually
sparse on news platforms. Thus, most existing methods use implicit feedback like clicks to construct
prediction targets for model training. For example, Wang et al. [151] formulated the news click
prediction problem as a binary classification task, and use crossentropy as the loss function for
model training. Wu et al. [156] proposed to employ negative sampling techniques that combine
each positive sample with several negative samples to construct labeled samples for model training.
However, implicit feedback usually contains much noise and may not indicate user interest, which
poses great challenges to learning accurate user interest models.

2.5 Evaluation
Properly evaluating the performance of personalized news recommendation algorithms is important
for developing high-quality news recommender systems. Most of existing methods use click-
related metrics to measure the accuracy of recommendation results. Some of them regard the
recommendation task as a classification problem [55, 95, 151], where the performance is evaluated
by classification metrics such as Area Under Curve (AUC) and F1-score. Many other methods use
ranking metrics such as Mean Reciprocal Rank (MRR) and normalized Discounted Cummulative
Gain (nDCG). However, click based metrics may not indicate user experience. Thus, a few works
explore to use user engagement based metrics to evaluate the recommendation performance [167],
such as dwell time and dislike, which can evaluate the performance of recommendation models
more comprehensively.

, Vol. 1, No. 1, Article . Publication date: July 2021.
Personalized News Recommendation: A Survey 7

 In most works, the performance of recommendation models is offline evaluated. However, the
data used for offline evaluation is usually influenced by the recommendation results generated
by the predecessor recommendation algorithms. Only a few works reported online evaluation
results [166], which may better indicate the real performance of the recommender systems.

2.6 Dataset
In the news recommendation field most researches are conducted on proprietary datasets, while
there are only a few datasets that are publicly available. Several representative datasets are plista [70],
Adressa [49] and MIND [172]. Among them, only MIND is a large-scale English news recommen-
dation dataset with raw textual information of news. In addition, MIND is associated with a public
leaderboard and an open competition. Thus, many recent researches are conducted on the MIND
dataset [162, 166, 169].

2.7 Responsible Personalized News Recommendation
Most endeavors on personalized news recommendation focus on improving the accuracy of rec-
ommendation results. In recent years, research on the responsibility of intelligent systems has
gained high attention to help AI techniques better help humans and avoid their potential negative
societal impact. Thus, a few studies explore to improve the responsibility of personalized news
recommender systems in different aspects, such as privacy preserving [127], diversity [164], debi-
asing [178] and fairness [171]. These methods have the potential to help develop higher-quality
news recommendation algorithms to serve users in a responsible way.
 On the basis of the overview above, we then present in-depth discussions on each core problem
in the following sections.

3 NEWS MODELING
News modeling is a critical step in personalized news recommendation methods to capture the
characteristics of news articles and understand their content. The techniques for news modeling
can be roughly divided into two categories, i.e., feature-based and deep learning-based, which are
introduced as follows.

3.1 Feature-based News Modeling
Feature-based news modeling methods mainly rely on handcrafted features to represent news
articles. As summarized in Fig. 1, there are mainly four types of features used in news modeling,
which are introduced as follows.
 In many CF-based methods, news articles are represented by collaborative filtering signals such as
news IDs [24, 50, 61, 112, 133, 136, 174]. However, on most news websites novel news are published
quickly and old ones will soon vanish. These methods model news in a content-agnostic manner,
which may suffer from the serious cold start problem due to the difficulty in processing newly
generated news. Thus, it is not suitable to simply represent news articles with their IDs [29].
 Due to the drawbacks of ID-based news modeling, many methods incorporate news content into
news modeling. For instance, Gershman et al. [45] considered Term Frequency-Inverse Document
Frequency (TF-IDF) features extracted from news texts. In news articles, entities/concepts are
usually more important than other words in understanding news content. Thus, many methods
use the entities/concepts in news texts to represent their content. For example, Goossen et al. [47]
proposed to use Concept Frequency-Inverse Document Frequency (CF-IDF) to model news content,
which is a variant of TF-IDF that uses the frequency of concepts extracted from WordNet rather than
term frequency. Capelle et al. [14] proposed to use Synset Frequency–Inverse Document Frequency
(SF-IDF) to model news, which is based on the frequency of synonym sets in WordNet. SF-IDF is

 , Vol. 1, No. 1, Article . Publication date: July 2021.
8 Work in Progress

 Content Feature Property Feature Context Feature CF Feature

 Semantic Topic Model Category Cluster Popularity CTR News ID

 Entity Keyword Location Publisher Recency Novelty User ID

 Emotion Multimodal Publish Time Dwell Time Bias User/News Graph

 Extracted from news content Intrinsic or static property Dynamic information Collaborative filtering signal

 Fig. 3. An overview of different types of news features.

extended by Moerland et al. [111] into SF-IDF+ by additionally considering the relationships of
concepts. They extend the synonym sets of concepts in news by adding other concepts in WordNet
that have relationships with the included concepts. Based on aforementioned approaches, the
family of CF-IDF is expanded by a set of later works [9, 16, 25, 52, 53].
 Besides semantic features, some works explore to extract other kinds of content features to
enhance modeling [83, 88, 118]. For example, Garcin et al. [43] proposed to use Latent Dirichlet
Allocation (LDA) to extract topics from the concatenation of news title, summary and main content.
Parizi et al. [118] proposed to extract emotion features of sentences in news as complementary
information of TF-IDF features. In their method, the emotion is represented by the Ekman model
that contains 6 emotion categories. A variant of this method that uses the sentiment orientation
(i.e., positive, neutral and negative) is also developed by Parizi et al. [119]. Beyond news texts, the
exploitation of vision-related information such as the videos of news is also studied in [107]. These
features can provide complementary information to better understand news content.
 In addition to content features, many other genres of features are used for news modeling. They
can be roughly divided into two categories, i.e., property features and context features. Property
features such as categories, locations and publishers usually reflect intrinsic properties of news.
The most widely used news property feature is category, since it is an important clue for modeling
news content and targeting user interest. For example, Liu et al. [100] proposed to represent news
using their topic categories. However, since the category labels of news often need to be manually
annotated by editors, in some scenarios news may not have off-the-shelf category labels, Thus,
several methods explore to cluster news into categories based on their content. For instance, in
the SCENE [88] recommender system, news articles are clustered in a hierarchical manner based
on their topic features extracted by LDA. By incorporating the categories or clusters of news into
news modeling, the news recommender can be aware of news topics and provide more targeted
recommendation services. Another representative property feature is news location, which is also
widely used to provide users with the news related to the locations that they are interested in. For
example, Tavakolifard et al. [145] incorporated the geographic information of news to filter news
based on their locations. In addition, since news from different publishers may have differences in
their content and topics, the information of news publisher is also considered by several methods
to enrich the information for news modeling [59, 96].
 Different from property features that are usually static after news publishing, context features of
news are dynamic. Popularity and recency, which reflect the attractiveness and freshness of news,
are two representative context features used by existing methods. For instance, MONERS [83] is
a news recommender system that represents news articles by news categories, news importance
suggested by providers and the recency of news articles. Gershman et al. [45] proposed to use
four kinds of features to represent news, i.e., news popularity, news age (recency), TF-IDF features
of words and named entities. Jonnalagedda et al. [62] proposed to use the timeline on Twitter
to enhance news modeling. They use the popularity and categories of news on Twitter for news

, Vol. 1, No. 1, Article . Publication date: July 2021.
Personalized News Recommendation: A Survey 9

Table 1. Main features used for news representation. *XF-IDF means TF-IDF and its variants such as CF-IDF
and SF-IDF.

 Features for News Modeling References
 [47][14][111][15][52][53][16][25][9][45][20][78][154][113][131]
 BOW/XF-IDF*
 [183][5][38][12][48][73][74][108][124][118][119][101][153][105]
 [47][14][111][15][52][53][16][25][9][88][45][145][59][97][112][87][64] [21][23]
 Entity/Keyword
 [131][183][5][38][187][11][12][13][36][65][74][146][68][149]
 [83][88][62][23][33][20][100][135][23][144][39][91][90][148][85][187][11]
 Cluster/Category
 [48][63][74][142][181][101][46][176]
 Topic Distribution [43][88][114][189][87][42][41][135][91][90][54][93][108][121][146][54]
 Location [145][114][177][59][148][67][153]
 Publisher [59][96][187][176]
 Popularity [136][88][45][62][145][23][59][189][20][23][90][63][67][73][42][59]
 CTR [20]
 Recency [83][88][45][145][23][59][189][135][68][23][90][187][181][153]
 Novelty [41][38]
 Dwell Time [18][45][179][59][189][187][59]
 Time Stamp [59][33][20][35][174][176]
 Emotion/Sentiment [118][119]
 Bias [121]
 Knowledge Graph [64][183]
 News/User Graph [97][112][87][41][147][91][93][123][46]
 Ontology [47][14][111][15][52][53][16][25][9][154][113][131][137][39][12][13][36]
 Visual Information [107]

representation. News recency only considers the time interval between the publishing and display
of news, while time stamp of news display can provide finer-grained information, such as seasons,
months, days and the time in a day. Thus, several approaches incorporate the time stamp of
news impression [20, 33, 35, 59, 174]. For example, Ilievski et al. [59] proposed to incorporate the
weekday and the hour of a news impression in news modeling. In addition to the context features
mentioned above, several methods also explore to use weather [177], click-through rate (CTR) [20],
and fact/opinion bias [121] to enrich the representations of news.
 Some hybrid methods consider both news IDs and additional features in news modeling [102].
For example, NewsWeeder [78] represents news articles by their IDs and bag-of-word features.
Claypool et al. [21] proposed to use news IDs and keywords to model news. Liu et al. [100]
proposed to represent news using their IDs and topic categories. Saranya et al. [135] proposed
to represent news by their IDs, topics, click frequency and the weights of a news belonging
to different categories. Using the combination of ID-based and content-based news modeling
techniques can mitigate the cold-start problem of news to some extent, and have been widely
explored by integrating other information like news property features [23, 154], news sessions [144],
ontology [13, 39, 113, 131, 137] and knowledge graphs [183].
 To draw a big picture of feature-based news modeling methods, we summarize the major features
they used in Table 1.

3.2 Deep learning-based News Modeling
With the development of deep learning techniques, in recent years many methods employ neural
networks to automatically learn news representations. Most of them use neural NLP techniques to
learn news representations from news texts. For example, Okura et al. [115] proposed an embedding-
based news recommendation (EBNR) method that uses a variant of denoising autoencoders to learn
news representations from news texts. RA-DSSM [75] is a neural news recommendation approach

 , Vol. 1, No. 1, Article . Publication date: July 2021.
10 Work in Progress

which incorporates a similar architecture as DSSM [57]. It first builds the representations of news
using the doc2vec [81] tool, then uses a two-layer neural network to learn hidden news represen-
tations. This method is also adopted by [76]. 3-D-CNN [77] represents news by the embeddings
of their words word2vec [110]. However, it is difficult for these methods to mine the semantic
information in news texts with traditional neural NLP models.
 Many later approaches use more advanced neural NLP models for text modeling. For example,
WE3CN [69] uses 2D CNN models to learn representations of news. NPA [156] uses CNN to generate
contextual representations of words in news titles, and use a personalized attention network to form
news representations by selecting important words in a personalized manner. NRMS [159] learns
word representations with a multi-head self-attention network, and useS an additive attention
network to form news representations. Similar news modeling method is also used by many later
works [164, 165, 167, 169, 171]. NRNF [161] uses self-attention to model the contexts of words in
news title and body, and it uses an interactive attention network to model the relatedness between
title and body. FedRec [127] learns news representations from news titles via a combination of CNN
and multi-head self-attention networks. These methods usually learn news representations based on
shallow text models and non-contextualized word embeddings such as GloVe [122], which may be
insufficient to capture the deep semantic information in news. Pre-trained language models (PLMs)
such as BERT [28] have been greatly successful in the NLP field, and a few recent works explore to
empower news modeling with PLMs [166, 173]. For example, PLM-NR [166] uses different PLMs to
empower English and multilingual news recommendation, and the online flight results in Microsoft
News showed notable performance improvement. Their findings imply the importance of accurate
text understanding in news recommendation.
 Instead of merely modeling semantic information in news texts, several methods study to use enti-
ties or keywords in news texts to enhance news modeling by introducing complementary knowledge
and commonsense information. For instance, DAN [188] learns news representations from news
titles and entities via two parallel CNN networks with max pooling operations. DKN [151] learns
news representations from the titles of news and the entities within titles via a knowledge-aware
CNN. The representations of entities are learned from a knowledge graph using the TransD [60]
knowledge graph embedding algorithm. Saskr [19] builds news representations from news titles and
bodies based on the average embeddings of their entities. DNA [182] learns news representations
from the news body, news ID and the elements (entities and keywords). More specifically, the
sentences in a news body are transformed into their embeddings via doc2vec [81], and then are
aggregated into a unified one via a sentence-level candidate-aware attention network. Each news
element is represented by averaging the embeddings of its words, and elements representations are
synthesized together via an element-level candidate-aware attention network. The embeddings
of the ID, texts, and elements of each piece of news are concatenated together into a unified
news representation. Gao et al. [40] proposed a knowledge-aware news recommendation approach
with hierarchical attention networks. In their method, a word attention network is used to learn
word-based news representations by using the embeddings of keywords as attention queries, and
these representations are concatenated with both entity embeddings and the average embeddings
of the entities in their contexts. An item attention network is used to aggregate these three kinds
of news representations by modeling their informativeness. Liu et al. [98] proposed to construct a
news-relevant knowledge graph on the basis of the Microsoft Satori knowledge graph by extracting
additional knowledge entities and topic entities from news and connecting entities in the same
news, entities clicked by the same user and entities appearing in the same browsing session to
enrich the relations between entities in the knowledge graph. They combine the entity embeddings
learned by TransE [7] with the news text embeddings learned by LDA and DSSM. TEKGR [82] also
enriches the knowledge graph with topical relations between entities. It predicts the topic of news

, Vol. 1, No. 1, Article . Publication date: July 2021.
Personalized News Recommendation: A Survey 11

based on texts and concepts, and used the predicted topic to enrich the knowledge graph and learn
topic enriched knowledge representations of news with graph neural networks. CAGE [138, 139]
constructs subgraph of KG by using one-hop neighbors of entities, and uses the TransE embeddings
of entities as complements to text embeddings learned by CNN. KRED [99] first learns entity
embeddings from knowledge graph with graph attention networks, then incorporates additional
entity features such as frequency, category and position, and finally selects entities according to
the texts representations of news. HieRec [128] uses text self-attention and entity self-attention
to model the contexts in news title and the relations between entities in news texts, respectively.
KIM [125] incorporates a knowledge-aware interactive news modeling method that can model the
relations between the entities and their neighbors of clicked news and candidate news.
 To better model the characteristics of news articles, several methods explore to incorporate other
types of news information beyond texts into news modeling. For example, DeepJoNN [184] learns
news representations from news IDs, categories, keywords and entities via a character-level CNN.
Park et al. [120] proposed a neural news recommendation method based on LSTM. They use a
proprietary corpus to train a doc2vec [81] model to encode news articles into their vector repre-
sentations, and use an LSTM network to generate user representations from the representations
of news. In addition, they incorporate the categories of news into news representations, which
are predicted by a CNN [72] model. TANR [157] learns news representation from news titles via
a combination of CNN and attention network, which is also used in [158, 178]. Moreover, TANR
incorporates an auxiliary news topic prediction task to learn topic-aware news representations.
NAML [155] is a news recommendation method with attentive multi-view learning, which in-
corporates different kinds of news information as different views of news. In this method, news
titles, bodies, categories and subcategories are processed by different models, and their embeddings
are further aggregated together into a unified one via a view-level attention network. A similar
method is also used by [162, 186] to model candidate news. LSTUR [1] uses a combination of CNN
and attention network to process news titles, and incorporates categories and subcategories by
applying a non-linear transformation to their embeddings. CHAMELEON [26, 37] learns news
representations from news bodies by using CNN with different kernel sizes, and these textual
representations are fused with news metadata features such as topics, categories and tags using a
fully connected layer. It also predicts the metadata features of news via auxiliary tasks. PP-Rec [126]
uses both news title, entities and news popularity information in news modeling. It uses gating
mechanisms to synthesize the near-real-time CTR, recency and popularity predicted from news
title into a unified news popularity score. SentiRec [164] considers the sentiment orientation of
news to learn sentiment-aware news representations. It uses the VADER [58] algorithm to compute
real-valued sentiment scores of news. MM-Rec [168] uses a visiolinguistic model ViLBERT [104] to
learn news multi-modal representations from both news texts and images. DebiasRec [178] uses
CNN and attention network to learn news content representations from news titles, and learns news
bias representations from the size and positions of news displayed on websites with a bias model.
These methods can usually understand news better by incorporating additional news information.
However, some news features (e.g., news category labels) may not be available in real-world news
recommender systems, which hinders the exploitation of these features.
 There are a few methods that learn news representations from graphs. For example, IGNN [129]
uses KCNN [151] to learn text-based news representations from news titles, and learn graph-based
news representations from the user-news graph. GNewsRec [55] is a hybrid approach which
considers graph information of users and news as well as news topic categories. It uses the same
architecture with DAN to learn text-based news representations, and uses a two-layer graph neural
network (GNN) to learn graph-based news representations from a heterogeneous user-news-topic
graph. GERL [44] learns news title representations with a combination of multi-head self-attention

 , Vol. 1, No. 1, Article . Publication date: July 2021.
12 Work in Progress

 Table 2. Comparison of different methods on news modeling.

 Methods Information Used Model
 EBNR [115] Body Autoencoder
 RA-DSSM [75] Title+Body Doc2vec+NN
 Khattar et al. [76] Title+Body Doc2vec+NN
 3-D-CNN [77] Title+Body Word2vec
 WE3CN [69] Title+Body 2-D CNN
 NPA [156] Title CNN+Personalized Attention
 NRMS [159] Title Self-Attention+Attention
 NRNF [161] Title Transformer+Attention
 UniRec [169] Title Self-Attention+Attention
 FeedRec [167] Title Transformer+Attention
 NRHUB [158] Title CNN+Attention
 DAINN [185] Body CNN+Dynamic Topic Model
 FedRec [127] Title CNN+Self-Attention+Attention
 CPRS [165] Title+Body Self-Attention+Attention
 FairRec [171] Title Transformer+Attention
 PLM-NR [166] Title PLM+Attention
 DAN [188] Title+Entity CNN
 DNA [182] Body+Element+ID Doc2vec+Candidate-Aware Attention+ID Embedding
 DKN [151] Title+Entity KCNN
 Saskr [19] Entity Entity Embedding
 Gao et al. [40] Body+Entity Attention
 Liu et al. [98] Title+Entity Entity Embedding+Attention
 TEKGR [82] Title+Entity Entity Embedding+Candidate-aware Attention
 CAGE [139] Title+Entity CNN+Entity Embedding
 KRED [99] Title+Entity+Entity Context Feature Attention
 HieRec [128] Title+Entity Transformer+Attention
 KIM [125] Title+Entity CNN+Transformer+Co-Attention+Graph Co-Attention
 Park et al. [120] Title+Body+Query+Category Doc2vec
 DeepJoNN [184] Keywords/Entities+Category+ID Char CNN
 TANR [157] Title+Category CNN+Attention+Topic Prediction
 LSTUR [1] Title+Category+Subcategory CNN+Attention
 NAML [155] Title+Body+Category+Subcategory CNN+Attention
 EEG [186] Title+Abstract+Body CNN+Attention
 CHAMELEON [37] Body+Metadata+Context Features CNN+Attribute Prediction
 PP-Rec [126] Title+Entity+CTR+Recency Self-Attention+Co-Attention+Gating
 SentiRec [164] Title+Sentiment Self-Attention
 MM-Rec [168] Title+Image ViLBERT
 DebiasRec [178] Title+Position+Size CNN+Attention+Bias Embedding
 User-as-Graph [162] Title+Category+Subcategory+Entity Transformer+Attention
 IGNN [129] Title+Entity+User-News Graph KCNN+GNN
 INNR [132] Heterogeneous Graph Node2vec
 GNewsRec [55] Title+Entity+Heterogeneous Graph CNN+GNN
 GERL [44] Title+Category+User-News Graph Transformer+GAT
 MVL [134] Title+Body+Category+User-News Graph CNN+Attention+GAT
 GNUD [56] Title+Entity+User-News Graph CNN+Disentangled GCN

and additive attention networks, and combines title representations with the embeddings of news
categories. MVL [134] uses a content view to incorporate news title, body and category, and uses
a graph view to enhance news representations with their neighbors on the user-news graph. In
addition, it uses a graph attention network to enhance representations of news by incorporating
the information their first- and second-order neighbors on the user-news graph. GNUD [56] also
uses the same news encoder as DAN to learn text-based news representations, and uses a graph
convolution network with a preference disentanglement regularization to learn disentangled news
representations on user-news graphs. These methods can exploit the high-order information on

, Vol. 1, No. 1, Article . Publication date: July 2021.
Personalized News Recommendation: A Survey 13

graphs to enhance news modeling. However, it is difficult for these methods to handle newly
generated news with few connections to existing nodes on the old graph used for training.
 To help better understand the relatedness and differences between the methods reviewed above,
we summarize the information and models they used for learning news representations in Table 2.
Next, we provide several discussions on the aforementioned methods for news modeling.

3.3 Discussions on News Modeling
3.3.1 Feature-based News Modeling. In feature-based news modeling methods, mining textual
information of news is critical for representing news content. Many methods incorporate BOW/TF-
IDF features or their variants to represent news texts, which are also popular in the NLP field. In
addition, topic models like LDA are employed by various methods to extract topics from texts. This
is probably because topic models are capable of mining the topic distributions of news articles
and can also provide useful clues for inferring user interest on different topics. Moreover, since
users may focus more on the entities or keywords in news, they are considered by many methods
to summarize the content and topic of news, and can also be useful links to find similar news or
map news on knowledge graphs. Especially, some methods also use ontology such as Wikipedia to
extract entity features to represent them more accurately.
 Besides the texts of news, many methods utilize other information of news. For instance, the
categories or clusters of news are popular news features to help model news content. In addition,
several dynamic features of news are also widely employed in feature-based news modeling methods,
such as popularity and recency. Since many users may pay more attention to popular events and
news usually vanish quickly, incorporating news popularity and recency can help build more
informative news representations. Besides, several environmental factors, such as locations and
time are also utilized by several methods. This is because considering locations of news can provide
news related to users’ neighbors, and using the timestamps of news may be useful for providing
time-aware news services.
 A few methods also study incorporating other interesting features. For example, the sentiment
information of news is useful for news understanding, because users may have different tastes on
the sentiment of news. The bias of news may also need to be taken into consideration, because
recommending news with biased opinions and facts may hurt user experience and the reputation
of news platforms. Finally, although several non-personalized news recommendation methods have
used news images to build news representations [103], few personalized ones consider the visual
information of news, which is very useful for news modeling.
 Although feature-based news modeling methods have comprehensive coverage of various news
information, they usually require a large amount of domain knowledge for feature design. In
addition, handcrafted features are usually not optimal in representing the textual content of news
due to the absence of the contexts and orders of words.

3.3.2 Deep Learning-based News Modeling. Among all the reviewed methods, only two methods,
i.e., DNA [182] and DeepJoNNA [184], directly incorporate the embeddings of news IDs. This is
probably because of the short lifecycle of news articles and the quick generation of novel news,
which make the coverage of news IDs in the training set very limited. Thus, it is very important to
understand news from their content.
 News text modeling is critical for news understanding. Most methods use news titles to model
news since news titles, because news titles usually have decisive influence on users’ click behaviors.
Several methods such as EBNR [115], NAML [155] and CPRS [165] use news bodies to enhance
news representations, since news bodies are contain more detailed information of news. In ex-
isting methods, CNN is the most frequently used architecture for text modeling. This is because

 , Vol. 1, No. 1, Article . Publication date: July 2021.
14 Work in Progress

local contexts in news articles are important for modeling news content, and CNN is effective
and efficient in capturing local contexts. In addition, since different news information may have
different informativeness in modeling news content and user interest, attention mechanisms are
also widely used to build news representations by selecting important features. With the success of
Transformer in NLP, many methods also use Transformer-like architectures for news modeling,
such as NRMS [159] and CPRS [165]. In addition, a few methods use pre-trained language or and
visiolinguistic models to empower news modeling [166, 168]. These advanced NLP techniques
can greatly improve news content understanding, which is very important for personalized news
recommendation. However, these methods mainly aim to capture the semantic information of news
and may not be aware of the knowledge and commonsense information encoded in news.
 To address this issue, many methods incorporate news entities into news modeling to learn
knowledge-aware news representations. Some methods such as DAN [188] directly use entity texts
to represent entities, while several other methods like DKN [151] use knowledge graph embeddings
to represent entities. These entity representations are usually combined with representations
learned from news texts to better model news content. However, there are many new entities and
concepts emerging in news and it may be difficult to accurately represent them with off-the-shelf
knowledge bases.
 Several methods incorporate the topic categories of news into news modeling, because news
topics are very useful for understanding news content and inferring user interest. Considering
the scenarios that some news articles are not labeled with topic categories, some methods such as
TANR [157] and CHAMELEON [37] also adopt auxiliary tasks by predicting news topic categories
to encode topic information into news representations. In addition, a few methods study using other
kinds of news features such as sentiment, popularity, recency [126, 164], which can help better
understand the characteristics of news. However, some additional news features (e.g., category)
may be unavailable in certain scenarios, which limits the application of these methods.
 There are also a few methods explore to enhance news modeling with graph information [44, 55].
These methods can incorporate the high-order information on user-news bipartite graphs [44, 56,
129, 134] or more complicated heterogeneous graphs [55, 132], which can provide useful contexts
on understanding the characteristics of news for news recommendation. However, since the graphs
used in these methods are static, they may have some difficulties in accurately representing newly
published news.
 In summary, by reviewing news modeling techniques used in existing news recommendation
methods, we can see that news modeling is still a quite challenging problem in news recommenda-
tion due to the variety, dynamic, and timeliness of online news information.

4 USER MODELING
User modeling is also a critical step in personalized news recommender systems to infer users’
personal interests in news. It is usually important for user modeling algorithms to understand users
from their behaviors [157]. An example user modeling framework in personalized news recommen-
dation is shown in Fig. 4. We can see that user modeling is based on the modeling of news that users
have interactions with, and it introduces additional user features to achieve better personalized
user understanding. The techniques for user modeling in existing news recommendation methods
can also be classified into feature-based ones and deep learning-based ones, which are introduced
in the following sections.

4.1 Feature-based User Modeling
Feature-based user modeling methods use handcrafted features to represent users. Similar to news
modeling, in CF-based methods users are also represented by their IDs [24, 133]. However, ID-based

, Vol. 1, No. 1, Article . Publication date: July 2021.
Personalized News Recommendation: A Survey 15

 User
 Representation
 User Model

 ...
 News News ... News
 Model Model Model

 ... Additional
 User Features
 Clicked News

 Target User

 Fig. 4. An example framework of user modeling.

user modeling methods usually suffer from the data sparsity. Thus, most methods consider the
behaviors of users such as news clicks to model their interest. An intuitive way is to use the features
of clicked news to build user features. For example, Goossen et al. [47] used the CF-IDF features
of clicked news to represent user interest. Capelle et al. [14] proposed to use the SF-IDF features
of clicked news for user modeling. Garcin et al. [43] proposed to model users by aggregating the
LDA features of all clicked news into a user vector by averaging. However, it is difficult for these
methods to model users accurately when their news click behaviors are sparse.
 Besides news features, many methods consider other supplementary information of users in
user modeling. For instance, in the MONERS [83] recommender system, users are clustered into
segments, and the preferences of user segments on news categories and news articles are used to
represent users. In addition, the demographics of users, such as age, gender and profession, are
also useful information for user modeling because users in different demographic groups usually
have different preferences on news. Thus, user demographic features are incorporated by several
methods [59, 83, 177]. For instance, Yeung et al. [177] proposed to use the age, gender, occupation
status and social economic grade of users to help identify their different preferences on news in
different categories. Chu et al. [20] used the age and gender categories of users to model their
characteristics. Besides, the location information of users is also very useful for accurate user
modeling, and it has been used by several location-aware news recommendation methods [35, 114].
However, some kinds of user features such as locations and demographics are privacy-sensitive,
and many users may not provide their accurate personal information.
 Since news clicks may not necessarily indicate user interests, several methods also consider
other kinds of user behaviors or feedback. For example, Gershman et al. [45] proposed to represent
users by the news they carefully read (regarded as positive news), rejected, and scrolled (both are
regarded as negative news). In addition, users’ dwell time on clicked news is also an important
indication of user interest, and Yi et al. [179] studied to use dwell time as the weights of clicked news
for user modeling. Besides these user behaviors, several other kinds of user behavior information
such as access patterns, are utilized by a few methods [88, 135] to capture the users’ habits on news
reading.
 Several methods also consider graph information (e.g., news-user graphs) in user modeling [46].
For example, Li et al. [87] proposed a news personalization method by using hypergraph to model
various high-order interactions between different news information, where users are represented

 , Vol. 1, No. 1, Article . Publication date: July 2021.
16 Work in Progress

Table 3. Additional features used for user representation. *ID/textual features of clicked news are excluded
because they are incorporated by most methods.

 Features for User Representation* References
 Demographic [83][177][59][20][85][153]
 Cluster/Segment [83][177][187][23][109]
 Tag/Keyword [62][177][23][39][5][13][63][142]
 Location [35][177][145][114][59][148][85][67][20][153]
 Access Pattern [88][135][153]
 Behaviors on Other Platforms [85][48][2][54][67][93][123][124][54] [95]

by subgraphs of the hypergraph. Garcin et al. [41] proposed to use context trees for user modeling.
They constructed context trees based on the sequence of articles, the sequence of topics and the
distribution of topics. Trevisiol et al. [147] proposed to build a browsing graph from the news
browsing histories of users on Yahoo News. Joseph et al. [64] proposed to represent users by
regarding the clicked news as subgraphs of a knowledge graph, which are constructed via entity
linking. These methods can consider the high-order information on graphs to help understand user
behaviors, which can improve user modeling.
 A few methods combine user IDs with other user features in user modeling [102]. For example,
NewsWeeder [78] used user IDs and the bag-of-words features of clicked news to represent users.
Claypool et al. [21] used user IDs and keywords of clicked news for user modeling Liu et al. [100]
proposed to represent users using their IDs and user interest features predicted by a Bayesian model.
These methods can mitigate the drawbacks of ID-based user modeling and meanwhile incorporate
useful personal information encoded by user IDs.
 Considering the evolutionary characteristics of user interest, some methods model both long-
term and short-term user interests [13, 91]. NewsDude [5] may be one of the earliest methods that
consider long short-term user interests. In this approach, users are represented by a hybrid model,
which models short-term interest of users based on recently browsed news, and models long-term
user interest by sorting words of news in each category with respect to their TF-IDF values and
select the top ranked words. Li et al. [90] proposed LOGO, which is a news recommendation method
that models both long-term and short-term user interests. LOGO uses a weighted summation of the
topic distributions of news clicked by users to indicate long-term user interest, and it uses the topic
distribution of the latest clicked news as the short-term user interest. Viana et al. [148] proposed
another news recommendation method based on long short-term user interest. In their method,
the long-term interest of users is represented by the frequency of a specific tag being read by this
user, and short-term interest is represented by several recently clicked news. Different from other
methods that only consider short-term or long-term user interests, these methods can better model
the evolution of user interests by capturing long short-term user interests.
 To help readers better understand feature-based user modeling methods in personalized news
recommender systems, we summarize the additional user features (ID and news features are
excluded) used in these methods in Table 3.

4.2 Deep Learning-based User Modeling
In recent years, many personalized news recommendation methods use deep learning techniques
for user modeling to remove the need of manual feature engineering. Most of them infer user
interests from historical news click behaviors. EBNR [115] learns representations of users from the
representations of their browsed news via a GRU network. Khattar et al. [76] used the summation
of clicked news representations weighted by a exponential discounting function, where more

, Vol. 1, No. 1, Article . Publication date: July 2021.
You can also read