A Hybrid News Recommendation Approach Based on Title–Content Matching

Jiang, Shuhao; Lu, Yizi; Song, Haoran; Lu, Zihong; Zhang, Yong

doi:10.3390/math12132125

Open AccessArticle

A Hybrid News Recommendation Approach Based on Title–Content Matching

by

Shuhao Jiang

,

Yizi Lu

,

Haoran Song

,

Zihong Lu

and

Yong Zhang

^*

School of Information Engineering, Tianjin University of Commerce, Tianjin 300134, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(13), 2125; https://doi.org/10.3390/math12132125

Submission received: 15 May 2024 / Revised: 5 June 2024 / Accepted: 2 July 2024 / Published: 6 July 2024

(This article belongs to the Section Mathematics and Computer Science)

Download

Browse Figures

Versions Notes

Abstract

:

Personalized news recommendation can alleviate the information overload problem, and accurate modeling of user interests is the core of personalized news recommendation. Existing news recommendation methods integrate the titles and contents of news articles that users have historically browsed to construct user interest models. However, this method ignores the phenomenon of “title–content mismatching” in news articles, which leads to the lack of precision in user interest modeling. Therefore, a hybrid news recommendation method based on title–content matching is proposed in this paper: (1) An interactive attention network is employed to model the correlation between title and content contexts, thereby enhancing the feature representation of both; (2) The degree of title–content matching is computed using a Siamese neural network, constructing a user interest model based on title–content matching; and (3) neural collaborative filtering (NCF) based on factorization machines (FM) is integrated, taking into account the perspective of the potential relationships between users for recommendation, leveraging the insensitivity of neural collaboration to news content to alleviate the impact of title–content mismatching on user feature modeling. The proposed model was evaluated on a real-world dataset, achieving an nDCG of 83.03%, MRR of 81.88%, AUC of 85.22%, and F1 Score of 35.10%. Compared to state-of-the-art news recommendation methods, our model demonstrated an average improvement of 0.65% in nDCG and 3% in MRR. These experimental results indicate that our approach effectively enhances the performance of news recommendation systems.

Keywords:

news recommendation; user interest models; Siamese neural network; neural collaborative filtering (NCF); factorization machines (FMs)

MSC:

68T50

1. Introduction

In recent years, with the development of information technology, especially the popularization of mobile Internet, online reading has gradually become the mainstream choice for users to browse news. Various types of news are aggregated on online news platforms and presented to users, which are rich in content and diverse. However, users find it challenging to navigate through the massive amount of news to discover content that aligns with their interests. Personalized news recommendation methods have been developed to assist users in finding news of interest and effectively alleviate the problems caused by information overload [1,2,3].

Accurate user interest modeling is the core problem of personalized news recommendation. Existing news recommendation methods usually construct user interest models based on the content and titles of the news articles that users have historically clicked on the news platform. Many researchers have done a lot of work and achieved promising results. Okura et al. [4] proposed to use recurrent neural networks (RNN) to construct user preferences with browsing history as input sequences. Wang et al. [5] proposed to learn the user representation from the news content and title of user’s historical clicks and evaluate the correlation between the clicked news and the candidate news through a knowledge-aware convolutional neural network (CNN). Fan et al. [6] proposed the News Recommendation Algorithm Based on Multiple Perspectives (BTEC), which utilizes the Bidirectional Encoder Representations from Transformers (BERT) model and the attention mechanism to vectorize the content, titles, and events in the news and performs fusion processing for candidate news as well as news browsed by the user’s history based on the above view. As a matter of fact, some news platforms, in order to gain traffic or promote advertisement for profit, publish news with exaggerated titles, aiming to cause readers to click on the news; however, the actual news content is not exactly a reflection of the headline, leading to the issue of “title–content mismatching” [7]. Such news content does not accurately reflect the real preferences of users. Directly using the titles and content of news articles historically browsed by users to construct their interests is bound to introduce bias.

The work presented in this paper is motivated by the following observations: Firstly, detecting whether the titles and the content of news articles match is crucial for modeling user interests. As illustrated in Figure 1, in the second news article, it can be found that the user shows clicking behavior on the news article because of their interest in entertainment news, as indicated by the title, but the content is not the entertainment news that has relevance to the title [8], and the content of the news is not representative of the user’s interest. Secondly, there exists a connection between the titles and content of the same news article, and these interactions serve as important clues for modeling the correlation between titles and content [9]. The 10 pounds of the title in the third news article is closely related to the weight loss in the text, while in fact the features extracted for the title and the content are independent of each other, and the extracted features are only the semantic vectors of their respective last [10]. Finally, neural collaborative filtering makes recommendations based on the clicking behavior of users and their nearest neighbors, independent of the text content. Therefore, the integration of neural collaborative filtering for hybrid recommendation can play a role in improving the issue of title-content mismatching.

In this paper, a hybrid news recommendation method based on title–content matching is proposed to address the problem of insufficiently accurate modeling of user interest preferences due to title–content mismatching. Initially, the BERT model is used to extract the title and content features of news articles historically browsed by users, and an interactive attention network is used to model the interaction between the title and the content. Subsequently, a Siamese neural network is applied to calculate the title–content matching degree, and a threshold is reasonably determined. If the matching degree exceeds the threshold, both the title and content of the news are taken as the features of the news, and if it is under the threshold, only the title is taken as the features of the news. Furthermore, a Gate Recurrent Unit (GRU) network is employed to build users’ time sequential features based on the news they have browsed, processing all historically viewed news to obtain user features based on title–content matching. For candidate news, BERT is used to extract features and a CNN is used to extract local features to enhance the feature representation of the candidate news, and the user features based on title–content matching are fitted with the candidate news features. Additionally, an improved neural collaborative filtering approach using FM is adopted to capture low- and high-order interactive relationships between users and news, mitigating data sparsity while improving the issue of title–content mismatching in content-based recommendation. Finally, the content recommendation based on title–content matching detection is mixed with the FM-based neural collaborative filtering recommendation to predict the ultimate click probability.

The contributions and innovations of this paper are as follows:

Using an interactive attention network to model the relevance of title and content. On the basis of extracting features for the title and content separately, the interactive features of the title and content are considered to enhance the feature representation of the title and content and at the same time facilitate the accurate assessment of the relevance between the title and the content.
Modeling user interest features based on the detection of title–content matching. Detecting title–content matching for news articles that users click on, determining user preferences for each news article based on a threshold, and selecting content for modeling. Extracting time sequence features from the user’s history of browsed news to construct a user interest preference model based on title–content matching.
Adopting an improved neural collaborative filtering approach to alleviate the title–content mismatching problem. Optimizing traditional neural collaborative filtering with FM and combining it with Multi-Layer Perceptron (MLP) to learn low- and high-order feature interactions between users and news, comprehensively capturing interactions to improve data sparsity. Only the clicking behaviors of users and neighboring users are considered, alleviating the problem of title–content mismatching.
Experiments are conducted on benchmark datasets, and the results demonstrate that this approach can effectively alleviate the title–content mismatching problem and significantly improve the performance of news recommendations.

2. Related Work

2.1. News Recommendation Algorithm

Personalized news recommendation is an important task in the field of natural language processing, which has become the focus of more and more research institutions and personnel [11]. The algorithms currently widely used in the field of news recommendation mainly include collaborative filtering (CF)-based recommendation algorithms [12], content-based recommendation algorithms, and hybrid recommendation algorithms [13]. The collaborative filtering algorithm is essentially a recommendation method based on the interaction between users and news, which takes similar users who read the same news have viewed the other news articles as the recommended news, and considers the correlation between neighboring users (or news articles) [13]. With the rise of deep learning, He X et al. found the combination of matrix decomposition and deep learning, and proposed the neural collaborative filtering (NCF) model [14], which replaces the inner product operation of the traditional collaborative filtering matrix decomposition on the user hidden vector and the item hidden vector with a more complex and better-fitting neural network MLP. This neural network MLP adopts a two-tower structure. The two-tower structure simultaneously extracts feature cross-information [15]. In many collaborative filtering-based approaches, news articles are represented by their IDs [16]. As the volume of news and users increases, data sparsity becomes more and more serious; thus, content-based recommendation methods emerge [17,18,19,20]. Traditional manual feature mining and machine learning techniques are replaced by deep learning due to their inability to extract deep semantic information from news articles. Wu et al. [21] proposed the neural news recommendation with multi-head self-attention (NRMS) model, using multi-head attention to learn news representation from news titles and user representation from historically viewed news articles. However, this method only relies on news headlines, which are brief and contain limited information, inadequate for accurately constructing user interests. Zhu et al. [22] proposed to learn the title-level and abstract-level representations of the news, concatenating the two representations as the final news article representation, and learning user interest preferences based on historically viewed news articles. Wu et al. [23] introduced the idea of multi-view based on convolutional neural networks and proposed the neural news recommendation with attentive multi-view learning (NAML) model, which learns a unified news representation by taking the titles, contents, and subject categories of the news as different views, and finally the different perspectives are spliced. Users are characterized based on their history of browsing. Wu et al. [24] proposed neural news recommendation with negative feedback (NRNF), which distinguishes between positive and negative news clicks based on the reading dwell time, and through a combination of transformer and additive attention networks combination to learn user representations from positive and negative news clicks, respectively. Li et al. [25] proposed to use convolutional neural networks to obtain local representations of words from the titles, categories, and body of the news articles to learn news representations by modeling the interrelationships of different components of the news articles using multi-head self-attention, and to learn user representations from the news articles that the user browses. Liu et al. [26] proposed a convolutional neural network injected with an attention mechanism for news text feature extraction, which extracts the user’s interest features by adding time sequential predictions to the news that the user has already browsed and injecting a multi-head self-attention mechanism. Hybrid recommendation methods mainly combine two or more recommendation methods, aiming to improve the recommendation effectiveness [27].

All of the above methods are based on the ideal state of title–content matching, which is often inconsistent with the actual situation. Differing from existing methods, the method in this paper judges the content that users are genuinely interested in on the basis of detecting the title–content matching degree, and corrects the user’s interest preference; it integrates the improved neural collaborative filtering, and utilizes the characteristics of the neural collaborative filtering that have nothing to do with the content to improve the problem of the title–content mismatching, and exerts the advantages of the hybrid recommendation to improve the accuracy and effect of the recommendation.

2.2. Detection of Title–Content Matching Degree

The content of an article is the most important implicit information in news articles, and its degree of association with the title and matching with user interests will directly affect the user’s recommendation experience. Therefore, title–content matching detection is an important task for online platforms [8]. In recent years, many scholars have explored the use of deep learning technology for the detection of title–content matching. Peter Bourgonje et al. [28] developed an algorithm to detect the relevance between titles and the main body of articles, based on term frequency, inverse document frequency, and an N-gram model. Dong et al. [29] proposed a deep similarity-aware attention model that learns the representation of titles and body text through Bi-GRU and measures the global and local similarity between the titles and the content. Yin et al. [30] utilized the Chinese BERT model to vectorize the title and content representations of news articles, applying fusion attention to fuse the features of titles and content, and finally using Bi-GRU to detect the title–content matching. Zheng et al. [31] proposed using a transformer model to simultaneously extract semantic features from both the title and the content, connecting sentence-level similarity with word-level matching to determine the final similarity. Meng et al. [32] proposed using a multilayer gated convolutional network to learn the contextual information between title words and introduced an innovative attention-fused deep relevance matching network to thoroughly explore both inter-sentence and intra-sentence similarities between the title and the content. The above approach detects the title–content matching degree by extracting features of titles and content separately, which only considers the phenotypic features of titles and content, ignoring the interactive features between titles and content, which is crucial for assessing title and content relevance [7]. Differing from the above work, this paper adopts the BERT model to deeply mine the text semantics and use an interactive attention network to model the interaction between the titles and content, which enhances the representation of the title and content features and improves the accuracy of the calculation of the title–content matching degree at the same time.

Furthermore, most current detections of title–content mismatching are commonly used to identify ‘clickbait’ issues. In fact, applying the detection of title–content mismatching in the field of news recommendation can identify whether titles and content match, discover the content users are genuinely interested in, and accurately and effectively model user interests.

3. Method

In this paper, we propose a hybrid news recommendation method based on title–content matching, which mainly consists of four modules: a user interest modeling module based on title–content matching degree detection for learning user preference features; a candidate-news-article feature modeling module for learning candidate news article representations; an improved neural collaborative filtering module for mining the low-order and high-order interaction features between the user and the clicked news; and a hybrid recommendation module, which integrates content recommendation and neural collaborative filtering recommendation to realize the predictive ranking of candidate news articles and predict the probability of the user finally clicking on the candidate news articles. The architecture of our approach is illustrated in Figure 2.

3.1. User Modeling Based on Title–Content Matching Degree Detection

In order to solve the problem of insufficiently accurate user interest feature modeling caused by title–content mismatching, user interest feature modeling based on title–content matching degree detection is proposed to construct a title–content matching user interest model. This mainly includes three parts: feature extraction of titles and content based on phenotypic and interactive characteristics, title–content matching degree detection using Siamese neural networks, and user interest modeling based on title–content matching. The architecture of the user modeling method based on title-content matching degree detection is illustrated in Figure 3.

3.1.1. Extraction of Phenotypic and Interactive Title and Content Features

Firstly, the features of the news titles and body text browsed historically by the user need to be extracted. Traditional neural networks, when extracting features, are unable to fully capture contextual information and are insufficient in precision for extracting deep semantic information. Therefore, this paper utilizes BERT replacing the traditional Word2Vec model in the embedding layer, one of the most prolific recent advances in natural language processing, which stands for Bidirectional Encoder Representations from Transformers [33]. Since being proposed by Google researchers in October 2018, BERT has had a notable impact on the field of NLP, surpassing other approaches of its time. Its success is largely due to the Masked Language Model (MLM), which randomly masks tokens in the input and forces the model to predict the original tokens based on the surrounding context [34]. This allows BERT to jointly condition on both left and right contexts, resulting in better word representations. The feature extraction of titles and body text includes two steps: generating word embedding matrices and performing average pooling.

Based on the encoded inputs of the news titles and body text

c o d e_{t}, c o d e_{b},

BERT word embedding matrices

t e x t_{t}, t e x t_{b}

are generated, as represented in Equations (1) and (2):

t e x t_{t} = B e r t (c o d e_{t}) = {(\begin{matrix} w o r d_{1}^{T} \\ ⋮ \\ w o r d_{s}^{T} \end{matrix})}_{s \times m}

(1)

t e x t_{b} = B e r t (c o d e_{b}) = {(\begin{matrix} w o r d_{1}^{T} \\ ⋮ \\ w o r d_{s}^{T} \end{matrix})}_{s \times m}

(2)

where

w o r d_{j}

is the

j

th word embedding,

s

is the length of the title or content, and

m

represents the word embedding dimension determined in the pre-training phase. The BERT word embedding matrix

t e x t_{t}

and

t e x t_{b}

is average pooled to obtain the semantic vectors of title and content

c_{t}, c_{b}

, as shown in Equations (3) and (4):

c_{t} = m e a n p o o l (t e x t_{t})

(3)

c_{b} = m e a n p o o l (t e x t_{b})

(4)

Average pooling involves summing all the word embeddings and then dividing each element by

s

, thereby smoothly aggregating the semantic information of the word embeddings to generate semantic vectors that represent the title or content [35]. This method can provide a more holistic representation of the title or content compared to [CLS] word embeddings and max pooling.

Titles and content are interconnected, yet during feature extraction, they are completely separated and independent, unable to perceive each other’s information. The extraction yields only their respective semantic vectors, lacking interactive features between the titles and content, which makes it difficult to learn the latent relationship and relevance between the two. An interactive attention network is employed to simulate the interplay of context between titles and content, capturing the association between the article content and its title. Specifically, for the BERT output of the title

c_{t}

, it is treated as the query, with the content output

c_{b}

serving as both key and value, which contains the context in the body and its interaction with each word in the title, resulting in a title-aware body text representation

t b - r e p;

similarly, the BERT output of the content

c_{b}

is used as the query, with the title output

c_{t}

serving as both key and value, conveying the context within the title and its interaction with every word in the body text [24], resulting in a body-text-aware title representation

b t - r e p

:

t b - r e p = M u l t i H e a d (c_{t}, c_{b}, c_{b})

(5)

b t - r e p = M u l t i H e a d (c_{b}, c_{t}, c_{t})

(6)

The final output is the features of the title and content that are based on phenotypic and interactive.

3.1.2. Title–Content Matching Degree Detection Based on Siamese Neural Network

The Siamese neural network, also referred to as a conjoined network with symmetry, is naturally suitable for the calculation of title–content matching degree. By sharing the weights between the two sub-networks of the Siamese neural network, it reduces the number of parameters to be trained, reduces the complexity of the model, and at the same time maps vectors with different spatial dimensions to the same dimension, ensuring a consistent data distribution between titles and body text. When calculating the degree of compatibility between the title and the content, the meaning of each word in the title and the content will be affected by a number of words in the front and a number of words in the back. Hence, a Bidirectional Long Short-Term Memory Network (Bi-LSTM) is employed at the encoding layer in this paper. The structure is shown in Figure 4:

Bi-LSTM consists of a forward-propagating LSTM and a backward-propagating LSTM that learn the semantic dependencies of the textual content in both directions and capture the contextual semantic information of each word [9], as shown in the following equation:

{\vec{h}}_{t} = f (W_{1} x_{t} + W_{3} {\vec{h}}_{t - 1} + {\vec{b}}_{t})

(7)

{\overset{\leftarrow}{h}}_{t} = f (W_{2} x_{t} + W_{4} {\overset{\leftarrow}{h}}_{t + 1} + {\overset{\leftarrow}{b}}_{t})

(8)

y_{t} = c o n c a t ({\vec{h}}_{t}, {\overset{\leftarrow}{h}}_{t})

(9)

Here,

{\vec{h}}_{t}

and

{\overset{\leftarrow}{h}}_{t}

represent the forward and backward propagated hidden layer states at time

t

respectively;

x_{t}

and

y_{t}

denote the input value to the neuron and the output value of the hidden layer state at time

t

;

{\vec{h}}_{t - 1}

and

{\overset{\leftarrow}{h}}_{t + 1}

respectively signify the forward propagated state at time

t - 1

and the backward propagated state at time

t + 1

; and

W_{1}, W_{2}, W_{3}

, and

W_{4}

represent the weight matrices corresponding to the different components, respectively [36].

Finally, the forward and backward hidden vectors are spliced to get the final output semantic vector that contains all the information in both forward and backward directions. The Siamese neural network is used to make the distance between similar titles and content as small as possible and the distance between dissimilar titles and content as large as possible. The vectors processed by the coding layer need to be fused for information fusion, i.e., to calculate the gap between the semantics expressed by the title–content vectors. When it is necessary to consider whether the texts are semantically similar or not, the cosine similarity can be utilized to measure in the direction of the text’s semantics, as shown in Equation (10):

s i m i l a r i t y = \cos (θ) = \frac{X \cdot Y}{‖X‖ ‖Y‖} = \frac{\sum_{i = 1}^{n} x_{i} y_{i}}{\sqrt{\sum_{i = 1}^{n} x_{i}^{2}} \sqrt{\sum_{i = 1}^{n} y_{i}^{2}}}

(10)

where

X

and

Y

are the final feature vectors for title and content, and

x_{i}

and

y_{i}

are the feature representations of each dimension in the two vectors, respectively.

Users usually decide whether to click on a news article based on its title and read the content of the news to get detailed information, hence computed title–content matching is used to determine user interest preferences. If the title–content matching degree is greater than a threshold

λ

, it means that the content of the article meets the user’s expectations, and the title and content of the article are spliced as the characterization of the news; if the title–content matching degree is less than a threshold

λ

, it suggests that the news article exhibits a title–content mismatch, and only the title is selected as the characterization of the news, and all the historical browsing of the news by the user is dealt with as the characterization of the news. The representations of all historically browsed news by the user are processed accordingly.

3.1.3. User Interest Modeling Based on Title–Content Matching

GRU is employed to learn the user’s interest preference representation from the user’s browsing history. The GRU can model the information in the user’s past behavior sequence, where each time step corresponds to a browsing behavior, and GRU updates the internal state at each time step to capture dynamic changes in the user’s interest [37].

A matrix of users’ historical news browsing that matches the title of the content

E_{n e w s}

is fed into the GRU network for computation, and the hidden state of the last time step is output. This allows for the consideration of the recency of browsing time [38]. The final user interest feature is computed as shown in the following equation:

h i s t o r y_{u} = \{t e x t_{1}^{T}, t e x t_{2}^{T}, \dots, t e x t_{l}^{T}\}

(11)

E_{n e w s} = \{e_{1}, e_{2}, \dots e_{k}\}

(12)

r_{t} = σ (W_{r} [h_{t - 1}, e_{t}])

(13)

z_{t} = σ (W_{z} [h_{t - 1}, e_{t}]

(14)

{\tilde{h}}_{t} = \tanh (W_{\tilde{h}} [r_{t} ⊙ h_{t - 1}, e_{t}])

(15)

h_{t} = z_{t} ⊙ h_{t} + (1 - z_{t}) ⊙ {\tilde{h}}_{t}

(16)

where

h i s t o r y_{u}

represents the news items browsed by the user historically,

E_{n e w s}

is the matrix of historically browsed news encoded for title–content matching, and

l

is the number of news items browsed. For each time step

t

, the GRU includes two gates: an update gate

z_{t}

and a reset gate

r_{t}

.

e_{t}

is the input of the moment

t

;

h_{t}

and

h_{t - 1}

are the state information of the current moment and the previous moment, respectively;

σ

is the sigmoid function;

{\tilde{h}}_{t}

represents the candidate hidden state;

⊙

is the term-by-term product; and

W_{r}

,

W_{z}

, and

W_{\tilde{h}}

are the weight matrices of the update gate, the reset gate, and the candidate hidden state [39], respectively.

The farthest to the nearest

l

news items are considered as

l

moments of user browsing events, with the final representation of user interest being the features of the news that the user is predicted to click on in the next moment by the GRU network, denoted as

U_{s} = h_{k}

, resulting in the user interest features based on title–content matching.

3.2. Candidate News Feature Modeling Based on BERT–CNN

To efficiently extract the contextual relationships, local features, and global features of candidate news text, the BERT model is first used to deeply extract the textual semantic information and consider the global features of candidate news articles. On this basis, the CNN model is combined to obtain short-range local features of words from candidate news articles, to comprehensively enhance the text feature representation of candidate news articles.

The content of the candidate news articles is represented as

[H_{1}, H_{2}, \dots H_{n}]

, which is transformed into a sequence of word vectors

[b_{1}, b_{2}, \dots b_{n}]

by the BERT model, where

n

is the length of the text content. The contextual semantic information of the word is very important for characterizing the news, and utilizing CNN allows for obtaining the contextual representation

b_{i}

of the

i

-th word, calculated as follows:

e_{i} = Re L U (F \times b_{(i - K) (i + K)} + d)

(17)

where

b_{(i - K) (i + K)}

is the concatenation of word embeddings from position

i - K

to

i + K

,

F

and

d

are the kernel and bias parameters of the CNN filter,

Re L U

is the activation function, and the output is the BERT–CNN based sequence of contextual representations of the candidate news articles

[e_{1}, e_{2}, \dots e_{n}]

, and ultimately the output of the feature representation of the candidate news articles

V_{n e w s}

.

3.3. Neural Collaborative Filtering Based on FM

Traditional neural collaborative filtering is constrained by the relatively simple structure of Generalized Matrix Factorization (GMF), primarily focusing on the direct interactions between users and news. GMF models user–item interactions mainly by dot product, which is somewhat deficient in modeling nonlinear relationships and limited in the case of sparse data. Unlike GMF, FM can learn the first-order and second-order feature interactions between users and news, and introduces a factorization mechanism to better capture the potential relationship between users and news.

Therefore, this paper adopts FM instead of the traditional GMF, which can flexibly deal with the complex relationship between higher-order nonlinear features by learning the implied feature vectors of the interactions between users and news, which is more advantageous in the face of data sparsity. The neural collaborative filtering model based on FM mainly consists of two parts: FM and MLP [40], which can capture low-order feature interactions as well as learn high-order feature interactions, improve the model’s ability to model the complex relationship between users and news, and improve the overall recommendation performance.

The user and news IDs are expanded into one-hot feature vectors

o_{u}

and

o_{i}

, respectively, which, after passing through the embedding layer, are transformed into the embedding vectors

p_{u}^{F}

and

q_{i}^{F}

for FM:

p_{u}^{F} = e m b e d_{u_{1}} (o_{u}) = W_{u_{1}}^{T} o_{u}

(18)

q_{i}^{F} = e m b e d_{i_{1}} (o_{i}) = W_{i_{1}}^{T} o_{i}

(19)

FM learns a hidden vector for each second-order crossover feature, and the weights of the crossover features can be expressed as the inner product of the corresponding hidden vectors of the features:

\hat{y} = w_{0} + \sum_{i = 1}^{n} w_{i} x_{i} + \sum_{i = 1}^{n} \sum_{j = i + 1}^{n} < v_{i}, v_{j} > x_{i} x_{j}

(20)

The second-order part is simplified as follows:

\begin{array}{l} \sum_{i = 1}^{n} \sum_{j = i + 1}^{n} < v_{i}, v_{j} > x_{i} x_{j} \\ = \frac{1}{2} (\sum_{i = 1}^{n} \sum_{j = 1}^{n} < v_{i}, v_{j} > x_{i} x_{j} - \sum_{i = 1}^{n} < v_{i}, v_{i} > x_{i}^{2}) \\ = \frac{1}{2} \sum_{f = 1}^{k} ({(\sum_{i = 1}^{n} v_{j f} x_{i})}^{2} - \sum_{i = 1}^{n} v_{i f}^{2} x_{i}^{2}) \end{array}

(21)

where

v_{i}

is the latent vector for the

i

-th feature, and

v_{i f}

is the

f

-th element of the

i

-th feature [41]. The aforementioned steps achieve low-order feature crossing. Splice the embedding vectors of users and news to obtain the following:

x = c o n c a t (p_{u}^{F}, q_{i}^{F}) = [\begin{array}{l} P_{u}^{F} \\ q_{i}^{F} \end{array}]

(22)

Input the spliced vectors into the FM model, and finally get the output of FM

{\hat{y}}^{F M}

:

\begin{matrix} {\hat{y}}^{F M} & = w_{0} + \sum_{i = 1}^{n} w_{i} x_{i} + \sum_{i = 1}^{n} \sum_{j = i + 1}^{n} < v_{i}, v_{j} > x_{i} x_{j} \\ = w_{0} + \sum_{i = 1}^{n} w_{i} x_{i} + \frac{1}{2} \sum_{f = 1}^{k} ({(\sum_{i = 1}^{n} v_{j f} x_{i})}^{2} - \sum_{i = 1}^{n} v_{i f}^{2} x_{i}^{2}) \end{matrix}

(23)

The MLP introduces nonlinear elements through the activation function, and it can learn higher-order feature interactions through the multilayer neural network structure, which makes the model adapt to the complex user–news interaction relations. Similar to FM, the one-hot vectors of users and news

o_{u}

and

o_{i}

are first input into the embedding layer, which outputs the embedding vectors for MLP

p_{u}^{M}

and

q_{i}^{M}

:

p_{u}^{M} = e m b e d_{u_{2}} (o_{u}) = W_{u_{2}}^{T} o_{u}

(24)

q_{i}^{M} = e m b e d_{i_{2}} (o_{i}) = W_{i_{2}}^{T} o_{i}

(25)

Then the embedding vectors of MLP are spliced together and inputted into the MLP model. Since the

Re L U

activation function has higher computational efficiency and better interpretability compared to

s i g m o i d

and

\tanh

functions, MLP introduces nonlinear activation function

Re L U

, which makes the neural network learn the higher-order nonlinear relationship between users and news, and finally get the output of MLP

{\hat{y}}^{M L P}

:

{\hat{y}}^{M L P} = W_{4}^{T} (W_{3}^{T} a (W_{2}^{T} a (W_{1}^{T} [\begin{array}{l} p_{u}^{M} \\ q_{i}^{M} \end{array}] + b_{1}) + b_{2}) + b_{3}) + b_{4}

(26)

Re L U (x) = \{\begin{cases} x, x \geq 0 \\ 0, x < 0 \end{cases}

(27)

3.4. Hybrid Recommendation

The neural collaborative filtering model relies on deep neural networks with strong learning ability to achieve interactive feature learning between users and news, considering the similarity between users and their nearest neighbors, finding similar users based on their clicking behavior, and news preferred by similar users is recommended. Meanwhile, content-based news recommendation models user interest features based on the consideration of title–content matching. A hybrid approach combining content recommendation based on title–content matching and neural collaborative filtering recommendation based on FM predicts the final click probability.

User interest preference features based on title–content matching

U_{s}

are fitted with candidate news features

V_{n e w s}

constructed based on BERT–CNN, yielding an output of

{\hat{y}}^{C}

. In neural collaborative filtering, the outputs of FM and MLP are

{\hat{y}}^{F M}

and

{\hat{y}}^{M L P}

, and these three outputs are mixed. Due to the weaker learning capability of linear methods and the propensity for overfitting when directly modeling all orders of cross-features with only three features, resulting in reduced generalization ability, a single hidden-layer MLP is utilized as the hybrid recommendation module.

ϕ_{2} = a (W_{1}^{T} [\begin{array}{l} {\hat{y}}^{F M} \\ {\hat{y}}^{M L P} \\ {\hat{y}}^{C} \end{array}] + b_{1})

(28)

\hat{y} = σ (W_{2}^{T} ϕ_{2} + b_{2})

(29)

L e a k y Re L U (x) = \{\begin{cases} x, x \geq 0 \\ α x, x < 0 \end{cases}

(30)

where

σ

is a

s i g m o i d

function that maps values to between 0 and 1 to get the final predicted click probability

\hat{y}

,

a

is the

L e a k y Re L U

function. The

L e a k y Re L U

function as shown in Equation (30), the value is negative when the input is less than 0, allowing for a small gradient, where

α

is a small value, often set to 0.01.

During the training process, Binary Cross Entropy (BCE) is used to compute the loss, as shown in Equation (31):

B C E L o s s (\hat{y}, y) = - \frac{1}{n} \sum_{i = 1}^{n} (y_{i} \log_{2} {\hat{y}}_{i} + (1 - y_{i}) \log_{2} (1 - {\hat{y}}_{i}))

(31)

After calculating the loss, gradients are backpropagated, and the model iteratively updates its parameters until convergence.

4. Experimental Results and Analysis

4.1. Dataset and Parameter Settings

4.1.1. Dataset and Data Processing

This study selected the publicly available dataset released by the Datacastle competition, which was collected from the well-known financial news website Caixin. The dataset statistics are as shown in Table 1. This dataset mainly includes six items: user ID, news ID, browsing timestamp, news title, news content, and news publication time. This paper primarily utilizes the first five items.

To preprocess the dataset, we removed data in the dataset where any field is empty, removed data with title and content lengths less than 10, and removed duplicate news IDs and user IDs to get 9395 users and 5853 news items. For duplicate browsing records, only the records with the latest timestamp are kept. Due to the timeliness of news, the preprocessed news browsed by the user’s history was divided in chronological order, with a training set to test set ratio of 5:1. The training set’s positive to negative sample ratio was 1:4, where negative samples were drawn from news not browsed by users, selected randomly. The dataset contains 474,850 data samples, of which the test set contains 82,545 samples and the training set contains 392,305 samples. There are 78,461 positive samples and 313,844 negative samples in the training set.

4.1.2. Parameter Settings

In this paper, the model is implemented based on PyTorch 11.3 framework, the specific model parameters are set as shown in Table 2 below, and the optimizer uses Adam.

To evaluate the performance of the model presented in this paper, the following five widely used metriare adopted: the Area Under Curve (AUC), which measures the ability of the recommendation system to distinguish between news of interest to the user and news not of interest, as shown in Equation (32); the Mean Reciprocal Rank (MRR), focusing on the ranking of the first positive sample in the recommendation list, emphasizing the accuracy of the list’s ranking, as shown in Equation (33); the Normalized Discounted Cumulative Gain (nDCG), an indicator for assessing the quality of ranking tasks, as shown in Equation (34); the F1 Score, which is the harmonic mean of precision and recall, aiming to consider both the accuracy and coverage of the classification model; and the Gini coefficient, used to measure the inequality of the recommendation results and to assess whether the distribution of user interest across different news is balanced. The final performance comparison was conducted on the test set. To ensure the accuracy of the experimental results, each experiment was repeated three times, and the average value of the evaluation metrics was taken as the final result.

A U C = \frac{1}{|S|} \sum_{u = 1}^{|s|} \frac{\sum_{υ \in v_{p}^{u}} r_{υ}^{u} - \frac{|v_{p}^{u}| \times (v_{p}^{u} + 1)}{2}}{|v_{p}^{u}| \times |v_{n}^{u}|}

(32)

M R R = \frac{1}{|S|} \sum_{u = 1}^{|s|} \frac{1}{r a n k_{i}}

(33)

\begin{array}{l} n D C G @ K = \frac{D C G_{u} @ K}{I D C G_{u}} \\ D C G_{u} @ K = \sum_{i = 1}^{K} \frac{2^{r e l_{i}^{u}} - 1}{\log_{2} (i + 1)} \\ I D C G_{u} = \max (\sum_{i = 1}^{m} \frac{2^{r e l_{i}^{u}} - 1}{\log_{2} (i + 1)}, 1) \end{array}

(34)

where

u

represents the user,

S

is the number of users,

v_{p}^{u}

and

v_{n}^{u}

denote positive and negative samples, respectively, and

r_{υ}^{u}

is the ranking of positive samples sorted by probability scores from lowest to highest.

r a n k_{i}

is the rank of the

i

-th sample.

K

is the size of the recommendation list;

r e l u_{i}^{u}

is the recommendation result at position

i

, where

r e l u_{i}^{u}

equals 0 when the user does not click on the

i

-th news; otherwise, it equals 1.

D C G_{u}

is the summation of values in the recommendation list with the predicted scores sorted from large to small, while

I D C G_{u}

is the normalized

D C G_{u}

in the ideal condition with the ground-truth labels sorted from 1 to 0.

4.2. Performance Evaluation

To evaluate the performance of the method presented in this paper, it was compared with recent advanced news recommendation algorithms, including:

-: NRMS [21], which utilizes word-level attention and multi-head self-attention to learn news representations from news titles and employs news-level multi-head self-attention to capture the relationships between historically browsed news for learning user representations.
-: NAML [23], which proposes a multi-view news learning method that considers titles, content, and categories as different views. View-level attention and word-level attention are used to learn news representations, and user representations are learned based on the news browsed by users.
-: LSTUR [42], which proposes a recommendation method incorporating long and short-term user interests, learning news representations from titles and topic categories, and employing GRU network to learn users’ short-term representations while shaping long-term user interests from their entire click history.
-: TANR [43], which proposes a neural news recommendation approach with topic-aware news representations and utilizes CNN to extract features from news titles and append news topic categorization. It learns the representations of users from their browsed news and uses attention networks to select informative news for user representation learning.
-: DFFA [26], which extracts the feature matrix of news text through the CNN injecting attention mechanism. By adding time series prediction to the news that users have browsed and injecting multi-head self-attention mechanism, the interest characteristics of users are extracted.
-: base, which extracts features of candidate news articles using the BERT model, combines semantic vectors of news with sequences of user historical behavior, employs GRU for sequential modeling of users, and integrates neural collaborative filtering based on FM.

The performance of this paper’s model and all of the baseline models on the Caixin dataset was evaluated and the results are shown in Table 3. All of the above methods are deep-learning-based news recommendation methods that are currently highly representative and widely compared, ensuring the authority of our comparisons. Additionally, the modules used in these methods are similar to those in our model. More importantly, they represent user interest models using either the title or a concatenation of the title and content, neglecting the issue of title–content mismatching. In contrast, our method constructs a user interest model based on title–content matching, making our comparative experiments more meaningful. There are several observations from the summary in Table 3.

(1) Models considering user sequence features, such as LSTUR, DFFA, and Our Model-base, outperform NAML, NRMS, and TANR. This indicates that the time series features of users’ historically browsed news articles can better reflect user characteristics, and accurate user interest modeling is crucial for news recommendation.

(2) Our Model-base and DFFA outperform other models, suggesting that the BERT model can delve deeper into news content semantics and consider news contextual features more effectively than other models.

(3) Our Model-base outperforms NAML, NRMS, TANR, LSTUR, and DFFA, indicating that fusing FM-based neural collaborative filtering can alleviate data sparsity while improving the quality of recommendations.

(4) The method presented in this paper consistently outperforms compared baseline models, further indicating that addressing the issue of title–content mismatching is vital for accurately modeling user interest preferences and improving the quality of news recommendations. Unlike baseline methods, our approach models the context of titles and content with interactive attention networks on top of deeply extracting news semantics with BERT, enhancing news understanding to help more accurately measure title and content relevance, and judging the degree of title–content matching to help more accurately model user interest preferences. Furthermore, integrating improved neural collaborative filtering and considering the potential relationships between users for recommendations alleviates the issue of title–content mismatching without considering text content.

As shown in Table 4, Our Model metrics are better than those of Our Model-base, because modeling users based on title–content matching can more accurately capture user interest preferences. The Gini coefficient is used to measure the degree of diversity of user interests in a recommender system, and the diversity of the two methods varies less because both methods use hybrid recommendation, which can effectively improve the performance of recommendation.

4.3. Ablation Study

4.3.1. Hyperparameter Analysis

(1): User Historical Sequence Length

The impact of the length of user historical sequences on recommendation effectiveness was considered. The variation of recommendation performance with different numbers of historical browsing actions is illustrated in Figure 5, where the horizontal axis represents the evaluation metrics and the vertical axis represents the values of these metrics. The F1 Score and Gini coefficient are shown in Figure 5b, with values ranging from 0.34 to 0.38. The results indicate that recommendation effectiveness improves with the increase in the length of user historical sequences. This improvement occurs because, when the user historical behavior sequence is too short, only a recent period of browsing records is used, lacking sufficient behavioral data to accurately understand user interest preferences. When the user historical sequence is too long, performance begins to decline. This decline is due to user interests changing over time, where some older news has less reference value, and not all news will influence user interest preferences. Therefore, the length of user historical sequences affects the accuracy of user interest preference modeling, thereby influencing recommendation performance. Setting the user historical sequence to a moderate value, such as 35, is more suitable for our method. When user historical reading records exceed this range, the earliest part of the reading time is taken.

(2): Impact of the Title–Content Matching Threshold λ

The effect of the threshold

λ

of title–content matching on the performance of the method in this paper was investigated. The recommendation performances with different

λ

values are shown in Figure 6. The results indicate that the performance of our method gradually increases with the growth of

λ

. This is because the value of

λ

is key to constructing user interest features, and a too-small

λ

would result in the underutilization of title–content matching detection, with most news being judged as title–content matching, leading to inaccurate user interest modeling. When

λ

is too large, performance begins to decline. This is because a too-large

λ

results in most news being deemed title–content mismatching, causing user interest features to overly derive from clicked news titles without adequately utilizing news content, thus losing important information and not sufficiently mining user interests. In Figure 6b, the Gini coefficient represents the diversity of recommendation. The better the performance of the recommendation, the lower the diversity of this method. Therefore, setting the value of

λ

to 0.75 is more suitable for the dataset used in this paper.

4.3.2. The Effectiveness of the Title–Content Matching Degree Detection Module

In this section, experiments were conducted to verify the effectiveness of the title–content matching degree detection module used in this paper. To assess the effectiveness of the title–content matching degree detection module, two variant models were set up:

Title + Content: this variant removes the title–content matching degree detection module and directly concatenates the title and content processed through BERT and the interactive attention network, serving as the feature representation of the news;
Title: only the titles of the news articles browsed by the user are used as representations of the user’s interest features.
Cosine similarity: this approach directly uses cosine similarity to calculate the title–content matching degree;
Siamese network: this method employs a Siamese network to calculate the title–content matching degree.

In the experiments, the optimal hyperparameters validated earlier were used, and the results are shown in Figure 7. The results indicate that directly concatenating the title and content, as well as using only the title, yields the worst effects, demonstrating the crucial role of title–content matching degree detection in the methodology of this paper. This suggests that mismatches between titles and content can lead to the imprecise modeling of user features, and conducting title–content matching degree checks before constructing user interest preferences helps to correct the user interest preferences, which is vital for accurately modeling user interests. The effect of using only the title is worse than concatenating the title and content, indicating that titles are relatively short and cannot accurately express users’ interests. Moreover, when calculating the title– content matching degree, using a Siamese neural network outperforms the direct use of cosine similarity. This indicates that the Siamese neural network, employing Bi-LSTM at the encoding layer, can help learn more accurate representations of titles and content for the computation of the title–content matching degree, further demonstrating the effectiveness of the title–content matching degree detection module based on the Siamese neural network adopted in this paper.

4.3.3. The Effectiveness of the Interactive Attention Network

In this section, the impact of interactive attention on the performance of our method is explored, as shown in Figure 8. Comparing the model without the interactive attention network with the method in this paper, from the results, the performance of our method is significantly better than that of the model without the interaction attention network module, which shows the effectiveness of the interaction attention network. This is because the interaction attention network can help capture the relationship between news titles and content, combining the advantages of phenotypic and interactive types to enhance the representations of titles and content. This facilitates the assessment of the relevance between the title and the body, aiding in the calculation of the title–content matching degree.

4.3.4. The Effectiveness of Neural Collaborative Filtering Based on FM

In this section, the effectiveness of the neural collaborative filtering based on FM in our method is intuitively explored from three perspectives:

w/o NCF: A method without the integration of neural collaborative filtering, solely utilizing content-based deep learning for recommendations;
NCF: A method that integrates deep learning approaches based on title–content matching degree detection with traditional neural collaborative filtering;
FM–NCF: A method that integrates deep learning methods based on title–content matching degree detection with neural collaborative filtering based on FM.

The experiments used the optimal hyperparameters verified above, and the results are shown in Figure 9. The results indicate that recommendations integrating neural collaborative filtering significantly outperform those based only on deep learning. This demonstrates that neural collaborative filtering can alleviate the issue of title–content mismatching and enhance recommendation performance. This is because neural collaborative filtering does not rely on any additional information, meaning it does not consider text content, thus avoiding the issue of title–content mismatching altogether and recommending based solely on the user’s nearest neighbors’ other clicked news. As shown in Figure 9b, the diversity metric significantly decreases after integrating neural collaborative filtering, which is due to NCF often emphasizing popular news in user–news interaction data, increasing inequality in the recommendation results. Moreover, neural collaborative filtering, by solving the title–content mismatching issue, makes user interest modeling too precise, overlooking users’ diverse needs. Furthermore, neural collaborative filtering based on FM significantly outperforms traditional neural collaborative filtering, indicating that FM-based neural collaborative filtering can effectively improve recommendation performance. This is because FM can learn the implicit feature vectors of interactions between users and news, flexibly handling the complex relationships between high-order non-linear features. Both ablation experiment results validate the effectiveness of integrating neural collaborative filtering based on FM in this paper.

5. Conclusions

In this paper, we propose a hybrid recommendation method that considers the degree of title–content matching. An interactive attention mechanism is employed in the deep learning recommendation module to learn the relevance between titles and content. Title–content matching is assessed before modeling user interest preferences. In the neural collaborative filtering module, we utilize FM instead of the original GMF to model both low-order and high-order feature interactions between users and news articles, alleviating data sparsity and considering the latent relationships among users. The final recommendation result is a blend of deep learning recommendations based on title–content matching degree detection and neural collaborative filtering recommendations based on FM. Extensive experiments have been conducted on a real-world news dataset. The proposed model achieved an nDCG of 83.03%, MRR of 81.88%, AUC of 85.22%, and F1 Score of 35.10%. These results indicate that our method outperforms many baseline models in recommendation performance, thereby verifying the effectiveness of our approach.

However, we did not consider the title–content matching degree of candidate news in this paper. Moreover, our proposed method is more suitable for entertainment news and news with exaggerated titles. In future work, we will focus on the candidate news recommended to users to improve user satisfaction. Additionally, we plan to further optimize the performance of the recommendation algorithm, especially in addressing the cold start problem and enhancing user experience. We aim to apply transfer learning and incremental learning, dynamically model user interests, perform sentiment analysis on the news articles a user has historically browsed, and explore the application of our method on larger and more diverse datasets.

Author Contributions

Conceptualization, S.J. and Y.L.; methodology, Y.L.; software, Y.L. and H.S.; validation, Y.L., Z.L. and H.S.; formal analysis, Y.Z.; investigation, Y.L.; resources, Y.Z.; data curation, Z.L.; writing—original draft preparation, Y.L.; writing—review and editing, S.J.; visualization, Y.L. and H.S.; supervision, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors would like to thank all the reviewers for their constructive comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wu, C.H.; Wu, F.Z.; Huang, Y.F.; Xie, X. Personalized News Recommendation: Methods and Challenges. ACM Trans. Inf. Syst. 2023, 41, 1–50. [Google Scholar] [CrossRef]
Jiang, S.; Shang, J.; Guo, J.; Zhang, Y. Multi-Strategy Improved Flamingo Search Algorithm for Global Optimization. Appl. Sci. 2023, 13, 5612. [Google Scholar] [CrossRef]
Jiang, S.; Wang, M.; Guo, J.; Wang, M. K-means clustering algorithm based on improved flower pollination algorithm. J. Electron. Imaging 2023, 32, 032003. [Google Scholar] [CrossRef]
Okura, S.; Tagami, Y.; Ono, S.; Tajima, A. Embedding-based news recommendation for millions of users. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 1933–1942. [Google Scholar]
Wang, H.; Zhang, F.; Xie, X.; Guo, M. DKN: Deep knowledge-aware network for news recommendation. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018; pp. 1835–1844. [Google Scholar]
Fan, L.; Xu, F.; Sun, Y.; Zhou, H. News Recommendation Algorithm Based on Multiple Perspectives. In Proceedings of the 2022 International Conference on Computer Network, Electronic and Automation (ICCNEA), Xi’an, China, 23–25 September 2022; pp. 21–25. [Google Scholar]
Wu, C.; Wu, F.; Qi, T.; Huang, Y. Clickbait detection with style-aware title modeling and co-attention. In Proceedings of the Chinese Computational Linguistics: 19th China National Conference, CCL 2020, Haikou, China, 30 October–1 November 2020; Proceedings 19. pp. 430–443. [Google Scholar]
Wei, F.; Nguyen, U.T. A neural attentive model using human semantic knowledge for clickbait detection. In Proceedings of the 2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), Exeter, UK, 17–19 December 2020; pp. 770–776. [Google Scholar]
Mao, Z.; Zeng, X.; Wong, K.-F. Neural news recommendation with collaborative news encoding and structural user encoding. arXiv 2021, arXiv:2109.00750. [Google Scholar]
Wu, C.; Wu, F.; Qi, T.; Huang, Y. User Modeling with Click Preference and Reading Satisfaction for News Recommendation. In Proceedings of the IJCAI, Yokohama, Japan, 7–15 January 2021; pp. 3023–3029. [Google Scholar]
Jeong, E.; Kim, G.; Kang, S. Multimodal Prompt Learning in Emotion Recognition Using Context and Audio Information. Mathematics 2023, 11, 2908. [Google Scholar] [CrossRef]
Nadimi-Shahraki, M.-H.; Bahadorpour, M. Cold-start problem in collaborative recommender systems: Efficient methods based on ask-to-rate technique. J. Comput. Inf. Technol. 2014, 22, 105–113. [Google Scholar] [CrossRef]
Darvishy, A.; Ibrahim, H.; Sidi, F.; Mustapha, A. HYPNER: A Hybrid Approach for Personalized News Recommendation. IEEE Access 2020, 8, 46877–46894. [Google Scholar] [CrossRef]
He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T.-S. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 173–182. [Google Scholar]
Wang, J.; Yu, G. Movie Recommendation System Based on Improved Neural Collaborative Filtering Model. Comput. Eng. Des. 2020, 41, 2069–2075. [Google Scholar]
Ji, Z.; Wu, M.; Yang, H.; Íñigo, J.E.A. Temporal sensitive heterogeneous graph neural network for news recommendation. Future Gener. Comput. Syst. 2021, 125, 324–333. [Google Scholar] [CrossRef]
Huang, J.S.; Han, Z.B.; Xu, H.Y.; Liu, H.T. Adapted transformer network for news recommendation. Neurocomputing 2022, 469, 119–129. [Google Scholar] [CrossRef]
Jiang, H.; Li, C.Z.; Cai, J.J.; Wang, J.L. MNN4Rec: A relation-aware approach based on multi-view news network for news recommendation. J. Inf. Sci. 2023. [Google Scholar] [CrossRef]
Qi, T.; Wu, F.; Wu, C.; Yang, P.; Yu, Y.; Xie, X.; Huang, Y. HieRec: Hierarchical user interest modeling for personalized news recommendation. arXiv 2021, arXiv:2106.04408. [Google Scholar]
Santosh, T.; Saha, A.; Ganguly, N. MVL: Multi-View Learning for News Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), Electr Network, Virtual Event, China, 25–30 July 2020; pp. 1873–1876. [Google Scholar]
Wu, C.; Wu, F.; Ge, S.; Qi, T.; Huang, Y.; Xie, X. Neural news recommendation with multi-head self-attention. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 6389–6394. [Google Scholar]
Zhu, Q.; Zhou, X.; Song, Z.; Tan, J.; Guo, L. Dan: Deep attention neural network for news recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 5973–5980. [Google Scholar]
Wu, C.; Wu, F.; An, M.; Huang, J.; Huang, Y.; Xie, X. Neural news recommendation with attentive multi-view learning. arXiv 2019, arXiv:1907.05576. [Google Scholar]
Wu, C.; Wu, F.; Huang, Y.; Xie, X. Neural news recommendation with negative feedback. CCF Trans. Pervasive Comput. Interact. 2020, 2, 178–188. [Google Scholar] [CrossRef]
Li, A.; He, T.; Guo, Y.; Li, Z.; Rong, Y.; Liu, G. Personalized News Recommendation with CNN and Multi-Head Self-Attention. In Proceedings of the 2022 IEEE 13th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 26–29 October 2022; pp. 0102–0108. [Google Scholar]
Liu, Y.Q.; Liu, Y.Q.; Zhang, Z.L.; Wei, Z.H.; Miao, R. News Recommendation Model with Deep Feature Fusion Injected with Attention Mechanism. J. Comput. Appl. 2022, 42, 426–432. [Google Scholar]
Zhu, P.; Cheng, D.; Luo, S.; Yang, F.; Luo, Y.; Qian, W.; Zhou, A. SI-News: Integrating social information for news recommendation with attention-based graph convolutional network. Neurocomputing 2022, 494, 33–42. [Google Scholar] [CrossRef]
Bourgonje, P.; Schneider, J.M.; Rehm, G. From clickbait to fake news detection: An approach based on detecting the stance of headlines to articles. In Proceedings of the 2017 EMNLP Workshop: Natural Language Processing Meets Journalism, Copenhagen, Denmark, 7 September 2017; pp. 84–89. [Google Scholar]
Dong, M.; Yao, L.; Wang, X.; Benatallah, B.; Huang, C. Similarity-aware deep attentive model for clickbait detection. In Advances in Knowledge Discovery and Data Mining, Proceedings of the 23rd Pacific-Asia Conference, PAKDD 2019, Macau, China, 14–17 April 2019; Proceedings, Part II; Springer: Berlin/Heidelberg, Germany, 2019; pp. 56–69. [Google Scholar]
Yin, P.B.; Pan, W.J.; Zhang, H.J.; Chen, D.G. Research on Clickbait News Identification Based on BERT-BiGA Model. J. Data Anal. Knowl. Discov. 2021, 5, 126–134. [Google Scholar]
Zheng, J.; Yu, K.; Wu, X. A deep model based on lure and similarity for adaptive clickbait detection. Knowl.-Based Syst. 2021, 214, 106714. [Google Scholar] [CrossRef]
Meng, Q.; Liu, B.; Sun, X.; Yan, H.; Liang, C.; Cao, J.; Lee, R.K.-W.; Bao, X. Attention-fused deep relevancy matching network for clickbait detection. IEEE Trans. Comput. Soc. Syst. 2022, 10, 3120–3131. [Google Scholar] [CrossRef]
Szczepański, M.; Pawlicki, M.; Kozik, R.; Choraś, M. New explainability method for BERT-based model in fake news detection. Sci. Rep. 2021, 11, 23705. [Google Scholar] [CrossRef]
Kula, S.; Choraś, M.; Kozik, R. Application of the BERT-based architecture in fake news detection. In 13th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2020); Springer: Cham, Switzerland, 2021; pp. 239–249. [Google Scholar]
Huang, P.-S.; He, X.; Gao, J.; Deng, L.; Acero, A.; Heck, L. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, San Francisco, CA, USA, 27 October–1 November 2013; pp. 2333–2338. [Google Scholar]
Zang, J.; Zhou, W.L.; Wang, Y. Semantic Matching Method Combining Multi-Head Attention Mechanism and Siamese Network. J. Comput. Sci. 2023, 50, 294–301. [Google Scholar]
Ge, S.; Wu, C.; Wu, F.; Qi, T.; Huang, Y. Graph enhanced representation learning for news recommendation. In Proceedings of the Web Conference, Taipei, Taiwan, 20–24 April 2020; pp. 2863–2869. [Google Scholar]
Yamak, P.T.; Yujian, L.; Gadosey, P.K. A comparison between arima, lstm, and gru for time series forecasting. In Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence, Sanya, China, 20–22 December 2019; pp. 49–55. [Google Scholar]
Li, X.; Li, R.; Peng, Q.; Ma, H. Candidate-Aware Attention Enhanced Graph Neural Network for News Recommendation. In Proceedings of the International Conference on Knowledge Science, Engineering and Management, Guangzhou, China, 16–18 August 2023; pp. 244–255. [Google Scholar]
Wang, Y.; Chen, Y. A Hybrid Recommendation Algorithm Combining Content and Matrix Factorization. J. Comput. Appl. Res. 2020, 37, 1359–1363. [Google Scholar]
Rendle, S. Factorization machines. In Proceedings of the 2010 IEEE International Conference on Data Mining, Sydney, Australia, 13–17 December 2010; pp. 995–1000. [Google Scholar]
An, M.; Wu, F.; Wu, C.; Zhang, K.; Liu, Z.; Xie, X. Neural news recommendation with long-and short-term user representations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 336–345. [Google Scholar]
Wu, C.; Wu, F.; An, M.; Huang, Y.; Xie, X. Neural news recommendation with topic-aware news representation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 1154–1159. [Google Scholar]

Figure 1. News articles clicked on by users.

Figure 2. The overall framework of our approach.

Figure 3. User interest modeling framework based on the detection of the consistency between titles and content.

Figure 4. Structure of Bi-LSTM neural network model.

Figure 5. Variation of indicators with length of historical series. (a) Variation of indicators with length of historical series—nDCG, MRR, and AUC. (b) Variation of indicators with length of historical series—F1 Score and Gini Coefficient.

Figure 6. The variation of each metric with the value of

λ

. (a) Variation of each metric with the value of

λ

—nDCG, MRR, and AUC. (b) Variation of each metric with the value of

λ

—F1 Score and Gini Coefficient.

Figure 6. The variation of each metric with the value of

λ

. (a) Variation of each metric with the value of

λ

—nDCG, MRR, and AUC. (b) Variation of each metric with the value of

λ

—F1 Score and Gini Coefficient.

Figure 7. Effectiveness of title–content matching detection. (a) Effectiveness of title–content matching detection—nDCG, MRR, and AUC. (b) Effectiveness of title–content matching detection—F1 Score and Gini Coefficient.

Figure 8. Effectiveness of interactive attention. (a) Effectiveness of interactive attention–nDCG, MRR, and AUC. (b) Effectiveness of interactive attention–F1 Score and Gini Coefficient.

Figure 9. Effectiveness of neural collaborative filtering based on FM. (a) Effectiveness of neural collaborative filtering based on FM—nDCG, MRR, and AUC. (b) Effectiveness of neural collaborative filtering based on FM—F1 Score and Gini Coefficient.

Table 1. Detailed dataset statistics.

#Users	9457	Avg.#Words Per News Title	14.0
#News	100,197	Avg.#Words Per News Content	584.0

Table 2. Hyperparameter values.

Name	Value
BERT model	alibaba-pai/pai-bert-tiny-zh
Batch size	64
Learning rate	2.00 × 10⁻³
Num epochs	8
Embedding dim	256
Click num	35
Title length	60
Content length	300
Candidate news word length	300
Num attention heads	16
Num GRU layers	2

Table 3. The performance of the different methods on the Caixin dataset.

Methods	MRR	nDCG@5
NAML	75.59	78.25
NRMS	76.40	79.51
TANR	75.72	80.23
LSTUR	78.88	82.38
DFFA	78.22	82.14
Our Model-base	81.36	82.48
Our Model	81.88	83.03

Table 4. The performance of the basic model.

Methods	MRR	AUC	nDCG@5	nDCG@10	F1 Score	Gini
Our Model-base	81.36	84.77	82.48	83.56	34.85	37.43
Our Model	81.88	85.22	83.03	84.16	35.10	37.15

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jiang, S.; Lu, Y.; Song, H.; Lu, Z.; Zhang, Y. A Hybrid News Recommendation Approach Based on Title–Content Matching. Mathematics 2024, 12, 2125. https://doi.org/10.3390/math12132125

AMA Style

Jiang S, Lu Y, Song H, Lu Z, Zhang Y. A Hybrid News Recommendation Approach Based on Title–Content Matching. Mathematics. 2024; 12(13):2125. https://doi.org/10.3390/math12132125

Chicago/Turabian Style

Jiang, Shuhao, Yizi Lu, Haoran Song, Zihong Lu, and Yong Zhang. 2024. "A Hybrid News Recommendation Approach Based on Title–Content Matching" Mathematics 12, no. 13: 2125. https://doi.org/10.3390/math12132125

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid News Recommendation Approach Based on Title–Content Matching

Abstract

1. Introduction

2. Related Work

2.1. News Recommendation Algorithm

2.2. Detection of Title–Content Matching Degree

3. Method

3.1. User Modeling Based on Title–Content Matching Degree Detection

3.1.1. Extraction of Phenotypic and Interactive Title and Content Features

3.1.2. Title–Content Matching Degree Detection Based on Siamese Neural Network

3.1.3. User Interest Modeling Based on Title–Content Matching

3.2. Candidate News Feature Modeling Based on BERT–CNN

3.3. Neural Collaborative Filtering Based on FM

3.4. Hybrid Recommendation

4. Experimental Results and Analysis

4.1. Dataset and Parameter Settings

4.1.1. Dataset and Data Processing

4.1.2. Parameter Settings

4.2. Performance Evaluation

4.3. Ablation Study

4.3.1. Hyperparameter Analysis

4.3.2. The Effectiveness of the Title–Content Matching Degree Detection Module

4.3.3. The Effectiveness of the Interactive Attention Network

4.3.4. The Effectiveness of Neural Collaborative Filtering Based on FM

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI