research-article

Neural Topic Modeling via Discrete Variational Inference

Authors:

Amulya Gupta,

Zhu ZhangAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology, Volume 14, Issue 2

Article No.: 23, Pages 1 - 33

https://doi.org/10.1145/3570509

Published: 16 February 2023 Publication History

Get Access

Abstract

Topic models extract commonly occurring latent topics from textual data. Statistical models such as Latent Dirichlet Allocation do not produce dense topic embeddings readily integratable into neural architectures, whereas earlier neural topic models are yet to fully take advantage of the discrete nature of the topic space. To bridge this gap, we propose a novel neural topic model, Discrete-Variational-Inference-based Topic Model (DVITM), which learns dense topic embeddings homomorphic to word embeddings via discrete variational inference. The model also views words as mixtures of topics and digests embedded input text. Quantitative and qualitative evaluations empirically demonstrate the superior performance of DVITM compared to important baseline models. In the end, case studies on text generation from a discrete space and aspect-aware item recommendation are presented to further illustrate the power of our model in downstream tasks.

Appendices

A Baseline Topic Modeling Architectures

Fig. 8.

Fig. 9.

Fig. 10.

Fig. 11.

B Latent Topics Identified by Various Models

Table 9.

Model	Topic ID	Topic Coherence	Top 10 nearest words
ProdLDA	0	0.4658	[minnesota, angeles, toronto, san, montreal, vancouver, ottawa, stanley, calgary, louis]
	1	0.3945	[finish, play, team, hitter, offense, smith, defensive, ice, tie, hit]
	2	0.5129	[christian, god, scripture, interpretation, jesus, resurrection, teaching, doctrine, existence, holy]
	3	0.3626	[troops, israel, border, turks, army, israeli, fire, arab, civilian, minority]
	4	0.5116	[ram, windows, quadra, fine, speed, scsi, faster, microsoft, apple, external]
	5	0.3740	[entry, char, compile, db, file, section, variable, contest, distribution, remark]
	6	0.5557	[christian, scripture, god, teaching, resurrection, existence, jesus, christianity, doctrine, biblical]
	7	0.3890	[ibm, interface, transfer, external, virtual, cpu, ram, dec, path, default]
	8	0.4294	[fire, car, safety, btw, andy, surrender, country, rider, cold, stupid]
	9	0.4889	[wiretap, escrow, drug, agency, warrant, clipper, illegal, encryption, crime, country]
	10	0.4657	[wiretap, encryption, escrow, nsa, chip, clipper, pat, cheaper, scheme, car]
	11	0.4874	[turks, armenian, army, turkish, russian, armenia, united, mountain, armenians, director]
	12	0.4587	[music, fine, crash, ram, sl, external, simm, honda, wm, hd]
	13	0.4824	[bmw, car, bike, rider, baseball, btw, cop, ball, hit, motorcycle]
	14	0.4628	[satellite, mission, distribute, nasa, spacecraft, km, space, earth, distribution, module]
	15	0.5309	[fine, ram, windows, apple, amp, simm, external, scsi, crash, quadra]
	16	0.4720	[agency, encryption, cryptography, telephone, wiretap, enforcement, des, privacy, distribution, encrypt]
	17	0.3659	[neighbor, christian, heart, harm, building, scripture, jesus, daughter, woman, holy]
	18	0.4507	[scsi, ram, hd, external, mhz, meg, scsus, ide, fine, bus]
	19	0.5172	[sl, wm, ram, mi, external, hd, connector, mg, mb, mw]
ProdLDA\(^\prime\)	0	0.3261	[first, series, information, next, team, get, san, mailing, use, also]
	1	0.5680	[buf, hp, exit, oname, char, printf, toolkit, mov, bh, saturn]
	2	0.6156	[wings, flyers, puck, pit, leafs, scripture, que, hitter, resurrection, penalty]
	3	0.5557	[lebanese, armenians, apartment, troops, azerbaijan, hitter, wings, coach, armenian, armenia]
	4	0.3214	[key, strong, secret, warrant, people, la, phone, des, algorithm, rights]
	5	0.3737	[power, little, league, game, head, go, team, make, turn, sport]
	6	0.4611	[lebanese, muslim, batf, bike, lebanon, arabs, troops, witness, apartment, massacre]
	7	0.5425	[coach, braves, leafs, hitter, rangers, tor, playoff, stanley, puck, pat]
	8	0.5674	[scsus, sl, mb, quadra, hd, meg, pd, byte, motherboard, workstation]
	9	0.6024	[satan, ford, simm, resurrection, bmw, bike, doctrine, gear, quadra, scripture]
	10	0.6974	[encrypt, encryption, wiretap, escrow, crypto, cipher, rsa, anonymous, pgp, cryptography]
	11	0.3734	[vehicle, police, chip, technique, transmission, dealer, cop, car, traffic, radar]
	12	0.6243	[mi, eus, spacecraft, mw, moon, mg, orbit, ah, rg, ax]
	13	0.3331	[want, buy, like, drive, solution, control, people, know, driver, anyone]
	14	0.4621	[armenia, morality, proceed, revelation, bmw, verse, turks, soul, resurrection, bike]
	15	0.5764	[xlib, toolkit, pixel, xterm, visual, motif, turbo, microsoft, printf, meg]
	16	0.4564	[bhj, spacecraft, bh, eus, wm, byte, device, ripem, wire, digital]
	17	0.4783	[mhz, mb, adapter, bhj, scsi, scsus, windows, wm, hd, isa]
	18	0.3429	[escape, window, conflict, see, document, terrorist, let, shell, motif, cambridge]
	19	0.4948	[militia, arab, palestinian, homicide, lebanese, arabs, turks, armenia, troops, armenians]
ETM	0	0.3489	[get, know, one, say, think, like, see, thing, people, time]
	1	0.3254	[use, may, make, case, many, also, part, however, system, president]
	2	0.4505	[god, jesus, christian, say, believe, bible, one, christ, make, belief]
	3	0.3682	[gun, people, child, kill, drug, crime, weapon, police, case, claim]
	4	0.5255	[datum, space, db, launch, output, hus, widget, dod, nasa, sun]
	5	0.3390	[file, use, program, send, available, list, code, email, please, line]
	6	0.4139	[hockey, team, new, division, san, canada, nhl, toronto, york, gm]
	7	0.3439	[new, look, buy, price, good, sell, include, package, offer, like]
	8	0.3851	[car, power, use, drive, speed, engine, wire, water, fast, low]
	9	0.4651	[game, year, play, win, team, player, season, go, good, run]
	10	0.3312	[information, make, get, please, mail, use, go, file, help, take]
	11	0.3378	[book, first, one, time, science, study, earth, author, history, find]
	12	0.3516	[university, group, internet, information, computer, fax, center, year, call, research]
	13	0.3628	[thanks, david, john, appreciate, steve, mark, wonder, jim, mike, michael]
	14	0.3749	[write, article, post, question, read, opinion, ask, please, yes, answer]
	15	0.3397	[period, la, pt, vs, de, van, pp, cal, power, second]
	16	0.3607	[key, government, law, use, encryption, state, chip, public, right, security]
	17	0.3354	[go, take, back, one, day, put, get, right, also, call]
	18	0.3987	[use, drive, system, window, card, run, windows, disk, problem, image]
	19	0.4854	[israel, people, war, israeli, jews, turkish, armenians, country, armenian, government]
\(\text{ETM}^{\prime }\)	0	0.3749	[use, wiring, connector, code, line, voltage, get, ground, find, might]
	1	0.3876	[game, read, news, times, know, beat, hear, braves, go, back]
	2	0.5399	[apple, macintosh, amiga, graphics, processor, pc, modem, computer, server, printer]
	3	0.3503	[know, want, get, really, say, see, ca, never, think, tell]
	4	0.3341	[think, know, name, really, something, people, many, feel, thing, like]
	5	0.4001	[gm, please, want, get, maybe, know, somebody, make, lot, hey]
	6	0.3833	[book, verse, copy, write, author, manual, guide, reader, story, edition]
	7	0.4091	[tax, billion, budget, pay, fee, federal, package, dollar, money, please]
	8	0.3644	[get, know, want, think, tell, go, really, take, find, ask]
	9	0.3416	[god, know, say, think, see, like, make, use, want, one]
	10	0.3316	[like, make, use, write, go, many, get, help, need, know]
	11	0.3827	[league, nhl, game, team, list, mail, international, use, first, new]
	12	0.4393	[peace, palestinian, visit, state, foreign, israel, islamic, arab, conference, muslim]
	13	0.5090	[fax, computer, pc, electronic, systems, hardware, software, graphics, nt, server]
	14	0.3841	[go, let, oh, hey, please, eat, red, hang, stay, waco]
	15	0.4863	[belief, christianity, religion, islam, faith, people, muslims, sense, believe, god]
	16	0.3508	[shell, institute, justice, research, fund, professor, microsoft, minority, secretary, panel]
	17	0.3367	[god, see, use, know, say, go, want, like, one, need]
	18	0.4485	[federal, constitution, law, amendment, state, act, government, enforcement, authority, right]
	19	0.5186	[christianity, religion, armenians, islam, jews, religious, turks, turkish, muslims, turkey]

Table 9. 20 Latent Topics Identified by Baseline Models from 20Ng

C Baseline Models in Case Studies

C.1 VAE-LM: Baseline Model in Case Study 1

VAE-LM [4] (Figure 12) leverages the Gaussian RT from VAE in the language modeling task. Specifically, the model combines following three components:

Fig. 12.

(1)

LSTM [16] encoder network, Enc(.),

(2)

Gaussian RT from VAE (see Section 2.2), and

(3)

LSTM decoder network, Dec(.).

Given a example text of the form \({\bf x} = (x_1, x_2 \ldots x_n)\) where each \(x_i\) comes from a fixed set of vocab, an encoder (component 1) is utilized to encode the text (see Figure 12 in Appendix A):

\begin{equation} {\bf h} = \text{Enc}({\bf x}), \end{equation}

(31)

where h represents the encoded representation of text x.

At this point, similarly to ProdLDA (see Section 2.3), several feed-forward linear networks (FF1 and FF2) are utilized to produce a K-dimension mean vector \(\boldsymbol {\mu }\) and a log-variance vector \(\boldsymbol {\sigma }^2\) of the same dimension. Then, the RT (component 2) [20] is applied to sample a standard Gaussian vector of dimension \(K\). At this point, the sampled latent representation, \({\bf z}\), is formulated by using Equation (2).

Last, a LSTM-decoder (component 3) is applied to reconstruct the original text conditioned on the latent representation, \({\bf z}\).

\begin{equation} {\bf x}^{\prime \prime } = \text{Dec}({\bf z}), \end{equation}

(32)

where \({\bf x}^{\prime \prime }\) represents the generated text.

Consequently, one can utilize the loss function in Equation (1) to train the neural-network parameters. For VAE-LM, various implementations of encoder and decoder exist. In this case study, we consider the implementation where encoder (Enc(.)) digests both word embeddings and character embeddings available in the input text, x. Additionally, the decoder (Dec(.)) ingests latent representation \({\bf z}\) at every step of generation as shown in Figure 12.

C.2 AARM: Baseline Model in Case Study 2

AARM [11] (Figure 13) considers the following three modules to compute a rating, \(\hat{y}_{uv}\), for a user u and a product v. Let us assume a pre-trained aspect embedding matrix, \({\bf W}_A\) \(\in\)\(\mathbb {R}\) \(^{d_a x A}\), a learnable user embedding matrix, \({\bf W}_U\) \(\in\) \(\mathbb {R}\) \(^{d_g x U}\), and a learnable product embedding matrix, \({\bf W}_V\) \(\in\) \(\mathbb {R}\) \(^{d_g x V}\), where A, U, and V represents total number of aspects, users, and products, respectively, and \(d_a\), \(d_g\) correspond to the dimension of any aspect, user, and product representation. Now, for a user u and for a product v, let us further assume that \(M_u\) and \(M_v\) aspects are extracted (by Sentires), which results in projections \({\bf F}_u\) \(\in\) \(\mathbb {R}\) \(^{d_a x M_u}\) and \({\bf F}_v\) \(\in\) \(\mathbb {R}\) \(^{d_a x M_v}\). Next, the following modules are implemented to predict the rating (\(\hat{y}_{uv}\)) between user u and item v.

Fig. 13.

(1)

Aspect Interaction

•

Aspect Embedding Transformation

\begin{equation} \begin{split} {\bf C}_u = \frac{{\bf W}_{trans}{\bf F}_u}{||{\bf W}_{trans}{\bf F}_u||}, {\bf C}_v = \frac{{\bf W}_{trans}{\bf F}_v}{||{\bf W}_{trans}{\bf F}_v||}. \end{split} \end{equation}

(33)

•

Aspect Interaction Layer

\begin{equation} {\bf F}_{AI}(u, v) = \lbrace {\bf C}_u \odot {\bf C}_v({\bf X}_u{\bf X}_v)\rbrace . \end{equation}

(34)

•

Aspect-Level Attentive Pooling Layer

\begin{equation} \begin{split} \hat{\beta }_{u,v} &= {\bf w}_{att_{1}}^{T} ({\bf C}_u \odot {\bf C}_v({\bf X}_u{\bf X}_v)) \\ \beta _{u,v} &= \frac{\exp (\hat{\beta }_{u,v})}{\sum _{v}\exp (\hat{\beta }_{u,v})}\\ {\bf H}_{u} &= \sum _{v}\beta _{u,v}({\bf C}_u \odot {\bf C}_v({\bf X}_u{\bf X}_v)). \end{split} \end{equation}

(35)

•

User-Level Attentive Pooling Layer

\begin{equation} \begin{split} {\bf X}_{v, u} &= {\bf g}_{v} \odot {\bf C}_{u} \\ {\bf g}_{v} &= \sum _{v} {\bf C}_v \\ \hat{\gamma }_{u, v} &= {\bf w}_{att_{2}}^{T} {\bf X}_{v, u} \\ \gamma _{u,v} &= \frac{\exp (\hat{\gamma }_{u,v})}{\sum _{u}\exp (\hat{\gamma }_{u,v})}. \end{split} \end{equation}

(36)

•

Output

\begin{equation} {\bf y}_{A} (u, v) = \sum _{u} \gamma _{u,v}{\bf H}_{u}, \end{equation}

(37)

where

–

\({\bf W}_{trans}\) \(\in\) \(\mathbb {R}\)\(^{d_a x d_a}\), \({\bf w}_{att_{1}}\) \(\in\) \(\mathbb {R}\) \(^{d_a}\), \({\bf w}_{att_{2}}\) \(\in\) \(\mathbb {R}\) \(^{d_a}\) are learnable parameters.

–

\({\bf X}_u\) \(\in\) {0, 1} and \({\bf X}_u\) \(\in\) {0, 1} are masking indicators that are used to mask <PAD> tokens.

–

\({\bf y}_{A} (u, v)\) \(\in\) \(\mathbb {R}\) \(^{d_a}\) is the output of aspect interaction module.

–

\(\odot\) represents a element-wise multiplication operator.

(2)

Global Interaction

\begin{equation} {\bf y}_{G}(u, v) = {\bf p}_{u} \odot {\bf q}_{v}, \end{equation}

(38)

where

•

\({\bf p}_{u}\) corresponds to the representation of user u from embedding matrix \({\bf W}_U\).

•

\({\bf q}_{v}\) corresponds to the representation of product v from embedding matrix \({\bf W}_V\).

•

\({\bf y}_{G} (u, v)\) \(\in\) \(\mathbb {R}\) \(^{d_g}\) is the output of aspect interaction module.

(3)

Output Layer

\begin{equation} \hat{y}(u, v) = {\bf W}_{out} \texttt {concat}({\bf y}_{G}(u, v), {\bf y}_{A}(u, v)), \end{equation}

(39)

where

•

\({\bf W}_{out}\) \(\in\) \(\mathbb {R}\) \(^{1 x (d_a + d_g)}\) is a learnable parameter.

•

\(\hat{y} (u, v)\) represents the overall satisfaction of user u toward product v.

For simplicity, we summarize Equations (33)–(39) as below:

\begin{equation} \hat{y}(u, v) = \text{AARM}({\bf p}_{u}, {\bf q}_{v}, {\bf F}_{u}, {\bf F}_{v}). \end{equation}

(40)

Loss Function: According to Reference [11], AARM utilizes the Bayesian Personalized Ranking method where given a user u, a triple is formulated (u, \(v^{+}\), \(v^{-}\)) to perform the training. \(v^{+}\) refers to the product that user u has purchased whereas \(v^{-}\) represents an unpurchased item. Additionally, the positive pair (u, \(v^{+}\)), in conjunction with a negative pair (u, \(v^{-}\)), is drawn from the rating set R. In summary, AARM utilizes the following loss function to train the whole model,

\begin{equation} \begin{split} L_{bpr} &= \frac{-1}{|R|} \sum _{(u, v^{+}) \in R} \log (\sigma (\hat{y}(u, v^{+})- \hat{y}(u, v^{-}))), \end{split} \end{equation}

(41)

\begin{equation} \begin{split} L_{aarm} &= L_{bpr} + \lambda * \left(\frac{||{\bf W}_{U}^{2}||}{|{\bf W}_{U}|} + \frac{||{\bf W}_{V}^{2}||}{|{\bf W}_{V}|} + \frac{||{\bf W}_{out}^{2}||}{|{\bf W}_{out}|}\right), \end{split} \end{equation}

(42)

where second term in total loss \(L_{aarm}\) represents the regularization term to avoid overfitting.

References

[1]

Bengio Yoshua, Courville Aaron, and Vincent Pascal. 2013. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 8 (2013), 1798–1828.

Abstract

A Baseline Topic Modeling Architectures

B Latent Topics Identified by Various Models

C Baseline Models in Case Studies

C.1 VAE-LM: Baseline Model in Case Study 1

C.2 AARM: Baseline Model in Case Study 2

References

Index Terms

Recommendations

Jointly Discovering Fine-grained and Coarse-grained Sentiments via Topic Modeling

Extractive text summarization using clustering-based topic modeling

Understanding Sparse Topical Structure of Short Text via Stochastic Variational-Gibbs Inference

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Get Access

Login options

Full Access

View options

PDF

eReader

Full Text

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations