research-article

A Principled Decomposition of Pointwise Mutual Information for Intention Template Discovery

Authors:

Denghao Ma,

Kevin Chen-Chuan Chang,

Yueguo Chen,

Xueqiang Lv,

Liang ShenAuthors Info & Claims

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Pages 1746 - 1755

https://doi.org/10.1145/3583780.3614767

Published: 21 October 2023 Publication History

Get Access

Abstract

With the rise of Artificial Intelligence (AI), question answering systems have become common for users to interact with computers, e.g., ChatGPT and Siri. These systems require a substantial amount of labeled data to train their models. However, the labeled data is scarce and challenging to be constructed. The construction process typically involves two stages: discovering potential sample candidates and manually labeling these candidates. To discover high-quality candidate samples, we study the intention paraphrase template discovery task: Given some seed questions or templates of an intention, discover new paraphrase templates that describe the intention and are diverse to the seeds enough in text. As the first exploration of the task, we identify the new quality requirements, i.e., relevance, divergence and popularity, and identify the new challenges, i.e., the paradox of divergent yet relevant paraphrases, and the conflict of popular yet relevant paraphrases. To untangle the paradox of divergent yet relevant paraphrases, in which the traditional bag of words falls short, we develop usage-centric modeling, which represents a question/template/answer as a bag of usages that users engaged (e.g., up-votes), and uses a usage-flow graph to interrelate templates, questions and answers. To balance the conflict of popular yet relevant paraphrases, we propose a new and principled decomposition for the well-known Pointwise Mutual Information from the usage perspective (usage-PMI), and then develop a Bayesian inference framework over the usage-flow graph to estimate the usage-PMI. Extensive experiments over three large CQA corpora show strong performance advantage over the baselines adopted from paraphrase identification task. We release 885,000 paraphrase templates of high quality discovered by our proposed PMI decomposition model, and the data is available in site https://github.com/Para-Questions/Intention\_template\_discovery.

References

[1]

Basant Agarwal, Heri Ramampiaro, Helge Langseth, and Massimiliano Ruocco. 2018. A deep network model for paraphrase detection in short text messages. Inf. Process. Manag., Vol. 54, 6 (2018), 922--937. https://doi.org/10.1016/j.ipm.2018.06.005

Abstract

References

Index Terms

Recommendations

Role of Paraphrases in PB-SMT

Linguistic resources for paraphrase generation in portuguese: a lexicon-grammar approach

Two multivariate generalizations of pointwise mutual information

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations