Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleAugust 2023
WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 4549–4560https://doi.org/10.1145/3580305.3599931We present WebGLM, a web-enhanced question-answering system based on the General Language Model (GLM). Its goal is to augment a pre-trained large language model (LLM) with web search and retrieval capabilities while being efficient for real-world ...
- research-articleAugust 2023
VRDU: A Benchmark for Visually-rich Document Understanding
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 5184–5193https://doi.org/10.1145/3580305.3599929Understanding visually-rich business documents to extract structured data and automate business workflows has been receiving attention both in academia and industry. Although recent multi-modal language models have achieved impressive results, we find ...
- research-articleAugust 2023
TwHIN-BERT: A Socially-Enriched Pre-trained Language Model for Multilingual Tweet Representations at Twitter
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 5597–5607https://doi.org/10.1145/3580305.3599921Pre-trained language models (PLMs) are fundamental for natural language processing applications. Most existing PLMs are not tailored to the noisy user-generated text on social media, and the pre-training does not factor in the valuable social engagement ...
- research-articleAugust 2023
Towards Suicide Prevention from Bipolar Disorder with Temporal Symptom-Aware Multitask Learning
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 4357–4369https://doi.org/10.1145/3580305.3599917Bipolar disorder (BD) is closely associated with an increased risk of suicide. However, while the prior work has revealed valuable insight into understanding the behavior of BD patients on social media, little attention has been paid to developing a ...
- research-articleAugust 2023
SMILE: Evaluation and Domain Adaptation for Social Media Language Understanding
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 3737–3749https://doi.org/10.1145/3580305.3599907We study the ability of transformer-based language models (LMs) to understand social media language. Social media (SM) language is distinct from standard written language, yet existing benchmarks fall short of capturing LM performance in this socially, ...
-
- research-articleAugust 2023
SentiGOLD: A Large Bangla Gold Standard Multi-Domain Sentiment Analysis Dataset and Its Evaluation
- Md. Ekramul Islam,
- Labib Chowdhury,
- Faisal Ahamed Khan,
- Shazzad Hossain,
- Md Sourave Hossain,
- Mohammad Mamun Or Rashid,
- Nabeel Mohammed,
- Mohammad Ruhul Amin
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 4207–4218https://doi.org/10.1145/3580305.3599904In this study, we present a Bangla multi-domain sentiment analysis dataset, named as SentiGOLD, developed using 70,000 samples, which was compiled from a variety of sources and annotated by a gender-balanced team of linguists. This dataset was created ...
- research-articleAugust 2023
Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 4333–4345https://doi.org/10.1145/3580305.3599896Social media is awash with hateful content, much of which is often veiled with linguistic and topical diversity. The benchmark datasets used for hate speech detection do not account for such divagation as they are predominantly compiled using hate ...
- research-articleAugust 2023
RecruitPro: A Pretrained Language Model with Skill-Aware Prompt Learning for Intelligent Recruitment
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 3991–4002https://doi.org/10.1145/3580305.3599894Recent years have witnessed the rapid development of machine-learning-based intelligent recruitment services. Along this line, a large number of emerging models have been proposed, achieving remarkable performance in various tasks, such as person-job fit,...
- research-articleAugust 2023
Neural Insights for Digital Marketing Content Design
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 4320–4332https://doi.org/10.1145/3580305.3599875In digital marketing, experimenting with new website content is one of the key levers to improve customer engagement. However, creating successful marketing content is a manual and time-consuming process that lacks clear guiding principles. This paper ...
- research-articleAugust 2023
MUSER: A MUlti-Step Evidence Retrieval Enhancement Framework for Fake News Detection
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 4461–4472https://doi.org/10.1145/3580305.3599873The ease of spreading false information online enables individuals with malicious intent to manipulate public opinion and destabilize social stability. Recently, fake news detection based on evidence retrieval has gained popularity in an effort to ...
- research-articleAugust 2023
Macular: A Multi-Task Adversarial Framework for Cross-Lingual Natural Language Understanding
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 5061–5070https://doi.org/10.1145/3580305.3599864Cross-lingual natural language understanding~(NLU) aims to train NLU models on a source language and apply the models to NLU tasks in target languages, and is a fundamental task for many cross-language applications. Most of the existing cross-lingual ...
- research-articleAugust 2023
IGB: Addressing The Gaps In Labeling, Features, Heterogeneity, and Size of Public Graph Datasets for Deep Learning Research
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 4284–4295https://doi.org/10.1145/3580305.3599843Graph neural networks (GNNs) have shown high potential for a variety of real-world, challenging applications, but one of the major obstacles in GNN research is the lack of large-scale flexible datasets. Most existing public datasets for GNNs are ...
- research-articleAugust 2023
GLM-Dialog: Noise-tolerant Pre-training for Knowledge-grounded Dialogue Generation
- Jing Zhang,
- Xiaokang Zhang,
- Daniel Zhang-Li,
- Jifan Yu,
- Zijun Yao,
- Zeyao Ma,
- Yiqi Xu,
- Haohua Wang,
- Xiaohan Zhang,
- Nianyi Lin,
- Sunrui Lu,
- Juanzi Li,
- Jie Tang
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 5564–5575https://doi.org/10.1145/3580305.3599832We present GLM-Dialog, a large-scale language model (LLM) with 10B parameters capable of knowledge-grounded conversation in Chinese using a search engine to access the Internet knowledge. GLM-Dialog offers a series of applicable techniques for exploiting ...
- research-articleAugust 2023
Fusing Multimodal Signals on Hyper-complex Space for Extreme Abstractive Text Summarization (TL;DR) of Scientific Contents
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 3724–3736https://doi.org/10.1145/3580305.3599830The realm of scientific text summarization has experienced remarkable progress due to the availability of annotated brief summaries and ample data. However, the utilization of multiple input modalities, such as videos and audio, has yet to be thoroughly ...
- research-articleAugust 2023
From Human Days to Machine Seconds: Automatically Answering and Generating Machine Learning Final Exams
- Iddo Drori,
- Sarah J. Zhang,
- Reece Shuttleworth,
- Sarah Zhang,
- Keith Tyser,
- Zad Chin,
- Pedro Lantigua,
- Saisamrit Surbehera,
- Gregory Hunter,
- Derek Austin,
- Leonard Tang,
- Yann Hicke,
- Sage Simhon,
- Sathwik Karnik,
- Darnell Granberry,
- Madeleine Udell
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 3947–3955https://doi.org/10.1145/3580305.3599827A final exam in machine learning at a top institution such as MIT, Harvard, or Cornell typically takes faculty days to write, and students hours to solve. We demonstrate that large language models pass machine learning finals at a human level on finals ...
- research-articleAugust 2023
CADENCE: Offline Category Constrained and Diverse Query Generation for E-commerce Autosuggest
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 3703–3712https://doi.org/10.1145/3580305.3599787Query AutoComplete (QAC) or AutoSuggest is the first place of user interaction with an e-commerce search engine. It is critical for the QAC system to suggest relevant and well-formed queries for multiple possible user intents. Suggesting only the ...
- abstractAugust 2023
Fast Text Generation with Text-Editing Models
- Eric Malmi,
- Yue Dong,
- Jonathan Mallinson,
- Aleksandr Chuklin,
- Jakub Adamek,
- Daniil Mirylenka,
- Felix Stahlberg,
- Sebastian Krause,
- Shankar Kumar,
- Aliaksei Severyn
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 5815–5816https://doi.org/10.1145/3580305.3599579Text-editing models have recently become a prominent alternative to seq2seq models for monolingual text-generation tasks such as grammatical error correction, simplification, and style transfer. These tasks share a common trait -- they exhibit a large ...
- abstractAugust 2023
Towards Next-Generation Intelligent Assistants Leveraging LLM Techniques
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 5792–5793https://doi.org/10.1145/3580305.3599572Virtual Intelligent Assistants take user requests in the voice form, perform actions such as setting an alarm, turning on a light, and answering a question, and provide answers or confirmations in the voice form or through other channels such as a ...
- abstractAugust 2023
Pretrained Language Representations for Text Understanding: A Weakly-Supervised Perspective
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 5817–5818https://doi.org/10.1145/3580305.3599569Language representations pretrained on general-domain corpora and adapted to downstream task data have achieved enormous success in building natural language understanding (NLU) systems. While the standard supervised fine-tuning of pretrained language ...
- abstractAugust 2023
Precision Health in the Age of Large Language Models
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 5825–5826https://doi.org/10.1145/3580305.3599568Medicine today is imprecise. Among the top 20 drugs in the U.S., up to 80% of patients are non-responders. The goal of precision health is to provide the right intervention for the right people at the right time. The key to realize this dream is to ...