Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Jialu Liu


2024

pdf bib
Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting
Zhen Qin | Rolf Jagerman | Kai Hui | Honglei Zhuang | Junru Wu | Le Yan | Jiaming Shen | Tianqi Liu | Jialu Liu | Donald Metzler | Xuanhui Wang | Michael Bendersky
Findings of the Association for Computational Linguistics: NAACL 2024

Ranking documents using Large Language Models (LLMs) by directly feeding the query and candidate documents into the prompt is an interesting and practical problem. However, researchers have found it difficult to outperform fine-tuned baseline rankers on benchmark datasets.We analyze pointwise and listwise ranking prompts used by existing methods and argue that off-the-shelf LLMs do not fully understand these challenging ranking formulations. In this paper, we propose to significantly reduce the burden on LLMs by using a new technique called Pairwise Ranking Prompting (PRP).Our results are the first in the literature to achieve state-of-the-art ranking performance on standard benchmarks using moderate-sized open-sourced LLMs. On TREC-DL 2019&2020, PRP based on the Flan-UL2 model with 20B parameters performs favorably with the previous best approach in the literature, which is based on the blackbox commercial GPT-4 that has 50x (estimated) model size, while outperforming other LLM-based solutions, such as InstructGPT which has 175B parameters, by over 10% for all ranking metrics. By using the same prompt template on seven BEIR tasks, PRP outperforms supervised baselines and outperforms the blackbox commercial ChatGPT solution by 4.2% and pointwise LLM-based solutions by more than 10% on average NDCG@10.Furthermore, we propose several variants of PRP to improve efficiency and show that it is possible to achieve competitive results even with linear complexity.

pdf bib
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs
Rongzhi Zhang | Jiaming Shen | Tianqi Liu | Haorui Wang | Zhen Qin | Feng Han | Jialu Liu | Simon Baumgartner | Michael Bendersky | Chao Zhang
Findings of the Association for Computational Linguistics ACL 2024

Large Language Models (LLMs) have exhibited impressive capabilities in various tasks, yet their vast parameter sizes restrict their applicability in resource-constrained settings. Knowledge distillation (KD) offers a viable solution by transferring expertise from large teacher models to compact student models. However, traditional KD techniques face specific challenges when applied to LLMs, including restricted access to LLM outputs, significant teacher-student capacity gaps, and the inherited mis-calibration issue. In this work, we present PLaD, a novel preference-based LLM distillation framework. PLaD exploits the teacher-student capacity discrepancy to generate pseudo-preference pairs where teacher outputs are preferred over student outputs. Then, PLaD leverages a ranking loss to re-calibrate the student’s estimation of sequence likelihood, which steers the student’s focus towards understanding the relative quality of outputs instead of simply imitating the teacher. PLaD bypasses the need for access to teacher LLM’s internal states, tackles the student’s expressivity limitations, and mitigates the student mis-calibration issue. Through extensive experiments on two sequence generation tasks and with various LLMs, we demonstrate the effectiveness of our proposed PLaD framework.

pdf bib
Predicting Text Preference Via Structured Comparative Reasoning
Jing Nathan Yan | Tianqi Liu | Justin Chiu | Jiaming Shen | Zhen Qin | Yue Yu | Charumathi Lakshmanan | Yair Kurzion | Alexander Rush | Jialu Liu | Michael Bendersky
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Comparative reasoning plays a crucial role in predicting text preferences; however, large language models (LLMs) often demonstrate inconsistencies in their reasoning, leading to incorrect preference predictions. While approaches like Chain-of-Thought improve accuracy in many settings, they struggle to consistently distinguish the similarities and differences of complex texts. We introduce SC2, a model that prompts LLMs to predict text preferences by generating structured intermediate comparisons. SC2 begins by proposing aspects for comparison, followed by generating textual comparisons under each aspect. We select consistent comparisons with a pairwise comparator that ensures each comparison of a given aspect clearly distinguishes differences between texts, significantly reducing hallucination and improving consistency. Our empirical studies across various NLP tasks, including summarization, retrieval, and automatic rating, demonstrate that SC2‘s enhanced performance in text preference prediction is significant.

pdf bib
Explanation-aware Soft Ensemble Empowers Large Language Model In-context Learning
Yue Yu | Jiaming Shen | Tianqi Liu | Zhen Qin | Jing Nathan Yan | Jialu Liu | Chao Zhang | Michael Bendersky
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large language models (LLMs) have shown remarkable capabilities in various natural language understanding tasks with a few demonstration examples via in-context learning. Common strategies to boost such “in-context” learning ability are to ensemble multiple model decoded results and require the model to generate an explanation along with the prediction. However, these models often treat different class predictions equally and neglect the potential discrepancy between the explanations and predictions. To fully unleash the power of explanations, we propose EASE, an Explanation-Aware Soft Ensemble framework to empower in-context learning with LLMs. We design two techniques, explanation-guided ensemble, and soft probability aggregation, to mitigate the effect of unreliable explanations and improve the consistency between explanations and final predictions. Experiments on seven natural language understanding tasks and four varying-size LLMs demonstrate the effectiveness of our proposed framework.

2021

pdf bib
Training ELECTRA Augmented with Multi-word Selection
Jiaming Shen | Jialu Liu | Tianqi Liu | Cong Yu | Jiawei Han
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021