-
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages
Authors:
Wenxuan Zhang,
Hou Pong Chan,
Yiran Zhao,
Mahani Aljunied,
Jianyu Wang,
Chaoqun Liu,
Yue Deng,
Zhiqiang Hu,
Weiwen Xu,
Yew Ken Chia,
Xin Li,
Lidong Bing
Abstract:
Large Language Models (LLMs) have shown remarkable abilities across various tasks, yet their development has predominantly centered on high-resource languages like English and Chinese, leaving low-resource languages underserved. To address this disparity, we present SeaLLMs 3, the latest iteration of the SeaLLMs model family, tailored for Southeast Asian languages. This region, characterized by it…
▽ More
Large Language Models (LLMs) have shown remarkable abilities across various tasks, yet their development has predominantly centered on high-resource languages like English and Chinese, leaving low-resource languages underserved. To address this disparity, we present SeaLLMs 3, the latest iteration of the SeaLLMs model family, tailored for Southeast Asian languages. This region, characterized by its rich linguistic diversity, has lacked adequate language technology support. SeaLLMs 3 aims to bridge this gap by covering a comprehensive range of languages spoken in this region, including English, Chinese, Indonesian, Vietnamese, Thai, Tagalog, Malay, Burmese, Khmer, Lao, Tamil, and Javanese. Leveraging efficient language enhancement techniques and a specially constructed instruction tuning dataset, SeaLLMs 3 significantly reduces training costs while maintaining high performance and versatility. Our model excels in tasks such as world knowledge, mathematical reasoning, translation, and instruction following, achieving state-of-the-art performance among similarly sized models. Additionally, we prioritized safety and reliability by addressing both general and culture-specific considerations and incorporated mechanisms to reduce hallucinations. This work underscores the importance of inclusive AI, showing that advanced LLM capabilities can benefit underserved linguistic and cultural communities.
△ Less
Submitted 28 July, 2024;
originally announced July 2024.
-
Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions
Authors:
Ruochen Zhao,
Wenxuan Zhang,
Yew Ken Chia,
Deli Zhao,
Lidong Bing
Abstract:
As LLMs evolve on a daily basis, there is an urgent need for a trustworthy evaluation method that can provide robust evaluation results in a timely fashion. Currently, as static benchmarks are prone to contamination concerns, users tend to trust human voting platforms, such as Chatbot Arena. However, human annotations require extensive manual efforts. To provide an automatic, robust, and trustwort…
▽ More
As LLMs evolve on a daily basis, there is an urgent need for a trustworthy evaluation method that can provide robust evaluation results in a timely fashion. Currently, as static benchmarks are prone to contamination concerns, users tend to trust human voting platforms, such as Chatbot Arena. However, human annotations require extensive manual efforts. To provide an automatic, robust, and trustworthy evaluation framework, we innovatively propose the Auto-Arena of LLMs, which automates the entire evaluation process with LLM agents. Firstly, an examiner LLM devises queries. Then, a pair of candidate LLMs engage in a multi-round peer-battle around the query, during which the LLM's true performance gaps become visible. Finally, a committee of LLM judges collectively discuss and determine the winner, which alleviates bias and promotes fairness. In our extensive experiment on the 17 newest LLMs, Auto-Arena shows the highest correlation with human preferences, providing a promising alternative to human evaluation platforms.
△ Less
Submitted 12 June, 2024; v1 submitted 30 May, 2024;
originally announced May 2024.
-
Brain Tumor Segmentation (BraTS) Challenge 2024: Meningioma Radiotherapy Planning Automated Segmentation
Authors:
Dominic LaBella,
Katherine Schumacher,
Michael Mix,
Kevin Leu,
Shan McBurney-Lin,
Pierre Nedelec,
Javier Villanueva-Meyer,
Jonathan Shapey,
Tom Vercauteren,
Kazumi Chia,
Omar Al-Salihi,
Justin Leu,
Lia Halasz,
Yury Velichko,
Chunhao Wang,
John Kirkpatrick,
Scott Floyd,
Zachary J. Reitman,
Trey Mullikin,
Ulas Bagci,
Sean Sachdev,
Jona A. Hattangadi-Gluth,
Tyler Seibert,
Nikdokht Farid,
Connor Puett
, et al. (45 additional authors not shown)
Abstract:
The 2024 Brain Tumor Segmentation Meningioma Radiotherapy (BraTS-MEN-RT) challenge aims to advance automated segmentation algorithms using the largest known multi-institutional dataset of radiotherapy planning brain MRIs with expert-annotated target labels for patients with intact or postoperative meningioma that underwent either conventional external beam radiotherapy or stereotactic radiosurgery…
▽ More
The 2024 Brain Tumor Segmentation Meningioma Radiotherapy (BraTS-MEN-RT) challenge aims to advance automated segmentation algorithms using the largest known multi-institutional dataset of radiotherapy planning brain MRIs with expert-annotated target labels for patients with intact or postoperative meningioma that underwent either conventional external beam radiotherapy or stereotactic radiosurgery. Each case includes a defaced 3D post-contrast T1-weighted radiotherapy planning MRI in its native acquisition space, accompanied by a single-label "target volume" representing the gross tumor volume (GTV) and any at-risk postoperative site. Target volume annotations adhere to established radiotherapy planning protocols, ensuring consistency across cases and institutions. For preoperative meningiomas, the target volume encompasses the entire GTV and associated nodular dural tail, while for postoperative cases, it includes at-risk resection cavity margins as determined by the treating institution. Case annotations were reviewed and approved by expert neuroradiologists and radiation oncologists. Participating teams will develop, containerize, and evaluate automated segmentation models using this comprehensive dataset. Model performance will be assessed using an adapted lesion-wise Dice Similarity Coefficient and the 95% Hausdorff distance. The top-performing teams will be recognized at the Medical Image Computing and Computer Assisted Intervention Conference in October 2024. BraTS-MEN-RT is expected to significantly advance automated radiotherapy planning by enabling precise tumor segmentation and facilitating tailored treatment, ultimately improving patient outcomes.
△ Less
Submitted 15 August, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns
Authors:
Yew Ken Chia,
Vernon Toh Yan Han,
Deepanway Ghosal,
Lidong Bing,
Soujanya Poria
Abstract:
Large multimodal models extend the impressive capabilities of large language models by integrating multimodal understanding abilities. However, it is not clear how they can emulate the general intelligence and reasoning ability of humans. As recognizing patterns and abstracting concepts are key to general intelligence, we introduce PuzzleVQA, a collection of 2000 puzzle instances based on abstract…
▽ More
Large multimodal models extend the impressive capabilities of large language models by integrating multimodal understanding abilities. However, it is not clear how they can emulate the general intelligence and reasoning ability of humans. As recognizing patterns and abstracting concepts are key to general intelligence, we introduce PuzzleVQA, a collection of 2000 puzzle instances based on abstract patterns. With this dataset, we evaluate large multimodal models with abstract patterns based on fundamental concepts, including colors, numbers, sizes, and shapes. Through our experiments on state-of-the-art large multimodal models, we find that they are not able to generalize well to simple abstract patterns. Notably, GPT-4V achieves a score of 46.4% on single-concept puzzles, which shows that state-of-the-art models struggle on our dataset. To diagnose the reasoning challenges in large multimodal models, we progressively guide the models with our ground truth reasoning explanations for visual perception, inductive reasoning, and deductive reasoning. Our systematic analysis finds that the main bottlenecks of GPT-4V are weaker visual perception and inductive reasoning abilities. Through this work, we hope to shed light on the limitations of large multimodal models and how they can better emulate human cognitive processes in the future. Our data and code are available at https://puzzlevqa.github.io
△ Less
Submitted 17 August, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
SeaLLMs -- Large Language Models for Southeast Asia
Authors:
Xuan-Phi Nguyen,
Wenxuan Zhang,
Xin Li,
Mahani Aljunied,
Zhiqiang Hu,
Chenhui Shen,
Yew Ken Chia,
Xingxuan Li,
Jianyu Wang,
Qingyu Tan,
Liying Cheng,
Guanzheng Chen,
Yue Deng,
Sen Yang,
Chaoqun Liu,
Hang Zhang,
Lidong Bing
Abstract:
Despite the remarkable achievements of large language models (LLMs) in various tasks, there remains a linguistic bias that favors high-resource languages, such as English, often at the expense of low-resource and regional languages. To address this imbalance, we introduce SeaLLMs, an innovative series of language models that specifically focuses on Southeast Asian (SEA) languages. SeaLLMs are buil…
▽ More
Despite the remarkable achievements of large language models (LLMs) in various tasks, there remains a linguistic bias that favors high-resource languages, such as English, often at the expense of low-resource and regional languages. To address this imbalance, we introduce SeaLLMs, an innovative series of language models that specifically focuses on Southeast Asian (SEA) languages. SeaLLMs are built upon the Llama-2 model and further advanced through continued pre-training with an extended vocabulary, specialized instruction and alignment tuning to better capture the intricacies of regional languages. This allows them to respect and reflect local cultural norms, customs, stylistic preferences, and legal considerations. Our comprehensive evaluation demonstrates that SeaLLM-13b models exhibit superior performance across a wide spectrum of linguistic tasks and assistant-style instruction-following capabilities relative to comparable open-source models. Moreover, they outperform ChatGPT-3.5 in non-Latin languages, such as Thai, Khmer, Lao, and Burmese, by large margins while remaining lightweight and cost-effective to operate.
△ Less
Submitted 1 July, 2024; v1 submitted 1 December, 2023;
originally announced December 2023.
-
Contrastive Chain-of-Thought Prompting
Authors:
Yew Ken Chia,
Guizhen Chen,
Luu Anh Tuan,
Soujanya Poria,
Lidong Bing
Abstract:
Despite the success of chain of thought in enhancing language model reasoning, the underlying process remains less well understood. Although logically sound reasoning appears inherently crucial for chain of thought, prior studies surprisingly reveal minimal impact when using invalid demonstrations instead. Furthermore, the conventional chain of thought does not inform language models on what mista…
▽ More
Despite the success of chain of thought in enhancing language model reasoning, the underlying process remains less well understood. Although logically sound reasoning appears inherently crucial for chain of thought, prior studies surprisingly reveal minimal impact when using invalid demonstrations instead. Furthermore, the conventional chain of thought does not inform language models on what mistakes to avoid, which potentially leads to more errors. Hence, inspired by how humans can learn from both positive and negative examples, we propose contrastive chain of thought to enhance language model reasoning. Compared to the conventional chain of thought, our approach provides both valid and invalid reasoning demonstrations, to guide the model to reason step-by-step while reducing reasoning mistakes. To improve generalization, we introduce an automatic method to construct contrastive demonstrations. Our experiments on reasoning benchmarks demonstrate that contrastive chain of thought can serve as a general enhancement of chain-of-thought prompting.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning
Authors:
Deepanway Ghosal,
Yew Ken Chia,
Navonil Majumder,
Soujanya Poria
Abstract:
Recently, the release of INSTRUCTEVAL has provided valuable insights into the performance of large language models (LLMs) that utilize encoder-decoder or decoder-only architecture. Interestingly, despite being introduced four years ago, T5-based LLMs, such as FLAN-T5, continue to outperform the latest decoder-based LLMs, such as LLAMA and VICUNA, on tasks that require general problem-solving skill…
▽ More
Recently, the release of INSTRUCTEVAL has provided valuable insights into the performance of large language models (LLMs) that utilize encoder-decoder or decoder-only architecture. Interestingly, despite being introduced four years ago, T5-based LLMs, such as FLAN-T5, continue to outperform the latest decoder-based LLMs, such as LLAMA and VICUNA, on tasks that require general problem-solving skills. This performance discrepancy can be attributed to three key factors: (1) Pre-training data, (2) Backbone architecture, and (3) Instruction dataset. In this technical report, our main focus is on investigating the impact of the third factor by leveraging VICUNA, a large language model based on LLAMA, which has undergone fine-tuning on ChatGPT conversations. To achieve this objective, we fine-tuned VICUNA using a customized instruction dataset collection called FLANMINI. This collection includes a subset of the large-scale instruction dataset known as FLAN, as well as various code-related datasets and conversational datasets derived from ChatGPT/GPT-4. This dataset comprises a large number of tasks that demand problem-solving skills. Our experimental findings strongly indicate that the enhanced problem-solving abilities of our model, FLACUNA, are obtained through fine-tuning VICUNA on the FLAN dataset, leading to significant improvements across numerous benchmark datasets in INSTRUCTEVAL. FLACUNA is publicly available at https://huggingface.co/declare-lab/flacuna-13b-v1.0.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models
Authors:
Wenxuan Zhang,
Sharifah Mahani Aljunied,
Chang Gao,
Yew Ken Chia,
Lidong Bing
Abstract:
Despite the existence of various benchmarks for evaluating natural language processing models, we argue that human exams are a more suitable means of evaluating general intelligence for large language models (LLMs), as they inherently demand a much wider range of abilities such as language understanding, domain knowledge, and problem-solving skills. To this end, we introduce M3Exam, a novel benchm…
▽ More
Despite the existence of various benchmarks for evaluating natural language processing models, we argue that human exams are a more suitable means of evaluating general intelligence for large language models (LLMs), as they inherently demand a much wider range of abilities such as language understanding, domain knowledge, and problem-solving skills. To this end, we introduce M3Exam, a novel benchmark sourced from real and official human exam questions for evaluating LLMs in a multilingual, multimodal, and multilevel context. M3Exam exhibits three unique characteristics: (1) multilingualism, encompassing questions from multiple countries that require strong multilingual proficiency and cultural knowledge; (2) multimodality, accounting for the multimodal nature of many exam questions to test the model's multimodal understanding capability; and (3) multilevel structure, featuring exams from three critical educational periods to comprehensively assess a model's proficiency at different levels. In total, M3Exam contains 12,317 questions in 9 diverse languages with three educational levels, where about 23\% of the questions require processing images for successful solving. We assess the performance of top-performing LLMs on M3Exam and find that current models, including GPT-4, still struggle with multilingual text, particularly in low-resource and non-Latin script languages. Multimodal LLMs also perform poorly with complex multimodal questions. We believe that M3Exam can be a valuable resource for comprehensively evaluating LLMs by examining their multilingual and multimodal abilities and tracking their development. Data and evaluation code is available at \url{https://github.com/DAMO-NLP-SG/M3Exam}.
△ Less
Submitted 9 November, 2023; v1 submitted 8 June, 2023;
originally announced June 2023.
-
INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models
Authors:
Yew Ken Chia,
Pengfei Hong,
Lidong Bing,
Soujanya Poria
Abstract:
Instruction-tuned large language models have revolutionized natural language processing and have shown great potential in applications such as conversational agents. These models, such as GPT-4, can not only master language but also solve complex tasks in areas like mathematics, coding, medicine, and law. Despite their impressive capabilities, there is still a lack of comprehensive understanding r…
▽ More
Instruction-tuned large language models have revolutionized natural language processing and have shown great potential in applications such as conversational agents. These models, such as GPT-4, can not only master language but also solve complex tasks in areas like mathematics, coding, medicine, and law. Despite their impressive capabilities, there is still a lack of comprehensive understanding regarding their full potential, primarily due to the black-box nature of many models and the absence of holistic evaluation studies. To address these challenges, we present INSTRUCTEVAL, a more comprehensive evaluation suite designed specifically for instruction-tuned large language models. Unlike previous works, our evaluation involves a rigorous assessment of models based on problem-solving, writing ability, and alignment to human values. We take a holistic approach to analyze various factors affecting model performance, including the pretraining foundation, instruction-tuning data, and training methods. Our findings reveal that the quality of instruction data is the most crucial factor in scaling model performance. While open-source models demonstrate impressive writing abilities, there is substantial room for improvement in problem-solving and alignment. We are encouraged by the rapid development of models by the open-source community, but we also highlight the need for rigorous evaluation to support claims made about these models. Through INSTRUCTEVAL, we aim to foster a deeper understanding of instruction-tuned models and advancements in their capabilities. INSTRUCTEVAL is publicly available at https://github.com/declare-lab/instruct-eval.
△ Less
Submitted 15 June, 2023; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction
Authors:
Yew Ken Chia,
Hui Chen,
Wei Han,
Guizhen Chen,
Sharifah Mahani Aljunied,
Soujanya Poria,
Lidong Bing
Abstract:
Aspect Sentiment Triplet Extraction (ASTE) is a subtask of Aspect-Based Sentiment Analysis (ABSA) that considers each opinion term, their expressed sentiment, and the corresponding aspect targets. However, existing methods are limited to the in-domain setting with two domains. Hence, we propose a domain-expanded benchmark to address the in-domain, out-of-domain and cross-domain settings. We suppor…
▽ More
Aspect Sentiment Triplet Extraction (ASTE) is a subtask of Aspect-Based Sentiment Analysis (ABSA) that considers each opinion term, their expressed sentiment, and the corresponding aspect targets. However, existing methods are limited to the in-domain setting with two domains. Hence, we propose a domain-expanded benchmark to address the in-domain, out-of-domain and cross-domain settings. We support the new benchmark by annotating more than 4000 data samples for two new domains based on hotel and cosmetics reviews. Our analysis of five existing methods shows that while there is a significant gap between in-domain and out-of-domain performance, generative methods have a strong potential for domain generalization. Our datasets, code implementation and models are available at https://github.com/DAMO-NLP-SG/domain-expanded-aste .
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources
Authors:
Xingxuan Li,
Ruochen Zhao,
Yew Ken Chia,
Bosheng Ding,
Shafiq Joty,
Soujanya Poria,
Lidong Bing
Abstract:
We present chain-of-knowledge (CoK), a novel framework that augments large language models (LLMs) by dynamically incorporating grounding information from heterogeneous sources. It results in more factual rationales and reduced hallucination in generation. Specifically, CoK consists of three stages: reasoning preparation, dynamic knowledge adapting, and answer consolidation. Given a knowledge-inten…
▽ More
We present chain-of-knowledge (CoK), a novel framework that augments large language models (LLMs) by dynamically incorporating grounding information from heterogeneous sources. It results in more factual rationales and reduced hallucination in generation. Specifically, CoK consists of three stages: reasoning preparation, dynamic knowledge adapting, and answer consolidation. Given a knowledge-intensive question, CoK first prepares several preliminary rationales and answers while identifying the relevant knowledge domains. If there is no majority consensus among the answers from samples, CoK corrects the rationales step by step by adapting knowledge from the identified domains. These corrected rationales can plausibly serve as a better foundation for the final answer consolidation. Unlike prior studies that primarily use unstructured data, CoK also leverages structured knowledge sources such as Wikidata and tables that provide more reliable factual information. To access both unstructured and structured knowledge sources in the dynamic knowledge adapting stage, we propose an adaptive query generator that allows the generation of queries for various types of query languages, including SPARQL, SQL, and natural sentences. Moreover, to minimize error propagation between rationales, CoK corrects the rationales progressively using preceding corrected rationales to generate and correct subsequent rationales. Extensive experiments show that CoK consistently improves the performance of LLMs on knowledge-intensive tasks across different domains.
△ Less
Submitted 21 February, 2024; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Can ChatGPT-like Generative Models Guarantee Factual Accuracy? On the Mistakes of New Generation Search Engines
Authors:
Ruochen Zhao,
Xingxuan Li,
Yew Ken Chia,
Bosheng Ding,
Lidong Bing
Abstract:
Although large conversational AI models such as OpenAI's ChatGPT have demonstrated great potential, we question whether such models can guarantee factual accuracy. Recently, technology companies such as Microsoft and Google have announced new services which aim to combine search engines with conversational AI. However, we have found numerous mistakes in the public demonstrations that suggest we sh…
▽ More
Although large conversational AI models such as OpenAI's ChatGPT have demonstrated great potential, we question whether such models can guarantee factual accuracy. Recently, technology companies such as Microsoft and Google have announced new services which aim to combine search engines with conversational AI. However, we have found numerous mistakes in the public demonstrations that suggest we should not easily trust the factual claims of the AI models. Rather than criticizing specific models or companies, we hope to call on researchers and developers to improve AI models' transparency and factual correctness.
△ Less
Submitted 2 March, 2023;
originally announced April 2023.
-
Is GPT-3 a Good Data Annotator?
Authors:
Bosheng Ding,
Chengwei Qin,
Linlin Liu,
Yew Ken Chia,
Shafiq Joty,
Boyang Li,
Lidong Bing
Abstract:
Data annotation is the process of labeling data that could be used to train machine learning models. Having high-quality annotation is crucial, as it allows the model to learn the relationship between the input data and the desired output. GPT-3, a large-scale language model developed by OpenAI, has demonstrated impressive zero- and few-shot performance on a wide range of NLP tasks. It is therefor…
▽ More
Data annotation is the process of labeling data that could be used to train machine learning models. Having high-quality annotation is crucial, as it allows the model to learn the relationship between the input data and the desired output. GPT-3, a large-scale language model developed by OpenAI, has demonstrated impressive zero- and few-shot performance on a wide range of NLP tasks. It is therefore natural to wonder whether it can be used to effectively annotate data for NLP tasks. In this paper, we evaluate the performance of GPT-3 as a data annotator by comparing it with traditional data annotation methods and analyzing its output on a range of tasks. Through this analysis, we aim to provide insight into the potential of GPT-3 as a general-purpose data annotator in NLP.
△ Less
Submitted 14 June, 2023; v1 submitted 20 December, 2022;
originally announced December 2022.
-
A Dataset for Hyper-Relational Extraction and a Cube-Filling Approach
Authors:
Yew Ken Chia,
Lidong Bing,
Sharifah Mahani Aljunied,
Luo Si,
Soujanya Poria
Abstract:
Relation extraction has the potential for large-scale knowledge graph construction, but current methods do not consider the qualifier attributes for each relation triplet, such as time, quantity or location. The qualifiers form hyper-relational facts which better capture the rich and complex knowledge graph structure. For example, the relation triplet (Leonard Parker, Educated At, Harvard Universi…
▽ More
Relation extraction has the potential for large-scale knowledge graph construction, but current methods do not consider the qualifier attributes for each relation triplet, such as time, quantity or location. The qualifiers form hyper-relational facts which better capture the rich and complex knowledge graph structure. For example, the relation triplet (Leonard Parker, Educated At, Harvard University) can be factually enriched by including the qualifier (End Time, 1967). Hence, we propose the task of hyper-relational extraction to extract more specific and complete facts from text. To support the task, we construct HyperRED, a large-scale and general-purpose dataset. Existing models cannot perform hyper-relational extraction as it requires a model to consider the interaction between three entities. Hence, we propose CubeRE, a cube-filling model inspired by table-filling approaches and explicitly considers the interaction between relation triplets and qualifiers. To improve model scalability and reduce negative class imbalance, we further propose a cube-pruning method. Our experiments show that CubeRE outperforms strong baselines and reveal possible directions for future research. Our code and data are available at github.com/declare-lab/HyperRED.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Neighbor-Based Optimized Logistic Regression Machine Learning Model For Electric Vehicle Occupancy Detection
Authors:
Sayan Shaw,
Keaton Chia,
Jan Kleissl
Abstract:
This paper presents an optimized logistic regression machine learning model that predicts the occupancy of an Electric Vehicle (EV) charging station given the occupancy of neighboring stations. The model was optimized for the time of day. Trained on data from 57 EV charging stations around the University of California San Diego campus, the model achieved an 88.43% average accuracy and 92.23% maxim…
▽ More
This paper presents an optimized logistic regression machine learning model that predicts the occupancy of an Electric Vehicle (EV) charging station given the occupancy of neighboring stations. The model was optimized for the time of day. Trained on data from 57 EV charging stations around the University of California San Diego campus, the model achieved an 88.43% average accuracy and 92.23% maximum accuracy in predicting occupancy, outperforming a persistence model benchmark.
△ Less
Submitted 27 April, 2022;
originally announced April 2022.
-
RelationPrompt: Leveraging Prompts to Generate Synthetic Data for Zero-Shot Relation Triplet Extraction
Authors:
Yew Ken Chia,
Lidong Bing,
Soujanya Poria,
Luo Si
Abstract:
Despite the importance of relation extraction in building and representing knowledge, less research is focused on generalizing to unseen relations types. We introduce the task setting of Zero-Shot Relation Triplet Extraction (ZeroRTE) to encourage further research in low-resource relation extraction methods. Given an input sentence, each extracted triplet consists of the head entity, relation labe…
▽ More
Despite the importance of relation extraction in building and representing knowledge, less research is focused on generalizing to unseen relations types. We introduce the task setting of Zero-Shot Relation Triplet Extraction (ZeroRTE) to encourage further research in low-resource relation extraction methods. Given an input sentence, each extracted triplet consists of the head entity, relation label, and tail entity where the relation label is not seen at the training stage. To solve ZeroRTE, we propose to synthesize relation examples by prompting language models to generate structured texts. Concretely, we unify language model prompts and structured text approaches to design a structured prompt template for generating synthetic relation samples when conditioning on relation label prompts (RelationPrompt). To overcome the limitation for extracting multiple relation triplets in a sentence, we design a novel Triplet Search Decoding method. Experiments on FewRel and Wiki-ZSL datasets show the efficacy of RelationPrompt for the ZeroRTE task and zero-shot relation classification. Our code and data are available at github.com/declare-lab/RelationPrompt.
△ Less
Submitted 17 March, 2022;
originally announced March 2022.
-
Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction
Authors:
Lu Xu,
Yew Ken Chia,
Lidong Bing
Abstract:
Aspect Sentiment Triplet Extraction (ASTE) is the most recent subtask of ABSA which outputs triplets of an aspect target, its associated sentiment, and the corresponding opinion term. Recent models perform the triplet extraction in an end-to-end manner but heavily rely on the interactions between each target word and opinion word. Thereby, they cannot perform well on targets and opinions which con…
▽ More
Aspect Sentiment Triplet Extraction (ASTE) is the most recent subtask of ABSA which outputs triplets of an aspect target, its associated sentiment, and the corresponding opinion term. Recent models perform the triplet extraction in an end-to-end manner but heavily rely on the interactions between each target word and opinion word. Thereby, they cannot perform well on targets and opinions which contain multiple words. Our proposed span-level approach explicitly considers the interaction between the whole spans of targets and opinions when predicting their sentiment relation. Thus, it can make predictions with the semantics of whole spans, ensuring better sentiment consistency. To ease the high computational cost caused by span enumeration, we propose a dual-channel span pruning strategy by incorporating supervision from the Aspect Term Extraction (ATE) and Opinion Term Extraction (OTE) tasks. This strategy not only improves computational efficiency but also distinguishes the opinion and target spans more properly. Our framework simultaneously achieves strong performance for the ASTE as well as ATE and OTE tasks. In particular, our analysis shows that our span-level approach achieves more significant improvements over the baselines on triplets with multi-word targets or opinions.
△ Less
Submitted 26 July, 2021;
originally announced July 2021.
-
Red Dragon AI at TextGraphs 2020 Shared Task: LIT : LSTM-Interleaved Transformer for Multi-Hop Explanation Ranking
Authors:
Yew Ken Chia,
Sam Witteveen,
Martin Andrews
Abstract:
Explainable question answering for science questions is a challenging task that requires multi-hop inference over a large set of fact sentences. To counter the limitations of methods that view each query-document pair in isolation, we propose the LSTM-Interleaved Transformer which incorporates cross-document interactions for improved multi-hop ranking. The LIT architecture can leverage prior ranki…
▽ More
Explainable question answering for science questions is a challenging task that requires multi-hop inference over a large set of fact sentences. To counter the limitations of methods that view each query-document pair in isolation, we propose the LSTM-Interleaved Transformer which incorporates cross-document interactions for improved multi-hop ranking. The LIT architecture can leverage prior ranking positions in the re-ranking setting. Our model is competitive on the current leaderboard for the TextGraphs 2020 shared task, achieving a test-set MAP of 0.5607, and would have gained third place had we submitted before the competition deadline. Our code implementation is made available at https://github.com/mdda/worldtree_corpus/tree/textgraphs_2020
△ Less
Submitted 28 December, 2020;
originally announced December 2020.
-
Red Dragon AI at TextGraphs 2019 Shared Task: Language Model Assisted Explanation Generation
Authors:
Yew Ken Chia,
Sam Witteveen,
Martin Andrews
Abstract:
The TextGraphs-13 Shared Task on Explanation Regeneration asked participants to develop methods to reconstruct gold explanations for elementary science questions. Red Dragon AI's entries used the language of the questions and explanation text directly, rather than a constructing a separate graph-like representation. Our leaderboard submission placed us 3rd in the competition, but we present here t…
▽ More
The TextGraphs-13 Shared Task on Explanation Regeneration asked participants to develop methods to reconstruct gold explanations for elementary science questions. Red Dragon AI's entries used the language of the questions and explanation text directly, rather than a constructing a separate graph-like representation. Our leaderboard submission placed us 3rd in the competition, but we present here three methods of increasing sophistication, each of which scored successively higher on the test set after the competition close.
△ Less
Submitted 20 November, 2019;
originally announced November 2019.
-
Scene Graph Parsing by Attention Graph
Authors:
Martin Andrews,
Yew Ken Chia,
Sam Witteveen
Abstract:
Scene graph representations, which form a graph of visual object nodes together with their attributes and relations, have proved useful across a variety of vision and language applications. Recent work in the area has used Natural Language Processing dependency tree methods to automatically build scene graphs.
In this work, we present an 'Attention Graph' mechanism that can be trained end-to-end…
▽ More
Scene graph representations, which form a graph of visual object nodes together with their attributes and relations, have proved useful across a variety of vision and language applications. Recent work in the area has used Natural Language Processing dependency tree methods to automatically build scene graphs.
In this work, we present an 'Attention Graph' mechanism that can be trained end-to-end, and produces a scene graph structure that can be lifted directly from the top layer of a standard Transformer model.
The scene graphs generated by our model achieve an F-score similarity of 52.21% to ground-truth graphs on the evaluation set using the SPICE metric, surpassing the best previous approaches by 2.5%.
△ Less
Submitted 13 September, 2019;
originally announced September 2019.
-
Transformer to CNN: Label-scarce distillation for efficient text classification
Authors:
Yew Ken Chia,
Sam Witteveen,
Martin Andrews
Abstract:
Significant advances have been made in Natural Language Processing (NLP) modelling since the beginning of 2018. The new approaches allow for accurate results, even when there is little labelled data, because these NLP models can benefit from training on both task-agnostic and task-specific unlabelled data. However, these advantages come with significant size and computational costs. This workshop…
▽ More
Significant advances have been made in Natural Language Processing (NLP) modelling since the beginning of 2018. The new approaches allow for accurate results, even when there is little labelled data, because these NLP models can benefit from training on both task-agnostic and task-specific unlabelled data. However, these advantages come with significant size and computational costs. This workshop paper outlines how our proposed convolutional student architecture, having been trained by a distillation process from a large-scale model, can achieve 300x inference speedup and 39x reduction in parameter count. In some cases, the student model performance surpasses its teacher on the studied tasks.
△ Less
Submitted 8 September, 2019;
originally announced September 2019.
-
Energy-Efficient, Large-scale Distributed-Antenna System (L-DAS) for Multiple Users
Authors:
Jingon Joung,
Yeow Khiang Chia,
Sumei Sun
Abstract:
Large-scale distributed-antenna system (L-DAS) with very large number of distributed antennas, possibly up to a few hundred antennas, is considered. A few major issues of the L-DAS, such as high latency, energy consumption, computational complexity, and large feedback (signaling) overhead, are identified. The potential capability of the L-DAS is illuminated in terms of an energy efficiency (EE) th…
▽ More
Large-scale distributed-antenna system (L-DAS) with very large number of distributed antennas, possibly up to a few hundred antennas, is considered. A few major issues of the L-DAS, such as high latency, energy consumption, computational complexity, and large feedback (signaling) overhead, are identified. The potential capability of the L-DAS is illuminated in terms of an energy efficiency (EE) throughout the paper. We firstly and generally model the power consumption of an L-DAS, and formulate an EE maximization problem. To tackle two crucial issues, namely the huge computational complexity and large amount of feedback (signaling) information, we propose a channel-gain-based antenna selection (AS) method and an interference-based user clustering (UC) method. The original problem is then split into multiple subproblems by a cluster, and each cluster's precoding and power control are managed in parallel for high EE. Simulation results reveal that i) using all antennas for zero-forcing multiuser multiple-input multiple-output (MU-MIMO) is energy inefficient if there is nonnegligible overhead power consumption on MU-MIMO processing, and ii) increasing the number of antennas does not necessarily result in a high EE. Furthermore, the results validate and underpin the EE merit of the proposed L-DAS complied with the AS, UC, precoding, and power control by comparing with non-clustering L-DAS and colocated antenna systems.
△ Less
Submitted 20 January, 2014; v1 submitted 6 December, 2013;
originally announced December 2013.
-
Cascade, Triangular and Two Way Source Coding with degraded side information at the second user
Authors:
Yeow Khiang Chia,
Haim Permuter,
Tsachy Weissman
Abstract:
We consider the Cascade and Triangular rate-distortion problems where the same side information is available at the source node and User 1, and the side information available at User 2 is a degraded version of the side information at the source node and User 1. We characterize the rate-distortion region for these problems. For the Cascade setup, we showed that, at User 1, decoding and re-binning t…
▽ More
We consider the Cascade and Triangular rate-distortion problems where the same side information is available at the source node and User 1, and the side information available at User 2 is a degraded version of the side information at the source node and User 1. We characterize the rate-distortion region for these problems. For the Cascade setup, we showed that, at User 1, decoding and re-binning the codeword sent by the source node for User 2 is optimum. We then extend our results to the Two way Cascade and Triangular setting, where the source node is interested in lossy reconstruction of the side information at User 2 via a rate limited link from User 2 to the source node. We characterize the rate distortion regions for these settings. Complete explicit characterizations for all settings are also given in the Quadratic Gaussian case. We conclude with two further extensions: A triangular source coding problem with a helper, and an extension of our Two Way Cascade setting in the Quadratic Gaussian case.
△ Less
Submitted 18 October, 2010;
originally announced October 2010.