-
A Disease-Specific Foundation Model Using Over 100K Fundus Images: Release and Validation for Abnormality and Multi-Disease Classification on Downstream Tasks
Authors:
Boa Jang,
Youngbin Ahn,
Eun Kyung Choe,
Chang Ki Yoon,
Hyuk Jin Choi,
Young-Gon Kim
Abstract:
Artificial intelligence applied to retinal images offers significant potential for recognizing signs and symptoms of retinal conditions and expediting the diagnosis of eye diseases and systemic disorders. However, developing generalized artificial intelligence models for medical data often requires a large number of labeled images representing various disease signs, and most models are typically t…
▽ More
Artificial intelligence applied to retinal images offers significant potential for recognizing signs and symptoms of retinal conditions and expediting the diagnosis of eye diseases and systemic disorders. However, developing generalized artificial intelligence models for medical data often requires a large number of labeled images representing various disease signs, and most models are typically task-specific, focusing on major retinal diseases. In this study, we developed a Fundus-Specific Pretrained Model (Image+Fundus), a supervised artificial intelligence model trained to detect abnormalities in fundus images. A total of 57,803 images were used to develop this pretrained model, which achieved superior performance across various downstream tasks, indicating that our proposed model outperforms other general methods. Our Image+Fundus model offers a generalized approach to improve model performance while reducing the number of labeled datasets required. Additionally, it provides more disease-specific insights into fundus images, with visualizations generated by our model. These disease-specific foundation models are invaluable in enhancing the performance and efficiency of deep learning models in the field of fundus imaging.
△ Less
Submitted 16 August, 2024;
originally announced August 2024.
-
Aligning Large Language Models with Diverse Political Viewpoints
Authors:
Dominik Stammbach,
Philine Widmer,
Eunjung Cho,
Caglar Gulcehre,
Elliott Ash
Abstract:
Large language models such as ChatGPT often exhibit striking political biases. If users query them about political information, they might take a normative stance and reinforce such biases. To overcome this, we align LLMs with diverse political viewpoints from 100,000 comments written by candidates running for national parliament in Switzerland. Such aligned models are able to generate more accura…
▽ More
Large language models such as ChatGPT often exhibit striking political biases. If users query them about political information, they might take a normative stance and reinforce such biases. To overcome this, we align LLMs with diverse political viewpoints from 100,000 comments written by candidates running for national parliament in Switzerland. Such aligned models are able to generate more accurate political viewpoints from Swiss parties compared to commercial models such as ChatGPT. We also propose a procedure to generate balanced overviews from multiple viewpoints using such models.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Attention-aware Post-training Quantization without Backpropagation
Authors:
Junhan Kim,
Ho-young Kim,
Eulrang Cho,
Chungman Lee,
Joonyoung Kim,
Yongkweon Jeon
Abstract:
Quantization is a promising solution for deploying large-scale language models (LLMs) on resource-constrained devices. Existing quantization approaches, however, rely on gradient-based optimization, regardless of it being post-training quantization (PTQ) or quantization-aware training (QAT), which becomes problematic for hyper-scale LLMs with billions of parameters. This overhead can be alleviated…
▽ More
Quantization is a promising solution for deploying large-scale language models (LLMs) on resource-constrained devices. Existing quantization approaches, however, rely on gradient-based optimization, regardless of it being post-training quantization (PTQ) or quantization-aware training (QAT), which becomes problematic for hyper-scale LLMs with billions of parameters. This overhead can be alleviated via recently proposed backpropagation-free PTQ methods; however, their performance is somewhat limited by their lack of consideration of inter-layer dependencies. In this paper, we thus propose a novel PTQ algorithm that considers inter-layer dependencies without relying on backpropagation. The fundamental concept involved is the development of attention-aware Hessian matrices, which facilitates the consideration of inter-layer dependencies within the attention module. Extensive experiments demonstrate that the proposed algorithm significantly outperforms conventional PTQ methods, particularly for low bit-widths.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
DialSim: A Real-Time Simulator for Evaluating Long-Term Dialogue Understanding of Conversational Agents
Authors:
Jiho Kim,
Woosog Chay,
Hyeonji Hwang,
Daeun Kyung,
Hyunseung Chung,
Eunbyeol Cho,
Yohan Jo,
Edward Choi
Abstract:
Recent advancements in Large Language Models (LLMs) have significantly enhanced the capabilities of conversational agents, making them applicable to various fields (e.g., education). Despite their progress, the evaluation of the agents often overlooks the complexities of real-world conversations, such as real-time interactions, multi-party dialogues, and extended contextual dependencies. To bridge…
▽ More
Recent advancements in Large Language Models (LLMs) have significantly enhanced the capabilities of conversational agents, making them applicable to various fields (e.g., education). Despite their progress, the evaluation of the agents often overlooks the complexities of real-world conversations, such as real-time interactions, multi-party dialogues, and extended contextual dependencies. To bridge this gap, we introduce DialSim, a real-time dialogue simulator. In this simulator, an agent is assigned the role of a character from popular TV shows, requiring it to respond to spontaneous questions using past dialogue information and to distinguish between known and unknown information. Key features of DialSim include evaluating the agent's ability to respond within a reasonable time limit, handling long-term multi-party dialogues, and managing adversarial settings (e.g., swap character names) to challenge the agent's reliance on pre-trained knowledge. We utilized this simulator to evaluate the latest conversational agents and analyze their limitations. Our experiments highlight both the strengths and weaknesses of these agents, providing valuable insights for future improvements in the field of conversational AI. DialSim is available at https://github.com/jiho283/Simulator.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Evaluating the Effectiveness and Robustness of Visual Similarity-based Phishing Detection Models
Authors:
Fujiao Ji,
Kiho Lee,
Hyungjoon Koo,
Wenhao You,
Euijin Choo,
Hyoungshick Kim,
Doowon Kim
Abstract:
Phishing attacks pose a significant threat to Internet users, with cybercriminals elaborately replicating the visual appearance of legitimate websites to deceive victims. Visual similarity-based detection systems have emerged as an effective countermeasure, but their effectiveness and robustness in real-world scenarios have been unexplored. In this paper, we comprehensively scrutinize and evaluate…
▽ More
Phishing attacks pose a significant threat to Internet users, with cybercriminals elaborately replicating the visual appearance of legitimate websites to deceive victims. Visual similarity-based detection systems have emerged as an effective countermeasure, but their effectiveness and robustness in real-world scenarios have been unexplored. In this paper, we comprehensively scrutinize and evaluate state-of-the-art visual similarity-based anti-phishing models using a large-scale dataset of 450K real-world phishing websites. Our analysis reveals that while certain models maintain high accuracy, others exhibit notably lower performance than results on curated datasets, highlighting the importance of real-world evaluation. In addition, we observe the real-world tactic of manipulating visual components that phishing attackers employ to circumvent the detection systems. To assess the resilience of existing models against adversarial attacks and robustness, we apply visible and perturbation-based manipulations to website logos, which adversaries typically target. We then evaluate the models' robustness in handling these adversarial samples. Our findings reveal vulnerabilities in several models, emphasizing the need for more robust visual similarity techniques capable of withstanding sophisticated evasion attempts. We provide actionable insights for enhancing the security of phishing defense systems, encouraging proactive actions. To the best of our knowledge, this work represents the first large-scale, systematic evaluation of visual similarity-based models for phishing detection in real-world settings, necessitating the development of more effective and robust defenses.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Three Disclaimers for Safe Disclosure: A Cardwriter for Reporting the Use of Generative AI in Writing Process
Authors:
Won Ik Cho,
Eunjung Cho,
Hyeonji Shin
Abstract:
Generative artificial intelligence (AI) and large language models (LLMs) are increasingly being used in the academic writing process. This is despite the current lack of unified framework for reporting the use of machine assistance. In this work, we propose "Cardwriter", an intuitive interface that produces a short report for authors to declare their use of generative AI in their writing process.…
▽ More
Generative artificial intelligence (AI) and large language models (LLMs) are increasingly being used in the academic writing process. This is despite the current lack of unified framework for reporting the use of machine assistance. In this work, we propose "Cardwriter", an intuitive interface that produces a short report for authors to declare their use of generative AI in their writing process. The demo is available online, at https://cardwriter.vercel.app
△ Less
Submitted 13 April, 2024;
originally announced April 2024.
-
Retrieval-Augmented Open-Vocabulary Object Detection
Authors:
Jooyeon Kim,
Eulrang Cho,
Sehyung Kim,
Hyunwoo J. Kim
Abstract:
Open-vocabulary object detection (OVD) has been studied with Vision-Language Models (VLMs) to detect novel objects beyond the pre-trained categories. Previous approaches improve the generalization ability to expand the knowledge of the detector, using 'positive' pseudo-labels with additional 'class' names, e.g., sock, iPod, and alligator. To extend the previous methods in two aspects, we propose R…
▽ More
Open-vocabulary object detection (OVD) has been studied with Vision-Language Models (VLMs) to detect novel objects beyond the pre-trained categories. Previous approaches improve the generalization ability to expand the knowledge of the detector, using 'positive' pseudo-labels with additional 'class' names, e.g., sock, iPod, and alligator. To extend the previous methods in two aspects, we propose Retrieval-Augmented Losses and visual Features (RALF). Our method retrieves related 'negative' classes and augments loss functions. Also, visual features are augmented with 'verbalized concepts' of classes, e.g., worn on the feet, handheld music player, and sharp teeth. Specifically, RALF consists of two modules: Retrieval Augmented Losses (RAL) and Retrieval-Augmented visual Features (RAF). RAL constitutes two losses reflecting the semantic similarity with negative vocabularies. In addition, RAF augments visual features with the verbalized concepts from a large language model (LLM). Our experiments demonstrate the effectiveness of RALF on COCO and LVIS benchmark datasets. We achieve improvement up to 3.4 box AP$_{50}^{\text{N}}$ on novel categories of the COCO dataset and 3.6 mask AP$_{\text{r}}$ gains on the LVIS dataset. Code is available at https://github.com/mlvlab/RALF .
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Simplifying MBA Expression Using E-Graphs
Authors:
Seoksu Lee,
Hyeongchang Jeon,
Eun-Sun Cho
Abstract:
Code obfuscation involves the addition of meaningless code or the complication of existing code in order to make a program difficult to reverse engineer. In recent years, MBA (Mixed Boolean Arithmetic) obfuscation has been applied to virus and malware code to impede expert analysis. Among the various obfuscation techniques, Mixed Boolean Arithmetic (MBA) obfuscation is considered the most challeng…
▽ More
Code obfuscation involves the addition of meaningless code or the complication of existing code in order to make a program difficult to reverse engineer. In recent years, MBA (Mixed Boolean Arithmetic) obfuscation has been applied to virus and malware code to impede expert analysis. Among the various obfuscation techniques, Mixed Boolean Arithmetic (MBA) obfuscation is considered the most challenging to decipher using existing code deobfuscation techniques. In this paper, we have attempted to simplify the MBA expression. We use an e-graph data structure to efficiently hold multiple expressions of the same semantics to systematically rewrite terms and find simpler expressions. The preliminary experimental result shows that our e-graph based MBA deobfuscation approach works faster with reasonable performance than other approaches do.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seongjin Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Augmented Reality based Simulated Data (ARSim) with multi-view consistency for AV perception networks
Authors:
Aqeel Anwar,
Tae Eun Choe,
Zian Wang,
Sanja Fidler,
Minwoo Park
Abstract:
Detecting a diverse range of objects under various driving scenarios is essential for the effectiveness of autonomous driving systems. However, the real-world data collected often lacks the necessary diversity presenting a long-tail distribution. Although synthetic data has been utilized to overcome this issue by generating virtual scenes, it faces hurdles such as a significant domain gap and the…
▽ More
Detecting a diverse range of objects under various driving scenarios is essential for the effectiveness of autonomous driving systems. However, the real-world data collected often lacks the necessary diversity presenting a long-tail distribution. Although synthetic data has been utilized to overcome this issue by generating virtual scenes, it faces hurdles such as a significant domain gap and the substantial efforts required from 3D artists to create realistic environments. To overcome these challenges, we present ARSim, a fully automated, comprehensive, modular framework designed to enhance real multi-view image data with 3D synthetic objects of interest. The proposed method integrates domain adaptation and randomization strategies to address covariate shift between real and simulated data by inferring essential domain attributes from real data and employing simulation-based randomization for other attributes. We construct a simplified virtual scene using real data and strategically place 3D synthetic assets within it. Illumination is achieved by estimating light distribution from multiple images capturing the surroundings of the vehicle. Camera parameters from real data are employed to render synthetic assets in each frame. The resulting augmented multi-view consistent dataset is used to train a multi-camera perception network for autonomous vehicles. Experimental results on various AV perception tasks demonstrate the superior performance of networks trained on the augmented dataset.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Semi-Supervised Graph Representation Learning with Human-centric Explanation for Predicting Fatty Liver Disease
Authors:
So Yeon Kim,
Sehee Wang,
Eun Kyung Choe
Abstract:
Addressing the challenge of limited labeled data in clinical settings, particularly in the prediction of fatty liver disease, this study explores the potential of graph representation learning within a semi-supervised learning framework. Leveraging graph neural networks (GNNs), our approach constructs a subject similarity graph to identify risk patterns from health checkup data. The effectiveness…
▽ More
Addressing the challenge of limited labeled data in clinical settings, particularly in the prediction of fatty liver disease, this study explores the potential of graph representation learning within a semi-supervised learning framework. Leveraging graph neural networks (GNNs), our approach constructs a subject similarity graph to identify risk patterns from health checkup data. The effectiveness of various GNN approaches in this context is demonstrated, even with minimal labeled samples. Central to our methodology is the inclusion of human-centric explanations through explainable GNNs, providing personalized feature importance scores for enhanced interpretability and clinical relevance, thereby underscoring the potential of our approach in advancing healthcare practices with a keen focus on graph representation learning and human-centric explanation.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
X-Eval: Generalizable Multi-aspect Text Evaluation via Augmented Instruction Tuning with Auxiliary Evaluation Aspects
Authors:
Minqian Liu,
Ying Shen,
Zhiyang Xu,
Yixin Cao,
Eunah Cho,
Vaibhav Kumar,
Reza Ghanadan,
Lifu Huang
Abstract:
Natural Language Generation (NLG) typically involves evaluating the generated text in various aspects (e.g., consistency and naturalness) to obtain a comprehensive assessment. However, multi-aspect evaluation remains challenging as it may require the evaluator to generalize to any given evaluation aspect even if it's absent during training. In this paper, we introduce X-Eval, a two-stage instructi…
▽ More
Natural Language Generation (NLG) typically involves evaluating the generated text in various aspects (e.g., consistency and naturalness) to obtain a comprehensive assessment. However, multi-aspect evaluation remains challenging as it may require the evaluator to generalize to any given evaluation aspect even if it's absent during training. In this paper, we introduce X-Eval, a two-stage instruction tuning framework to evaluate the text in both seen and unseen aspects customized by end users. X-Eval consists of two learning stages: the vanilla instruction tuning stage that improves the model's ability to follow evaluation instructions, and an enhanced instruction tuning stage that exploits the connections between fine-grained evaluation aspects to better assess text quality. To support the training of X-Eval, we collect AspectInstruct, the first instruction tuning dataset tailored for multi-aspect NLG evaluation spanning 27 diverse evaluation aspects with 65 tasks. To enhance task diversity, we devise an augmentation strategy that converts human rating annotations into diverse forms of NLG evaluation tasks, including scoring, comparison, ranking, and Boolean question answering. Extensive experiments across three essential categories of NLG tasks: dialogue generation, summarization, and data-to-text coupled with 21 aspects in meta-evaluation, demonstrate that our X-Eval enables even a lightweight language model to achieve a comparable if not higher correlation with human judgments compared to the state-of-the-art NLG evaluators, such as GPT-4.
△ Less
Submitted 13 April, 2024; v1 submitted 15 November, 2023;
originally announced November 2023.
-
EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images
Authors:
Seongsu Bae,
Daeun Kyung,
Jaehee Ryu,
Eunbyeol Cho,
Gyubok Lee,
Sunjun Kweon,
Jungwoo Oh,
Lei Ji,
Eric I-Chao Chang,
Tackeun Kim,
Edward Choi
Abstract:
Electronic Health Records (EHRs), which contain patients' medical histories in various multi-modal formats, often overlook the potential for joint reasoning across imaging and table modalities underexplored in current EHR Question Answering (QA) systems. In this paper, we introduce EHRXQA, a novel multi-modal question answering dataset combining structured EHRs and chest X-ray images. To develop o…
▽ More
Electronic Health Records (EHRs), which contain patients' medical histories in various multi-modal formats, often overlook the potential for joint reasoning across imaging and table modalities underexplored in current EHR Question Answering (QA) systems. In this paper, we introduce EHRXQA, a novel multi-modal question answering dataset combining structured EHRs and chest X-ray images. To develop our dataset, we first construct two uni-modal resources: 1) The MIMIC-CXR-VQA dataset, our newly created medical visual question answering (VQA) benchmark, specifically designed to augment the imaging modality in EHR QA, and 2) EHRSQL (MIMIC-IV), a refashioned version of a previously established table-based EHR QA dataset. By integrating these two uni-modal resources, we successfully construct a multi-modal EHR QA dataset that necessitates both uni-modal and cross-modal reasoning. To address the unique challenges of multi-modal questions within EHRs, we propose a NeuralSQL-based strategy equipped with an external VQA API. This pioneering endeavor enhances engagement with multi-modal EHR sources and we believe that our dataset can catalyze advances in real-world medical scenarios such as clinical decision-making and research. EHRXQA is available at https://github.com/baeseongsu/ehrxqa.
△ Less
Submitted 25 December, 2023; v1 submitted 28 October, 2023;
originally announced October 2023.
-
Problem-Solving Guide: Predicting the Algorithm Tags and Difficulty for Competitive Programming Problems
Authors:
Juntae Kim,
Eunjung Cho,
Dongwoo Kim,
Dongbin Na
Abstract:
The recent program development industries have required problem-solving abilities for engineers, especially application developers. However, AI-based education systems to help solve computer algorithm problems have not yet attracted attention, while most big tech companies require the ability to solve algorithm problems including Google, Meta, and Amazon. The most useful guide to solving algorithm…
▽ More
The recent program development industries have required problem-solving abilities for engineers, especially application developers. However, AI-based education systems to help solve computer algorithm problems have not yet attracted attention, while most big tech companies require the ability to solve algorithm problems including Google, Meta, and Amazon. The most useful guide to solving algorithm problems might be guessing the category (tag) of the facing problems. Therefore, our study addresses the task of predicting the algorithm tag as a useful tool for engineers and developers. Moreover, we also consider predicting the difficulty levels of algorithm problems, which can be used as useful guidance to calculate the required time to solve that problem. In this paper, we present a real-world algorithm problem multi-task dataset, AMT, by mainly collecting problem samples from the most famous and large competitive programming website Codeforces. To the best of our knowledge, our proposed dataset is the most large-scale dataset for predicting algorithm tags compared to previous studies. Moreover, our work is the first to address predicting the difficulty levels of algorithm problems. We present a deep learning-based novel method for simultaneously predicting algorithm tags and the difficulty levels of an algorithm problem given. All datasets and source codes are available at https://github.com/sronger/PSG_Predicting_Algorithm_Tags_and_Difficulty.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
PaperCard for Reporting Machine Assistance in Academic Writing
Authors:
Won Ik Cho,
Eunjung Cho,
Kyunghyun Cho
Abstract:
Academic writing process has benefited from various technological developments over the years including search engines, automatic translators, and editing tools that review grammar and spelling mistakes. They have enabled human writers to become more efficient in writing academic papers, for example by helping with finding relevant literature more effectively and polishing texts. While these devel…
▽ More
Academic writing process has benefited from various technological developments over the years including search engines, automatic translators, and editing tools that review grammar and spelling mistakes. They have enabled human writers to become more efficient in writing academic papers, for example by helping with finding relevant literature more effectively and polishing texts. While these developments have so far played a relatively assistive role, recent advances in large-scale language models (LLMs) have enabled LLMs to play a more major role in the writing process, such as coming up with research questions and generating key contents. This raises critical questions surrounding the concept of authorship in academia. ChatGPT, a question-answering system released by OpenAI in November 2022, has demonstrated a range of capabilities that could be utilised in producing academic papers. The academic community will have to address relevant pressing questions, including whether Artificial Intelligence (AI) should be merited authorship if it made significant contributions in the writing process, or whether its use should be restricted such that human authorship would not be undermined. In this paper, we aim to address such questions, and propose a framework we name "PaperCard", a documentation for human authors to transparently declare the use of AI in their writing process.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Distribution-Aware Prompt Tuning for Vision-Language Models
Authors:
Eulrang Cho,
Jooyeon Kim,
Hyunwoo J. Kim
Abstract:
Pre-trained vision-language models (VLMs) have shown impressive performance on various downstream tasks by utilizing knowledge learned from large data. In general, the performance of VLMs on target tasks can be further improved by prompt tuning, which adds context to the input image or text. By leveraging data from target tasks, various prompt-tuning methods have been studied in the literature. A…
▽ More
Pre-trained vision-language models (VLMs) have shown impressive performance on various downstream tasks by utilizing knowledge learned from large data. In general, the performance of VLMs on target tasks can be further improved by prompt tuning, which adds context to the input image or text. By leveraging data from target tasks, various prompt-tuning methods have been studied in the literature. A key to prompt tuning is the feature space alignment between two modalities via learnable vectors with model parameters fixed. We observed that the alignment becomes more effective when embeddings of each modality are `well-arranged' in the latent space. Inspired by this observation, we proposed distribution-aware prompt tuning (DAPT) for vision-language models, which is simple yet effective. Specifically, the prompts are learned by maximizing inter-dispersion, the distance between classes, as well as minimizing the intra-dispersion measured by the distance between embeddings from the same class. Our extensive experiments on 11 benchmark datasets demonstrate that our method significantly improves generalizability. The code is available at https://github.com/mlvlab/DAPT.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes
Authors:
Sunjun Kweon,
Junu Kim,
Jiyoun Kim,
Sujeong Im,
Eunbyeol Cho,
Seongsu Bae,
Jungwoo Oh,
Gyubok Lee,
Jong Hak Moon,
Seng Chan You,
Seungjin Baek,
Chang Hoon Han,
Yoon Bin Jung,
Yohan Jo,
Edward Choi
Abstract:
The development of large language models tailored for handling patients' clinical notes is often hindered by the limited accessibility and usability of these notes due to strict privacy regulations. To address these challenges, we first create synthetic large-scale clinical notes using publicly available case reports extracted from biomedical literature. We then use these synthetic notes to train…
▽ More
The development of large language models tailored for handling patients' clinical notes is often hindered by the limited accessibility and usability of these notes due to strict privacy regulations. To address these challenges, we first create synthetic large-scale clinical notes using publicly available case reports extracted from biomedical literature. We then use these synthetic notes to train our specialized clinical large language model, Asclepius. While Asclepius is trained on synthetic data, we assess its potential performance in real-world applications by evaluating it using real clinical notes. We benchmark Asclepius against several other large language models, including GPT-3.5-turbo and other open-source alternatives. To further validate our approach using synthetic notes, we also compare Asclepius with its variants trained on real clinical notes. Our findings convincingly demonstrate that synthetic clinical notes can serve as viable substitutes for real ones when constructing high-performing clinical language models. This conclusion is supported by detailed evaluations conducted by both GPT-4 and medical professionals. All resources including weights, codes, and data used in the development of Asclepius are made publicly accessible for future research. (https://github.com/starmpcc/Asclepius)
△ Less
Submitted 29 July, 2024; v1 submitted 1 September, 2023;
originally announced September 2023.
-
RecMind: Large Language Model Powered Agent For Recommendation
Authors:
Yancheng Wang,
Ziyan Jiang,
Zheng Chen,
Fan Yang,
Yingxue Zhou,
Eunah Cho,
Xing Fan,
Xiaojiang Huang,
Yanbin Lu,
Yingzhen Yang
Abstract:
While the recommendation system (RS) has advanced significantly through deep learning, current RS approaches usually train and fine-tune models on task-specific datasets, limiting their generalizability to new recommendation tasks and their ability to leverage external knowledge due to model scale and data size constraints. Thus, we designed an LLM-powered autonomous recommender agent, RecMind, wh…
▽ More
While the recommendation system (RS) has advanced significantly through deep learning, current RS approaches usually train and fine-tune models on task-specific datasets, limiting their generalizability to new recommendation tasks and their ability to leverage external knowledge due to model scale and data size constraints. Thus, we designed an LLM-powered autonomous recommender agent, RecMind, which is capable of leveraging external knowledge, utilizing tools with careful planning to provide zero-shot personalized recommendations. We propose a Self-Inspiring algorithm to improve the planning ability. At each intermediate step, the LLM self-inspires to consider all previously explored states to plan for the next step. This mechanism greatly improves the model's ability to comprehend and utilize historical information in planning for recommendation. We evaluate RecMind's performance in various recommendation scenarios. Our experiment shows that RecMind outperforms existing zero/few-shot LLM-based recommendation baseline methods in various tasks and achieves comparable performance to a fully trained recommendation model P5.
△ Less
Submitted 20 March, 2024; v1 submitted 28 August, 2023;
originally announced August 2023.
-
Graph Meets LLM: A Novel Approach to Collaborative Filtering for Robust Conversational Understanding
Authors:
Zheng Chen,
Ziyan Jiang,
Fan Yang,
Eunah Cho,
Xing Fan,
Xiaojiang Huang,
Yanbin Lu,
Aram Galstyan
Abstract:
Conversational AI systems such as Alexa need to understand defective queries to ensure robust conversational understanding and reduce user friction. These defective queries often arise from user ambiguities, mistakes, or errors in automatic speech recognition (ASR) and natural language understanding (NLU).
Personalized query rewriting is an approach that focuses on reducing defects in queries by…
▽ More
Conversational AI systems such as Alexa need to understand defective queries to ensure robust conversational understanding and reduce user friction. These defective queries often arise from user ambiguities, mistakes, or errors in automatic speech recognition (ASR) and natural language understanding (NLU).
Personalized query rewriting is an approach that focuses on reducing defects in queries by taking into account the user's individual behavior and preferences. It typically relies on an index of past successful user interactions with the conversational AI. However, unseen interactions within the user's history present additional challenges for personalized query rewriting. This paper presents our "Collaborative Query Rewriting" approach, which specifically addresses the task of rewriting new user interactions that have not been previously observed in the user's history. This approach builds a "User Feedback Interaction Graph" (FIG) of historical user-entity interactions and leverages multi-hop graph traversal to enrich each user's index to cover future unseen defective queries. The enriched user index is called a Collaborative User Index and contains hundreds of additional entries. To counteract precision degradation from the enlarged index, we add additional transformer layers to the L1 retrieval model and incorporate graph-based and guardrail features into the L2 ranking model.
Since the user index can be pre-computed, we further investigate the utilization of a Large Language Model (LLM) to enhance the FIG for user-entity link prediction in the Video/Music domains. Specifically, this paper investigates the Dolly-V2 7B model. We found that the user index augmented by the fine-tuned Dolly-V2 generation significantly enhanced the coverage of future unseen user interactions, thereby boosting QR performance on unseen queries compared with the graph traversal only approach.
△ Less
Submitted 19 June, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
PALR: Personalization Aware LLMs for Recommendation
Authors:
Fan Yang,
Zheng Chen,
Ziyan Jiang,
Eunah Cho,
Xiaojiang Huang,
Yanbin Lu
Abstract:
Large language models (LLMs) have recently received significant attention for their exceptional capabilities. Despite extensive efforts in developing general-purpose LLMs that can be utilized in various natural language processing (NLP) tasks, there has been less research exploring their potential in recommender systems. In this paper, we propose a novel framework, named PALR, which aiming to comb…
▽ More
Large language models (LLMs) have recently received significant attention for their exceptional capabilities. Despite extensive efforts in developing general-purpose LLMs that can be utilized in various natural language processing (NLP) tasks, there has been less research exploring their potential in recommender systems. In this paper, we propose a novel framework, named PALR, which aiming to combine user history behaviors (such as clicks, purchases, ratings, etc.) with LLMs to generate user preferred items. Specifically, we first use user/item interactions as guidance for candidate retrieval. Then we adopt a LLM-based ranking model to generate recommended items. Unlike existing approaches that typically adopt general-purpose LLMs for zero/few-shot recommendation testing or training on small-sized language models (with less than 1 billion parameters), which cannot fully elicit LLMs' reasoning abilities and leverage rich item side parametric knowledge, we fine-tune a 7 billion parameters LLM for the ranking purpose. This model takes retrieval candidates in natural language format as input, with instruction which explicitly asking to select results from input candidates during inference. Our experimental results demonstrate that our solution outperforms state-of-the-art models on various sequential recommendation tasks.
△ Less
Submitted 7 June, 2023; v1 submitted 12 May, 2023;
originally announced May 2023.
-
Feature Engineering Using File Layout for Malware Detection
Authors:
Jeongwoo Kim,
Eun-Sun Cho,
Joon-Young Paik
Abstract:
Malware detection on binary executables provides a high availability to even binaries which are not disassembled or decompiled. However, a binary-level approach could cause ambiguity problems. In this paper, we propose a new feature engineering technique that use minimal knowledge about the internal layout on a binary. The proposed feature avoids the ambiguity problems by integrating the informati…
▽ More
Malware detection on binary executables provides a high availability to even binaries which are not disassembled or decompiled. However, a binary-level approach could cause ambiguity problems. In this paper, we propose a new feature engineering technique that use minimal knowledge about the internal layout on a binary. The proposed feature avoids the ambiguity problems by integrating the information about the layout with structural entropy. The experimental results show that our feature improves accuracy and F1-score by 3.3% and 0.07, respectively, on a CNN based malware detector with realistic benign and malicious samples.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
Rediscovery of CNN's Versatility for Text-based Encoding of Raw Electronic Health Records
Authors:
Eunbyeol Cho,
Min Jae Lee,
Kyunghoon Hur,
Jiyoun Kim,
Jinsung Yoon,
Edward Choi
Abstract:
Making the most use of abundant information in electronic health records (EHR) is rapidly becoming an important topic in the medical domain. Recent work presented a promising framework that embeds entire features in raw EHR data regardless of its form and medical code standards. The framework, however, only focuses on encoding EHR with minimal preprocessing and fails to consider how to learn effic…
▽ More
Making the most use of abundant information in electronic health records (EHR) is rapidly becoming an important topic in the medical domain. Recent work presented a promising framework that embeds entire features in raw EHR data regardless of its form and medical code standards. The framework, however, only focuses on encoding EHR with minimal preprocessing and fails to consider how to learn efficient EHR representation in terms of computation and memory usage. In this paper, we search for a versatile encoder not only reducing the large data into a manageable size but also well preserving the core information of patients to perform diverse clinical tasks. We found that hierarchically structured Convolutional Neural Network (CNN) often outperforms the state-of-the-art model on diverse tasks such as reconstruction, prediction, and generation, even with fewer parameters and less training time. Moreover, it turns out that making use of the inherent hierarchy of EHR data can boost the performance of any kind of backbone models and clinical tasks performed. Through extensive experiments, we present concrete evidence to generalize our research findings into real-world practice. We give a clear guideline on building the encoder based on the research findings captured while exploring numerous settings.
△ Less
Submitted 10 May, 2023; v1 submitted 14 March, 2023;
originally announced March 2023.
-
HazardNet: Road Debris Detection by Augmentation of Synthetic Models
Authors:
Tae Eun Choe,
Jane Wu,
Xiaolin Lin,
Karen Kwon,
Minwoo Park
Abstract:
We present an algorithm to detect unseen road debris using a small set of synthetic models. Early detection of road debris is critical for safe autonomous or assisted driving, yet the development of a robust road debris detection model has not been widely discussed. There are two main challenges to building a road debris detector: first, data collection of road debris is challenging since hazardou…
▽ More
We present an algorithm to detect unseen road debris using a small set of synthetic models. Early detection of road debris is critical for safe autonomous or assisted driving, yet the development of a robust road debris detection model has not been widely discussed. There are two main challenges to building a road debris detector: first, data collection of road debris is challenging since hazardous objects on the road are rare to encounter in real driving scenarios; second, the variability of road debris is broad, ranging from a very small brick to a large fallen tree. To overcome these challenges, we propose a novel approach to few-shot learning of road debris that uses semantic augmentation and domain randomization to augment real road images with synthetic models. We constrain the problem domain to uncommon objects on the road and allow the deep neural network, HazardNet, to learn the semantic meaning of road debris to eventually detect unseen road debris. Our results demonstrate that HazardNet is able to accurately detect real road debris when only trained on synthetic objects in augmented images.
△ Less
Submitted 13 March, 2023;
originally announced March 2023.
-
KG-ECO: Knowledge Graph Enhanced Entity Correction for Query Rewriting
Authors:
Jinglun Cai,
Mingda Li,
Ziyan Jiang,
Eunah Cho,
Zheng Chen,
Yang Liu,
Xing Fan,
Chenlei Guo
Abstract:
Query Rewriting (QR) plays a critical role in large-scale dialogue systems for reducing frictions. When there is an entity error, it imposes extra challenges for a dialogue system to produce satisfactory responses. In this work, we propose KG-ECO: Knowledge Graph enhanced Entity COrrection for query rewriting, an entity correction system with corrupt entity span detection and entity retrieval/re-r…
▽ More
Query Rewriting (QR) plays a critical role in large-scale dialogue systems for reducing frictions. When there is an entity error, it imposes extra challenges for a dialogue system to produce satisfactory responses. In this work, we propose KG-ECO: Knowledge Graph enhanced Entity COrrection for query rewriting, an entity correction system with corrupt entity span detection and entity retrieval/re-ranking functionalities. To boost the model performance, we incorporate Knowledge Graph (KG) to provide entity structural information (neighboring entities encoded by graph neural networks) and textual information (KG entity descriptions encoded by RoBERTa). Experimental results show that our approach yields a clear performance gain over two baselines: utterance level QR and entity correction without utilizing KG information. The proposed system is particularly effective for few-shot learning cases where target entities are rarely seen in training or there is a KG relation between the target entity and other contextual entities in the query.
△ Less
Submitted 22 February, 2023; v1 submitted 21 February, 2023;
originally announced February 2023.
-
L4 Pointer: An efficient pointer extension for spatial memory safety support without hardware extension
Authors:
Seong-Kyun Mok,
Eun-Sun Cho
Abstract:
Since buffer overflow has long been a frequently occurring, high-risk vulnerability, various methods have been developed to support spatial memory safety and prevent buffer overflow. However, every proposed method, although effective in part, has its limitations. Due to expensive bound-checking or large memory in taking for metadata, the software-only support for spatial memory safety inherently e…
▽ More
Since buffer overflow has long been a frequently occurring, high-risk vulnerability, various methods have been developed to support spatial memory safety and prevent buffer overflow. However, every proposed method, although effective in part, has its limitations. Due to expensive bound-checking or large memory in taking for metadata, the software-only support for spatial memory safety inherently entails runtime overhead. Contrastingly, hardware-assisted methods are not available without specific hardware assistants. To mitigate such limitations, Herein we propose L4 Pointer, which is a 128-bit pointer extended from a normal 64-bit virtual addresses. By using the extra bits and widespread SIMD operations, L4 Pointer shows less slow-down and higher performance without hardware extension than existing methods.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
UniHPF : Universal Healthcare Predictive Framework with Zero Domain Knowledge
Authors:
Kyunghoon Hur,
Jungwoo Oh,
Junu Kim,
Jiyoun Kim,
Min Jae Lee,
Eunbyeol Cho,
Seong-Eun Moon,
Young-Hak Kim,
Edward Choi
Abstract:
Despite the abundance of Electronic Healthcare Records (EHR), its heterogeneity restricts the utilization of medical data in building predictive models. To address this challenge, we propose Universal Healthcare Predictive Framework (UniHPF), which requires no medical domain knowledge and minimal pre-processing for multiple prediction tasks. Experimental results demonstrate that UniHPF is capable…
▽ More
Despite the abundance of Electronic Healthcare Records (EHR), its heterogeneity restricts the utilization of medical data in building predictive models. To address this challenge, we propose Universal Healthcare Predictive Framework (UniHPF), which requires no medical domain knowledge and minimal pre-processing for multiple prediction tasks. Experimental results demonstrate that UniHPF is capable of building large-scale EHR models that can process any form of medical data from distinct EHR systems. We believe that our findings can provide helpful insights for further research on the multi-source learning of EHRs.
△ Less
Submitted 1 September, 2024; v1 submitted 15 November, 2022;
originally announced November 2022.
-
Taking a Language Detour: How International Migrants Speaking a Minority Language Seek COVID-Related Information in Their Host Countries
Authors:
Ge Gao,
Jian Zheng,
Eun Kyoung Choe,
Naomi Yamashita
Abstract:
Information seeking is crucial for people's self-care and wellbeing in times of public crises. Extensive research has investigated empirical understandings as well as technical solutions to facilitate information seeking by domestic citizens of affected regions. However, limited knowledge is established to support international migrants who need to survive a crisis in their host countries. The cur…
▽ More
Information seeking is crucial for people's self-care and wellbeing in times of public crises. Extensive research has investigated empirical understandings as well as technical solutions to facilitate information seeking by domestic citizens of affected regions. However, limited knowledge is established to support international migrants who need to survive a crisis in their host countries. The current paper presents an interview study with two cohorts of Chinese migrants living in Japan (N=14) and the United States (N=14). Participants reflected on their information seeking experiences during the COVID pandemic. The reflection was supplemented by two weeks of self-tracking where participants maintained records of their COVIDrelated information seeking practice. Our data indicated that participants often took language detours, or visits to Mandarin resources for information about the COVID outbreak in their host countries. They also made strategic use of the Mandarin information to perform selective reading, cross-checking, and contextualized interpretation of COVID-related information in Japanese or English. While such practices enhanced participants' perceived effectiveness of COVID-related information gathering and sensemaking, they disadvantaged people through sometimes incognizant ways. Further, participants lacked the awareness or preference to review migrant-oriented information that was issued by the host country's public authorities despite its availability. Building upon these findings, we discussed solutions to improve international migrants' COVID-related information seeking in their non-native language and cultural environment. We advocated inclusive crisis infrastructures that would engage people with diverse levels of local language fluency, information literacy, and experience in leveraging public services.
△ Less
Submitted 27 September, 2022; v1 submitted 6 September, 2022;
originally announced September 2022.
-
SSLEM: A Simplifier for MBA Expressions based on Semi-linear MBA Expressions and Program Synthesis
Authors:
Seong-Kyun Mok,
Seoyeon Kang,
Jeongwoo Kim,
Eun-Sun Cho,
Seokwoo Choi
Abstract:
MBA (mixed boolean and arithmetic) expressions are hard to simplify, so used for malware obfuscation to hinder analysts' diagnosis. Some MBA simplification methods with high performance have been developed, but they narrowed the target to "linear" MBA expressions, which allows efficient solutions based on logic/term-rewriting. However such restrictions are not appropriate for general forms of MBA…
▽ More
MBA (mixed boolean and arithmetic) expressions are hard to simplify, so used for malware obfuscation to hinder analysts' diagnosis. Some MBA simplification methods with high performance have been developed, but they narrowed the target to "linear" MBA expressions, which allows efficient solutions based on logic/term-rewriting. However such restrictions are not appropriate for general forms of MBA expressions usually appearing in malware. To overcome this limitation, we introduce a "semi-linear" MBA expression, a new class of MBA expression extended from a linear MBA expression, and propose a new MBA simplifier called "SSLEM", based on a simplification idea of semi-linear MBA expressions and program synthesis
△ Less
Submitted 15 August, 2022; v1 submitted 10 August, 2022;
originally announced August 2022.
-
GenHPF: General Healthcare Predictive Framework with Multi-task Multi-source Learning
Authors:
Kyunghoon Hur,
Jungwoo Oh,
Junu Kim,
Jiyoun Kim,
Min Jae Lee,
Eunbyeol Cho,
Seong-Eun Moon,
Young-Hak Kim,
Louis Atallah,
Edward Choi
Abstract:
Despite the remarkable progress in the development of predictive models for healthcare, applying these algorithms on a large scale has been challenging. Algorithms trained on a particular task, based on specific data formats available in a set of medical records, tend to not generalize well to other tasks or databases in which the data fields may differ. To address this challenge, we propose Gener…
▽ More
Despite the remarkable progress in the development of predictive models for healthcare, applying these algorithms on a large scale has been challenging. Algorithms trained on a particular task, based on specific data formats available in a set of medical records, tend to not generalize well to other tasks or databases in which the data fields may differ. To address this challenge, we propose General Healthcare Predictive Framework (GenHPF), which is applicable to any EHR with minimal preprocessing for multiple prediction tasks. GenHPF resolves heterogeneity in medical codes and schemas by converting EHRs into a hierarchical textual representation while incorporating as many features as possible. To evaluate the efficacy of GenHPF, we conduct multi-task learning experiments with single-source and multi-source settings, on three publicly available EHR datasets with different schemas for 12 clinically meaningful prediction tasks. Our framework significantly outperforms baseline models that utilize domain knowledge in multi-source learning, improving average AUROC by 1.2%P in pooled learning and 2.6%P in transfer learning while also showing comparable results when trained on a single EHR dataset. Furthermore, we demonstrate that self-supervised pretraining using multi-source datasets is effective when combined with GenHPF, resulting in a 0.6%P AUROC improvement compared to models without pretraining. By eliminating the need for preprocessing and feature engineering, we believe that this work offers a solid framework for multi-task and multi-source learning that can be leveraged to speed up the scaling and usage of predictive algorithms in healthcare.
△ Less
Submitted 15 November, 2023; v1 submitted 20 July, 2022;
originally announced July 2022.
-
A Large Scale Study and Classification of VirusTotal Reports on Phishing and Malware URLs
Authors:
Euijin Choo,
Mohamed Nabeel,
Ravindu De Silva,
Ting Yu,
Issa Khalil
Abstract:
VirusTotal (VT) provides aggregated threat intelligence on various entities including URLs, IP addresses, and binaries. It is widely used by researchers and practitioners to collect ground truth and evaluate the maliciousness of entities. In this work, we provide a comprehensive analysis of VT URL scanning reports containing the results of 95 scanners for 1.577 Billion URLs over two years. Individ…
▽ More
VirusTotal (VT) provides aggregated threat intelligence on various entities including URLs, IP addresses, and binaries. It is widely used by researchers and practitioners to collect ground truth and evaluate the maliciousness of entities. In this work, we provide a comprehensive analysis of VT URL scanning reports containing the results of 95 scanners for 1.577 Billion URLs over two years. Individual VT scanners are known to be noisy in terms of their detection and attack type classification. To obtain high quality ground truth of URLs and actively take proper actions to mitigate different types of attacks, there are two challenges: (1) how to decide whether a given URL is malicious given noisy reports and (2) how to determine attack types (e.g., phishing or malware hosting) that the URL is involved in, given conflicting attack labels from different scanners. In this work, we provide a systematic comparative study on the behavior of VT scanners for different attack types of URLs. A common practice to decide the maliciousness is to use a cut-off threshold of scanners that report the URL as malicious. However, in this work, we show that using a fixed threshold is suboptimal, due to several reasons: (1) correlations between scanners; (2) lead/lag behavior; (3) the specialty of scanners; (4) the quality and reliability of scanners. A common practice to determine an attack type is to use majority voting. However, we show that majority voting could not accurately classify the attack type of a URL due to the bias from correlated scanners. Instead, we propose a machine learning-based approach to assign an attack type to URLs given the VT reports.
△ Less
Submitted 26 May, 2022;
originally announced May 2022.
-
Literature Review to Collect Conceptual Variables of Scenario Methods for Establishing a Conceptual Scenario Framework
Authors:
Young-Min Baek,
Esther Cho,
Donghwan Shin,
Doo-Hwan Bae
Abstract:
Over recent decades, scenarios and scenario-based software/system engineering have been actively employed as essential tools to handle intricate problems, validate requirements, and support stakeholders' communication. However, despite the widespread use of scenarios, there have been several challenges for engineers to more willingly utilize scenario-based engineering approaches (i.e., scenario me…
▽ More
Over recent decades, scenarios and scenario-based software/system engineering have been actively employed as essential tools to handle intricate problems, validate requirements, and support stakeholders' communication. However, despite the widespread use of scenarios, there have been several challenges for engineers to more willingly utilize scenario-based engineering approaches (i.e., scenario methods) in their projects. First, the term scenario has numerous published definitions, thus lacking in a well-established shared understanding of scenarios and scenario methods. Second, the conceptual basis for engineers developing or employing scenarios is missing. To establish shared understanding and to find common denominators of scenario methods, this study leverages well-defined metamodeling and conceptualization that systematically investigate the concepts under analysis and define core entities and their relations. By conducting a semi-systematic literature review, conceptual variables are collected and conceptualized as a conceptual meta-model. As a result, this study introduces scenario variables (SVs) that represent constructs/semantics of scenario descriptions, according to 4 levels of constructs of a scenario method. To evaluate the comprehensibility and applicability of the defined variables, we analyze five existing scenario methods and their instances in automated driving system (ADS) domains. The results showed that our conceptual model and its constituent scenario variables adequately support the understanding of a scenario method and provide a means for comparative analysis between different scenario methods.
△ Less
Submitted 17 May, 2022;
originally announced May 2022.
-
Alexa as an Active Listener: How Backchanneling Can Elicit Self-Disclosure and Promote User Experience
Authors:
Eugene Cho,
Nasim Motalebi,
S. Shyam Sundar,
Saeed Abdullah
Abstract:
Active listening is a well-known skill applied in human communication to build intimacy and elicit self-disclosure to support a wide variety of cooperative tasks. When applied to conversational UIs, active listening from machines can also elicit greater self-disclosure by signaling to the users that they are being heard, which can have positive outcomes. However, it takes considerable engineering…
▽ More
Active listening is a well-known skill applied in human communication to build intimacy and elicit self-disclosure to support a wide variety of cooperative tasks. When applied to conversational UIs, active listening from machines can also elicit greater self-disclosure by signaling to the users that they are being heard, which can have positive outcomes. However, it takes considerable engineering effort and training to embed active listening skills in machines at scale, given the need to personalize active-listening cues to individual users and their specific utterances. A more generic solution is needed given the increasing use of conversational agents, especially by the growing number of socially isolated individuals. With this in mind, we developed an Amazon Alexa skill that provides privacy-preserving and pseudo-random backchanneling to indicate active listening. User study (N = 40) data show that backchanneling improves perceived degree of active listening by smart speakers. It also results in more emotional disclosure, with participants using more positive words. Perception of smart speakers as active listeners is positively associated with perceived emotional support. Interview data corroborate the feasibility of using smart speakers to provide emotional support. These findings have important implications for smart speaker interaction design in several domains of cooperative work and social computing.
△ Less
Submitted 22 September, 2022; v1 submitted 21 April, 2022;
originally announced April 2022.
-
MyMove: Facilitating Older Adults to Collect In-Situ Activity Labels on a Smartwatch with Speech
Authors:
Young-Ho Kim,
Diana Chou,
Bongshin Lee,
Margaret Danilovich,
Amanda Lazar,
David E. Conroy,
Hernisa Kacorri,
Eun Kyoung Choe
Abstract:
Current activity tracking technologies are largely trained on younger adults' data, which can lead to solutions that are not well-suited for older adults. To build activity trackers for older adults, it is crucial to collect training data with them. To this end, we examine the feasibility and challenges with older adults in collecting activity labels by leveraging speech. Specifically, we built My…
▽ More
Current activity tracking technologies are largely trained on younger adults' data, which can lead to solutions that are not well-suited for older adults. To build activity trackers for older adults, it is crucial to collect training data with them. To this end, we examine the feasibility and challenges with older adults in collecting activity labels by leveraging speech. Specifically, we built MyMove, a speech-based smartwatch app to facilitate the in-situ labeling with a low capture burden. We conducted a 7-day deployment study, where 13 older adults collected their activity labels and smartwatch sensor data, while wearing a thigh-worn activity monitor. Participants were highly engaged, capturing 1,224 verbal reports in total. We extracted 1,885 activities with corresponding effort level and timespan, and examined the usefulness of these reports as activity labels. We discuss the implications of our approach and the collected dataset in supporting older adults through personalized activity tracking technologies.
△ Less
Submitted 31 March, 2022;
originally announced April 2022.
-
Zero-Shot Dialogue State Tracking via Cross-Task Transfer
Authors:
Zhaojiang Lin,
Bing Liu,
Andrea Madotto,
Seungwhan Moon,
Paul Crook,
Zhenpeng Zhou,
Zhiguang Wang,
Zhou Yu,
Eunjoon Cho,
Rajen Subba,
Pascale Fung
Abstract:
Zero-shot transfer learning for dialogue state tracking (DST) enables us to handle a variety of task-oriented dialogue domains without the expense of collecting in-domain data. In this work, we propose to transfer the \textit{cross-task} knowledge from general question answering (QA) corpora for the zero-shot DST task. Specifically, we propose TransferQA, a transferable generative QA model that se…
▽ More
Zero-shot transfer learning for dialogue state tracking (DST) enables us to handle a variety of task-oriented dialogue domains without the expense of collecting in-domain data. In this work, we propose to transfer the \textit{cross-task} knowledge from general question answering (QA) corpora for the zero-shot DST task. Specifically, we propose TransferQA, a transferable generative QA model that seamlessly combines extractive QA and multi-choice QA via a text-to-text transformer framework, and tracks both categorical slots and non-categorical slots in DST. In addition, we introduce two effective ways to construct unanswerable questions, namely, negative question sampling and context truncation, which enable our model to handle "none" value slots in the zero-shot DST setting. The extensive experiments show that our approaches substantially improve the existing zero-shot and few-shot results on MultiWoz. Moreover, compared to the fully trained baseline on the Schema-Guided Dialogue dataset, our approach shows better generalization ability in unseen domains.
△ Less
Submitted 9 September, 2021;
originally announced September 2021.
-
Leveraging Slot Descriptions for Zero-Shot Cross-Domain Dialogue State Tracking
Authors:
Zhaojiang Lin,
Bing Liu,
Seungwhan Moon,
Paul Crook,
Zhenpeng Zhou,
Zhiguang Wang,
Zhou Yu,
Andrea Madotto,
Eunjoon Cho,
Rajen Subba
Abstract:
Zero-shot cross-domain dialogue state tracking (DST) enables us to handle task-oriented dialogue in unseen domains without the expense of collecting in-domain data. In this paper, we propose a slot description enhanced generative approach for zero-shot cross-domain DST. Specifically, our model first encodes dialogue context and slots with a pre-trained self-attentive encoder, and generates slot va…
▽ More
Zero-shot cross-domain dialogue state tracking (DST) enables us to handle task-oriented dialogue in unseen domains without the expense of collecting in-domain data. In this paper, we propose a slot description enhanced generative approach for zero-shot cross-domain DST. Specifically, our model first encodes dialogue context and slots with a pre-trained self-attentive encoder, and generates slot values in an auto-regressive manner. In addition, we incorporate Slot Type Informed Descriptions that capture the shared information across slots to facilitate cross-domain knowledge transfer. Experimental results on the MultiWOZ dataset show that our proposed method significantly improves existing state-of-the-art results in the zero-shot cross-domain setting.
△ Less
Submitted 10 May, 2021;
originally announced May 2021.
-
Investigating Opportunities to Support Kids' Agency and Well-being: A Review of Kids' Wearables
Authors:
Rachael Zehrung,
Lily Huang,
Bongshin Lee,
Eun Kyoung Choe
Abstract:
Wearable devices hold great potential for promoting children's health and well-being. However, research on kids' wearables is sparse and often focuses on their use in the context of parental surveillance. To gain insight into the current landscape of kids' wearables, we surveyed 47 wearable devices marketed for children. We collected rich data on the functionality of these devices and assessed how…
▽ More
Wearable devices hold great potential for promoting children's health and well-being. However, research on kids' wearables is sparse and often focuses on their use in the context of parental surveillance. To gain insight into the current landscape of kids' wearables, we surveyed 47 wearable devices marketed for children. We collected rich data on the functionality of these devices and assessed how different features satisfy parents' information needs, and identified opportunities for wearables to support children's needs and interests. We found that many kids' wearables are technologically sophisticated devices that focus on parents' ability to communicate with their children and keep them safe, as well as encourage physical activity and nurture good habits. We discuss how our findings could inform the design of wearables that serve as more than monitoring devices, and instead support children and parents as equal stakeholders, providing implications for kids' agency, long-term development, and overall well-being. Finally, we identify future research efforts related to designing for kids' self-tracking and collaborative tracking with parents.
△ Less
Submitted 13 April, 2021;
originally announced April 2021.
-
Data@Hand: Fostering Visual Exploration of Personal Data on Smartphones Leveraging Speech and Touch Interaction
Authors:
Young-Ho Kim,
Bongshin Lee,
Arjun Srinivasan,
Eun Kyoung Choe
Abstract:
Most mobile health apps employ data visualization to help people view their health and activity data, but these apps provide limited support for visual data exploration. Furthermore, despite its huge potential benefits, mobile visualization research in the personal data context is sparse. This work aims to empower people to easily navigate and compare their personal health data on smartphones by e…
▽ More
Most mobile health apps employ data visualization to help people view their health and activity data, but these apps provide limited support for visual data exploration. Furthermore, despite its huge potential benefits, mobile visualization research in the personal data context is sparse. This work aims to empower people to easily navigate and compare their personal health data on smartphones by enabling flexible time manipulation with speech. We designed and developed Data@Hand, a mobile app that leverages the synergy of two complementary modalities: speech and touch. Through an exploratory study with 13 long-term Fitbit users, we examined how multimodal interaction helps participants explore their own health data. Participants successfully adopted multimodal interaction (i.e., speech and touch) for convenient and fluid data exploration. Based on the quantitative and qualitative findings, we discuss design implications and opportunities with multimodal interaction for better supporting visual data exploration on mobile devices.
△ Less
Submitted 15 January, 2021;
originally announced January 2021.
-
Continual Learning in Task-Oriented Dialogue Systems
Authors:
Andrea Madotto,
Zhaojiang Lin,
Zhenpeng Zhou,
Seungwhan Moon,
Paul Crook,
Bing Liu,
Zhou Yu,
Eunjoon Cho,
Zhiguang Wang
Abstract:
Continual learning in task-oriented dialogue systems can allow us to add new domains and functionalities through time without incurring the high cost of a whole system retraining. In this paper, we propose a continual learning benchmark for task-oriented dialogue systems with 37 domains to be learned continuously in four settings, such as intent recognition, state tracking, natural language genera…
▽ More
Continual learning in task-oriented dialogue systems can allow us to add new domains and functionalities through time without incurring the high cost of a whole system retraining. In this paper, we propose a continual learning benchmark for task-oriented dialogue systems with 37 domains to be learned continuously in four settings, such as intent recognition, state tracking, natural language generation, and end-to-end. Moreover, we implement and compare multiple existing continual learning baselines, and we propose a simple yet effective architectural method based on residual adapters. Our experiments demonstrate that the proposed architectural method and a simple replay-based strategy perform comparably well but they both achieve inferior performance to the multi-task learning baseline, in where all the data are shown at once, showing that continual learning in task-oriented dialogue systems is a challenging task. Furthermore, we reveal several trade-offs between different continual learning methods in term of parameter usage and memory size, which are important in the design of a task-oriented dialogue system. The proposed benchmark is released together with several baselines to promote more research in this direction.
△ Less
Submitted 31 December, 2020;
originally announced December 2020.
-
Time-Window Group-Correlation Support vs. Individual Features: A Detection of Abnormal Users
Authors:
Lun-Pin Yuan,
Euijin Choo,
Ting Yu,
Issa Khalil,
Sencun Zhu
Abstract:
Autoencoder-based anomaly detection methods have been used in identifying anomalous users from large-scale enterprise logs with the assumption that adversarial activities do not follow past habitual patterns. Most existing approaches typically build models by reconstructing single-day and individual-user behaviors. However, without capturing long-term signals and group-correlation signals, the mod…
▽ More
Autoencoder-based anomaly detection methods have been used in identifying anomalous users from large-scale enterprise logs with the assumption that adversarial activities do not follow past habitual patterns. Most existing approaches typically build models by reconstructing single-day and individual-user behaviors. However, without capturing long-term signals and group-correlation signals, the models cannot identify low-signal yet long-lasting threats, and will wrongly report many normal users as anomalies on busy days, which, in turn, lead to high false positive rate. In this paper, we propose ACOBE, an Anomaly detection method based on COmpound BEhavior, which takes into consideration long-term patterns and group behaviors. ACOBE leverages a novel behavior representation and an ensemble of deep autoencoders and produces an ordered investigation list. Our evaluation shows that ACOBE outperforms prior work by a large margin in terms of precision and recall, and our case study demonstrates that ACOBE is applicable in practice for cyberattack detection.
△ Less
Submitted 27 December, 2020;
originally announced December 2020.
-
Adding Chit-Chat to Enhance Task-Oriented Dialogues
Authors:
Kai Sun,
Seungwhan Moon,
Paul Crook,
Stephen Roller,
Becka Silvert,
Bing Liu,
Zhiguang Wang,
Honglei Liu,
Eunjoon Cho,
Claire Cardie
Abstract:
Existing dialogue corpora and models are typically designed under two disjoint motives: while task-oriented systems focus on achieving functional goals (e.g., booking hotels), open-domain chatbots aim at making socially engaging conversations. In this work, we propose to integrate both types of systems by Adding Chit-Chat to ENhance Task-ORiented dialogues (ACCENTOR), with the goal of making virtu…
▽ More
Existing dialogue corpora and models are typically designed under two disjoint motives: while task-oriented systems focus on achieving functional goals (e.g., booking hotels), open-domain chatbots aim at making socially engaging conversations. In this work, we propose to integrate both types of systems by Adding Chit-Chat to ENhance Task-ORiented dialogues (ACCENTOR), with the goal of making virtual assistant conversations more engaging and interactive. Specifically, we propose a Human <-> AI collaborative data collection approach for generating diverse chit-chat responses to augment task-oriented dialogues with minimal annotation effort. We then present our new chit-chat-based annotations to 23.8K dialogues from two popular task-oriented datasets (Schema-Guided Dialogue and MultiWOZ 2.1) and demonstrate their advantage over the originals via human evaluation. Lastly, we propose three new models for adding chit-chat to task-oriented dialogues, explicitly trained to predict user goals and to generate contextually relevant chit-chat responses. Automatic and human evaluations show that, compared with the state-of-the-art task-oriented baseline, our models can code-switch between task and chit-chat to be more engaging, interesting, knowledgeable, and humanlike, while maintaining competitive task performance.
△ Less
Submitted 1 May, 2021; v1 submitted 23 October, 2020;
originally announced October 2020.
-
Situated and Interactive Multimodal Conversations
Authors:
Seungwhan Moon,
Satwik Kottur,
Paul A. Crook,
Ankita De,
Shivani Poddar,
Theodore Levin,
David Whitney,
Daniel Difranco,
Ahmad Beirami,
Eunjoon Cho,
Rajen Subba,
Alborz Geramifard
Abstract:
Next generation virtual assistants are envisioned to handle multimodal inputs (e.g., vision, memories of previous interactions, in addition to the user's utterances), and perform multimodal actions (e.g., displaying a route in addition to generating the system's utterance). We introduce Situated Interactive MultiModal Conversations (SIMMC) as a new direction aimed at training agents that take mult…
▽ More
Next generation virtual assistants are envisioned to handle multimodal inputs (e.g., vision, memories of previous interactions, in addition to the user's utterances), and perform multimodal actions (e.g., displaying a route in addition to generating the system's utterance). We introduce Situated Interactive MultiModal Conversations (SIMMC) as a new direction aimed at training agents that take multimodal actions grounded in a co-evolving multimodal input context in addition to the dialog history. We provide two SIMMC datasets totalling ~13K human-human dialogs (~169K utterances) using a multimodal Wizard-of-Oz (WoZ) setup, on two shopping domains: (a) furniture (grounded in a shared virtual environment) and, (b) fashion (grounded in an evolving set of images). We also provide logs of the items appearing in each scene, and contextual NLU and coreference annotations, using a novel and unified framework of SIMMC conversational acts for both user and assistant utterances. Finally, we present several tasks within SIMMC as objective evaluation protocols, such as Structural API Prediction and Response Generation. We benchmark a collection of existing models on these SIMMC tasks as strong baselines, and demonstrate rich multimodal conversational interactions. Our data, annotations, code, and models are publicly available.
△ Less
Submitted 10 November, 2020; v1 submitted 2 June, 2020;
originally announced June 2020.
-
Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection
Authors:
Yuliang Guo,
Guang Chen,
Peitao Zhao,
Weide Zhang,
Jinghao Miao,
Jingao Wang,
Tae Eun Choe
Abstract:
We present a generalized and scalable method, called Gen-LaneNet, to detect 3D lanes from a single image. The method, inspired by the latest state-of-the-art 3D-LaneNet, is a unified framework solving image encoding, spatial transform of features and 3D lane prediction in a single network. However, we propose unique designs for Gen-LaneNet in two folds. First, we introduce a new geometry-guided la…
▽ More
We present a generalized and scalable method, called Gen-LaneNet, to detect 3D lanes from a single image. The method, inspired by the latest state-of-the-art 3D-LaneNet, is a unified framework solving image encoding, spatial transform of features and 3D lane prediction in a single network. However, we propose unique designs for Gen-LaneNet in two folds. First, we introduce a new geometry-guided lane anchor representation in a new coordinate frame and apply a specific geometric transformation to directly calculate real 3D lane points from the network output. We demonstrate that aligning the lane points with the underlying top-view features in the new coordinate frame is critical towards a generalized method in handling unfamiliar scenes. Second, we present a scalable two-stage framework that decouples the learning of image segmentation subnetwork and geometry encoding subnetwork. Compared to 3D-LaneNet, the proposed Gen-LaneNet drastically reduces the amount of 3D lane labels required to achieve a robust solution in real-world application. Moreover, we release a new synthetic dataset and its construction strategy to encourage the development and evaluation of 3D lane detection methods. In experiments, we conduct extensive ablation study to substantiate the proposed Gen-LaneNet significantly outperforms 3D-LaneNet in average precision(AP) and F-score.
△ Less
Submitted 24 March, 2020;
originally announced March 2020.
-
Low Latency ASR for Simultaneous Speech Translation
Authors:
Thai Son Nguyen,
Jan Niehues,
Eunah Cho,
Thanh-Le Ha,
Kevin Kilgour,
Markus Muller,
Matthias Sperber,
Sebastian Stueker,
Alex Waibel
Abstract:
User studies have shown that reducing the latency of our simultaneous lecture translation system should be the most important goal. We therefore have worked on several techniques for reducing the latency for both components, the automatic speech recognition and the speech translation module. Since the commonly used commitment latency is not appropriate in our case of continuous stream decoding, we…
▽ More
User studies have shown that reducing the latency of our simultaneous lecture translation system should be the most important goal. We therefore have worked on several techniques for reducing the latency for both components, the automatic speech recognition and the speech translation module. Since the commonly used commitment latency is not appropriate in our case of continuous stream decoding, we focused on word latency. We used it to analyze the performance of our current system and to identify opportunities for improvements. In order to minimize the latency we combined run-on decoding with a technique for identifying stable partial hypotheses when stream decoding and a protocol for dynamic output update that allows to revise the most recent parts of the transcription. This combination reduces the latency at word level, where the words are final and will never be updated again in the future, from 18.1s to 1.1s without sacrificing performance in terms of word error rate.
△ Less
Submitted 22 March, 2020;
originally announced March 2020.
-
Data Augmentation using Pre-trained Transformer Models
Authors:
Varun Kumar,
Ashutosh Choudhary,
Eunah Cho
Abstract:
Language model based pre-trained models such as BERT have provided significant gains across different NLP tasks. In this paper, we study different types of transformer based pre-trained models such as auto-regressive models (GPT-2), auto-encoder models (BERT), and seq2seq models (BART) for conditional data augmentation. We show that prepending the class labels to text sequences provides a simple y…
▽ More
Language model based pre-trained models such as BERT have provided significant gains across different NLP tasks. In this paper, we study different types of transformer based pre-trained models such as auto-regressive models (GPT-2), auto-encoder models (BERT), and seq2seq models (BART) for conditional data augmentation. We show that prepending the class labels to text sequences provides a simple yet effective way to condition the pre-trained models for data augmentation. Additionally, on three classification benchmarks, pre-trained Seq2Seq model outperforms other data augmentation methods in a low-resource setting. Further, we explore how different pre-trained model based data augmentation differs in-terms of data diversity, and how well such methods preserve the class-label information.
△ Less
Submitted 31 January, 2021; v1 submitted 4 March, 2020;
originally announced March 2020.
-
DeviceWatch: Identifying Compromised Mobile Devices through Network Traffic Analysis and Graph Inference
Authors:
Euijin Choo,
Mohamed Nabeel,
Mashael Alsabah,
Issa Khalil,
Ting Yu,
Wei Wang
Abstract:
In this paper, we propose to identify compromised mobile devices from a network administrator's point of view. Intuitively, inadvertent users (and thus their devices) who download apps through untrustworthy markets are often allured to install malicious apps through in-app advertisement or phishing. We thus hypothesize that devices sharing a similar set of apps will have a similar probability of b…
▽ More
In this paper, we propose to identify compromised mobile devices from a network administrator's point of view. Intuitively, inadvertent users (and thus their devices) who download apps through untrustworthy markets are often allured to install malicious apps through in-app advertisement or phishing. We thus hypothesize that devices sharing a similar set of apps will have a similar probability of being compromised, resulting in the association between a device being compromised and apps in the device. Our goal is to leverage such associations to identify unknown compromised devices (i.e., devices possibly having yet currently not having known malicious apps) using the guilt-by-association principle. Admittedly, such associations could be quite weak as it is often hard, if not impossible, for an app to automatically download and install other apps without explicit initiation from a user. We describe how we can magnify such weak associations between devices and apps by carefully choosing parameters when applying graph-based inferences. We empirically show the effectiveness of our approach with a comprehensive study on the mobile network traffic provided by a major mobile service provider. Concretely, we achieve nearly 98\% accuracy in terms of AUC (area under the ROC curve). Given the relatively weak nature of association, we further conduct in-depth analysis of the different behavior of a graph-inference approach, by comparing it to active DNS data. Moreover, we validate our results by showing that detected compromised devices indeed present undesirable behavior in terms of their privacy leakage and network infrastructure accessed.
△ Less
Submitted 27 November, 2019;
originally announced November 2019.
-
Efficient Semi-Supervised Learning for Natural Language Understanding by Optimizing Diversity
Authors:
Eunah Cho,
He Xie,
John P. Lalor,
Varun Kumar,
William M. Campbell
Abstract:
Expanding new functionalities efficiently is an ongoing challenge for single-turn task-oriented dialogue systems. In this work, we explore functionality-specific semi-supervised learning via self-training. We consider methods that augment training data automatically from unlabeled data sets in a functionality-targeted manner. In addition, we examine multiple techniques for efficient selection of a…
▽ More
Expanding new functionalities efficiently is an ongoing challenge for single-turn task-oriented dialogue systems. In this work, we explore functionality-specific semi-supervised learning via self-training. We consider methods that augment training data automatically from unlabeled data sets in a functionality-targeted manner. In addition, we examine multiple techniques for efficient selection of augmented utterances to reduce training time and increase diversity. First, we consider paraphrase detection methods that attempt to find utterance variants of labeled training data with good coverage. Second, we explore sub-modular optimization based on n-grams features for utterance selection. Experiments show that functionality-specific self-training is very effective for improving system performance. In addition, methods optimizing diversity can reduce training data in many cases to 50% with little impact on performance.
△ Less
Submitted 9 October, 2019;
originally announced October 2019.
-
A Comparative Evaluation of Animation and Small Multiples for Trend Visualization on Mobile Phones
Authors:
Matthew Brehmer,
Bongshin Lee,
Petra Isenberg,
Eun Kyoung Choe
Abstract:
We compare the efficacy of animated and small multiples variants of scatterplots on mobile phones for comparing trends in multivariate datasets. Visualization is increasingly prevalent in mobile applications and mobile-first websites, yet there is little prior visualization research dedicated to small displays. In this paper, we build upon previous experimental research carried out on larger displ…
▽ More
We compare the efficacy of animated and small multiples variants of scatterplots on mobile phones for comparing trends in multivariate datasets. Visualization is increasingly prevalent in mobile applications and mobile-first websites, yet there is little prior visualization research dedicated to small displays. In this paper, we build upon previous experimental research carried out on larger displays that assessed animated and non-animated variants of scatterplots. Incorporating similar experimental stimuli and tasks, we conducted an experiment where 96 crowdworker participants performed nine trend comparison tasks using their mobile phones. We found that those using a small multiples design consistently completed tasks in less time, albeit with slightly less confidence than those using an animated design. The accuracy results were more task-dependent, and we further interpret our results according to the characteristics of the individual tasks, with a specific focus on the trajectories of target and distractor data items in each task. We identify cases that appear to favor either animation or small multiples, providing new questions for further experimental research and implications for visualization design on mobile devices. Lastly, we provide a reflection on our evaluation methodology.
△ Less
Submitted 12 October, 2019; v1 submitted 8 July, 2019;
originally announced July 2019.
-
Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning
Authors:
Jan Niehues,
Eunah Cho
Abstract:
Linguistic resources such as part-of-speech (POS) tags have been extensively used in statistical machine translation (SMT) frameworks and have yielded better performances. However, usage of such linguistic annotations in neural machine translation (NMT) systems has been left under-explored.
In this work, we show that multi-task learning is a successful and a easy approach to introduce an additio…
▽ More
Linguistic resources such as part-of-speech (POS) tags have been extensively used in statistical machine translation (SMT) frameworks and have yielded better performances. However, usage of such linguistic annotations in neural machine translation (NMT) systems has been left under-explored.
In this work, we show that multi-task learning is a successful and a easy approach to introduce an additional knowledge into an end-to-end neural attentional model. By jointly training several natural language processing (NLP) tasks in one system, we are able to leverage common information and improve the performance of the individual task.
We analyze the impact of three design decisions in multi-task learning: the tasks used in training, the training schedule, and the degree of parameter sharing across the tasks, which is defined by the network architecture. The experiments are conducted for an German to English translation task. As additional linguistic resources, we exploit POS information and named-entities (NE). Experiments show that the translation quality can be improved by up to 1.5 BLEU points under the low-resource condition. The performance of the POS tagger is also improved using the multi-task learning scheme.
△ Less
Submitted 3 August, 2017;
originally announced August 2017.
-
Analyzing Neural MT Search and Model Performance
Authors:
Jan Niehues,
Eunah Cho,
Thanh-Le Ha,
Alex Waibel
Abstract:
In this paper, we offer an in-depth analysis about the modeling and search performance. We address the question if a more complex search algorithm is necessary. Furthermore, we investigate the question if more complex models which might only be applicable during rescoring are promising.
By separating the search space and the modeling using $n$-best list reranking, we analyze the influence of bot…
▽ More
In this paper, we offer an in-depth analysis about the modeling and search performance. We address the question if a more complex search algorithm is necessary. Furthermore, we investigate the question if more complex models which might only be applicable during rescoring are promising.
By separating the search space and the modeling using $n$-best list reranking, we analyze the influence of both parts of an NMT system independently. By comparing differently performing NMT systems, we show that the better translation is already in the search space of the translation systems with less performance. This results indicate that the current search algorithms are sufficient for the NMT systems. Furthermore, we could show that even a relatively small $n$-best list of $50$ hypotheses already contain notably better translations.
△ Less
Submitted 1 August, 2017;
originally announced August 2017.
-
A spectral characterisation of t-designs and its applications
Authors:
Eun-Kyung Cho,
Cunsheng Ding,
Jong Yoon Hyun
Abstract:
There are two standard approaches to the construction of $t$-designs. The first one is based on permutation group actions on certain base blocks. The second one is based on coding theory. The objective of this paper is to give a spectral characterisation of all $t$-designs by introducing a characteristic Boolean function of a $t$-design. The spectra of the characteristic functions of $(n-2)/2$-…
▽ More
There are two standard approaches to the construction of $t$-designs. The first one is based on permutation group actions on certain base blocks. The second one is based on coding theory. The objective of this paper is to give a spectral characterisation of all $t$-designs by introducing a characteristic Boolean function of a $t$-design. The spectra of the characteristic functions of $(n-2)/2$-$(n, n/2, 1)$ Steiner systems are determined and properties of such designs are proved. Delsarte's characterisations of orthogonal arrays and $t$-designs, which are two special cases of Delsarte's characterisation of $T$-designs in association schemes, are slightly extended into two spectral characterisations. Another characterisation of $t$-designs by Delsarte and Seidel is also extended into a spectral one. These spectral characterisations are then compared with the new spectral characterisation of this paper.
△ Less
Submitted 9 June, 2018; v1 submitted 1 June, 2017;
originally announced June 2017.