Search | arXiv e-print repository

Decoding Matters: Addressing Amplification Bias and Homogeneity Issue for LLM-based Recommendation

Authors: Keqin Bao, Jizhi Zhang, Yang Zhang, Xinyue Huo, Chong Chen, Fuli Feng

Abstract: Adapting Large Language Models (LLMs) for recommendation requires careful consideration of the decoding process, given the inherent differences between generating items and natural language. Existing approaches often directly apply LLMs' original decoding methods. However, we find these methods encounter significant challenges: 1) amplification bias -- where standard length normalization inflates… ▽ More Adapting Large Language Models (LLMs) for recommendation requires careful consideration of the decoding process, given the inherent differences between generating items and natural language. Existing approaches often directly apply LLMs' original decoding methods. However, we find these methods encounter significant challenges: 1) amplification bias -- where standard length normalization inflates scores for items containing tokens with generation probabilities close to 1 (termed ghost tokens), and 2) homogeneity issue -- generating multiple similar or repetitive items for a user. To tackle these challenges, we introduce a new decoding approach named Debiasing-Diversifying Decoding (D3). D3 disables length normalization for ghost tokens to alleviate amplification bias, and it incorporates a text-free assistant model to encourage tokens less frequently generated by LLMs for counteracting recommendation homogeneity. Extensive experiments on real-world datasets demonstrate the method's effectiveness in enhancing accuracy and diversity. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.11503 [pdf, other]

GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation

Authors: Shihao Cai, Keqin Bao, Hangyu Guo, Jizhi Zhang, Jun Song, Bo Zheng

Abstract: Large language models have seen widespread adoption in math problem-solving. However, in geometry problems that usually require visual aids for better understanding, even the most advanced multi-modal models currently still face challenges in effectively using image information. High-quality data is crucial for enhancing the geometric capabilities of multi-modal models, yet existing open-source da… ▽ More Large language models have seen widespread adoption in math problem-solving. However, in geometry problems that usually require visual aids for better understanding, even the most advanced multi-modal models currently still face challenges in effectively using image information. High-quality data is crucial for enhancing the geometric capabilities of multi-modal models, yet existing open-source datasets and related efforts are either too challenging for direct model learning or suffer from misalignment between text and images. To overcome this issue, we introduce a novel pipeline that leverages GPT-4 and GPT-4V to generate relatively basic geometry problems with aligned text and images, facilitating model learning. We have produced a dataset of 4.9K geometry problems and combined it with 19K open-source data to form our GeoGPT4V dataset. Experimental results demonstrate that the GeoGPT4V dataset significantly improves the geometry performance of various models on the MathVista and MathVision benchmarks. The code is available at https://github.com/Lanyu0303/GeoGPT4V_Project △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.08796 [pdf, other]

Deep Exploration of Cross-Lingual Zero-Shot Generalization in Instruction Tuning

Authors: Janghoon Han, Changho Lee, Joongbo Shin, Stanley Jungkyu Choi, Honglak Lee, Kynghoon Bae

Abstract: Instruction tuning has emerged as a powerful technique, significantly boosting zero-shot performance on unseen tasks. While recent work has explored cross-lingual generalization by applying instruction tuning to multilingual models, previous studies have primarily focused on English, with a limited exploration of non-English tasks. For an in-depth exploration of cross-lingual generalization in ins… ▽ More Instruction tuning has emerged as a powerful technique, significantly boosting zero-shot performance on unseen tasks. While recent work has explored cross-lingual generalization by applying instruction tuning to multilingual models, previous studies have primarily focused on English, with a limited exploration of non-English tasks. For an in-depth exploration of cross-lingual generalization in instruction tuning, we perform instruction tuning individually for two distinct language meta-datasets. Subsequently, we assess the performance on unseen tasks in a language different from the one used for training. To facilitate this investigation, we introduce a novel non-English meta-dataset named "KORANI" (Korean Natural Instruction), comprising 51 Korean benchmarks. Moreover, we design cross-lingual templates to mitigate discrepancies in language and instruction-format of the template between training and inference within the cross-lingual setting. Our experiments reveal consistent improvements through cross-lingual generalization in both English and Korean, outperforming baseline by average scores of 20.7\% and 13.6\%, respectively. Remarkably, these enhancements are comparable to those achieved by monolingual instruction tuning and even surpass them in some tasks. The result underscores the significance of relevant data acquisition across languages over linguistic congruence with unseen tasks during instruction tuning. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Findings of ACL 2024 (Camera-ready), by Janghoon Han and Changho Lee, with equal contribution

arXiv:2406.03210 [pdf, other]

Text-like Encoding of Collaborative Information in Large Language Models for Recommendation

Authors: Yang Zhang, Keqin Bao, Ming Yan, Wenjie Wang, Fuli Feng, Xiangnan He

Abstract: When adapting Large Language Models for Recommendation (LLMRec), it is crucial to integrate collaborative information. Existing methods achieve this by learning collaborative embeddings in LLMs' latent space from scratch or by mapping from external models. However, they fail to represent the information in a text-like format, which may not align optimally with LLMs. To bridge this gap, we introduc… ▽ More When adapting Large Language Models for Recommendation (LLMRec), it is crucial to integrate collaborative information. Existing methods achieve this by learning collaborative embeddings in LLMs' latent space from scratch or by mapping from external models. However, they fail to represent the information in a text-like format, which may not align optimally with LLMs. To bridge this gap, we introduce BinLLM, a novel LLMRec method that seamlessly integrates collaborative information through text-like encoding. BinLLM converts collaborative embeddings from external models into binary sequences -- a specific text format that LLMs can understand and operate on directly, facilitating the direct usage of collaborative information in text-like format by LLMs. Additionally, BinLLM provides options to compress the binary sequence using dot-decimal notation to avoid excessively long lengths. Extensive experiments validate that BinLLM introduces collaborative information in a manner better aligned with LLMs, resulting in enhanced performance. We release our code at https://github.com/zyang1580/BinLLM. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Accepted by ACL 2024

ACM Class: H.3.3

arXiv:2405.06088 [pdf, other]

A Mixture of Experts Approach to 3D Human Motion Prediction

Authors: Edmund Shieh, Joshua Lee Franco, Kang Min Bae, Tej Lalvani

Abstract: This project addresses the challenge of human motion prediction, a critical area for applications such as au- tonomous vehicle movement detection. Previous works have emphasized the need for low inference times to provide real time performance for applications like these. Our primary objective is to critically evaluate existing model ar- chitectures, identifying their advantages and opportunities… ▽ More This project addresses the challenge of human motion prediction, a critical area for applications such as au- tonomous vehicle movement detection. Previous works have emphasized the need for low inference times to provide real time performance for applications like these. Our primary objective is to critically evaluate existing model ar- chitectures, identifying their advantages and opportunities for improvement by replicating the state-of-the-art (SOTA) Spatio-Temporal Transformer model as best as possible given computational con- straints. These models have surpassed the limitations of RNN-based models and have demonstrated the ability to generate plausible motion sequences over both short and long term horizons through the use of spatio-temporal rep- resentations. We also propose a novel architecture to ad- dress challenges of real time inference speed by incorpo- rating a Mixture of Experts (MoE) block within the Spatial- Temporal (ST) attention layer. The particular variation that is used is Soft MoE, a fully-differentiable sparse Transformer that has shown promising ability to enable larger model capacity at lower inference cost. We make out code publicly available at https://github.com/edshieh/motionprediction △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 16 pages, 6 figures

arXiv:2404.16418 [pdf, other]

Instruction Matters, a Simple yet Effective Task Selection Approach in Instruction Tuning for Specific Tasks

Authors: Changho Lee, Janghoon Han, Seonghyeon Ye, Stanley Jungkyu Choi, Honglak Lee, Kyunghoon Bae

Abstract: Instruction tuning has shown its ability to not only enhance zero-shot generalization across various tasks but also its effectiveness in improving the performance of specific tasks. A crucial aspect in instruction tuning for a particular task is a strategic selection of related tasks that offer meaningful supervision, thereby enhancing efficiency and preventing performance degradation from irrelev… ▽ More Instruction tuning has shown its ability to not only enhance zero-shot generalization across various tasks but also its effectiveness in improving the performance of specific tasks. A crucial aspect in instruction tuning for a particular task is a strategic selection of related tasks that offer meaningful supervision, thereby enhancing efficiency and preventing performance degradation from irrelevant tasks. Our research reveals that leveraging instruction information \textit{alone} enables the identification of pertinent tasks for instruction tuning. This approach is notably simpler compared to traditional methods that necessitate complex measurements of pairwise transferability between tasks or the creation of data samples for the target task. Furthermore, by additionally learning the unique instructional template style of the meta-dataset, we observe an improvement in task selection accuracy, which contributes to enhanced overall performance. Experimental results demonstrate that training on a small set of tasks, chosen solely based on the instructions, leads to substantial performance improvements on benchmarks like P3, Big-Bench, NIV2, and Big-Bench Hard. Significantly, these improvements exceed those achieved by prior task selection methods, highlighting the efficacy of our approach. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 21 pages, 6 figures, 16 tables

arXiv:2403.08978 [pdf, other]

AutoGuide: Automated Generation and Selection of State-Aware Guidelines for Large Language Model Agents

Authors: Yao Fu, Dong-Ki Kim, Jaekyeom Kim, Sungryull Sohn, Lajanugen Logeswaran, Kyunghoon Bae, Honglak Lee

Abstract: The primary limitation of large language models (LLMs) is their restricted understanding of the world. This poses significant difficulties for LLM-based agents, particularly in domains where pre-trained LLMs lack sufficient knowledge. In this paper, we introduce a novel framework, called AutoGuide, that bridges the knowledge gap in pre-trained LLMs by leveraging implicit knowledge in offline exper… ▽ More The primary limitation of large language models (LLMs) is their restricted understanding of the world. This poses significant difficulties for LLM-based agents, particularly in domains where pre-trained LLMs lack sufficient knowledge. In this paper, we introduce a novel framework, called AutoGuide, that bridges the knowledge gap in pre-trained LLMs by leveraging implicit knowledge in offline experiences. Specifically, AutoGuide effectively extracts knowledge embedded in offline data by extracting a set of state-aware guidelines. Importantly, each state-aware guideline is expressed in concise natural language and follows a conditional structure, clearly describing the state where it is applicable. As such, the resulting guidelines enable a principled way to provide helpful knowledge pertinent to an agent's current decision-making process. We show that our approach outperforms competitive LLM-based baselines by a large margin in sequential decision-making benchmarks. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2402.18240 [pdf, other]

Prospect Personalized Recommendation on Large Language Model-based Agent Platform

Authors: Jizhi Zhang, Keqin Bao, Wenjie Wang, Yang Zhang, Wentao Shi, Wanhong Xu, Fuli Feng, Tat-Seng Chua

Abstract: The new kind of Agent-oriented information system, exemplified by GPTs, urges us to inspect the information system infrastructure to support Agent-level information processing and to adapt to the characteristics of Large Language Model (LLM)-based Agents, such as interactivity. In this work, we envisage the prospect of the recommender system on LLM-based Agent platforms and introduce a novel recom… ▽ More The new kind of Agent-oriented information system, exemplified by GPTs, urges us to inspect the information system infrastructure to support Agent-level information processing and to adapt to the characteristics of Large Language Model (LLM)-based Agents, such as interactivity. In this work, we envisage the prospect of the recommender system on LLM-based Agent platforms and introduce a novel recommendation paradigm called Rec4Agentverse, comprised of Agent Items and Agent Recommender. Rec4Agentverse emphasizes the collaboration between Agent Items and Agent Recommender, thereby promoting personalized information services and enhancing the exchange of information beyond the traditional user-recommender feedback loop. Additionally, we prospect the evolution of Rec4Agentverse and conceptualize it into three stages based on the enhancement of the interaction and information exchange among Agent Items, Agent Recommender, and the user. A preliminary study involving several cases of Rec4Agentverse validates its significant potential for application. Lastly, we discuss potential issues and promising directions for future research. △ Less

Submitted 5 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.15215 [pdf, other]

Item-side Fairness of Large Language Model-based Recommendation System

Authors: Meng Jiang, Keqin Bao, Jizhi Zhang, Wenjie Wang, Zhengyi Yang, Fuli Feng, Xiangnan He

Abstract: Recommendation systems for Web content distribution intricately connect to the information access and exposure opportunities for vulnerable populations. The emergence of Large Language Models-based Recommendation System (LRS) may introduce additional societal challenges to recommendation systems due to the inherent biases in Large Language Models (LLMs). From the perspective of item-side fairness,… ▽ More Recommendation systems for Web content distribution intricately connect to the information access and exposure opportunities for vulnerable populations. The emergence of Large Language Models-based Recommendation System (LRS) may introduce additional societal challenges to recommendation systems due to the inherent biases in Large Language Models (LLMs). From the perspective of item-side fairness, there remains a lack of comprehensive investigation into the item-side fairness of LRS given the unique characteristics of LRS compared to conventional recommendation systems. To bridge this gap, this study examines the property of LRS with respect to item-side fairness and reveals the influencing factors of both historical users' interactions and inherent semantic biases of LLMs, shedding light on the need to extend conventional item-side fairness methods for LRS. Towards this goal, we develop a concise and effective framework called IFairLRS to enhance the item-side fairness of an LRS. IFairLRS covers the main stages of building an LRS with specifically adapted strategies to calibrate the recommendations of LRS. We utilize IFairLRS to fine-tune LLaMA, a representative LLM, on \textit{MovieLens} and \textit{Steam} datasets, and observe significant item-side fairness improvements. The code can be found in https://github.com/JiangM-C/IFairLRS.git. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: Accepted by the Proceedings of the ACM Web Conference 2024

arXiv:2401.01884 [pdf, other]

A rewriting-logic-with-SMT-based formal analysis and parameter synthesis framework for parametric time Petri nets

Authors: Jaime Arias, Kyungmin Bae, Carlos Olarte, Peter Csaba Ölveczky, Laure Petrucci

Abstract: This paper presents a concrete and a symbolic rewriting logic semantics for parametric time Petri nets with inhibitor arcs (PITPNs), a flexible model of timed systems where parameters are allowed in firing bounds. We prove that our semantics is bisimilar to the "standard" semantics of PITPNs. This allows us to use the rewriting logic tool Maude, combined with SMT solving, to provide sound and comp… ▽ More This paper presents a concrete and a symbolic rewriting logic semantics for parametric time Petri nets with inhibitor arcs (PITPNs), a flexible model of timed systems where parameters are allowed in firing bounds. We prove that our semantics is bisimilar to the "standard" semantics of PITPNs. This allows us to use the rewriting logic tool Maude, combined with SMT solving, to provide sound and complete formal analyses for PITPNs. We develop and implement a new general folding approach for symbolic reachability, so that Maude-with-SMT reachability analysis terminates whenever the parametric state-class graph of the PITPN is finite. Our work opens up the possibility of using the many formal analysis capabilities of Maude -- including full LTL model checking, analysis with user-defined analysis strategies, and even statistical model checking -- for such nets. We illustrate this by explaining how almost all formal analysis and parameter synthesis methods supported by the state-of-the-art PITPN tool Romeo can be performed using Maude with SMT. In addition, we also support analysis and parameter synthesis from parametric initial markings, as well as full LTL model checking and analysis with user-defined execution strategies. Experiments show that our methods outperform Romeo in many cases. △ Less

Submitted 5 April, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2303.08929

arXiv:2312.08843 [pdf, other]

Diffusion-C: Unveiling the Generative Challenges of Diffusion Models through Corrupted Data

Authors: Keywoong Bae, Suan Lee, Wookey Lee

Abstract: In our contemporary academic inquiry, we present "Diffusion-C," a foundational methodology to analyze the generative restrictions of Diffusion Models, particularly those akin to GANs, DDPM, and DDIM. By employing input visual data that has been subjected to a myriad of corruption modalities and intensities, we elucidate the performance characteristics of those Diffusion Models. The noise component… ▽ More In our contemporary academic inquiry, we present "Diffusion-C," a foundational methodology to analyze the generative restrictions of Diffusion Models, particularly those akin to GANs, DDPM, and DDIM. By employing input visual data that has been subjected to a myriad of corruption modalities and intensities, we elucidate the performance characteristics of those Diffusion Models. The noise component takes center stage in our analysis, hypothesized to be a pivotal element influencing the mechanics of deep learning systems. In our rigorous expedition utilizing Diffusion-C, we have discerned the following critical observations: (I) Within the milieu of generative models under the Diffusion taxonomy, DDPM emerges as a paragon, consistently exhibiting superior performance metrics. (II) Within the vast spectrum of corruption frameworks, the fog and fractal corruptions notably undermine the functional robustness of both DDPM and DDIM. (III) The vulnerability of Diffusion Models to these particular corruptions is significantly influenced by topological and statistical similarities, particularly concerning the alignment between mean and variance. This scholarly work highlights Diffusion-C's core understandings regarding the impacts of various corruptions, setting the stage for future research endeavors in the realm of generative models. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 11 pages

arXiv:2312.00826 [pdf, other]

DEVIAS: Learning Disentangled Video Representations of Action and Scene for Holistic Video Understanding

Authors: Kyungho Bae, Geo Ahn, Youngrae Kim, Jinwoo Choi

Abstract: When watching a video, humans can naturally extract human actions from the surrounding scene context, even when action-scene combinations are unusual. However, unlike humans, video action recognition models often learn scene-biased action representations from the spurious correlation in training data, leading to poor performance in out-of-context scenarios. While scene-debiased models achieve impr… ▽ More When watching a video, humans can naturally extract human actions from the surrounding scene context, even when action-scene combinations are unusual. However, unlike humans, video action recognition models often learn scene-biased action representations from the spurious correlation in training data, leading to poor performance in out-of-context scenarios. While scene-debiased models achieve improved performance in out-of-context scenarios, they often overlook valuable scene information in the data. Addressing this challenge, we propose Disentangled VIdeo representations of Action and Scene (DEVIAS), which aims to achieve holistic video understanding. Disentangled action and scene representations with our method could provide flexibility to adjust the emphasis on action or scene information depending on downstream task and dataset characteristics. Disentangled action and scene representations could be beneficial for both in-context and out-of-context video understanding. To this end, we employ slot attention to learn disentangled action and scene representations with a single model, along with auxiliary tasks that further guide slot attention. We validate the proposed method on both in-context datasets: UCF-101 and Kinetics-400, and out-of-context datasets: SCUBA and HAT. Our proposed method shows favorable performance across different datasets compared to the baselines, demonstrating its effectiveness in diverse video understanding scenarios. △ Less

Submitted 30 November, 2023; originally announced December 2023.

Comments: 20 pages, 8 figures, 17 tables

arXiv:2311.12467 [pdf, other]

GLAD: Global-Local View Alignment and Background Debiasing for Unsupervised Video Domain Adaptation with Large Domain Gap

Authors: Hyogun Lee, Kyungho Bae, Seong Jong Ha, Yumin Ko, Gyeong-Moon Park, Jinwoo Choi

Abstract: In this work, we tackle the challenging problem of unsupervised video domain adaptation (UVDA) for action recognition. We specifically focus on scenarios with a substantial domain gap, in contrast to existing works primarily deal with small domain gaps between labeled source domains and unlabeled target domains. To establish a more realistic setting, we introduce a novel UVDA scenario, denoted as… ▽ More In this work, we tackle the challenging problem of unsupervised video domain adaptation (UVDA) for action recognition. We specifically focus on scenarios with a substantial domain gap, in contrast to existing works primarily deal with small domain gaps between labeled source domains and unlabeled target domains. To establish a more realistic setting, we introduce a novel UVDA scenario, denoted as Kinetics->BABEL, with a more considerable domain gap in terms of both temporal dynamics and background shifts. To tackle the temporal shift, i.e., action duration difference between the source and target domains, we propose a global-local view alignment approach. To mitigate the background shift, we propose to learn temporal order sensitive representations by temporal order learning and background invariant representations by background augmentation. We empirically validate that the proposed method shows significant improvement over the existing methods on the Kinetics->BABEL dataset with a large domain gap. The code is available at https://github.com/KHUVLL/GLAD. △ Less

Submitted 22 November, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

Comments: This is an accepted WACV 2024 paper. Our code is available at https://github.com/KHUVLL/GLAD

arXiv:2310.19488 [pdf, other]

CoLLM: Integrating Collaborative Embeddings into Large Language Models for Recommendation

Authors: Yang Zhang, Fuli Feng, Jizhi Zhang, Keqin Bao, Qifan Wang, Xiangnan He

Abstract: Leveraging Large Language Models as Recommenders (LLMRec) has gained significant attention and introduced fresh perspectives in user preference modeling. Existing LLMRec approaches prioritize text semantics, usually neglecting the valuable collaborative information from user-item interactions in recommendations. While these text-emphasizing approaches excel in cold-start scenarios, they may yield… ▽ More Leveraging Large Language Models as Recommenders (LLMRec) has gained significant attention and introduced fresh perspectives in user preference modeling. Existing LLMRec approaches prioritize text semantics, usually neglecting the valuable collaborative information from user-item interactions in recommendations. While these text-emphasizing approaches excel in cold-start scenarios, they may yield sub-optimal performance in warm-start situations. In pursuit of superior recommendations for both cold and warm start scenarios, we introduce CoLLM, an innovative LLMRec methodology that seamlessly incorporates collaborative information into LLMs for recommendation. CoLLM captures collaborative information through an external traditional model and maps it to the input token embedding space of LLM, forming collaborative embeddings for LLM usage. Through this external integration of collaborative information, CoLLM ensures effective modeling of collaborative information without modifying the LLM itself, providing the flexibility to employ various collaborative information modeling techniques. Extensive experiments validate that CoLLM adeptly integrates collaborative information into LLMs, resulting in enhanced recommendation performance. We release the code and data at https://github.com/zyang1580/CoLLM. △ Less

Submitted 30 October, 2023; originally announced October 2023.

arXiv:2309.01961 [pdf, other]

NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

Authors: Taehoon Kim, Pyunghwan Ahn, Sangyun Kim, Sihaeng Lee, Mark Marsden, Alessandra Sala, Seung Hwan Kim, Bohyung Han, Kyoung Mu Lee, Honglak Lee, Kyounghoon Bae, Xiangyu Wu, Yi Gao, Hailiang Zhang, Yang Yang, Weili Guo, Jianfeng Lu, Youngtaek Oh, Jae Won Cho, Dong-jin Kim, In So Kweon, Junmo Kim, Wooyoung Kang, Won Young Jhoo, Byungseok Roh , et al. (17 additional authors not shown)

Abstract: In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested… ▽ More In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested using a new evaluation dataset that includes a large variety of visual concepts from many domains. There was no specific training data provided for the challenge, and therefore the challenge entries were required to adapt to new types of image descriptions that had not been seen during training. This report includes information on the newly proposed NICE dataset, evaluation methods, challenge results, and technical details of top-ranking entries. We expect that the outcomes of the challenge will contribute to the improvement of AI models on various vision-language tasks. △ Less

Submitted 10 September, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: Tech report, project page https://nice.lgresearch.ai/

arXiv:2308.08434 [pdf, other]

A Bi-Step Grounding Paradigm for Large Language Models in Recommendation Systems

Authors: Keqin Bao, Jizhi Zhang, Wenjie Wang, Yang Zhang, Zhengyi Yang, Yancheng Luo, Chong Chen, Fuli Feng, Qi Tian

Abstract: As the focus on Large Language Models (LLMs) in the field of recommendation intensifies, the optimization of LLMs for recommendation purposes (referred to as LLM4Rec) assumes a crucial role in augmenting their effectiveness in providing recommendations. However, existing approaches for LLM4Rec often assess performance using restricted sets of candidates, which may not accurately reflect the models… ▽ More As the focus on Large Language Models (LLMs) in the field of recommendation intensifies, the optimization of LLMs for recommendation purposes (referred to as LLM4Rec) assumes a crucial role in augmenting their effectiveness in providing recommendations. However, existing approaches for LLM4Rec often assess performance using restricted sets of candidates, which may not accurately reflect the models' overall ranking capabilities. In this paper, our objective is to investigate the comprehensive ranking capacity of LLMs and propose a two-step grounding framework known as BIGRec (Bi-step Grounding Paradigm for Recommendation). It initially grounds LLMs to the recommendation space by fine-tuning them to generate meaningful tokens for items and subsequently identifies appropriate actual items that correspond to the generated tokens. By conducting extensive experiments on two datasets, we substantiate the superior performance, capacity for handling few-shot scenarios, and versatility across multiple domains exhibited by BIGRec. Furthermore, we observe that the marginal benefits derived from increasing the quantity of training samples are modest for BIGRec, implying that LLMs possess the limited capability to assimilate statistical information, such as popularity and collaborative filtering, due to their robust semantic priors. These findings also underline the efficacy of integrating diverse statistical information into the LLM4Rec framework, thereby pointing towards a potential avenue for future research. Our code and data are available at https://github.com/SAI990323/Grounding4Rec. △ Less

Submitted 31 December, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

Comments: 17 pages

arXiv:2305.16907 [pdf, other]

doi 10.1016/j.compind.2023.103982

CyPhERS: A Cyber-Physical Event Reasoning System providing real-time situational awareness for attack and fault response

Authors: Nils Müller, Kaibin Bao, Jörg Matthes, Kai Heussen

Abstract: Cyber-physical systems (CPSs) constitute the backbone of critical infrastructures such as power grids or water distribution networks. Operating failures in these systems can cause serious risks for society. To avoid or minimize downtime, operators require real-time awareness about critical incidents. However, online event identification in CPSs is challenged by the complex interdependency of numer… ▽ More Cyber-physical systems (CPSs) constitute the backbone of critical infrastructures such as power grids or water distribution networks. Operating failures in these systems can cause serious risks for society. To avoid or minimize downtime, operators require real-time awareness about critical incidents. However, online event identification in CPSs is challenged by the complex interdependency of numerous physical and digital components, requiring to take cyber attacks and physical failures equally into account. The online event identification problem is further complicated through the lack of historical observations of critical but rare events, and the continuous evolution of cyber attack strategies. This work introduces and demonstrates CyPhERS, a Cyber-Physical Event Reasoning System. CyPhERS provides real-time information pertaining the occurrence, location, physical impact, and root cause of potentially critical events in CPSs, without the need for historical event observations. Key novelty of CyPhERS is the capability to generate informative and interpretable event signatures of known and unknown types of both cyber attacks and physical failures. The concept is evaluated and benchmarked on a demonstration case that comprises a multitude of attack and fault events targeting various components of a CPS. The results demonstrate that the event signatures provide relevant and inferable information on both known and unknown event types. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: Article submitted to Computers in Industry

arXiv:2305.16713 [pdf, other]

ReConPatch : Contrastive Patch Representation Learning for Industrial Anomaly Detection

Authors: Jeeho Hyun, Sangyun Kim, Giyoung Jeon, Seung Hwan Kim, Kyunghoon Bae, Byung Jun Kang

Abstract: Anomaly detection is crucial to the advanced identification of product defects such as incorrect parts, misaligned components, and damages in industrial manufacturing. Due to the rare observations and unknown types of defects, anomaly detection is considered to be challenging in machine learning. To overcome this difficulty, recent approaches utilize the common visual representations pre-trained f… ▽ More Anomaly detection is crucial to the advanced identification of product defects such as incorrect parts, misaligned components, and damages in industrial manufacturing. Due to the rare observations and unknown types of defects, anomaly detection is considered to be challenging in machine learning. To overcome this difficulty, recent approaches utilize the common visual representations pre-trained from natural image datasets and distill the relevant features. However, existing approaches still have the discrepancy between the pre-trained feature and the target data, or require the input augmentation which should be carefully designed, particularly for the industrial dataset. In this paper, we introduce ReConPatch, which constructs discriminative features for anomaly detection by training a linear modulation of patch features extracted from the pre-trained model. ReConPatch employs contrastive representation learning to collect and distribute features in a way that produces a target-oriented and easily separable representation. To address the absence of labeled pairs for the contrastive learning, we utilize two similarity measures between data representations, pairwise and contextual similarities, as pseudo-labels. Our method achieves the state-of-the-art anomaly detection performance (99.72%) for the widely used and challenging MVTec AD dataset. Additionally, we achieved a state-of-the-art anomaly detection performance (95.8%) for the BTAD dataset. △ Less

Submitted 10 January, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: Accepted on WACV 2024

arXiv:2305.07609 [pdf, other]

doi 10.1145/3604915.3608860

Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation

Authors: Jizhi Zhang, Keqin Bao, Yang Zhang, Wenjie Wang, Fuli Feng, Xiangnan He

Abstract: The remarkable achievements of Large Language Models (LLMs) have led to the emergence of a novel recommendation paradigm -- Recommendation via LLM (RecLLM). Nevertheless, it is important to note that LLMs may contain social prejudices, and therefore, the fairness of recommendations made by RecLLM requires further investigation. To avoid the potential risks of RecLLM, it is imperative to evaluate t… ▽ More The remarkable achievements of Large Language Models (LLMs) have led to the emergence of a novel recommendation paradigm -- Recommendation via LLM (RecLLM). Nevertheless, it is important to note that LLMs may contain social prejudices, and therefore, the fairness of recommendations made by RecLLM requires further investigation. To avoid the potential risks of RecLLM, it is imperative to evaluate the fairness of RecLLM with respect to various sensitive attributes on the user side. Due to the differences between the RecLLM paradigm and the traditional recommendation paradigm, it is problematic to directly use the fairness benchmark of traditional recommendation. To address the dilemma, we propose a novel benchmark called Fairness of Recommendation via LLM (FaiRLLM). This benchmark comprises carefully crafted metrics and a dataset that accounts for eight sensitive attributes1 in two recommendation scenarios: music and movies. By utilizing our FaiRLLM benchmark, we conducted an evaluation of ChatGPT and discovered that it still exhibits unfairness to some sensitive attributes when generating recommendations. Our code and dataset can be found at https://github.com/jizhi-zhang/FaiRLLM. △ Less

Submitted 17 October, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

Comments: Accepted by Recsys 2023 (Short)

arXiv:2305.00447 [pdf, other]

doi 10.1145/3604915.3608857

TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation

Authors: Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, Xiangnan He

Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across diverse domains, thereby prompting researchers to explore their potential for use in recommendation systems. Initial attempts have leveraged the exceptional capabilities of LLMs, such as rich knowledge and strong generalization through In-context Learning, which involves phrasing the recommendation task as prompts. Nevert… ▽ More Large Language Models (LLMs) have demonstrated remarkable performance across diverse domains, thereby prompting researchers to explore their potential for use in recommendation systems. Initial attempts have leveraged the exceptional capabilities of LLMs, such as rich knowledge and strong generalization through In-context Learning, which involves phrasing the recommendation task as prompts. Nevertheless, the performance of LLMs in recommendation tasks remains suboptimal due to a substantial disparity between the training tasks for LLMs and recommendation tasks, as well as inadequate recommendation data during pre-training. To bridge the gap, we consider building a Large Recommendation Language Model by tunning LLMs with recommendation data. To this end, we propose an efficient and effective Tuning framework for Aligning LLMs with Recommendation, namely TALLRec. We have demonstrated that the proposed TALLRec framework can significantly enhance the recommendation capabilities of LLMs in the movie and book domains, even with a limited dataset of fewer than 100 samples. Additionally, the proposed framework is highly efficient and can be executed on a single RTX 3090 with LLaMA-7B. Furthermore, the fine-tuned LLM exhibits robust cross-domain generalization. Our code and data are available at https://github.com/SAI990323/TALLRec. △ Less

Submitted 17 October, 2023; v1 submitted 30 April, 2023; originally announced May 2023.

Comments: RecSys '23: Proceedings of the 17th ACM Conference on Recommender Systems; September 2023 Pages; 1007-1014

arXiv:2303.08929 [pdf, other]

Symbolic Analysis and Parameter Synthesis for Time Petri Nets Using Maude and SMT Solving

Authors: Jaime Arias, Kyungmin Bae, Carlos Olarte, Peter Csaba Ölveczky, Laure Petrucci, Fredrik Rømming

Abstract: Parametric time Petri nets with inhibitor arcs (PITPNs) support flexibility for timed systems by allowing parameters in firing bounds. In this paper we present and prove correct a concrete and a symbolic rewriting logic semantics for PITPNs. We show how this allows us to use Maude combined with SMT solving to provide sound and complete formal analyses for PITPNs. We develop a new general folding a… ▽ More Parametric time Petri nets with inhibitor arcs (PITPNs) support flexibility for timed systems by allowing parameters in firing bounds. In this paper we present and prove correct a concrete and a symbolic rewriting logic semantics for PITPNs. We show how this allows us to use Maude combined with SMT solving to provide sound and complete formal analyses for PITPNs. We develop a new general folding approach for symbolic reachability that terminates whenever the parametric state-class graph of the PITPN is finite. We explain how almost all formal analysis and parameter synthesis supported by the state-of-the-art PITPN tool Roméo can be done in Maude with SMT. In addition, we also support analysis and parameter synthesis from parametric initial markings, as well as full LTL model checking and analysis with user-defined execution strategies. Experiments on three benchmarks show that our methods outperform Roméo in many cases. △ Less

Submitted 15 March, 2023; originally announced March 2023.

arXiv:2302.08975 [pdf, other]

Towards Fine-Grained Information: Identifying the Type and Location of Translation Errors

Authors: Keqin Bao, Yu Wan, Dayiheng Liu, Baosong Yang, Wenqiang Lei, Xiangnan He, Derek F. Wong, Jun Xie

Abstract: Fine-grained information on translation errors is helpful for the translation evaluation community. Existing approaches can not synchronously consider error position and type, failing to integrate the error information of both. In this paper, we propose Fine-Grained Translation Error Detection (FG-TED) task, aiming at identifying both the position and the type of translation errors on given source… ▽ More Fine-grained information on translation errors is helpful for the translation evaluation community. Existing approaches can not synchronously consider error position and type, failing to integrate the error information of both. In this paper, we propose Fine-Grained Translation Error Detection (FG-TED) task, aiming at identifying both the position and the type of translation errors on given source-hypothesis sentence pairs. Besides, we build an FG-TED model to predict the \textbf{addition} and \textbf{omission} errors -- two typical translation accuracy errors. First, we use a word-level classification paradigm to form our model and use the shortcut learning reduction to relieve the influence of monolingual features. Besides, we construct synthetic datasets for model training, and relieve the disagreement of data labeling in authoritative datasets, making the experimental benchmark concordant. Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results on the restored dataset. Our model also delivers more reliable predictions on low-resource and transfer scenarios than existing baselines. The related datasets and the source code will be released in the future. △ Less

Submitted 17 February, 2023; originally announced February 2023.

arXiv:2212.07050 [pdf, other]

Significantly Improving Zero-Shot X-ray Pathology Classification via Fine-tuning Pre-trained Image-Text Encoders

Authors: Jongseong Jang, Daeun Kyung, Seung Hwan Kim, Honglak Lee, Kyunghoon Bae, Edward Choi

Abstract: Deep neural networks have been successfully adopted to diverse domains including pathology classification based on medical images. However, large-scale and high-quality data to train powerful neural networks are rare in the medical domain as the labeling must be done by qualified experts. Researchers recently tackled this problem with some success by taking advantage of models pre-trained on large… ▽ More Deep neural networks have been successfully adopted to diverse domains including pathology classification based on medical images. However, large-scale and high-quality data to train powerful neural networks are rare in the medical domain as the labeling must be done by qualified experts. Researchers recently tackled this problem with some success by taking advantage of models pre-trained on large-scale general domain data. Specifically, researchers took contrastive image-text encoders (e.g., CLIP) and fine-tuned it with chest X-ray images and paired reports to perform zero-shot pathology classification, thus completely removing the need for pathology-annotated images to train a classification model. Existing studies, however, fine-tuned the pre-trained model with the same contrastive learning objective, and failed to exploit the multi-labeled nature of medical image-report pairs. In this paper, we propose a new fine-tuning strategy based on sentence sampling and positive pair loss relaxation for improving the downstream zero-shot pathology classification performance, which can be applied to any pre-trained contrastive image-text encoders. Our method consistently showed dramatically improved zero-shot pathology classification performance on four different chest X-ray datasets and 3 different pre-trained models (5.77% average AUROC increase). In particular, fine-tuning CLIP with our method showed much comparable or marginally outperformed to board-certified radiologists (0.619 vs 0.625 in F1 score and 0.530 vs 0.544 in MCC) in zero-shot classification of five prominent diseases from the CheXpert dataset. △ Less

Submitted 16 March, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

arXiv:2210.10049 [pdf, other]

Alibaba-Translate China's Submission for WMT 2022 Quality Estimation Shared Task

Authors: Keqin Bao, Yu Wan, Dayiheng Liu, Baosong Yang, Wenqiang Lei, Xiangnan He, Derek F. Wong, Jun Xie

Abstract: In this paper, we present our submission to the sentence-level MQM benchmark at Quality Estimation Shared Task, named UniTE (Unified Translation Evaluation). Specifically, our systems employ the framework of UniTE, which combined three types of input formats during training with a pre-trained language model. First, we apply the pseudo-labeled data examples for the continuously pre-training phase.… ▽ More In this paper, we present our submission to the sentence-level MQM benchmark at Quality Estimation Shared Task, named UniTE (Unified Translation Evaluation). Specifically, our systems employ the framework of UniTE, which combined three types of input formats during training with a pre-trained language model. First, we apply the pseudo-labeled data examples for the continuously pre-training phase. Notably, to reduce the gap between pre-training and fine-tuning, we use data pruning and a ranking-based score normalization strategy. For the fine-tuning phase, we use both Direct Assessment (DA) and Multidimensional Quality Metrics (MQM) data from past years' WMT competitions. Finally, we collect the source-only evaluation results, and ensemble the predictions generated by two UniTE models, whose backbones are XLM-R and InfoXLM, respectively. Results show that our models reach 1st overall ranking in the Multilingual and English-Russian settings, and 2nd overall ranking in English-German and Chinese-English settings, showing relatively strong performances in this year's quality estimation competition. △ Less

Submitted 17 February, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

Comments: WMT 2022 QE Shared Task. arXiv admin note: text overlap with arXiv:2210.09683

arXiv:2210.09683 [pdf, other]

Alibaba-Translate China's Submission for WMT 2022 Metrics Shared Task

Authors: Yu Wan, Keqin Bao, Dayiheng Liu, Baosong Yang, Derek F. Wong, Lidia S. Chao, Wenqiang Lei, Jun Xie

Abstract: In this report, we present our submission to the WMT 2022 Metrics Shared Task. We build our system based on the core idea of UNITE (Unified Translation Evaluation), which unifies source-only, reference-only, and source-reference-combined evaluation scenarios into one single model. Specifically, during the model pre-training phase, we first apply the pseudo-labeled data examples to continuously pre… ▽ More In this report, we present our submission to the WMT 2022 Metrics Shared Task. We build our system based on the core idea of UNITE (Unified Translation Evaluation), which unifies source-only, reference-only, and source-reference-combined evaluation scenarios into one single model. Specifically, during the model pre-training phase, we first apply the pseudo-labeled data examples to continuously pre-train UNITE. Notably, to reduce the gap between pre-training and fine-tuning, we use data cropping and a ranking-based score normalization strategy. During the fine-tuning phase, we use both Direct Assessment (DA) and Multidimensional Quality Metrics (MQM) data from past years' WMT competitions. Specially, we collect the results from models with different pre-trained language model backbones, and use different ensembling strategies for involved translation directions. △ Less

Submitted 17 February, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

Comments: WMT 2022 Metrics Shared Task

arXiv:2209.01009 [pdf, other]

Physics-informed MTA-UNet: Prediction of Thermal Stress and Thermal Deformation of Satellites

Authors: Zeyu Cao, Wen Yao, Wei Peng, Xiaoya Zhang, Kairui Bao

Abstract: The rapid analysis of thermal stress and deformation plays a pivotal role in the thermal control measures and optimization of the structural design of satellites. For achieving real-time thermal stress and thermal deformation analysis of satellite motherboards, this paper proposes a novel Multi-Task Attention UNet (MTA-UNet) neural network which combines the advantages of both Multi-Task Learning… ▽ More The rapid analysis of thermal stress and deformation plays a pivotal role in the thermal control measures and optimization of the structural design of satellites. For achieving real-time thermal stress and thermal deformation analysis of satellite motherboards, this paper proposes a novel Multi-Task Attention UNet (MTA-UNet) neural network which combines the advantages of both Multi-Task Learning (MTL) and U-Net with attention mechanism. Besides, a physics-informed strategy is used in the training process, where partial differential equations (PDEs) are integrated into the loss functions as residual terms. Finally, an uncertainty-based loss balancing approach is applied to weight different loss functions of multiple training tasks. Experimental results show that the proposed MTA-UNet effectively improves the prediction accuracy of multiple physics tasks compared with Single-Task Learning (STL) models. In addition, the physics-informed method brings less error in the prediction of each task, especially on small data sets. The code can be downloaded at: \url{https://github.com/KomorebiTso/MTA-UNet}. △ Less

Submitted 5 September, 2022; v1 submitted 1 September, 2022; originally announced September 2022.

arXiv:2203.14507 [pdf, other]

ANNA: Enhanced Language Representation for Question Answering

Authors: Changwook Jun, Hansol Jang, Myoseop Sim, Hyun Kim, Jooyoung Choi, Kyungkoo Min, Kyunghoon Bae

Abstract: Pre-trained language models have brought significant improvements in performance in a variety of natural language processing tasks. Most existing models performing state-of-the-art results have shown their approaches in the separate perspectives of data processing, pre-training tasks, neural network modeling, or fine-tuning. In this paper, we demonstrate how the approaches affect performance indiv… ▽ More Pre-trained language models have brought significant improvements in performance in a variety of natural language processing tasks. Most existing models performing state-of-the-art results have shown their approaches in the separate perspectives of data processing, pre-training tasks, neural network modeling, or fine-tuning. In this paper, we demonstrate how the approaches affect performance individually, and that the language model performs the best results on a specific question answering task when those approaches are jointly considered in pre-training models. In particular, we propose an extended pre-training task, and a new neighbor-aware mechanism that attends neighboring tokens more to capture the richness of context for pre-training language modeling. Our best model achieves new state-of-the-art results of 95.7\% F1 and 90.6\% EM on SQuAD 1.1 and also outperforms existing pre-trained language models such as RoBERTa, ALBERT, ELECTRA, and XLNet on the SQuAD 2.0 benchmark. △ Less

Submitted 3 April, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

Comments: 11 pages, 3 figures

Journal ref: ACL 2022 Workshop RepL4NLP Submission

arXiv:2203.08150 [pdf, other]

A physics and data co-driven surrogate modeling approach for temperature field prediction on irregular geometric domain

Authors: Kairui Bao, Wen Yao, Xiaoya Zhang, Wei Peng, Yu Li

Abstract: In the whole aircraft structural optimization loop, thermal analysis plays a very important role. But it faces a severe computational burden when directly applying traditional numerical analysis tools, especially when each optimization involves repetitive parameter modification and thermal analysis followed. Recently, with the fast development of deep learning, several Convolutional Neural Network… ▽ More In the whole aircraft structural optimization loop, thermal analysis plays a very important role. But it faces a severe computational burden when directly applying traditional numerical analysis tools, especially when each optimization involves repetitive parameter modification and thermal analysis followed. Recently, with the fast development of deep learning, several Convolutional Neural Network (CNN) surrogate models have been introduced to overcome this obstacle. However, for temperature field prediction on irregular geometric domains (TFP-IGD), CNN can hardly be competent since most of them stem from processing for regular images. To alleviate this difficulty, we propose a novel physics and data co-driven surrogate modeling method. First, after adapting the Bezier curve in geometric parameterization, a body-fitted coordinate mapping is introduced to generate coordinate transforms between the irregular physical plane and regular computational plane. Second, a physics-driven CNN surrogate with partial differential equation (PDE) residuals as a loss function is utilized for fast meshing (meshing surrogate); then, we present a data-driven surrogate model based on the multi-level reduced-order method, aiming to learn solutions of temperature field in the above regular computational plane (thermal surrogate). Finally, combining the grid position information provided by the meshing surrogate with the scalar temperature field information provided by the thermal surrogate (combined model), we reach an end-to-end surrogate model from geometric parameters to temperature field prediction on an irregular geometric domain. Numerical results demonstrate that our method can significantly improve accuracy prediction on a smaller dataset while reducing the training time when compared with other CNN methods. △ Less

Submitted 15 March, 2022; originally announced March 2022.

arXiv:2111.11133 [pdf, other]

L-Verse: Bidirectional Generation Between Image and Text

Authors: Taehoon Kim, Gwangmo Song, Sihaeng Lee, Sangyun Kim, Yewon Seo, Soonyoung Lee, Seung Hwan Kim, Honglak Lee, Kyunghoon Bae

Abstract: Far beyond learning long-range interactions of natural language, transformers are becoming the de-facto standard for many vision tasks with their power and scalability. Especially with cross-modal tasks between image and text, vector quantized variational autoencoders (VQ-VAEs) are widely used to make a raw RGB image into a sequence of feature vectors. To better leverage the correlation between im… ▽ More Far beyond learning long-range interactions of natural language, transformers are becoming the de-facto standard for many vision tasks with their power and scalability. Especially with cross-modal tasks between image and text, vector quantized variational autoencoders (VQ-VAEs) are widely used to make a raw RGB image into a sequence of feature vectors. To better leverage the correlation between image and text, we propose L-Verse, a novel architecture consisting of feature-augmented variational autoencoder (AugVAE) and bidirectional auto-regressive transformer (BiART) for image-to-text and text-to-image generation. Our AugVAE shows the state-of-the-art reconstruction performance on ImageNet1K validation set, along with the robustness to unseen images in the wild. Unlike other models, BiART can distinguish between image (or text) as a conditional reference and a generation target. L-Verse can be directly used for image-to-text or text-to-image generation without any finetuning or extra object detection framework. In quantitative and qualitative experiments, L-Verse shows impressive results against previous methods in both image-to-text and text-to-image generation on MS-COCO Captions. We furthermore assess the scalability of L-Verse architecture on Conceptual Captions and present the initial result of bidirectional vision-language representation learning on general domain. △ Less

Submitted 6 April, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

Comments: Accepted to CVPR 2022 as Oral Presentation (18 pages, 14 figures, 4 tables)

arXiv:2104.00169 [pdf, other]

Improved and efficient inter-vehicle distance estimation using road gradients of both ego and target vehicles

Authors: Muhyun Back, Jinkyu Lee, Kyuho Bae, Sung Soo Hwang, Il Yong Chun

Abstract: In advanced driver assistant systems and autonomous driving, it is crucial to estimate distances between an ego vehicle and target vehicles. Existing inter-vehicle distance estimation methods assume that the ego and target vehicles drive on a same ground plane. In practical driving environments, however, they may drive on different ground planes. This paper proposes an inter-vehicle distance estim… ▽ More In advanced driver assistant systems and autonomous driving, it is crucial to estimate distances between an ego vehicle and target vehicles. Existing inter-vehicle distance estimation methods assume that the ego and target vehicles drive on a same ground plane. In practical driving environments, however, they may drive on different ground planes. This paper proposes an inter-vehicle distance estimation framework that can consider slope changes of a road forward, by estimating road gradients of \emph{both} ego vehicle and target vehicles and using a 2D object detection deep net. Numerical experiments demonstrate that the proposed method significantly improves the distance estimation accuracy and time complexity, compared to deep learning-based depth estimation methods. △ Less

Submitted 31 March, 2021; originally announced April 2021.

Comments: 5 pages, 3 figures, 2 tables, submitted to IEEE ICAS 2021

arXiv:2010.00672 [pdf, other]

Explaining Convolutional Neural Networks through Attribution-Based Input Sampling and Block-Wise Feature Aggregation

Authors: Sam Sattarzadeh, Mahesh Sudhakar, Anthony Lem, Shervin Mehryar, K. N. Plataniotis, Jongseong Jang, Hyunwoo Kim, Yeonjeong Jeong, Sangmin Lee, Kyunghoon Bae

Abstract: As an emerging field in Machine Learning, Explainable AI (XAI) has been offering remarkable performance in interpreting the decisions made by Convolutional Neural Networks (CNNs). To achieve visual explanations for CNNs, methods based on class activation mapping and randomized input sampling have gained great popularity. However, the attribution methods based on these techniques provide lower reso… ▽ More As an emerging field in Machine Learning, Explainable AI (XAI) has been offering remarkable performance in interpreting the decisions made by Convolutional Neural Networks (CNNs). To achieve visual explanations for CNNs, methods based on class activation mapping and randomized input sampling have gained great popularity. However, the attribution methods based on these techniques provide lower resolution and blurry explanation maps that limit their explanation power. To circumvent this issue, visualization based on various layers is sought. In this work, we collect visualization maps from multiple layers of the model based on an attribution-based input sampling technique and aggregate them to reach a fine-grained and complete explanation. We also propose a layer selection strategy that applies to the whole family of CNN-based models, based on which our extraction framework is applied to visualize the last layers of each convolutional block of the model. Moreover, we perform an empirical analysis of the efficacy of derived lower-level information to enhance the represented attributions. Comprehensive experiments conducted on shallow and deep models trained on natural and industrial datasets, using both ground-truth and model-truth based evaluation metrics validate our proposed algorithm by meeting or outperforming the state-of-the-art methods in terms of explanation ability and visual quality, demonstrating that our method shows stability regardless of the size of objects or instances to be explained. △ Less

Submitted 24 December, 2020; v1 submitted 1 October, 2020; originally announced October 2020.

Comments: 9 pages, 9 figures, Accepted at the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

arXiv:1912.10687 [pdf, other]

5D Light Field Synthesis from a Monocular Video

Authors: Kyuho Bae, Andre Ivan, Hajime Nagahara, In Kyu Park

Abstract: Commercially available light field cameras have difficulty in capturing 5D (4D + time) light field videos. They can only capture still light filed images or are excessively expensive for normal users to capture the light field video. To tackle this problem, we propose a deep learning-based method for synthesizing a light field video from a monocular video. We propose a new synthetic light field vi… ▽ More Commercially available light field cameras have difficulty in capturing 5D (4D + time) light field videos. They can only capture still light filed images or are excessively expensive for normal users to capture the light field video. To tackle this problem, we propose a deep learning-based method for synthesizing a light field video from a monocular video. We propose a new synthetic light field video dataset that renders photorealistic scenes using UnrealCV rendering engine because no light field dataset is available. The proposed deep learning framework synthesizes the light field video with a full set (9$\times$9) of sub-aperture images from a normal monocular video. The proposed network consists of three sub-networks, namely, feature extraction, 5D light field video synthesis, and temporal consistency refinement. Experimental results show that our model can successfully synthesize the light field video for synthetic and actual scenes and outperforms the previous frame-by-frame methods quantitatively and qualitatively. The synthesized light field can be used for conventional light field applications, namely, depth estimation, viewpoint change, and refocusing. △ Less

Submitted 23 December, 2019; originally announced December 2019.

arXiv:1911.00289 [pdf, other]

Does Adam optimizer keep close to the optimal point?

Authors: Kiwook Bae, Heechang Ryu, Hayong Shin

Abstract: The adaptive optimizer for training neural networks has continually evolved to overcome the limitations of the previously proposed adaptive methods. Recent studies have found the rare counterexamples that Adam cannot converge to the optimal point. Those counterexamples reveal the distortion of Adam due to a small second momentum from a small gradient. Unlike previous studies, we show Adam cannot k… ▽ More The adaptive optimizer for training neural networks has continually evolved to overcome the limitations of the previously proposed adaptive methods. Recent studies have found the rare counterexamples that Adam cannot converge to the optimal point. Those counterexamples reveal the distortion of Adam due to a small second momentum from a small gradient. Unlike previous studies, we show Adam cannot keep closer to the optimal point for not only the counterexamples but also a general convex region when the effective learning rate exceeds the certain bound. Subsequently, we propose an algorithm that overcomes Adam's limitation and ensures that it can reach and stay at the optimal point region. △ Less

Submitted 1 November, 2019; originally announced November 2019.

Comments: Accepted as a workshop paper at the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

arXiv:1910.06059 [pdf, other]

The Open Porous Media Flow Reservoir Simulator

Authors: Atgeirr Flø Rasmussen, Tor Harald Sandve, Kai Bao, Andreas Lauser, Joakim Hove, Bård Skaflestad, Robert Klöfkorn, Markus Blatt, Alf Birger Rustad, Ove Sævareid, Knut-Andreas Lie, Andreas Thune

Abstract: The Open Porous Media (OPM) initiative is a community effort that encourages open innovation and reproducible research for simulation of porous media processes. OPM coordinates collaborative software development, maintains and distributes open-source software and open data sets, and seeks to ensure that these are available under a free license in a long-term perspective. In this paper, we presen… ▽ More The Open Porous Media (OPM) initiative is a community effort that encourages open innovation and reproducible research for simulation of porous media processes. OPM coordinates collaborative software development, maintains and distributes open-source software and open data sets, and seeks to ensure that these are available under a free license in a long-term perspective. In this paper, we present OPM Flow, which is a reservoir simulator developed for industrial use, as well as some of the individual components used to make OPM Flow. The descriptions apply to the 2019.10 release of OPM. △ Less

Submitted 4 October, 2019; originally announced October 2019.

Comments: 43 pages, 22 figures

MSC Class: 76S05; 68N01; 97N80

arXiv:1807.02361 [pdf, other]

The Influence of Differential Privacy on Short Term Electric Load Forecasting

Authors: Günther Eibl, Kaibin Bao, Philip-William Grassal, Daniel Bernau, Hartmut Schmeck

Abstract: There has been a large number of contributions on privacy-preserving smart metering with Differential Privacy, addressing questions from actual enforcement at the smart meter to billing at the energy provider. However, exploitation is mostly limited to application of cryptographic security means between smart meters and energy providers. We illustrate along the use case of privacy preserving load… ▽ More There has been a large number of contributions on privacy-preserving smart metering with Differential Privacy, addressing questions from actual enforcement at the smart meter to billing at the energy provider. However, exploitation is mostly limited to application of cryptographic security means between smart meters and energy providers. We illustrate along the use case of privacy preserving load forecasting that Differential Privacy is indeed a valuable addition that unlocks novel information flows for optimization. We show that (i) there are large differences in utility along three selected forecasting methods, (ii) energy providers can enjoy good utility especially under the linear regression benchmark model, and (iii) households can participate in privacy preserving load forecasting with an individual re-identification risk < 60%, only 10% over random guessing. △ Less

Submitted 6 July, 2018; originally announced July 2018.

Comments: This is a pre-print of an article submitted to Springer Open Journal "Energy Informatics"

arXiv:1301.0038 [pdf, other]

doi 10.4204/EPTCS.105.2

PALS-Based Analysis of an Airplane Multirate Control System in Real-Time Maude

Authors: Kyungmin Bae, Joshua Krisiloff, José Meseguer, Peter Csaba Ölveczky

Abstract: Distributed cyber-physical systems (DCPS) are pervasive in areas such as aeronautics and ground transportation systems, including the case of distributed hybrid systems. DCPS design and verification is quite challenging because of asynchronous communication, network delays, and clock skews. Furthermore, their model checking verification typically becomes unfeasible due to the huge state space expl… ▽ More Distributed cyber-physical systems (DCPS) are pervasive in areas such as aeronautics and ground transportation systems, including the case of distributed hybrid systems. DCPS design and verification is quite challenging because of asynchronous communication, network delays, and clock skews. Furthermore, their model checking verification typically becomes unfeasible due to the huge state space explosion caused by the system's concurrency. The PALS ("physically asynchronous, logically synchronous") methodology has been proposed to reduce the design and verification of a DCPS to the much simpler task of designing and verifying its underlying synchronous version. The original PALS methodology assumes a single logical period, but Multirate PALS extends it to deal with multirate DCPS in which components may operate with different logical periods. This paper shows how Multirate PALS can be applied to formally verify a nontrivial multirate DCPS. We use Real-Time Maude to formally specify a multirate distributed hybrid system consisting of an airplane maneuvered by a pilot who turns the airplane according to a specified angle through a distributed control system. Our formal analysis revealed that the original design was ineffective in achieving a smooth turning maneuver, and led to a redesign of the system that satisfies the desired correctness properties. This shows that the Multirate PALS methodology is not only effective for formal DCPS verification, but can also be used effectively in the DCPS design process, even before properties are verified. △ Less

Submitted 31 December, 2012; originally announced January 2013.

Comments: In Proceedings FTSCS 2012, arXiv:1212.6574

Journal ref: EPTCS 105, 2012, pp. 5-21

arXiv:1106.0365 [pdf, ps, other]

Lower Bounds for Sparse Recovery

Authors: Khanh Do Ba, Piotr Indyk, Eric Price, David P. Woodruff

Abstract: We consider the following k-sparse recovery problem: design an m x n matrix A, such that for any signal x, given Ax we can efficiently recover x' satisfying ||x-x'||_1 <= C min_{k-sparse} x"} ||x-x"||_1. It is known that there exist matrices A with this property that have only O(k log (n/k)) rows. In this paper we show that this bound is tight. Our bound holds even for the more general /rand… ▽ More We consider the following k-sparse recovery problem: design an m x n matrix A, such that for any signal x, given Ax we can efficiently recover x' satisfying ||x-x'||_1 <= C min_{k-sparse} x"} ||x-x"||_1. It is known that there exist matrices A with this property that have only O(k log (n/k)) rows. In this paper we show that this bound is tight. Our bound holds even for the more general /randomized/ version of the problem, where A is a random variable and the recovery algorithm is required to work for any fixed x with constant probability (over A). △ Less

Submitted 2 June, 2011; v1 submitted 2 June, 2011; originally announced June 2011.

Comments: 11 pages. Appeared at SODA 2010

arXiv:1009.4261 [pdf, other]

doi 10.4204/EPTCS.36.3

Extending the Real-Time Maude Semantics of Ptolemy to Hierarchical DE Models

Authors: Kyungmin Bae, Peter Csaba Ölveczky

Abstract: This paper extends our Real-Time Maude formalization of the semantics of flat Ptolemy II discrete-event (DE) models to hierarchical models, including modal models. This is a challenging task that requires combining synchronous fixed-point computations with hierarchical structure. The synthesis of a Real-Time Maude verification model from a Ptolemy II DE model, and the formal verification of… ▽ More This paper extends our Real-Time Maude formalization of the semantics of flat Ptolemy II discrete-event (DE) models to hierarchical models, including modal models. This is a challenging task that requires combining synchronous fixed-point computations with hierarchical structure. The synthesis of a Real-Time Maude verification model from a Ptolemy II DE model, and the formal verification of the synthesized model in Real-Time Maude, have been integrated into Ptolemy II, enabling a model-engineering process that combines the convenience of Ptolemy II DE modeling and simulation with formal verification in Real-Time Maude. △ Less

Submitted 22 September, 2010; originally announced September 2010.

Comments: In Proceedings RTRTS 2010, arXiv:1009.3982

Journal ref: EPTCS 36, 2010, pp. 46-66

arXiv:0904.0292 [pdf, ps, other]

Sublinear Time Algorithms for Earth Mover's Distance

Authors: Khanh Do Ba, Huy L Nguyen, Huy N Nguyen, Ronitt Rubinfeld

Abstract: We study the problem of estimating the Earth Mover's Distance (EMD) between probability distributions when given access only to samples. We give closeness testers and additive-error estimators over domains in $[0, Δ]^d$, with sample complexities independent of domain size - permitting the testability even of continuous distributions over infinite domains. Instead, our algorithms depend on other… ▽ More We study the problem of estimating the Earth Mover's Distance (EMD) between probability distributions when given access only to samples. We give closeness testers and additive-error estimators over domains in $[0, Δ]^d$, with sample complexities independent of domain size - permitting the testability even of continuous distributions over infinite domains. Instead, our algorithms depend on other parameters, such as the diameter of the domain space, which may be significantly smaller. We also prove lower bounds showing the dependencies on these parameters to be essentially optimal. Additionally, we consider whether natural classes of distributions exist for which there are algorithms with better dependence on the dimension, and show that for highly clusterable data, this is indeed the case. Lastly, we consider a variant of the EMD, defined over tree metrics instead of the usual L1 metric, and give optimal algorithms. △ Less

Submitted 1 April, 2009; originally announced April 2009.

Comments: 12 pages

arXiv:cs/0208040 [pdf, ps, other]

Using Hierarchical Data Mining to Characterize Performance of Wireless System Configurations

Authors: Alex Verstak, Naren Ramakrishnan, Kyung Kyoon Bae, William H. Tranter, Layne T. Watson, Jian He, Clifford A. Shaffer, Theodore S. Rappaport

Abstract: This paper presents a statistical framework for assessing wireless systems performance using hierarchical data mining techniques. We consider WCDMA (wideband code division multiple access) systems with two-branch STTD (space time transmit diversity) and 1/2 rate convolutional coding (forward error correction codes). Monte Carlo simulation estimates the bit error probability (BEP) of the system a… ▽ More This paper presents a statistical framework for assessing wireless systems performance using hierarchical data mining techniques. We consider WCDMA (wideband code division multiple access) systems with two-branch STTD (space time transmit diversity) and 1/2 rate convolutional coding (forward error correction codes). Monte Carlo simulation estimates the bit error probability (BEP) of the system across a wide range of signal-to-noise ratios (SNRs). A performance database of simulation runs is collected over a targeted space of system configurations. This database is then mined to obtain regions of the configuration space that exhibit acceptable average performance. The shape of the mined regions illustrates the joint influence of configuration parameters on system performance. The role of data mining in this application is to provide explainable and statistically valid design conclusions. The research issue is to define statistically meaningful aggregation of data in a manner that permits efficient and effective data mining algorithms. We achieve a good compromise between these goals and help establish the applicability of data mining for characterizing wireless systems performance. △ Less

Submitted 25 August, 2002; originally announced August 2002.

ACM Class: I.6.4

arXiv:cs/0202027 [pdf, ps, other]

BSML: A Binding Schema Markup Language for Data Interchange in Problem Solving Environments (PSEs)

Authors: Alex Verstak, Naren Ramakrishnan, Layne T. Watson, Jian He, Clifford A. Shaffer, Kyung Kyoon Bae, Jing Jiang, William H. Tranter, Theodore S. Rappaport

Abstract: We describe a binding schema markup language (BSML) for describing data interchange between scientific codes. Such a facility is an important constituent of scientific problem solving environments (PSEs). BSML is designed to integrate with a PSE or application composition system that views model specification and execution as a problem of managing semistructured data. The data interchange proble… ▽ More We describe a binding schema markup language (BSML) for describing data interchange between scientific codes. Such a facility is an important constituent of scientific problem solving environments (PSEs). BSML is designed to integrate with a PSE or application composition system that views model specification and execution as a problem of managing semistructured data. The data interchange problem is addressed by three techniques for processing semistructured data: validation, binding, and conversion. We present BSML and describe its application to a PSE for wireless communications system design. △ Less

Submitted 18 February, 2002; originally announced February 2002.

ACM Class: D.2.6; I.2.4

Showing 1–41 of 41 results for author: Bae, K