Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 102 results for author: Jing, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03240  [pdf, other

    cs.CV

    Cyclic Refiner: Object-Aware Temporal Representation Learning for Multi-View 3D Detection and Tracking

    Authors: Mingzhe Guo, Zhipeng Zhang, Liping Jing, Yuan He, Ke Wang, Heng Fan

    Abstract: We propose a unified object-aware temporal learning framework for multi-view 3D detection and tracking tasks. Having observed that the efficacy of the temporal fusion strategy in recent multi-view perception methods may be weakened by distractors and background clutters in historical frames, we propose a cyclic learning mechanism to improve the robustness of multi-view representation learning. The… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted by IJCV

  2. arXiv:2406.17680  [pdf, other

    cs.CV

    End-to-End Autonomous Driving without Costly Modularization and 3D Manual Annotation

    Authors: Mingzhe Guo, Zhipeng Zhang, Yuan He, Ke Wang, Liping Jing

    Abstract: We propose UAD, a method for vision-based end-to-end autonomous driving (E2EAD), achieving the best open-loop evaluation performance in nuScenes, meanwhile showing robust closed-loop driving quality in CARLA. Our motivation stems from the observation that current E2EAD models still mimic the modular architecture in typical driving stacks, with carefully designed supervised perception and predictio… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 17 pages, 10 figures and 15 tables

  3. arXiv:2406.15695  [pdf, other

    cs.CL

    SS-Bench: A Benchmark for Social Story Generation and Evaluation

    Authors: Yi Feng, Mingyang Song, Jiaqi Wang, Mao Zheng, Liping Jing, Jian Yu

    Abstract: Children with Autism Spectrum Disorder (ASD) often misunderstand social situations and struggle to participate in daily routines. Psychology experts write Social Stories under strict constraints of structural clarity, descriptive orientation, and situational safety to enhance their abilities in these regimes. However, Social Stories are costly in creation and often limited in diversity and timelin… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  4. arXiv:2405.16571  [pdf, other

    cs.CL

    A Preliminary Empirical Study on Prompt-based Unsupervised Keyphrase Extraction

    Authors: Mingyang Song, Yi Feng, Liping Jing

    Abstract: Pre-trained large language models can perform natural language processing downstream tasks by conditioning on human-designed prompts. However, a prompt-based approach often requires "prompt engineering" to design different prompts, primarily hand-crafted through laborious trial and error, requiring human intervention and expertise. It is a challenging problem when constructing a prompt-based keyph… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: work in progress

  5. arXiv:2405.04390  [pdf, other

    cs.CV

    DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving

    Authors: Chen Min, Dawei Zhao, Liang Xiao, Jian Zhao, Xinli Xu, Zheng Zhu, Lei Jin, Jianshu Li, Yulan Guo, Junliang Xing, Liping Jing, Yiming Nie, Bin Dai

    Abstract: Vision-centric autonomous driving has recently raised wide attention due to its lower cost. Pre-training is essential for extracting a universal representation. However, current vision-centric pre-training typically relies on either 2D or 3D pre-text tasks, overlooking the temporal characteristics of autonomous driving as a 4D scene understanding task. In this paper, we address this challenge by i… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR2024

  6. arXiv:2405.00236  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    STT: Stateful Tracking with Transformers for Autonomous Driving

    Authors: Longlong Jing, Ruichi Yu, Xu Chen, Zhengli Zhao, Shiwei Sheng, Colin Graber, Qi Chen, Qinru Li, Shangxuan Wu, Han Deng, Sangjin Lee, Chris Sweeney, Qiurui He, Wei-Chih Hung, Tong He, Xingyi Zhou, Farshid Moussavi, Zijian Guo, Yin Zhou, Mingxing Tan, Weilong Yang, Congcong Li

    Abstract: Tracking objects in three-dimensional space is critical for autonomous driving. To ensure safety while driving, the tracker must be able to reliably track objects across frames and accurately estimate their states such as velocity and acceleration in the present. Existing works frequently focus on the association task while either neglecting the model performance on state estimation or deploying c… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: ICRA 2024

  7. arXiv:2404.16452  [pdf, other

    cs.CV

    PAD: Patch-Agnostic Defense against Adversarial Patch Attacks

    Authors: Lihua Jing, Rui Wang, Wenqi Ren, Xin Dong, Cong Zou

    Abstract: Adversarial patch attacks present a significant threat to real-world object detectors due to their practical feasibility. Existing defense methods, which rely on attack data or prior knowledge, struggle to effectively address a wide range of adversarial patches. In this paper, we show two inherent characteristics of adversarial patches, semantic independence and spatial heterogeneity, independent… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  8. The Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data

    Authors: Zixuan Zhu, Rui Wang, Cong Zou, Lihua Jing

    Abstract: Recently, backdoor attacks have posed a serious security threat to the training process of deep neural networks (DNNs). The attacked model behaves normally on benign samples but outputs a specific result when the trigger is present. However, compared with the rocketing progress of backdoor attacks, existing defenses are difficult to deal with these threats effectively or require benign samples to… ▽ More

    Submitted 31 May, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: 13 pages, 6 figures, published to ICCV

    Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2023: 155-164

  9. arXiv:2404.05046  [pdf, other

    cs.CV cs.CL

    FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback

    Authors: Liqiang Jing, Xinya Du

    Abstract: Large Vision-Language Models (LVLMs) have demonstrated proficiency in tackling a variety of visual-language tasks. However, current LVLMs suffer from misalignment between text and image modalities which causes three kinds of hallucination problems, i.e., object existence, object attribute, and object relationship. To tackle this issue, existing methods mainly utilize Reinforcement Learning (RL) to… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  10. arXiv:2403.16788  [pdf, other

    cs.CV

    HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation

    Authors: Linglin Jing, Yiming Ding, Yunpeng Gao, Zhigang Wang, Xu Yan, Dong Wang, Gerald Schaefer, Hui Fang, Bin Zhao, Xuelong Li

    Abstract: Event-based semantic segmentation has gained popularity due to its capability to deal with scenarios under high-speed motion and extreme lighting conditions, which cannot be addressed by conventional RGB cameras. Since it is hard to annotate event data, previous approaches rely on event-to-image reconstruction to obtain pseudo labels for training. However, this will inevitably introduce noise, and… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  11. arXiv:2403.15715  [pdf, other

    cs.CL

    EDDA: A Encoder-Decoder Data Augmentation Framework for Zero-Shot Stance Detection

    Authors: Daijun Ding, Li Dong, Zhichao Huang, Guangning Xu, Xu Huang, Bo Liu, Liwen Jing, Bowen Zhang

    Abstract: Stance detection aims to determine the attitude expressed in text towards a given target. Zero-shot stance detection (ZSSD) has emerged to classify stances towards unseen targets during inference. Recent data augmentation techniques for ZSSD increase transferable knowledge between targets through text or target augmentation. However, these methods exhibit limitations. Target augmentation lacks log… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  12. arXiv:2403.02637  [pdf, other

    cs.CV

    BSDP: Brain-inspired Streaming Dual-level Perturbations for Online Open World Object Detection

    Authors: Yu Chen, Liyan Ma, Liping Jing, Jian Yu

    Abstract: Humans can easily distinguish the known and unknown categories and can recognize the unknown object by learning it once instead of repeating it many times without forgetting the learned object. Hence, we aim to make deep learning models simulate the way people learn. We refer to such a learning manner as OnLine Open World Object Detection(OLOWOD). Existing OWOD approaches pay more attention to the… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 29 pages, 12 figures

  13. arXiv:2402.18107  [pdf, other

    cs.MM

    Multimodal Interaction Modeling via Self-Supervised Multi-Task Learning for Review Helpfulness Prediction

    Authors: HongLin Gong, Mengzhao Jia, Liqiang Jing

    Abstract: In line with the latest research, the task of identifying helpful reviews from a vast pool of user-generated textual and visual data has become a prominent area of study. Effective modal representations are expected to possess two key attributes: consistency and differentiation. Current methods designed for Multimodal Review Helpfulness Prediction (MRHP) face limitations in capturing distinctive i… ▽ More

    Submitted 25 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: 10 pages,4 figures, 4 tables

  14. arXiv:2402.11414  [pdf, other

    cs.CL

    Fine-grained and Explainable Factuality Evaluation for Multimodal Summarization

    Authors: Liqiang Jing, Jingxuan Zuo, Yue Zhang

    Abstract: Multimodal summarization aims to generate a concise summary based on the input text and image. However, the existing methods potentially suffer from unfactual output. To evaluate the factuality of multimodal summarization models, we propose two fine-grained and explainable evaluation frameworks (FALLACIOUS) for different application scenarios, i.e. reference-based factuality evaluation framework a… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  15. arXiv:2402.03658  [pdf, other

    cs.CL cs.MM

    Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue

    Authors: Kun Ouyang, Liqiang Jing, Xuemeng Song, Meng Liu, Yupeng Hu, Liqiang Nie

    Abstract: Sarcasm Explanation in Dialogue (SED) is a new yet challenging task, which aims to generate a natural language explanation for the given sarcastic dialogue that involves multiple modalities (i.e., utterance, video, and audio). Although existing studies have achieved great success based on the generative pretrained language model BART, they overlook exploiting the sentiments residing in the utteran… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  16. arXiv:2402.03635  [pdf, ps, other

    cs.IR

    Retrieval Augmented Cross-Modal Tag Recommendation in Software Q&A Sites

    Authors: Sijin Lu, Pengyu Xu, Bing Liu, Hongjian Sun, Liping Jing, Jian Yu

    Abstract: Posts in software Q\&A sites often consist of three main parts: title, description and code, which are interconnected and jointly describe the question. Existing tag recommendation methods often treat different modalities as a whole or inadequately consider the interaction between different modalities. Additionally, they focus on extracting information directly from the post itself, neglecting the… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  17. arXiv:2401.04317  [pdf, other

    cs.CV cs.CL

    Vision Reimagined: AI-Powered Breakthroughs in WiFi Indoor Imaging

    Authors: Jianyang Shi, Bowen Zhang, Amartansh Dubey, Ross Murch, Liwen Jing

    Abstract: Indoor imaging is a critical task for robotics and internet-of-things. WiFi as an omnipresent signal is a promising candidate for carrying out passive imaging and synchronizing the up-to-date information to all connected devices. This is the first research work to consider WiFi indoor imaging as a multi-modal image generation task that converts the measured WiFi power into a high-resolution indoor… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  18. arXiv:2401.02402  [pdf, other

    cs.CV

    3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation

    Authors: Zihao Xiao, Longlong Jing, Shangxuan Wu, Alex Zihao Zhu, Jingwei Ji, Chiyu Max Jiang, Wei-Chih Hung, Thomas Funkhouser, Weicheng Kuo, Anelia Angelova, Yin Zhou, Shiwei Sheng

    Abstract: 3D panoptic segmentation is a challenging perception task, especially in autonomous driving. It aims to predict both semantic and instance annotations for 3D points in a scene. Although prior 3D panoptic segmentation approaches have achieved great performance on closed-set benchmarks, generalizing these approaches to unseen things and unseen stuff categories remains an open problem. For unseen obj… ▽ More

    Submitted 2 April, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

  19. arXiv:2401.01761  [pdf, other

    cs.CL

    Cross-target Stance Detection by Exploiting Target Analytical Perspectives

    Authors: Daijun Ding, Rong Chen, Liwen Jing, Bowen Zhang, Xu Huang, Li Dong, Xiaowen Zhao, Ge Song

    Abstract: Cross-target stance detection (CTSD) is an important task, which infers the attitude of the destination target by utilizing annotated data derived from the source target. One important approach in CTSD is to extract domain-invariant features to bridge the knowledge gap between multiple targets. However, the analysis of informal and short text structure, and implicit expressions, complicate the ext… ▽ More

    Submitted 3 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

  20. arXiv:2312.16054  [pdf, other

    cs.CL

    A Logically Consistent Chain-of-Thought Approach for Stance Detection

    Authors: Bowen Zhang, Daijun Ding, Liwen Jing, Hu Huang

    Abstract: Zero-shot stance detection (ZSSD) aims to detect stances toward unseen targets. Incorporating background knowledge to enhance transferability between seen and unseen targets constitutes the primary approach of ZSSD. However, these methods often struggle with a knowledge-task disconnect and lack logical consistency in their predictions. To address these issues, we introduce a novel approach named L… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

  21. arXiv:2312.15156  [pdf, other

    cs.CL

    Large Language Models as Zero-Shot Keyphrase Extractors: A Preliminary Empirical Study

    Authors: Mingyang Song, Xuelian Geng, Songfang Yao, Shilong Lu, Yi Feng, Liping Jing

    Abstract: Zero-shot keyphrase extraction aims to build a keyphrase extractor without training by human-annotated data, which is challenging due to the limited human intervention involved. Challenging but worthwhile, zero-shot setting efficiently reduces the time and effort that data labeling takes. Recent efforts on pre-trained large language models (e.g., ChatGPT and ChatGLM) show promising performance on… ▽ More

    Submitted 10 January, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Technical Report, 6 pages

  22. arXiv:2312.10493  [pdf, other

    cs.CL cs.MM

    Debiasing Multimodal Sarcasm Detection with Contrastive Learning

    Authors: Mengzhao Jia, Can Xie, Liqiang Jing

    Abstract: Despite commendable achievements made by existing work, prevailing multimodal sarcasm detection studies rely more on textual content over visual information. It unavoidably induces spurious correlations between textual words and labels, thereby significantly hindering the models' generalization capability. To address this problem, we define the task of out-of-distribution (OOD) multimodal sarcasm… ▽ More

    Submitted 19 December, 2023; v1 submitted 16 December, 2023; originally announced December 2023.

  23. arXiv:2312.10210  [pdf, other

    cs.CL

    VK-G2T: Vision and Context Knowledge enhanced Gloss2Text

    Authors: Liqiang Jing, Xuemeng Song, Xinxing Zu, Na Zheng, Zhongzhou Zhao, Liqiang Nie

    Abstract: Existing sign language translation methods follow a two-stage pipeline: first converting the sign language video to a gloss sequence (i.e. Sign2Gloss) and then translating the generated gloss sequence into a spoken language sentence (i.e. Gloss2Text). While previous studies have focused on boosting the performance of the Sign2Gloss stage, we emphasize the optimization of the Gloss2Text stage. Howe… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP 2024

  24. arXiv:2312.07378  [pdf, other

    cs.CV

    X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-modal Knowledge Transfer

    Authors: Linglin Jing, Ying Xue, Xu Yan, Chaoda Zheng, Dong Wang, Ruimao Zhang, Zhigang Wang, Hui Fang, Bin Zhao, Zhen Li

    Abstract: The field of 4D point cloud understanding is rapidly developing with the goal of analyzing dynamic 3D point cloud sequences. However, it remains a challenging task due to the sparsity and lack of texture in point clouds. Moreover, the irregularity of point cloud poses a difficulty in aligning temporal information within video sequences. To address these issues, we propose a novel cross-modal knowl… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  25. arXiv:2311.10887  [pdf, other

    cs.CV cs.AI

    Point Cloud Self-supervised Learning via 3D to Multi-view Masked Autoencoder

    Authors: Zhimin Chen, Yingwei Li, Longlong Jing, Liang Yang, Bing Li

    Abstract: In recent years, the field of 3D self-supervised learning has witnessed significant progress, resulting in the emergence of Multi-Modality Masked AutoEncoders (MAE) methods that leverage both 2D images and 3D point clouds for pre-training. However, a notable limitation of these approaches is that they do not fully utilize the multi-view attributes inherent in 3D point clouds, which is crucial for… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

  26. arXiv:2311.01477  [pdf, other

    cs.CV

    FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models

    Authors: Liqiang Jing, Ruosen Li, Yunmo Chen, Mengzhao Jia, Xinya Du

    Abstract: We introduce FAITHSCORE (Faithfulness to Atomic Image Facts Score), a reference-free and fine-grained evaluation metric that measures the faithfulness of the generated free-form answers from large vision-language models (LVLMs). The FAITHSCORE evaluation first identifies sub-sentences containing descriptive statements that need to be verified, then extracts a comprehensive list of atomic facts fro… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  27. arXiv:2310.08855  [pdf, other

    cs.LG

    Overcoming Recency Bias of Normalization Statistics in Continual Learning: Balance and Adaptation

    Authors: Yilin Lyu, Liyuan Wang, Xingxing Zhang, Zicheng Sun, Hang Su, Jun Zhu, Liping Jing

    Abstract: Continual learning entails learning a sequence of tasks and balancing their knowledge appropriately. With limited access to old training samples, much of the current work in deep neural networks has focused on overcoming catastrophic forgetting of old tasks in gradient-based optimization. However, the normalization layers provide an exception, as they are updated interdependently by the gradient a… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS 2023

  28. arXiv:2310.07700  [pdf, other

    cs.CL

    Knowledge-enhanced Memory Model for Emotional Support Conversation

    Authors: Mengzhao Jia, Qianglong Chen, Liqiang Jing, Dawei Fu, Renyu Li

    Abstract: The prevalence of mental disorders has become a significant issue, leading to the increased focus on Emotional Support Conversation as an effective supplement for mental health support. Existing methods have achieved compelling results, however, they still face three challenges: 1) variability of emotions, 2) practicality of the response, and 3) intricate strategy modeling. To address these challe… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  29. arXiv:2310.05423  [pdf, other

    cs.IR

    Sequential Tag Recommendation

    Authors: Bing Liu, Pengyu Xu, Sijin Lu, Shijing Wang, Hongjian Sun, Liping Jing

    Abstract: With the development of Internet technology and the expansion of social networks, online platforms have become an important way for people to obtain information. The introduction of tags facilitates information categorization and retrieval. Meanwhile, the development of tag recommendation systems not only enables users to input tags more efficiently, but also improves the quality of tags. However,… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  30. arXiv:2308.05983   

    cs.CV cs.AI cs.CR

    Face Encryption via Frequency-Restricted Identity-Agnostic Attacks

    Authors: Xin Dong, Rui Wang, Siyuan Liang, Aishan Liu, Lihua Jing

    Abstract: Billions of people are sharing their daily live images on social media everyday. However, malicious collectors use deep face recognition systems to easily steal their biometric information (e.g., faces) from these images. Some studies are being conducted to generate encrypted face photos using adversarial attacks by introducing imperceptible perturbations to reduce face information leakage. Howeve… ▽ More

    Submitted 24 August, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: I noticed something missing in the article's description in subsection 3.2, so I'd like to undo it and re-finalize and describe it

  31. arXiv:2307.10511  [pdf, other

    cs.CL

    General Debiasing for Multimodal Sentiment Analysis

    Authors: Teng Sun, Juntong Ni, Wenjie Wang, Liqiang Jing, Yinwei Wei, Liqiang Nie

    Abstract: Existing work on Multimodal Sentiment Analysis (MSA) utilizes multimodal information for prediction yet unavoidably suffers from fitting the spurious correlations between multimodal features and sentiment labels. For example, if most videos with a blue background have positive labels in a dataset, the model will rely on such correlations for prediction, while "blue background" is not a sentiment-r… ▽ More

    Submitted 7 August, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

    Comments: Accepted at ACM MM 2023

  32. arXiv:2307.10046  [pdf, other

    cs.CV

    Divert More Attention to Vision-Language Object Tracking

    Authors: Mingzhe Guo, Zhipeng Zhang, Liping Jing, Haibin Ling, Heng Fan

    Abstract: Multimodal vision-language (VL) learning has noticeably pushed the tendency toward generic intelligence owing to emerging large foundation models. However, tracking, as a fundamental vision problem, surprisingly enjoys less bonus from recent flourishing VL learning. We argue that the reasons are two-fold: the lack of large-scale vision-language annotated videos and ineffective vision-language inte… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

    Comments: 16 pages, 9 figures

  33. arXiv:2306.16650  [pdf, other

    cs.CL cs.AI

    Multi-source Semantic Graph-based Multimodal Sarcasm Explanation Generation

    Authors: Liqiang Jing, Xuemeng Song, Kun Ouyang, Mengzhao Jia, Liqiang Nie

    Abstract: Multimodal Sarcasm Explanation (MuSE) is a new yet challenging task, which aims to generate a natural language sentence for a multimodal social post (an image as well as its caption) to explain why it contains sarcasm. Although the existing pioneer study has achieved great success with the BART backbone, it overlooks the gap between the visual feature space and the decoder semantic space, the obje… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: Accepted by ACL 2023 main conference

    Journal ref: ACL 2023

  34. arXiv:2305.08776  [pdf, other

    cs.CV

    Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models

    Authors: Zhimin Chen, Longlong Jing, Yingwei Li, Bing Li

    Abstract: Foundation models have achieved remarkable results in 2D and language tasks like image segmentation, object detection, and visual-language understanding. However, their potential to enrich 3D scene representation learning is largely untapped due to the existence of the domain gap. In this work, we propose an innovative methodology called Bridge3D to address this gap by pre-training 3D models using… ▽ More

    Submitted 2 November, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  35. arXiv:2305.03256  [pdf, other

    cs.CL

    Stylized Data-to-Text Generation: A Case Study in the E-Commerce Domain

    Authors: Liqiang Jing, Xuemeng Song, Xuming Lin, Zhongzhou Zhao, Wei Zhou, Liqiang Nie

    Abstract: Existing data-to-text generation efforts mainly focus on generating a coherent text from non-linguistic input data, such as tables and attribute-value pairs, but overlook that different application scenarios may require texts of different styles. Inspired by this, we define a new task, namely stylized data-to-text generation, whose aim is to generate coherent text for the given non-linguistic data… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

  36. arXiv:2304.14618  [pdf, other

    cs.LG stat.ML

    Recognizable Information Bottleneck

    Authors: Yilin Lyu, Xin Liu, Mingyang Song, Xinyue Wang, Yaxin Peng, Tieyong Zeng, Liping Jing

    Abstract: Information Bottlenecks (IBs) learn representations that generalize to unseen data by information compression. However, existing IBs are practically unable to guarantee generalization in real-world scenarios due to the vacuous generalization bound. The recent PAC-Bayes IB uses information complexity instead of information compression to establish a connection with the mutual information generaliza… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

    Comments: 12 pages. To appear in IJCAI 2023

  37. arXiv:2304.03087  [pdf, other

    cs.CL

    Investigating Chain-of-thought with ChatGPT for Stance Detection on Social Media

    Authors: Bowen Zhang, Xianghua Fu, Daijun Ding, Hu Huang, Yangyang Li, Liwen Jing

    Abstract: Stance detection predicts attitudes towards targets in texts and has gained attention with the rise of social media. Traditional approaches include conventional machine learning, early deep neural networks, and pre-trained fine-tuning models. However, with the evolution of very large pre-trained language models (VLPLMs) like ChatGPT (GPT-3.5), traditional methods face deployment challenges. The pa… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Comments: arXiv admin note: text overlap with arXiv:2212.14548

  38. arXiv:2303.13001  [pdf, other

    cs.CL

    Is ChatGPT A Good Keyphrase Generator? A Preliminary Study

    Authors: Mingyang Song, Haiyun Jiang, Shuming Shi, Songfang Yao, Shilong Lu, Yi Feng, Huafeng Liu, Liping Jing

    Abstract: The emergence of ChatGPT has recently garnered significant attention from the computational linguistics community. To demonstrate its capabilities as a keyphrase generator, we conduct a preliminary evaluation of ChatGPT for the keyphrase generation task. We evaluate its performance in various aspects, including keyphrase generation prompts, keyphrase generation diversity, and long document underst… ▽ More

    Submitted 22 December, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

    Comments: Technical Report, 6 pages

  39. arXiv:2212.14548  [pdf, other

    cs.CL

    How would Stance Detection Techniques Evolve after the Launch of ChatGPT?

    Authors: Bowen Zhang, Daijun Ding, Liwen Jing

    Abstract: Stance detection refers to the task of extracting the standpoint (Favor, Against or Neither) towards a target in given texts. Such research gains increasing attention with the proliferation of social media contents. The conventional framework of handling stance detection is converting it into text classification tasks. Deep learning models have already replaced rule-based models and traditional ma… ▽ More

    Submitted 10 April, 2023; v1 submitted 30 December, 2022; originally announced December 2022.

  40. arXiv:2212.03517  [pdf, other

    cs.CV

    AsyInst: Asymmetric Affinity with DepthGrad and Color for Box-Supervised Instance Segmentation

    Authors: Siwei Yang, Longlong Jing, Junfei Xiao, Hang Zhao, Alan Yuille, Yingwei Li

    Abstract: The weakly supervised instance segmentation is a challenging task. The existing methods typically use bounding boxes as supervision and optimize the network with a regularization loss term such as pairwise color affinity loss for instance segmentation. Through systematic analysis, we found that the commonly used pairwise affinity loss has two limitations: (1) it works with color affinity but leads… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

  41. arXiv:2211.10685  [pdf, other

    cs.CL

    Pairwise Instance Relation Augmentation for Long-tailed Multi-label Text Classification

    Authors: Lin Xiao, Pengyu Xu, Liping Jing, Xiangliang Zhang

    Abstract: Multi-label text classification (MLTC) is one of the key tasks in natural language processing. It aims to assign multiple target labels to one document. Due to the uneven popularity of labels, the number of documents per label follows a long-tailed distribution in most cases. It is much more challenging to learn classifiers for data-scarce tail labels than for data-rich head labels. The main reaso… ▽ More

    Submitted 19 November, 2022; originally announced November 2022.

  42. arXiv:2210.10138  [pdf, other

    cs.CV cs.AI

    Class-Level Confidence Based 3D Semi-Supervised Learning

    Authors: Zhimin Chen, Longlong Jing, Liang Yang, Yingwei Li, Bing Li

    Abstract: Recent state-of-the-art method FlexMatch firstly demonstrated that correctly estimating learning status is crucial for semi-supervised learning (SSL). However, the estimation method proposed by FlexMatch does not take into account imbalanced data, which is the common case for 3D semi-supervised learning. To address this problem, we practically demonstrate that unlabeled data class-level confidence… ▽ More

    Submitted 21 October, 2022; v1 submitted 18 October, 2022; originally announced October 2022.

    Journal ref: WACV 2023 accepeted

  43. arXiv:2210.04135  [pdf, other

    cs.CV cs.LG cs.MM

    VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment

    Authors: Shraman Pramanick, Li Jing, Sayan Nag, Jiachen Zhu, Hardik Shah, Yann LeCun, Rama Chellappa

    Abstract: Vision-language pre-training (VLP) has recently proven highly effective for various uni- and multi-modal downstream applications. However, most existing end-to-end VLP methods use high-resolution image-text box data to perform well on fine-grained region-level tasks, such as object detection, segmentation, and referring expression comprehension. Unfortunately, such high-resolution images with accu… ▽ More

    Submitted 29 October, 2023; v1 submitted 8 October, 2022; originally announced October 2022.

    Comments: Published in TMLR 2023

  44. arXiv:2209.11964  [pdf, other

    cs.LG cs.CV

    Strong Transferable Adversarial Attacks via Ensembled Asymptotically Normal Distribution Learning

    Authors: Zhengwei Fang, Rui Wang, Tao Huang, Liping Jing

    Abstract: Strong adversarial examples are crucial for evaluating and enhancing the robustness of deep neural networks. However, the performance of popular attacks is usually sensitive, for instance, to minor image transformations, stemming from limited information -- typically only one input example, a handful of white-box source models, and undefined defense strategies. Hence, the crafted adversarial examp… ▽ More

    Submitted 29 March, 2024; v1 submitted 24 September, 2022; originally announced September 2022.

  45. arXiv:2209.08988  [pdf, ps, other

    cs.CV cs.LG

    MSA-GCN:Multiscale Adaptive Graph Convolution Network for Gait Emotion Recognition

    Authors: Yunfei Yin, Li Jing, Faliang Huang, Guangchao Yang, Zhuowei Wang

    Abstract: Gait emotion recognition plays a crucial role in the intelligent system. Most of the existing methods recognize emotions by focusing on local actions over time. However, they ignore that the effective distances of different emotions in the time domain are different, and the local actions during walking are quite similar. Thus, emotions should be represented by global states instead of indirect loc… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

  46. Counterfactual Reasoning for Out-of-distribution Multimodal Sentiment Analysis

    Authors: Teng Sun, Wenjie Wang, Liqiang Jing, Yiran Cui, Xuemeng Song, Liqiang Nie

    Abstract: Existing studies on multimodal sentiment analysis heavily rely on textual modality and unavoidably induce the spurious correlations between textual words and sentiment labels. This greatly hinders the model generalization ability. To address this problem, we define the task of out-of-distribution (OOD) multimodal sentiment analysis. This task aims to estimate and mitigate the bad effect of textual… ▽ More

    Submitted 23 July, 2022; originally announced July 2022.

  47. arXiv:2207.07934  [pdf, other

    cs.CL cs.HC cs.MM

    Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model

    Authors: Xiaolin Chen, Xuemeng Song, Liqiang Jing, Shuo Li, Linmei Hu, Liqiang Nie

    Abstract: Text response generation for multimodal task-oriented dialog systems, which aims to generate the proper text response given the multimodal context, is an essential yet challenging task. Although existing efforts have achieved compelling success, they still suffer from two pivotal limitations: 1) overlook the benefit of generative pre-training, and 2) ignore the textual context related knowledge. T… ▽ More

    Submitted 11 May, 2024; v1 submitted 16 July, 2022; originally announced July 2022.

  48. arXiv:2207.01076  [pdf, other

    cs.CV

    Divert More Attention to Vision-Language Tracking

    Authors: Mingzhe Guo, Zhipeng Zhang, Heng Fan, Liping Jing

    Abstract: Relying on Transformer for complex visual feature learning, object tracking has witnessed the new standard for state-of-the-arts (SOTAs). However, this advancement accompanies by larger training data and longer training period, making tracking increasingly expensive. In this paper, we demonstrate that the Transformer-reliance is not necessary and the pure ConvNets are still competitive and even be… ▽ More

    Submitted 3 July, 2022; originally announced July 2022.

    Comments: 18 pages, 7 figures

  49. arXiv:2206.07700  [pdf, other

    cs.CV cs.AI cs.LG

    Masked Siamese ConvNets

    Authors: Li Jing, Jiachen Zhu, Yann LeCun

    Abstract: Self-supervised learning has shown superior performances over supervised methods on various vision benchmarks. The siamese network, which encourages embeddings to be invariant to distortions, is one of the most successful self-supervised visual representation learning approaches. Among all the augmentation methods, masking is the most general and straightforward method that has the potential to be… ▽ More

    Submitted 15 June, 2022; originally announced June 2022.

  50. arXiv:2206.04831  [pdf, other

    cs.CV

    R4D: Utilizing Reference Objects for Long-Range Distance Estimation

    Authors: Yingwei Li, Tiffany Chen, Maya Kabkab, Ruichi Yu, Longlong Jing, Yurong You, Hang Zhao

    Abstract: Estimating the distance of objects is a safety-critical task for autonomous driving. Focusing on short-range objects, existing methods and datasets neglect the equally important long-range objects. In this paper, we introduce a challenging and under-explored task, which we refer to as Long-Range Distance Estimation, as well as two datasets to validate new methods developed for this task. We then p… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

    Comments: ICLR 2022