Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 112 results for author: Jiang, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.16337  [pdf, other

    cs.LG

    STATE: A Robust ATE Estimator of Heavy-Tailed Metrics for Variance Reduction in Online Controlled Experiments

    Authors: Hao Zhou, Kun Sun, Shaoming Li, Yangfeng Fan, Guibin Jiang, Jiaqi Zheng, Tao Li

    Abstract: Online controlled experiments play a crucial role in enabling data-driven decisions across a wide range of companies. Variance reduction is an effective technique to improve the sensitivity of experiments, achieving higher statistical power while using fewer samples and shorter experimental periods. However, typical variance reduction methods (e.g., regression-adjusted estimators) are built upon t… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by KDD 2024

  2. arXiv:2407.13664  [pdf, other

    cs.LG

    Decision Focused Causal Learning for Direct Counterfactual Marketing Optimization

    Authors: Hao Zhou, Rongxiao Huang, Shaoming Li, Guibin Jiang, Jiaqi Zheng, Bing Cheng, Wei Lin

    Abstract: Marketing optimization plays an important role to enhance user engagement in online Internet platforms. Existing studies usually formulate this problem as a budget allocation problem and solve it by utilizing two fully decoupled stages, i.e., machine learning (ML) and operation research (OR). However, the learning objective in ML does not take account of the downstream optimization task in OR, whi… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by KDD 2024

  3. arXiv:2407.10973  [pdf, other

    cs.AI

    Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion

    Authors: Yongyuan Liang, Tingqiang Xu, Kaizhe Hu, Guangqi Jiang, Furong Huang, Huazhe Xu

    Abstract: Can we generate a control policy for an agent using just one demonstration of desired behaviors as a prompt, as effortlessly as creating an image from a textual description? In this paper, we present Make-An-Agent, a novel policy parameter generator that leverages the power of conditional diffusion models for behavior-to-policy generation. Guided by behavior embeddings that encode trajectory infor… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  4. arXiv:2407.09756  [pdf, other

    cs.CL

    LLM-Collaboration on Automatic Science Journalism for the General Audience

    Authors: Gongyao Jiang, Xinran Shi, Qiong Luo

    Abstract: Science journalism reports current scientific discoveries to non-specialists, aiming to enable public comprehension of the state of the art. However, this task can be challenging as the audience often lacks specific knowledge about the presented research. To address this challenge, we propose a framework that integrates three LLMs mimicking the real-world writing-reading-feedback-revision workflow… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Under review

  5. arXiv:2407.04284  [pdf, other

    cs.MM

    TSC-PCAC: Voxel Transformer and Sparse Convolution Based Point Cloud Attribute Compression for 3D Broadcasting

    Authors: Zixi Guo, Yun Zhang, Linwei Zhu, Hanli Wang, Gangyi Jiang

    Abstract: Point cloud has been the mainstream representation for advanced 3D applications, such as virtual reality and augmented reality. However, the massive data amounts of point clouds is one of the most challenging issues for transmission and storage. In this paper, we propose an end-to-end voxel Transformer and Sparse Convolution based Point Cloud Attribute Compression (TSC-PCAC) for 3D broadcasting. F… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  6. arXiv:2406.12251  [pdf, other

    cs.CL cs.AI cs.LG

    Mitigate Negative Transfer with Similarity Heuristic Lifelong Prompt Tuning

    Authors: Chenyuan Wu, Gangwei Jiang, Defu Lian

    Abstract: Lifelong prompt tuning has significantly advanced parameter-efficient lifelong learning with its efficiency and minimal storage demands on various tasks. Our empirical studies, however, highlights certain transferability constraints in the current methodologies: a universal algorithm that guarantees consistent positive transfer across all tasks is currently unattainable, especially when dealing di… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Findings

  7. arXiv:2406.12227  [pdf, other

    cs.AI

    Interpretable Catastrophic Forgetting of Large Language Model Fine-tuning via Instruction Vector

    Authors: Gangwei Jiang, Caigao Jiang, Zhaoyi Li, Siqiao Xue, Jun Zhou, Linqi Song, Defu Lian, Ying Wei

    Abstract: Fine-tuning large language models (LLMs) can cause them to lose their general capabilities. However, the intrinsic mechanisms behind such forgetting remain unexplored. In this paper, we begin by examining this phenomenon by focusing on knowledge understanding and instruction following, with the latter identified as the main contributor to forgetting during fine-tuning. Consequently, we propose the… ▽ More

    Submitted 24 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  8. arXiv:2406.11455  [pdf, other

    cs.CL cs.AI

    Adaptive Reinforcement Learning Planning: Harnessing Large Language Models for Complex Information Extraction

    Authors: Zepeng Ding, Ruiyang Ke, Wenhao Huang, Guochao Jiang, Yanda Li, Deqing Yang, Yanghua Xiao, Jiaqing Liang

    Abstract: Existing research on large language models (LLMs) shows that they can solve information extraction tasks through multi-step planning. However, their extraction behavior on complex sentences and tasks is unstable, emerging issues such as false positives and missing elements. We observe that decomposing complex extraction tasks and extracting them step by step can effectively improve LLMs' performan… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  9. arXiv:2406.11434  [pdf, other

    cs.DB

    DB-GPT-Hub: Towards Open Benchmarking Text-to-SQL Empowered by Large Language Models

    Authors: Fan Zhou, Siqiao Xue, Danrui Qi, Wenhui Shi, Wang Zhao, Ganglin Wei, Hongyang Zhang, Caigai Jiang, Gangwei Jiang, Zhixuan Chu, Faqiang Chen

    Abstract: Large language models (LLMs) becomes the dominant paradigm for the challenging task of text-to-SQL. LLM-empowered text-to-SQL methods are typically categorized into prompting-based and tuning approaches. Compared to prompting-based methods, benchmarking fine-tuned LLMs for text-to-SQL is important yet under-explored, partially attributed to the prohibitively high computational cost. In this paper,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  10. arXiv:2406.07662  [pdf, other

    eess.IV cs.AI cs.CV cs.LG q-bio.NC

    Progress Towards Decoding Visual Imagery via fNIRS

    Authors: Michel Adamic, Wellington Avelino, Anna Brandenberger, Bryan Chiang, Hunter Davis, Stephen Fay, Andrew Gregory, Aayush Gupta, Raphael Hotter, Grace Jiang, Fiona Leng, Stephen Polcyn, Thomas Ribeiro, Paul Scotti, Michelle Wang, Marley Xiong, Jonathan Xu

    Abstract: We demonstrate the possibility of reconstructing images from fNIRS brain activity and start building a prototype to match the required specs. By training an image reconstruction model on downsampled fMRI data, we discovered that cm-scale spatial resolution is sufficient for image generation. We obtained 71% retrieval accuracy with 1-cm resolution, compared to 93% on the full-resolution fMRI, and 2… ▽ More

    Submitted 22 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  11. arXiv:2406.05620  [pdf, other

    cs.CV

    Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval

    Authors: Yiwei Ma, Xiaoshuai Sun, Jiayi Ji, Guannan Jiang, Weilin Zhuang, Rongrong Ji

    Abstract: Text-based person retrieval (TPR) is a challenging task that involves retrieving a specific individual based on a textual description. Despite considerable efforts to bridge the gap between vision and language, the significant differences between these modalities continue to pose a challenge. Previous methods have attempted to align text and image samples in a modal-shared space, but they face unc… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: ACM MM2023

  12. arXiv:2406.01341  [pdf, other

    cs.SI

    Important node identification for complex networks based on improved Electre Multi-Attribute fusion

    Authors: Qi Cao, Yurong Song, Min Li, Ruqi Li, Hongbo Qu, Guo-Ping Jiang, Jinye Xiong

    Abstract: Influence maximization problem involves selecting a subset of seed nodes within a social network to maximize information spread under a given diffusion model, so how to identify the important nodes is the problem to be considered in this paper. Due to the great differences in the reality of the network, a class of multi-attribute decision fusion methods is often used to solve this problem. Electre… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  13. arXiv:2405.18706  [pdf, other

    cs.CV

    FocSAM: Delving Deeply into Focused Objects in Segmenting Anything

    Authors: You Huang, Zongyu Lan, Liujuan Cao, Xianming Lin, Shengchuan Zhang, Guannan Jiang, Rongrong Ji

    Abstract: The Segment Anything Model (SAM) marks a notable milestone in segmentation models, highlighted by its robust zero-shot capabilities and ability to handle diverse prompts. SAM follows a pipeline that separates interactive segmentation into image preprocessing through a large encoder and interactive inference via a lightweight decoder, ensuring efficient real-time performance. However, SAM faces sta… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR 2024

  14. arXiv:2405.17067  [pdf, other

    cs.CL cs.AI

    Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization

    Authors: Dixuan Wang, Yanda Li, Junyuan Jiang, Zepeng Ding, Guochao Jiang, Jiaqing Liang, Deqing Yang

    Abstract: Large Language Models (LLMs) have shown remarkable capabilities in language understanding and generation. Nonetheless, it was also witnessed that LLMs tend to produce inaccurate responses to specific queries. This deficiency can be traced to the tokenization step LLMs must undergo, which is an inevitable limitation inherent to all LLMs. In fact, incorrect tokenization is the critical point that hi… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 17 pages, 3 figures, this paper is submitted to neurips 2024

  15. arXiv:2405.16552  [pdf, other

    cs.CL cs.AI

    SED: Self-Evaluation Decoding Enhances Large Language Models for Better Generation

    Authors: Ziqin Luo, Haixia Han, Haokun Zhao, Guochao Jiang, Chengyu Du, Tingyun Li, Jiaqing Liang, Deqing Yang, Yanghua Xiao

    Abstract: Existing Large Language Models (LLMs) generate text through unidirectional autoregressive decoding methods to respond to various user queries. These methods tend to consider token selection in a simple sequential manner, making it easy to fall into suboptimal options when encountering uncertain tokens, referred to as chaotic points in our work. Many chaotic points exist in texts generated by LLMs,… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: The relevant code will be released in subsequent versions

  16. arXiv:2405.08419  [pdf, other

    cs.CV

    WaterMamba: Visual State Space Model for Underwater Image Enhancement

    Authors: Meisheng Guan, Haiyong Xu, Gangyi Jiang, Mei Yu, Yeyao Chen, Ting Luo, Yang Song

    Abstract: Underwater imaging often suffers from low quality due to factors affecting light propagation and absorption in water. To improve image quality, some underwater image enhancement (UIE) methods based on convolutional neural networks (CNN) and Transformer have been proposed. However, CNN-based UIE methods are limited in modeling long-range dependencies, and Transformer-based methods involve a large n… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2403.06098

  17. arXiv:2405.06906  [pdf, other

    cs.CL

    Finding structure in logographic writing with library learning

    Authors: Guangyuan Jiang, Matthias Hofer, Jiayuan Mao, Lionel Wong, Joshua B. Tenenbaum, Roger P. Levy

    Abstract: One hallmark of human language is its combinatoriality -- reusing a relatively small inventory of building blocks to create a far larger inventory of increasingly complex structures. In this paper, we explore the idea that combinatoriality in language reflects a human inductive bias toward representational efficiency in symbol systems. We develop a computational framework for discovering structure… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: Accepted at CogSci 2024 (Talk)

  18. arXiv:2405.04960  [pdf, other

    cs.CL

    P-ICL: Point In-Context Learning for Named Entity Recognition with Large Language Models

    Authors: Guochao Jiang, Zepeng Ding, Yuchen Shi, Deqing Yang

    Abstract: In recent years, the rise of large language models (LLMs) has made it possible to directly achieve named entity recognition (NER) without any demonstration samples or only using a few samples through in-context learning (ICL). However, standard ICL only helps LLMs understand task instructions, format and input-label mapping, but neglects the particularity of the NER task itself. In this paper, we… ▽ More

    Submitted 17 June, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  19. arXiv:2405.00578  [pdf, other

    cs.CL cs.AI

    The Real, the Better: Aligning Large Language Models with Online Human Behaviors

    Authors: Guanying Jiang, Lingyong Yan, Haibo Shi, Dawei Yin

    Abstract: Large language model alignment is widely used and studied to avoid LLM producing unhelpful and harmful responses. However, the lengthy training process and predefined preference bias hinder adaptation to online diverse human preferences. To this end, this paper proposes an alignment framework, called Reinforcement Learning with Human Behavior (RLHB), to align LLMs by directly leveraging real onlin… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 11 pages, 6 figures

  20. arXiv:2404.19444  [pdf, other

    cs.CV

    AnomalyXFusion: Multi-modal Anomaly Synthesis with Diffusion

    Authors: Jie Hu, Yawen Huang, Yilin Lu, Guoyang Xie, Guannan Jiang, Yefeng Zheng, Zhichao Lu

    Abstract: Anomaly synthesis is one of the effective methods to augment abnormal samples for training. However, current anomaly synthesis methods predominantly rely on texture information as input, which limits the fidelity of synthesized abnormal samples. Because texture information is insufficient to correctly depict the pattern of anomalies, especially for logical anomalies. To surmount this obstacle, we… ▽ More

    Submitted 1 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

  21. arXiv:2404.18433  [pdf, other

    cs.CV

    ShadowMaskFormer: Mask Augmented Patch Embeddings for Shadow Removal

    Authors: Zhuohao Li, Guoyang Xie, Guannan Jiang, Zhichao Lu

    Abstract: Transformer recently emerged as the de facto model for computer vision tasks and has also been successfully applied to shadow removal. However, these existing methods heavily rely on intricate modifications to the attention mechanisms within the transformer blocks while using a generic patch embedding. As a result, it often leads to complex architectural designs requiring additional computation re… ▽ More

    Submitted 30 April, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  22. arXiv:2404.11064  [pdf, other

    cs.CV cs.AI

    Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization

    Authors: Yongdong Luo, Haojia Lin, Xiawu Zheng, Yigeng Jiang, Fei Chao, Jie Hu, Guannan Jiang, Songan Zhang, Rongrong Ji

    Abstract: 3D Visual Grounding (3DVG) and 3D Dense Captioning (3DDC) are two crucial tasks in various 3D applications, which require both shared and complementary information in localization and visual-language relationships. Therefore, existing approaches adopt the two-stage "detect-then-describe/discriminate" pipeline, which relies heavily on the performance of the detector, resulting in suboptimal perform… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  23. arXiv:2404.09145  [pdf, other

    cs.CL cs.AI

    ToNER: Type-oriented Named Entity Recognition with Generative Language Model

    Authors: Guochao Jiang, Ziqin Luo, Yuchen Shi, Dixuan Wang, Jiaqing Liang, Deqing Yang

    Abstract: In recent years, the fine-tuned generative models have been proven more powerful than the previous tagging-based or span-based models on named entity recognition (NER) task. It has also been found that the information related to entities, such as entity types, can prompt a model to achieve NER better. However, it is not easy to determine the entity types indeed existing in the given sentence in ad… ▽ More

    Submitted 11 June, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: Accepted by LREC-COLING 2024

  24. arXiv:2404.04293  [pdf, other

    cs.CL cs.AI

    Reason from Fallacy: Enhancing Large Language Models' Logical Reasoning through Logical Fallacy Understanding

    Authors: Yanda Li, Dixuan Wang, Jiaqing Liang, Guochao Jiang, Qianyu He, Yanghua Xiao, Deqing Yang

    Abstract: Large Language Models (LLMs) have demonstrated good performance in many reasoning tasks, but they still struggle with some complicated reasoning tasks including logical reasoning. One non-negligible reason for LLMs' suboptimal performance on logical reasoning is their overlooking of understanding logical fallacies correctly. To evaluate LLMs' capability of logical fallacy understanding (LFU), we p… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  25. arXiv:2403.17367  [pdf, other

    cs.RO

    RoboDuet: A Framework Affording Mobile-Manipulation and Cross-Embodiment

    Authors: Guoping Pan, Qingwei Ben, Zhecheng Yuan, Guangqi Jiang, Yandong Ji, Jiangmiao Pang, Houde Liu, Huazhe Xu

    Abstract: Combining the mobility of legged robots with the manipulation skills of arms has the potential to significantly expand the operational range and enhance the capabilities of robotic systems in performing various mobile manipulation tasks. Existing approaches are confined to imprecise six degrees of freedom (DoF) manipulation and possess a limited arm workspace. In this paper, we propose a novel fra… ▽ More

    Submitted 13 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  26. arXiv:2402.15052  [pdf, other

    cs.CL cs.AI

    ToMBench: Benchmarking Theory of Mind in Large Language Models

    Authors: Zhuang Chen, Jincenzi Wu, Jinfeng Zhou, Bosi Wen, Guanqun Bi, Gongyao Jiang, Yaru Cao, Mengting Hu, Yunghwei Lai, Zexuan Xiong, Minlie Huang

    Abstract: Theory of Mind (ToM) is the cognitive capability to perceive and ascribe mental states to oneself and others. Recent research has sparked a debate over whether large language models (LLMs) exhibit a form of ToM. However, existing ToM evaluations are hindered by challenges such as constrained scope, subjective judgment, and unintended contamination, yielding inadequate assessments. To address this… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Under review

  27. arXiv:2402.14328  [pdf, other

    cs.CL

    Understanding and Patching Compositional Reasoning in LLMs

    Authors: Zhaoyi Li, Gangwei Jiang, Hong Xie, Linqi Song, Defu Lian, Ying Wei

    Abstract: LLMs have marked a revolutonary shift, yet they falter when faced with compositional reasoning tasks. Our research embarks on a quest to uncover the root causes of compositional reasoning failures of LLMs, uncovering that most of them stem from the improperly generated or leveraged implicit reasoning results. Inspired by our empirical findings, we resort to Logit Lens and an intervention experimen… ▽ More

    Submitted 6 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL'2024 Findings

  28. arXiv:2401.01545  [pdf, other

    cs.CV cs.RO

    DDN-SLAM: Real-time Dense Dynamic Neural Implicit SLAM

    Authors: Mingrui Li, Yiming Zhou, Guangan Jiang, Tianchen Deng, Yangyang Wang, Hongyu Wang

    Abstract: SLAM systems based on NeRF have demonstrated superior performance in rendering quality and scene reconstruction for static environments compared to traditional dense SLAM. However, they encounter tracking drift and mapping errors in real-world scenarios with dynamic interferences. To address these issues, we introduce DDN-SLAM, the first real-time dense dynamic neural implicit SLAM system integrat… ▽ More

    Submitted 8 March, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: 11pages, 4figures

  29. arXiv:2312.14203  [pdf, other

    q-fin.PM cs.CL cs.LG

    Shai: A large language model for asset management

    Authors: Zhongyang Guo, Guanran Jiang, Zhongdan Zhang, Peng Li, Zhefeng Wang, Yinchun Wang

    Abstract: This paper introduces "Shai" a 10B level large language model specifically designed for the asset management industry, built upon an open-source foundational model. With continuous pre-training and fine-tuning using a targeted corpus, Shai demonstrates enhanced performance in tasks relevant to its domain, outperforming baseline models. Our research includes the development of an innovative evaluat… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  30. arXiv:2312.14134  [pdf, other

    cs.LG cs.CV cs.RO

    Diffusion Reward: Learning Rewards via Conditional Video Diffusion

    Authors: Tao Huang, Guangqi Jiang, Yanjie Ze, Huazhe Xu

    Abstract: Learning rewards from expert videos offers an affordable and effective solution to specify the intended behaviors for reinforcement learning tasks. In this work, we propose Diffusion Reward, a novel framework that learns rewards from expert videos via conditional video diffusion models for solving complex visual RL problems. Our key insight is that lower generative diversity is observed when condi… ▽ More

    Submitted 18 March, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Project page and code: https://diffusion-reward.github.io/

  31. arXiv:2312.06988  [pdf, other

    cs.CV

    MWSIS: Multimodal Weakly Supervised Instance Segmentation with 2D Box Annotations for Autonomous Driving

    Authors: Guangfeng Jiang, Jun Liu, Yuzhi Wu, Wenlong Liao, Tao He, Pai Peng

    Abstract: Instance segmentation is a fundamental research in computer vision, especially in autonomous driving. However, manual mask annotation for instance segmentation is quite time-consuming and costly. To address this problem, some prior works attempt to apply weakly supervised manner by exploring 2D or 3D boxes. However, no one has ever successfully segmented 2D and 3D instances simultaneously by only… ▽ More

    Submitted 17 December, 2023; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: AAAI2024

  32. arXiv:2312.00085  [pdf, other

    cs.CV

    X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation

    Authors: Yiwei Ma, Yijun Fan, Jiayi Ji, Haowei Wang, Xiaoshuai Sun, Guannan Jiang, Annan Shu, Rongrong Ji

    Abstract: In recent times, automatic text-to-3D content creation has made significant progress, driven by the development of pretrained 2D diffusion models. Existing text-to-3D methods typically optimize the 3D representation to ensure that the rendered image aligns well with the given text, as evaluated by the pretrained 2D diffusion model. Nevertheless, a substantial domain gap exists between 2D images an… ▽ More

    Submitted 30 July, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

    Comments: ToMM24

  33. arXiv:2311.03768  [pdf, other

    cs.LG cs.AI

    PT-Tuning: Bridging the Gap between Time Series Masked Reconstruction and Forecasting via Prompt Token Tuning

    Authors: Hao Liu, Jinrui Gan, Xiaoxuan Fan, Yi Zhang, Chuanxian Luo, Jing Zhang, Guangxin Jiang, Yucheng Qian, Changwei Zhao, Huan Ma, Zhenyu Guo

    Abstract: Self-supervised learning has been actively studied in time series domain recently, especially for masked reconstruction. Most of these methods follow the "Pre-training + Fine-tuning" paradigm in which a new decoder replaces the pre-trained decoder to fit for a specific downstream task, leading to inconsistency of upstream and downstream tasks. In this paper, we first point out that the unification… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  34. arXiv:2311.02018  [pdf, other

    cs.AI cs.CV

    Active Reasoning in an Open-World Environment

    Authors: Manjie Xu, Guangyuan Jiang, Wei Liang, Chi Zhang, Yixin Zhu

    Abstract: Recent advances in vision-language learning have achieved notable success on complete-information question-answering datasets through the integration of extensive world knowledge. Yet, most models operate passively, responding to questions based on pre-stored knowledge. In stark contrast, humans possess the ability to actively explore, accumulate, and reason using both newfound and existing inform… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: Accepted to NeurIPS 2023

  35. arXiv:2311.00447  [pdf, other

    cs.AI

    On the Opportunities of Green Computing: A Survey

    Authors: You Zhou, Xiujing Lin, Xiang Zhang, Maolin Wang, Gangwei Jiang, Huakang Lu, Yupeng Wu, Kai Zhang, Zhe Yang, Kehang Wang, Yongduo Sui, Fengwei Jia, Zuoli Tang, Yao Zhao, Hongxuan Zhang, Tiannuo Yang, Weibo Chen, Yunong Mao, Yi Li, De Bao, Yu Li, Hongrui Liao, Ting Liu, Jingwen Liu, Jinchi Guo , et al. (16 additional authors not shown)

    Abstract: Artificial Intelligence (AI) has achieved significant advancements in technology and research with the development over several decades, and is widely used in many areas including computing vision, natural language processing, time-series analysis, speech synthesis, etc. During the age of deep learning, especially with the arise of Large Language Models, a large majority of researchers' attention… ▽ More

    Submitted 8 November, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: 113 pages, 18 figures

  36. arXiv:2311.00397  [pdf, other

    cs.CV

    Towards Omni-supervised Referring Expression Segmentation

    Authors: Minglang Huang, Yiyi Zhou, Gen Luo, Guannan Jiang, Weilin Zhuang, Xiaoshuai Sun

    Abstract: Referring Expression Segmentation (RES) is an emerging task in computer vision, which segments the target instances in images based on text descriptions. However, its development is plagued by the expensive segmentation labels. To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled,… ▽ More

    Submitted 27 November, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

  37. arXiv:2310.13024  [pdf, other

    cs.CL cs.AI cs.LG

    Towards Anytime Fine-tuning: Continually Pre-trained Language Models with Hypernetwork Prompt

    Authors: Gangwei Jiang, Caigao Jiang, Siqiao Xue, James Y. Zhang, Jun Zhou, Defu Lian, Ying Wei

    Abstract: Continual pre-training has been urgent for adapting a pre-trained model to a multitude of domains and tasks in the fast-evolving world. In practice, a continually pre-trained model is expected to demonstrate not only greater capacity when fine-tuned on pre-trained domains but also a non-decreasing performance on unseen ones. In this work, we first investigate such anytime fine-tuning effectiveness… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

  38. arXiv:2310.04993  [pdf, other

    cs.LG

    Prompt-augmented Temporal Point Process for Streaming Event Sequence

    Authors: Siqiao Xue, Yan Wang, Zhixuan Chu, Xiaoming Shi, Caigao Jiang, Hongyan Hao, Gangwei Jiang, Xiaoyun Feng, James Y. Zhang, Jun Zhou

    Abstract: Neural Temporal Point Processes (TPPs) are the prevalent paradigm for modeling continuous-time event sequences, such as user activities on the web and financial transactions. In real-world applications, event data is typically received in a \emph{streaming} manner, where the distribution of patterns may shift over time. Additionally, \emph{privacy and memory constraints} are commonly observed in p… ▽ More

    Submitted 13 October, 2023; v1 submitted 7 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023 camera ready version

  39. arXiv:2309.13681  [pdf, other

    cs.LG

    Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)

    Authors: Guo-qing Jiang, Jinlong Liu, Zixiang Ding, Lin Guo, Wei Lin

    Abstract: As models for nature language processing (NLP), computer vision (CV) and recommendation systems (RS) require surging computation, a large number of GPUs/TPUs are paralleled as a large batch (LB) to improve training throughput. However, training such LB tasks often meets large generalization gap and downgrades final precision, which limits enlarging the batch size. In this work, we develop the vari… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

    Comments: 25 pages, 5 figures

  40. arXiv:2309.10217  [pdf, other

    cs.CV cs.AI

    An Empirical Study of Attention Networks for Semantic Segmentation

    Authors: Hao Guo, Hongbiao Si, Guilin Jiang, Wei Zhang, Zhiyan Liu, Xuanyi Zhu, Xulong Zhang, Yang Liu

    Abstract: Semantic segmentation is a vital problem in computer vision. Recently, a common solution to semantic segmentation is the end-to-end convolution neural network, which is much more accurate than traditional methods.Recently, the decoders based on attention achieve state-of-the-art (SOTA) performance on various datasets. But these networks always are compared with the mIoU of previous SOTA networks t… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: Accepted by the 7th APWeb-WAIM International Joint Conference on Web and Big Data. (APWeb 2023)

  41. arXiv:2308.09493  [pdf, other

    eess.AS cs.SD

    Generative Machine Listener

    Authors: Guanxin Jiang, Lars Villemoes, Arijit Biswas

    Abstract: We show how a neural network can be trained on individual intrusive listening test scores to predict a distribution of scores for each pair of reference and coded input stereo or binaural signals. We nickname this method the Generative Machine Listener (GML), as it is capable of generating an arbitrary amount of simulated listening test data. Compared to a baseline system using regression over mea… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: Accepted to 155th Audio Engineering Society (AES) Convention, New York, NY, USA, October 2023

  42. arXiv:2308.08717  [pdf, other

    cs.CV cs.AI

    EdgeMA: Model Adaptation System for Real-Time Video Analytics on Edge Devices

    Authors: Liang Wang, Nan Zhang, Xiaoyang Qu, Jianzong Wang, Jiguang Wan, Guokuan Li, Kaiyu Hu, Guilin Jiang, Jing Xiao

    Abstract: Real-time video analytics on edge devices for changing scenes remains a difficult task. As edge devices are usually resource-constrained, edge deep neural networks (DNNs) have fewer weights and shallower architectures than general DNNs. As a result, they only perform well in limited scenarios and are sensitive to data drift. In this paper, we introduce EdgeMA, a practical and efficient video analy… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: Accepted by 30th International Conference on Neural Information Processing (ICONIP 2023)

  43. arXiv:2308.08446  [pdf, other

    cs.IR cs.LG

    CSPM: A Contrastive Spatiotemporal Preference Model for CTR Prediction in On-Demand Food Delivery Services

    Authors: Guyu Jiang, Xiaoyun Li, Rongrong Jing, Ruoqi Zhao, Xingliang Ni, Guodong Cao, Ning Hu

    Abstract: Click-through rate (CTR) prediction is a crucial task in the context of an online on-demand food delivery (OFD) platform for precisely estimating the probability of a user clicking on food items. Unlike universal e-commerce platforms such as Taobao and Amazon, user behaviors and interests on the OFD platform are more location and time-sensitive due to limited delivery ranges and regional commodity… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

  44. arXiv:2308.05996  [pdf, other

    cs.AI

    Deep Task-specific Bottom Representation Network for Multi-Task Recommendation

    Authors: Qi Liu, Zhilong Zhou, Gangwei Jiang, Tiezheng Ge, Defu Lian

    Abstract: Neural-based multi-task learning (MTL) has gained significant improvement, and it has been successfully applied to recommendation system (RS). Recent deep MTL methods for RS (e.g. MMoE, PLE) focus on designing soft gating-based parameter-sharing networks that implicitly learn a generalized representation for each task. However, MTL methods may suffer from performance degeneration when dealing with… ▽ More

    Submitted 17 August, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: CIKM'23

  45. arXiv:2308.05359  [pdf, other

    cs.CV

    Pseudo-label Alignment for Semi-supervised Instance Segmentation

    Authors: Jie Hu, Chen Chen, Liujuan Cao, Shengchuan Zhang, Annan Shu, Guannan Jiang, Rongrong Ji

    Abstract: Pseudo-labeling is significant for semi-supervised instance segmentation, which generates instance masks and classes from unannotated images for subsequent training. However, in existing pipelines, pseudo-labels that contain valuable information may be directly filtered out due to mismatches in class and mask quality. To address this issue, we propose a novel framework, called pseudo-label alignin… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  46. arXiv:2308.02606  [pdf, other

    cs.CV

    Improving Human-Object Interaction Detection via Virtual Image Learning

    Authors: Shuman Fang, Shuai Liu, Jie Li, Guannan Jiang, Xianming Lin, Rongrong Ji

    Abstract: Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects, which plays a curtail role in high-level semantic understanding tasks. However, most works pursue designing better architectures to learn overall features more efficiently, while ignoring the long-tail nature of interaction-object pair categories. In this paper, we propose to alleviate the impa… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM 2023

  47. arXiv:2307.16376  [pdf, other

    cs.IR cs.AI cs.CL

    When Large Language Models Meet Personalization: Perspectives of Challenges and Opportunities

    Authors: Jin Chen, Zheng Liu, Xu Huang, Chenwang Wu, Qi Liu, Gangwei Jiang, Yuanhao Pu, Yuxuan Lei, Xiaolong Chen, Xingmei Wang, Defu Lian, Enhong Chen

    Abstract: The advent of large language models marks a revolutionary breakthrough in artificial intelligence. With the unprecedented scale of training and model parameters, the capability of large language models has been dramatically improved, leading to human-like performances in understanding, language synthesizing, and common-sense reasoning, etc. Such a major leap-forward in general AI capacity will cha… ▽ More

    Submitted 30 July, 2023; originally announced July 2023.

  48. Continual Learning in Predictive Autoscaling

    Authors: Hongyan Hao, Zhixuan Chu, Shiyi Zhu, Gangwei Jiang, Yan Wang, Caigao Jiang, James Zhang, Wei Jiang, Siqiao Xue, Jun Zhou

    Abstract: Predictive Autoscaling is used to forecast the workloads of servers and prepare the resources in advance to ensure service level objectives (SLOs) in dynamic cloud environments. However, in practice, its prediction task often suffers from performance degradation under abnormal traffics caused by external events (such as sales promotional activities and applications re-configurations), for which a… ▽ More

    Submitted 14 August, 2023; v1 submitted 29 July, 2023; originally announced July 2023.

  49. arXiv:2307.08222  [pdf

    cs.HC

    NaMemo2: Facilitating Teacher-Student Interaction with Theory-Based Design and Student Autonomy Consideration

    Authors: Guang Jiang, Jiahui Zhu, Yunsong Li, Pengcheng An, Yunlong Wang

    Abstract: Teacher-student interaction (TSI) is essential for learning efficiency and harmonious teacher-student interpersonal relationships. However, studies on TSI support tools often focus on teacher needs while neglecting student needs and autonomy. To enhance both lecturer competence in delivering interpersonal interaction and student autonomy in TSI, we developed NaMemo2, a novel augmented-reality syst… ▽ More

    Submitted 18 July, 2023; v1 submitted 16 July, 2023; originally announced July 2023.

    Comments: This paper has been accepted in July 2023 for publication in Education and Information Technologies

  50. arXiv:2306.15706  [pdf, other

    cs.CV

    Approximated Prompt Tuning for Vision-Language Pre-trained Models

    Authors: Qiong Wu, Shubin Huang, Yiyi Zhou, Pingyang Dai, Annan Shu, Guannan Jiang, Rongrong Ji

    Abstract: Prompt tuning is a parameter-efficient way to deploy large-scale pre-trained models to downstream tasks by adding task-specific tokens. In terms of vision-language pre-trained (VLP) models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks, which greatly exacerbates the already high computational overhead. In this paper,… ▽ More

    Submitted 21 August, 2023; v1 submitted 27 June, 2023; originally announced June 2023.