Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 305 results for author: Zhou, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00056  [pdf, other

    cs.IR cs.AI cs.SI

    MMBee: Live Streaming Gift-Sending Recommendations via Multi-Modal Fusion and Behaviour Expansion

    Authors: Jiaxin Deng, Shiyao Wang, Yuchen Wang, Jiansong Qi, Liqin Zhao, Guorui Zhou, Gaofeng Meng

    Abstract: Live streaming services are becoming increasingly popular due to real-time interactions and entertainment. Viewers can chat and send comments or virtual gifts to express their preferences for the streamers. Accurately modeling the gifting interaction not only enhances users' experience but also increases streamers' revenue. Previous studies on live streaming gifting prediction treat this task as a… ▽ More

    Submitted 15 June, 2024; originally announced July 2024.

    Comments: Accepted at KDD 2024

  2. arXiv:2406.12186  [pdf, ps, other

    eess.IV cs.CV

    Unlocking the Potential of Early Epochs: Uncertainty-aware CT Metal Artifact Reduction

    Authors: Xinquan Yang, Guanqun Zhou, Wei Sun, Youjian Zhang, Zhongya Wang, Jiahui He, Zhicheng Zhang

    Abstract: In computed tomography (CT), the presence of metallic implants in patients often leads to disruptive artifacts in the reconstructed images, hindering accurate diagnosis. Recently, a large amount of supervised deep learning-based approaches have been proposed for metal artifact reduction (MAR). However, these methods neglect the influence of initial training weights. In this paper, we have discover… ▽ More

    Submitted 20 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  3. Contextual Distillation Model for Diversified Recommendation

    Authors: Fan Li, Xu Si, Shisong Tang, Dingmin Wang, Kunyan Han, Bing Han, Guorui Zhou, Yang Song, Hechang Chen

    Abstract: The diversity of recommendation is equally crucial as accuracy in improving user experience. Existing studies, e.g., Determinantal Point Process (DPP) and Maximal Marginal Relevance (MMR), employ a greedy paradigm to iteratively select items that optimize both accuracy and diversity. However, prior methods typically exhibit quadratic complexity, limiting their applications to the re-ranking stage… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: accepted by KDD 2024

  4. arXiv:2406.00276  [pdf

    cs.LG cs.AI cs.CE physics.data-an

    Non-destructive Degradation Pattern Decoupling for Ultra-early Battery Prototype Verification Using Physics-informed Machine Learning

    Authors: Shengyu Tao, Mengtian Zhang, Zixi Zhao, Haoyang Li, Ruifei Ma, Yunhong Che, Xin Sun, Lin Su, Xiangyu Chen, Zihao Zhou, Heng Chang, Tingwei Cao, Xiao Xiao, Yaojun Liu, Wenjun Yu, Zhongling Xu, Yang Li, Han Hao, Xuan Zhang, Xiaosong Hu, Guangmin ZHou

    Abstract: Manufacturing complexities and uncertainties have impeded the transition from material prototypes to commercial batteries, making prototype verification critical to quality assessment. A fundamental challenge involves deciphering intertwined chemical processes to characterize degradation patterns and their quantitative relationship with battery performance. Here we show that a physics-informed mac… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    ACM Class: J.2; G.3

  5. arXiv:2405.19610  [pdf, other

    stat.ML cs.LG stat.ME

    Factor Augmented Tensor-on-Tensor Neural Networks

    Authors: Guanhao Zhou, Yuefeng Han, Xiufan Yu

    Abstract: This paper studies the prediction task of tensor-on-tensor regression in which both covariates and responses are multi-dimensional arrays (a.k.a., tensors) across time with arbitrary tensor order and data dimension. Existing methods either focused on linear models without accounting for possibly nonlinear relationships between covariates and responses, or directly employed black-box deep learning… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  6. arXiv:2405.14824  [pdf, other

    cs.CV cs.RO

    Camera Relocalization in Shadow-free Neural Radiance Fields

    Authors: Shiyao Xu, Caiyun Liu, Yuantao Chen, Zhenxin Zhu, Zike Yan, Yongliang Shi, Hao Zhao, Guyue Zhou

    Abstract: Camera relocalization is a crucial problem in computer vision and robotics. Recent advancements in neural radiance fields (NeRFs) have shown promise in synthesizing photo-realistic images. Several works have utilized NeRFs for refining camera poses, but they do not account for lighting changes that can affect scene appearance and shadow regions, causing a degraded pose optimization process. In thi… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Accepted by ICRA 2024. 8 pages, 5 figures, 3 tables. Codes and dataset: https://github.com/hnrna/ShadowfreeNeRF-CameraReloc

  7. arXiv:2405.12217  [pdf, other

    cs.CV cs.AI cs.LG

    Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning

    Authors: Guanglin Zhou, Zhongyi Han, Shiming Chen, Biwei Huang, Liming Zhu, Salman Khan, Xin Gao, Lina Yao

    Abstract: Recent studies indicate that large multimodal models (LMMs) are highly robust against natural distribution shifts, often surpassing previous baselines. Despite this, domain-specific adaptation is still necessary, particularly in specialized areas like healthcare. Due to the impracticality of fine-tuning LMMs given their vast parameter space, this work investigates in-context learning (ICL) as an e… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 17 pages, 7 figures, 7 tables

  8. arXiv:2405.11769  [pdf, other

    q-bio.BM cs.LG physics.bio-ph

    Uni-Mol Docking V2: Towards Realistic and Accurate Binding Pose Prediction

    Authors: Eric Alcaide, Zhifeng Gao, Guolin Ke, Yaqi Li, Linfeng Zhang, Hang Zheng, Gengmo Zhou

    Abstract: In recent years, machine learning (ML) methods have emerged as promising alternatives for molecular docking, offering the potential for high accuracy without incurring prohibitive computational costs. However, recent studies have indicated that these ML models may overfit to quantitative metrics while neglecting the physical constraints inherent in the problem. In this work, we present Uni-Mol Doc… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  9. arXiv:2405.08423  [pdf, other

    eess.IV cs.CV

    NAFRSSR: a Lightweight Recursive Network for Efficient Stereo Image Super-Resolution

    Authors: Yihong Chen, Zhen Fan, Shuai Dong, Zhiwei Chen, Wenjie Li, Minghui Qin, Min Zeng, Xubing Lu, Guofu Zhou, Xingsen Gao, Jun-Ming Liu

    Abstract: Stereo image super-resolution (SR) refers to the reconstruction of a high-resolution (HR) image from a pair of low-resolution (LR) images as typically captured by a dual-camera device. To enhance the quality of SR images, most previous studies focused on increasing the number and size of feature maps and introducing complex and computationally intensive structures, resulting in models with high co… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  10. arXiv:2405.06524  [pdf, other

    cs.CL

    Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts

    Authors: Wenyu Huang, Guancheng Zhou, Mirella Lapata, Pavlos Vougiouklis, Sebastien Montella, Jeff Z. Pan

    Abstract: Although Large Language Models (LLMs) are effective in performing various NLP tasks, they still struggle to handle tasks that require extensive, real-world knowledge, especially when dealing with long-tail facts (facts related to long-tail entities). This limitation highlights the need to supplement LLMs with non-parametric knowledge. To address this issue, we analysed the effects of different typ… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  11. arXiv:2405.05957  [pdf, other

    cs.CL

    OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning

    Authors: Dan Qiao, Yi Su, Pinzheng Wang, Jing Ye, Wenjing Xie, Yuechi Zhou, Yuyang Ding, Zecheng Tang, Jikai Wang, Yixin Ji, Yue Wang, Pei Guo, Zechen Sun, Zikang Zhang, Juntao Li, Pingfu Chao, Wenliang Chen, Guohong Fu, Guodong Zhou, Qiaoming Zhu, Min Zhang

    Abstract: Large Language Models (LLMs) have played an important role in many fields due to their powerful capabilities.However, their massive number of parameters leads to high deployment requirements and incurs significant inference costs, which impedes their practical applications. Training smaller models is an effective way to address this problem. Therefore, we introduce OpenBA-V2, a 3.4B model derived… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  12. arXiv:2405.04840  [pdf, other

    cs.IR

    Federated Adaptation for Foundation Model-based Recommendations

    Authors: Chunxu Zhang, Guodong Long, Hongkuan Guo, Xiao Fang, Yang Song, Zhaojie Liu, Guorui Zhou, Zijian Zhang, Yang Liu, Bo Yang

    Abstract: With the recent success of large language models, particularly foundation models with generalization abilities, applying foundation models for recommendations becomes a new paradigm to improve existing recommendation systems. It becomes a new open challenge to enable the foundation model to capture user preference changes in a timely manner with reasonable communication and computation costs while… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted as a regular paper of IJCAI'24

  13. arXiv:2405.03727  [pdf, other

    cs.SE cs.AI cs.LG cs.PL

    Large Language Models Synergize with Automated Machine Learning

    Authors: Jinglue Xu, Jialong Li, Zhen Liu, Nagar Anthel Venkatesh Suryanarayanan, Guoyuan Zhou, Jia Guo, Hitoshi Iba, Kenji Tei

    Abstract: Recently, program synthesis driven by large language models (LLMs) has become increasingly popular. However, program synthesis for machine learning (ML) tasks still poses significant challenges. This paper explores a novel form of program synthesis, targeting ML programs, by combining LLMs and automated machine learning (autoML). Specifically, our goal is to fully automate the generation and optim… ▽ More

    Submitted 11 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  14. arXiv:2405.02880  [pdf, other

    cs.CV cs.RO

    Blending Distributed NeRFs with Tri-stage Robust Pose Optimization

    Authors: Baijun Ye, Caiyun Liu, Xiaoyu Ye, Yuantao Chen, Yuhai Wang, Zike Yan, Yongliang Shi, Hao Zhao, Guyue Zhou

    Abstract: Due to the limited model capacity, leveraging distributed Neural Radiance Fields (NeRFs) for modeling extensive urban environments has become a necessity. However, current distributed NeRF registration approaches encounter aliasing artifacts, arising from discrepancies in rendering resolutions and suboptimal pose precision. These factors collectively deteriorate the fidelity of pose estimation wit… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  15. arXiv:2404.18192  [pdf, other

    cs.RO

    Block-Map-Based Localization in Large-Scale Environment

    Authors: Yixiao Feng, Zhou Jiang, Yongliang Shi, Yunlong Feng, Xiangyu Chen, Hao Zhao, Guyue Zhou

    Abstract: Accurate localization is an essential technology for the flexible navigation of robots in large-scale environments. Both SLAM-based and map-based localization will increase the computing load due to the increase in map size, which will affect downstream tasks such as robot navigation and services. To this end, we propose a localization system based on Block Maps (BMs) to reduce the computational l… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 7 pages, 4 figures, 4 tables, published to ICRA 2024

  16. arXiv:2404.16831  [pdf, other

    cs.CV

    The Third Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

    Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More

    Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in CVPRW2024

  17. arXiv:2404.15807  [pdf, other

    cs.CL

    One Subgraph for All: Efficient Reasoning on Opening Subgraphs for Inductive Knowledge Graph Completion

    Authors: Zhiwen Xie, Yi Zhang, Guangyou Zhou, Jin Liu, Xinhui Tu, Jimmy Xiangji Huang

    Abstract: Knowledge Graph Completion (KGC) has garnered massive research interest recently, and most existing methods are designed following a transductive setting where all entities are observed during training. Despite the great progress on the transductive KGC, these methods struggle to conduct reasoning on emerging KGs involving unseen entities. Thus, inductive KGC, which aims to deduce missing links am… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  18. arXiv:2404.13946  [pdf, other

    cs.LG

    Dual Model Replacement:invisible Multi-target Backdoor Attack based on Federal Learning

    Authors: Rong Wang, Guichen Zhou, Mingjun Gao, Yunpeng Xiao

    Abstract: In recent years, the neural network backdoor hidden in the parameters of the federated learning model has been proved to have great security risks. Considering the characteristics of trigger generation, data poisoning and model training in backdoor attack, this paper designs a backdoor attack method based on federated learning. Firstly, aiming at the concealment of the backdoor trigger, a TrojanGa… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  19. arXiv:2404.13425  [pdf, other

    cs.CV cs.AI

    AdvLoRA: Adversarial Low-Rank Adaptation of Vision-Language Models

    Authors: Yuheng Ji, Yue Liu, Zhicheng Zhang, Zhao Zhang, Yuting Zhao, Gang Zhou, Xingwei Zhang, Xinwang Liu, Xiaolong Zheng

    Abstract: Vision-Language Models (VLMs) are a significant technique for Artificial General Intelligence (AGI). With the fast growth of AGI, the security problem become one of the most important challenges for VLMs. In this paper, through extensive experiments, we demonstrate the vulnerability of the conventional adaptation methods for VLMs, which may bring significant security risks. In addition, as the siz… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  20. arXiv:2404.06078  [pdf, other

    cs.IR

    End-to-end training of Multimodal Model and ranking Model

    Authors: Xiuqi Deng, Lu Xu, Xiyao Li, Jinkai Yu, Erpeng Xue, Zhongyuan Wang, Di Zhang, Zhaojie Liu, Guorui Zhou, Yang Song, Na Mou, Shen Jiang, Han Li

    Abstract: Traditional recommender systems heavily rely on ID features, which often encounter challenges related to cold-start and generalization. Modeling pre-extracted content features can mitigate these issues, but is still a suboptimal solution due to the discrepancies between training tasks and model parameters. End-to-end training presents a promising solution for these problems, yet most of the existi… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 9 pages, 8 figures

  21. arXiv:2404.04579  [pdf, other

    cs.HC

    TeleAware Robot: Designing Awareness-augmented Telepresence Robot for Remote Collaborative Locomotion

    Authors: Ruyi Li, Yaxin Zhu, Min Liu, Yihang Zeng, Shanning Zhuang, Jiayi Fu, Yi Lu, Guyue Zhou, Can Liu, Jiangtao Gong

    Abstract: Telepresence robots can be used to support users to navigate an environment remotely and share the visiting experience with their social partners. Although such systems allow users to see and hear the remote environment and communicate with their partners via live video feed, this does not provide enough awareness of the environment and their remote partner's activities. In this paper, we introduc… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 33 pages, 12 figures

    MSC Class: H.5.2

    Journal ref: IMUWT 2024

  22. arXiv:2404.04167  [pdf, other

    cs.CL cs.AI

    Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

    Authors: Xinrun Du, Zhouliang Yu, Songyang Gao, Ding Pan, Yuyang Cheng, Ziyang Ma, Ruibin Yuan, Xingwei Qu, Jiaheng Liu, Tianyu Zheng, Xinchen Luo, Guorui Zhou, Binhang Yuan, Wenhu Chen, Jie Fu, Ge Zhang

    Abstract: In this study, we introduce CT-LLM, a 2B large language model (LLM) that illustrates a pivotal shift towards prioritizing the Chinese language in developing LLMs. Uniquely initiated from scratch, CT-LLM diverges from the conventional methodology by primarily incorporating Chinese textual data, utilizing an extensive corpus of 1,200 billion tokens, including 800 billion Chinese tokens, 300 billion… ▽ More

    Submitted 9 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

  23. arXiv:2404.03634  [pdf, other

    cs.RO cs.CV

    PreAfford: Universal Affordance-Based Pre-Grasping for Diverse Objects and Environments

    Authors: Kairui Ding, Boyuan Chen, Ruihai Wu, Yuyang Li, Zongzheng Zhang, Huan-ang Gao, Siqi Li, Guyue Zhou, Yixin Zhu, Hao Dong, Hao Zhao

    Abstract: Robotic manipulation with two-finger grippers is challenged by objects lacking distinct graspable features. Traditional pre-grasping methods, which typically involve repositioning objects or utilizing external aids like table edges, are limited in their adaptability across different object categories and environments. To overcome these limitations, we introduce PreAfford, a novel pre-grasping plan… ▽ More

    Submitted 4 July, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: Project Page: https://air-discover.github.io/PreAfford/

  24. Efficient Multi-branch Segmentation Network for Situation Awareness in Autonomous Navigation

    Authors: Guan-Cheng Zhou, Chen Chengb, Yan-zhou Chena

    Abstract: Real-time and high-precision situational awareness technology is critical for autonomous navigation of unmanned surface vehicles (USVs). In particular, robust and fast obstacle semantic segmentation methods are essential. However, distinguishing between the sea and the sky is challenging due to the differences between port and maritime environments. In this study, we built a dataset that captured… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Journal ref: Ocean Engineering 302 (2024) 117741

  25. arXiv:2403.16535  [pdf, other

    cs.RO

    Arm-Constrained Curriculum Learning for Loco-Manipulation of the Wheel-Legged Robot

    Authors: Zifan Wang, Yufei Jia, Lu Shi, Haoyu Wang, Haizhou Zhao, Xueyang Li, Jinni Zhou, Jun Ma, Guyue Zhou

    Abstract: Incorporating a robotic manipulator into a wheel-legged robot enhances its agility and expands its potential for practical applications. However, the presence of potential instability and uncertainties presents additional challenges for control objectives. In this paper, we introduce an arm-constrained curriculum learning architecture to tackle the issues introduced by adding the manipulator. Firs… ▽ More

    Submitted 28 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  26. arXiv:2403.14674  [pdf

    cs.CY

    Packaging Up Media Mix Modeling: An Introduction to Robyn's Open-Source Approach

    Authors: Gufeng Zhou, Igor Skokan, Julian Runge

    Abstract: While attribution of user behavior across apps and websites had led to unseen levels of determinism in digital advertising measurement, privacy-centric changes to the digital data landscape are bringing probabilistic techniques such as marketing and media mix modeling en vogue again. Many small and midsize advertisers lack the scale and resources to invest in advanced proprietary modeling efforts… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  27. arXiv:2403.12787  [pdf, other

    cs.CV

    DDSB: An Unsupervised and Training-free Method for Phase Detection in Echocardiography

    Authors: Zhenyu Bu, Yang Liu, Jiayu Huo, Jingjing Peng, Kaini Wang, Guangquan Zhou, Rachel Sparks, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin

    Abstract: Accurate identification of End-Diastolic (ED) and End-Systolic (ES) frames is key for cardiac function assessment through echocardiography. However, traditional methods face several limitations: they require extensive amounts of data, extensive annotations by medical experts, significant training resources, and often lack robustness. Addressing these challenges, we proposed an unsupervised and tra… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  28. arXiv:2403.12386  [pdf

    cs.CL cs.AI

    Pipelined Biomedical Event Extraction Rivaling Joint Learning

    Authors: Pengchao Wu, Xuefeng Li, Jinghang Gu, Longhua Qian, Guodong Zhou

    Abstract: Biomedical event extraction is an information extraction task to obtain events from biomedical text, whose targets include the type, the trigger, and the respective arguments involved in an event. Traditional biomedical event extraction usually adopts a pipelined approach, which contains trigger identification, argument role recognition, and finally event construction either using specific rules o… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  29. arXiv:2403.10319  [pdf, other

    cs.NI cs.CR

    NetBench: A Large-Scale and Comprehensive Network Traffic Benchmark Dataset for Foundation Models

    Authors: Chen Qian, Xiaochang Li, Qineng Wang, Gang Zhou, Huajie Shao

    Abstract: In computer networking, network traffic refers to the amount of data transmitted in the form of packets between internetworked computers or Cyber-Physical Systems. Monitoring and analyzing network traffic is crucial for ensuring the performance, security, and reliability of a network. However, a significant challenge in network traffic analysis is to process diverse data packets including both cip… ▽ More

    Submitted 18 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  30. arXiv:2403.07027  [pdf, ps, other

    cs.LG

    FWin transformer for dengue prediction under climate and ocean influence

    Authors: Nhat Thanh Tran, Jack Xin, Guofa Zhou

    Abstract: Dengue fever is one of the most deadly mosquito-born tropical infectious diseases. Detailed long range forecast model is vital in controlling the spread of disease and making mitigation efforts. In this study, we examine methods used to forecast dengue cases for long range predictions. The dataset consists of local climate/weather in addition to global climate indicators of Singapore from 2000 to… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  31. arXiv:2403.05326  [pdf, other

    cs.CL cs.AI

    ChatASU: Evoking LLM's Reflexion to Truly Understand Aspect Sentiment in Dialogues

    Authors: Yiding Liu, Jingjing Wang, Jiamin Luo, Tao Zeng, Guodong Zhou

    Abstract: Aspect Sentiment Understanding (ASU) in interactive scenarios (e.g., Question-Answering and Dialogue) has attracted ever-more interest in recent years and achieved important progresses. However, existing studies on interactive ASU largely ignore the coreference issue for opinion targets (i.e., aspects), while this phenomenon is ubiquitous in interactive scenarios especially dialogues, limiting the… ▽ More

    Submitted 10 April, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  32. arXiv:2403.04789  [pdf, other

    cs.CL cs.AI cs.LG

    TopicDiff: A Topic-enriched Diffusion Approach for Multimodal Conversational Emotion Detection

    Authors: Jiamin Luo, Jingjing Wang, Guodong Zhou

    Abstract: Multimodal Conversational Emotion (MCE) detection, generally spanning across the acoustic, vision and language modalities, has attracted increasing interest in the multimedia community. Previous studies predominantly focus on learning contextual information in conversations with only a few considering the topic information in single language modality, while always neglecting the acoustic and visio… ▽ More

    Submitted 10 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  33. arXiv:2403.02942  [pdf, other

    cs.IT eess.SP

    Tensor Decomposition-based Time Varying Channel Estimation for mmWave MIMO-OFDM Systems

    Authors: Ruizhe Wang, Hong Ren, Cunhua Pan, Gui Zhou, Jiangzhou Wang

    Abstract: In this paper, we consider the time-varying channel estimation in millimeter wave (mmWave) multiple-input multiple-output MIMO systems with hybrid beamforming architectures. Different from the existing contributions that considered single-carrier mmWave systems with high mobility, the wideband orthogonal frequency division multiplexing (OFDM) system is considered in this work. To solve the channel… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  34. arXiv:2403.01820  [pdf, other

    math.NA cs.LG

    Macroscopic auxiliary asymptotic preserving neural networks for the linear radiative transfer equations

    Authors: Hongyan Li, Song Jiang, Wenjun Sun, Liwei Xu, Guanyu Zhou

    Abstract: We develop a Macroscopic Auxiliary Asymptotic-Preserving Neural Network (MA-APNN) method to solve the time-dependent linear radiative transfer equations (LRTEs), which have a multi-scale nature and high dimensionality. To achieve this, we utilize the Physics-Informed Neural Networks (PINNs) framework and design a new adaptive exponentially weighted Asymptotic-Preserving (AP) loss function, which i… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 24 pages, 29 figures

  35. arXiv:2402.19116  [pdf, other

    cs.CL cs.AI

    How to Understand "Support"? An Implicit-enhanced Causal Inference Approach for Weakly-supervised Phrase Grounding

    Authors: Jiamin Luo, Jianing Zhao, Jingjing Wang, Guodong Zhou

    Abstract: Weakly-supervised Phrase Grounding (WPG) is an emerging task of inferring the fine-grained phrase-region matching, while merely leveraging the coarse-grained sentence-image pairs for training. However, existing studies on WPG largely ignore the implicit phrase-region matching relations, which are crucial for evaluating the capability of models in understanding the deep multimodal semantics. To thi… ▽ More

    Submitted 4 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  36. arXiv:2402.16915  [pdf, other

    cs.LG cs.AI

    More Than Routing: Joint GPS and Route Modeling for Refine Trajectory Representation Learning

    Authors: Zhipeng Ma, Zheyan Tu, Xinhai Chen, Yan Zhang, Deguo Xia, Guyue Zhou, Yilun Chen, Yu Zheng, Jiangtao Gong

    Abstract: Trajectory representation learning plays a pivotal role in supporting various downstream tasks. Traditional methods in order to filter the noise in GPS trajectories tend to focus on routing-based methods used to simplify the trajectories. However, this approach ignores the motion details contained in the GPS data, limiting the representation capability of trajectory representation learning. To fil… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  37. arXiv:2402.15852  [pdf, other

    cs.CV cs.RO

    NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

    Authors: Jiazhao Zhang, Kunyu Wang, Rongtao Xu, Gengze Zhou, Yicong Hong, Xiaomeng Fang, Qi Wu, Zhizheng Zhang, He Wang

    Abstract: Vision-and-language navigation (VLN) stands as a key research problem of Embodied AI, aiming at enabling agents to navigate in unseen environments following linguistic instructions. In this field, generalization is a long-standing challenge, either to out-of-distribution scenes or from Sim to Real. In this paper, we propose NaVid, a video-based large vision language model (VLM), to mitigate such a… ▽ More

    Submitted 30 June, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

    Comments: Accepted by Robotics: Science and Systems (RSS 2024)

  38. arXiv:2402.15738  [pdf, other

    cs.CR eess.SY

    Privacy-Preserving State Estimation in the Presence of Eavesdroppers: A Survey

    Authors: Xinhao Yan, Guanzhong Zhou, Daniel E. Quevedo, Carlos Murguia, Bo Chen, Hailong Huang

    Abstract: Networked systems are increasingly the target of cyberattacks that exploit vulnerabilities within digital communications, embedded hardware, and software. Arguably, the simplest class of attacks -- and often the first type before launching destructive integrity attacks -- are eavesdropping attacks, which aim to infer information by collecting system data and exploiting it for malicious purposes. A… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 16 pages, 5 figures, 4 tables

  39. CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge

    Authors: Xiao Lin, Minghao Zhu, Ronghao Dang, Guangliang Zhou, Shaolong Shu, Feng Lin, Chengju Liu, Qijun Chen

    Abstract: Most of existing category-level object pose estimation methods devote to learning the object category information from point cloud modality. However, the scale of 3D datasets is limited due to the high cost of 3D data collection and annotation. Consequently, the category features extracted from these limited point cloud samples may not be comprehensive. This motivates us to investigate whether we… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 14 pages, 4 figures, 9 tables

  40. Understanding Human-AI Collaboration in Music Therapy Through Co-Design with Therapists

    Authors: Jingjing Sun, Jingyi Yang, Guyue Zhou, Yucheng Jin, Jiangtao Gong

    Abstract: The rapid development of musical AI technologies has expanded the creative potential of various musical activities, ranging from music style transformation to music generation. However, little research has investigated how musical AIs can support music therapists, who urgently need new technology support. This study used a mixed method, including semi-structured interviews and a participatory desi… ▽ More

    Submitted 15 April, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 20 pages, 7 figures

    MSC Class: H.5.2

    Journal ref: CHI2024

  41. "It Must Be Gesturing Towards Me": Gesture-Based Interaction between Autonomous Vehicles and Pedestrians

    Authors: Xiang Chang, Zihe Chen, Xiaoyan Dong, Yuxin Cai, Tingmin Yan, Haolin Cai, Zherui Zhou, Guyue Zhou, Jiangtao Gong

    Abstract: Interacting with pedestrians understandably and efficiently is one of the toughest challenges faced by autonomous vehicles (AVs) due to the limitations of current algorithms and external human-machine interfaces (eHMIs). In this paper, we design eHMIs based on gestures inspired by the most popular method of interaction between pedestrians and human drivers. Eight common gestures were selected to c… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 26 pages,22 figures

    MSC Class: H.5.2

    Journal ref: CHI2024

  42. arXiv:2402.14399  [pdf, other

    cs.IR cs.AI

    Ensure Timeliness and Accuracy: A Novel Sliding Window Data Stream Paradigm for Live Streaming Recommendation

    Authors: Fengqi Liang, Baigong Zheng, Liqin Zhao, Guorui Zhou, Qian Wang, Yanan Niu

    Abstract: Live streaming recommender system is specifically designed to recommend real-time live streaming of interest to users. Due to the dynamic changes of live content, improving the timeliness of the live streaming recommender system is a critical problem. Intuitively, the timeliness of the data determines the upper bound of the timeliness that models can learn. However, none of the previous works addr… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  43. arXiv:2402.09055  [pdf, other

    cs.CV cs.AI

    Comment-aided Video-Language Alignment via Contrastive Pre-training for Short-form Video Humor Detection

    Authors: Yang Liu, Tongfei Shen, Dong Zhang, Qingying Sun, Shoushan Li, Guodong Zhou

    Abstract: The growing importance of multi-modal humor detection within affective computing correlates with the expanding influence of short-form video sharing on social media platforms. In this paper, we propose a novel two-branch hierarchical model for short-form video humor detection (SVHD), named Comment-aided Video-Language Alignment (CVLA) via data-augmented multi-modal contrastive pre-training. Notabl… ▽ More

    Submitted 14 April, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: Accepted by ICMR 2024

  44. arXiv:2402.05869  [pdf, other

    cs.CV

    Adaptive Surface Normal Constraint for Geometric Estimation from Monocular Images

    Authors: Xiaoxiao Long, Yuhang Zheng, Yupeng Zheng, Beiwen Tian, Cheng Lin, Lingjie Liu, Hao Zhao, Guyue Zhou, Wenping Wang

    Abstract: We introduce a novel approach to learn geometries such as depth and surface normal from images while incorporating geometric context. The difficulty of reliably capturing geometric context in existing methods impedes their ability to accurately enforce the consistency between the different geometric properties, thereby leading to a bottleneck of geometric estimation quality. We therefore propose t… ▽ More

    Submitted 31 March, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: Accepted by TPAMI. arXiv admin note: substantial text overlap with arXiv:2103.15483

  45. arXiv:2402.04580  [pdf, other

    cs.RO cs.AI cs.LG

    A Comprehensive Survey of Cross-Domain Policy Transfer for Embodied Agents

    Authors: Haoyi Niu, Jianming Hu, Guyue Zhou, Xianyuan Zhan

    Abstract: The burgeoning fields of robot learning and embodied AI have triggered an increasing demand for large quantities of data. However, collecting sufficient unbiased data from the target domain remains a challenge due to costly data collection processes and stringent safety requirements. Consequently, researchers often resort to data from easily accessible source domains, such as simulation and labora… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  46. arXiv:2402.02456  [pdf, other

    cs.LG cs.CL

    tnGPS: Discovering Unknown Tensor Network Structure Search Algorithms via Large Language Models (LLMs)

    Authors: Junhua Zeng, Chao Li, Zhun Sun, Qibin Zhao, Guoxu Zhou

    Abstract: Tensor networks are efficient for extremely high-dimensional representation, but their model selection, known as tensor network structure search (TN-SS), is a challenging problem. Although several works have targeted TN-SS, most existing algorithms are manually crafted heuristics with poor performance, suffering from the curse of dimensionality and local convergence. In this work, we jump out of t… ▽ More

    Submitted 1 June, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: Accepted by ICML2024, pre-printed version

  47. arXiv:2402.00632  [pdf, other

    cs.CL

    Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases

    Authors: Giulio Zhou, Tsz Kin Lam, Alexandra Birch, Barry Haddow

    Abstract: Speech-to-Text Translation (S2TT) has typically been addressed with cascade systems, where speech recognition systems generate a transcription that is subsequently passed to a translation model. While there has been a growing interest in developing direct speech translation systems to avoid propagating errors and losing non-verbal content, prior work in direct S2TT has struggled to conclusively es… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: Accepted at Findings of EACL 2024

  48. arXiv:2401.09716  [pdf, other

    cs.CV cs.AI

    HCVP: Leveraging Hierarchical Contrastive Visual Prompt for Domain Generalization

    Authors: Guanglin Zhou, Zhongyi Han, Shiming Chen, Biwei Huang, Liming Zhu, Tongliang Liu, Lina Yao, Kun Zhang

    Abstract: Domain Generalization (DG) endeavors to create machine learning models that excel in unseen scenarios by learning invariant features. In DG, the prevalent practice of constraining models to a fixed structure or uniform parameterization to encapsulate invariant features can inadvertently blend specific aspects. Such an approach struggles with nuanced differentiation of inter-domain variations and m… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  49. arXiv:2401.05946  [pdf, other

    cs.LG cs.AI

    Learning Cognitive Maps from Transformer Representations for Efficient Planning in Partially Observed Environments

    Authors: Antoine Dedieu, Wolfgang Lehrach, Guangyao Zhou, Dileep George, Miguel Lázaro-Gredilla

    Abstract: Despite their stellar performance on a wide range of tasks, including in-context tasks only revealed during inference, vanilla transformers and variants trained for next-token predictions (a) do not learn an explicit world model of their environment which can be flexibly queried and (b) cannot be used for planning or navigation. In this paper, we consider partially observed environments (POEs), wh… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  50. arXiv:2401.04942  [pdf, other

    cs.CV

    Latency-aware Road Anomaly Segmentation in Videos: A Photorealistic Dataset and New Metrics

    Authors: Beiwen Tian, Huan-ang Gao, Leiyao Cui, Yupeng Zheng, Lan Luo, Baofeng Wang, Rong Zhi, Guyue Zhou, Hao Zhao

    Abstract: In the past several years, road anomaly segmentation is actively explored in the academia and drawing growing attention in the industry. The rationale behind is straightforward: if the autonomous car can brake before hitting an anomalous object, safety is promoted. However, this rationale naturally calls for a temporally informed setting while existing methods and benchmarks are designed in an unr… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.