Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 503 results for author: Tang, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.12598  [pdf, other

    cs.CV cs.AI

    ND-SDF: Learning Normal Deflection Fields for High-Fidelity Indoor Reconstruction

    Authors: Ziyu Tang, Weicai Ye, Yifan Wang, Di Huang, Hujun Bao, Tong He, Guofeng Zhang

    Abstract: Neural implicit reconstruction via volume rendering has demonstrated its effectiveness in recovering dense 3D surfaces. However, it is non-trivial to simultaneously recover meticulous geometry and preserve smoothness across regions with differing characteristics. To address this issue, previous methods typically employ geometric priors, which are often constrained by the performance of the prior m… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  2. arXiv:2408.09951  [pdf

    cs.AI eess.SP

    Principle Driven Parameterized Fiber Model based on GPT-PINN Neural Network

    Authors: Yubin Zang, Boyu Hua, Zhenzhou Tang, Zhipeng Lin, Fangzheng Zhang, Simin Li, Zuxing Zhang, Hongwei Chen

    Abstract: In cater the need of Beyond 5G communications, large numbers of data driven artificial intelligence based fiber models has been put forward as to utilize artificial intelligence's regression ability to predict pulse evolution in fiber transmission at a much faster speed compared with the traditional split step Fourier method. In order to increase the physical interpretabiliy, principle driven fibe… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  3. arXiv:2408.05533  [pdf, other

    cs.CV

    Radiance Field Learners As UAV First-Person Viewers

    Authors: Liqi Yan, Qifan Wang, Junhan Zhao, Qiang Guan, Zheng Tang, Jianhui Zhang, Dongfang Liu

    Abstract: First-Person-View (FPV) holds immense potential for revolutionizing the trajectory of Unmanned Aerial Vehicles (UAVs), offering an exhilarating avenue for navigating complex building structures. Yet, traditional Neural Radiance Field (NeRF) methods face challenges such as sampling single points per iteration and requiring an extensive array of views for supervision. UAV videos exacerbate these iss… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: Accepted to ECCV 2024

    Journal ref: European Conference on Computer Vision (ECCV 2024)

  4. arXiv:2408.04600  [pdf, other

    cs.CV

    Improving Network Interpretability via Explanation Consistency Evaluation

    Authors: Hefeng Wu, Hao Jiang, Keze Wang, Ziyi Tang, Xianghuan He, Liang Lin

    Abstract: While deep neural networks have achieved remarkable performance, they tend to lack transparency in prediction. The pursuit of greater interpretability in neural networks often results in a degradation of their original performance. Some works strive to improve both interpretability and performance, but they primarily depend on meticulously imposed conditions. In this paper, we propose a simple yet… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: To appear in IEEE Transactions on Multimedia

  5. arXiv:2408.03753  [pdf, other

    cs.CV

    3iGS: Factorised Tensorial Illumination for 3D Gaussian Splatting

    Authors: Zhe Jun Tang, Tat-Jen Cham

    Abstract: The use of 3D Gaussians as representation of radiance fields has enabled high quality novel view synthesis at real-time rendering speed. However, the choice of optimising the outgoing radiance of each Gaussian independently as spherical harmonics results in unsatisfactory view dependent effects. In response to these limitations, our work, Factorised Tensorial Illumination for 3D Gaussian Splatting… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: The 18th European Conference on Computer Vision ECCV 2024

  6. arXiv:2408.01803  [pdf, other

    cs.LG cs.CL

    STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

    Authors: Peijie Dong, Lujun Li, Dayou Du, Yuhan Chen, Zhenheng Tang, Qiang Wang, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo, Xiaowen Chu

    Abstract: In this paper, we present STBLLM, the first structural binarization framework for compressing Large Language Models (LLMs) to less than 1-bit precision. LLMs have achieved remarkable performance, but their heavy memory requirements have hindered widespread adoption, particularly on resource-constrained devices. Binarization, which quantifies weights to a mere 1-bit, achieves a milestone in increas… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  7. arXiv:2407.21633  [pdf, other

    cs.CL

    Zero-Shot Cross-Domain Dialogue State Tracking via Dual Low-Rank Adaptation

    Authors: Xiang Luo, Zhiwen Tang, Jin Wang, Xuejie Zhang

    Abstract: Zero-shot dialogue state tracking (DST) seeks to enable dialogue systems to transition to unfamiliar domains without manual annotation or extensive retraining. Prior research has approached this objective by embedding prompts into language models (LMs). Common methodologies include integrating prompts at the input layer or introducing learnable variables at each transformer layer. Nonetheless, eac… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Accepted by ACL 2024

  8. arXiv:2407.19828  [pdf

    cs.LG cs.CR

    Federated Learning based Latent Factorization of Tensors for Privacy-Preserving QoS Prediction

    Authors: Shuai Zhong, Zengtong Tang, Di Wu

    Abstract: In applications related to big data and service computing, dynamic connections tend to be encountered, especially the dynamic data of user-perspective quality of service (QoS) in Web services. They are transformed into high-dimensional and incomplete (HDI) tensors which include abundant temporal pattern information. Latent factorization of tensors (LFT) is an extremely efficient and typical approa… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  9. arXiv:2407.19548  [pdf, other

    cs.CV

    Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle

    Authors: Zhenyu Tang, Junwu Zhang, Xinhua Cheng, Wangbo Yu, Chaoran Feng, Yatian Pang, Bin Lin, Li Yuan

    Abstract: Recent 3D large reconstruction models typically employ a two-stage process, including first generate multi-view images by a multi-view diffusion model, and then utilize a feed-forward model to reconstruct images to 3D content.However, multi-view diffusion models often produce low-quality and inconsistent images, adversely affecting the quality of the final 3D reconstruction. To address this issue,… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Project page: https://pku-yuangroup.github.io/Cycle3D/

  10. arXiv:2407.17211  [pdf, other

    cs.AI cs.NI cs.RO

    Testing Large Language Models on Driving Theory Knowledge and Skills for Connected Autonomous Vehicles

    Authors: Zuoyin Tang, Jianhua He, Dashuai Pei, Kezhong Liu, Tao Gao

    Abstract: Handling long tail corner cases is a major challenge faced by autonomous vehicles (AVs). While large language models (LLMs) hold great potentials to handle the corner cases with excellent generalization and explanation capabilities and received increasing research interest on application to autonomous driving, there are still technical barriers to be tackled, such as strict model performance and h… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  11. arXiv:2407.16521  [pdf, other

    cs.CL

    AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game

    Authors: Yizhou Chi, Lingjun Mao, Zineng Tang

    Abstract: Strategic social deduction games serve as valuable testbeds for evaluating the understanding and inference skills of language models, offering crucial insights into social science, artificial intelligence, and strategic gaming. This paper focuses on creating proxies of human behavior in simulated environments, with Among Us utilized as a tool for studying simulated human behavior. The study introd… ▽ More

    Submitted 24 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: Wordplay @ ACL 2024

  12. arXiv:2407.16131  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    Crystals with Transformers on Graphs, for Prediction of Unconventional Crystal Material Properties and the Benchmark

    Authors: Hongyi Wang, Ji Sun, Jinzhe Liang, Li Zhai, Zitian Tang, Zijian Li, Wei Zhai, Xusheng Wang, Weihao Gao, Sheng Gong, Bolong Huang, Hua Zhang

    Abstract: The ionic bonding across the lattice and ordered microscopic structures endow crystals with unique symmetry and determine their macroscopic properties. Unconventional crystals, in particular, exhibit non-traditional lattice structures or possess exotic physical properties, making them intriguing subjects for investigation. Therefore, to accurately predict the physical and chemical properties of cr… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  13. arXiv:2407.15186  [pdf, other

    cs.CL

    A Survey on Employing Large Language Models for Text-to-SQL Tasks

    Authors: Liang Shi, Zhengju Tang, Nan Zhang, Xiaotong Zhang, Zhi Yang

    Abstract: The increasing volume of data stored in relational databases has led to the need for efficient querying and utilization of this data in various sectors. However, writing SQL queries requires specialized knowledge, which poses a challenge for non-professional users trying to access and query databases. Text-to-SQL parsing solves this issue by converting natural language queries into SQL queries, th… ▽ More

    Submitted 11 August, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

  14. arXiv:2407.15026  [pdf, other

    cs.AR cs.AI

    Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms

    Authors: Zhihai Wang, Zijie Geng, Zhaojie Tu, Jie Wang, Yuxi Qian, Zhexuan Xu, Ziyan Liu, Siyuan Xu, Zhentao Tang, Shixiong Kai, Mingxuan Yuan, Jianye Hao, Bin Li, Yongdong Zhang, Feng Wu

    Abstract: The increasing complexity of modern very-large-scale integration (VLSI) design highlights the significance of Electronic Design Automation (EDA) technologies. Chip placement is a critical step in the EDA workflow, which positions chip modules on the canvas with the goal of optimizing performance, power, and area (PPA) metrics of final chip designs. Recent advances have demonstrated the great poten… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: A comprehensive benchmark for AI-based chip placement algorithms using end-to-end performance metrics

  15. arXiv:2407.13303  [pdf, other

    cs.LG

    Mean Teacher based SSL Framework for Indoor Localization Using Wi-Fi RSSI Fingerprinting

    Authors: Sihao Li, Zhe Tang, Kyeong Soo Kim, Jeremy S. Smith

    Abstract: Wi-Fi fingerprinting is widely applied for indoor localization due to the widespread availability of Wi-Fi devices. However, traditional methods are not ideal for multi-building and multi-floor environments due to the scalability issues. Therefore, more and more researchers have employed deep learning techniques to enable scalable indoor localization. This paper introduces a novel semi-supervised… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 12 pages, 10 figures, under preparation for a journal publication

  16. arXiv:2407.13288  [pdf, other

    cs.LG

    Hierarchical Stage-Wise Training of Linked Deep Neural Networks for Multi-Building and Multi-Floor Indoor Localization Based on Wi-Fi RSSI Fingerprinting

    Authors: Sihao Li, Kyeong Soo Kim, Zhe Tang, Graduate, Jeremy S. Smith

    Abstract: In this paper, we present a new solution to the problem of large-scale multi-building and multi-floor indoor localization based on linked neural networks, where each neural network is dedicated to a sub-problem and trained under a hierarchical stage-wise training framework. When the measured data from sensors have a hierarchical representation as in multi-building and multi-floor indoor localizati… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 9 pages, 5 figures, under review for journal publication

  17. arXiv:2407.11553  [pdf, other

    eess.SP cs.AI

    Learning Global and Local Features of Power Load Series Through Transformer and 2D-CNN: An Image-based Multi-step Forecasting Approach Incorporating Phase Space Reconstruction

    Authors: Zihan Tang, Tianyao Ji, Wenhu Tang

    Abstract: As modern power systems continue to evolve, accurate power load forecasting remains a critical issue in energy management. The phase space reconstruction method can effectively retain the inner chaotic property of power load from a system dynamics perspective and thus is a promising knowledge-based preprocessing method for short-term forecasting. In order to fully utilize the capability of PSR met… ▽ More

    Submitted 28 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

  18. arXiv:2407.10408  [pdf, other

    cs.IT eess.SP

    Latency Minimization for IRS-enhanced Wideband MEC Networks with Practical Reflection Model

    Authors: N. Li, W. Hao, X. Li, Z. Zhu, Z. Tang, S. Yang

    Abstract: Intelligent reflecting surface (IRS) has been considered as an efficient way to boost the computation capability of mobile edge computing (MEC) system, especially when the communication links is blocked or the communication signal is weak. However, most existing works are restricted to narrow-band channel and ideal IRS reflection model, which is not practical and may lead to significant performanc… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 13 pages, 9 figures

  19. arXiv:2407.09553  [pdf, other

    cs.CV cs.AI

    ResVMUNetX: A Low-Light Enhancement Network Based on VMamba

    Authors: Shuang Wang, Qingchuan Tao, Zhenming Tang

    Abstract: This study presents ResVMUNetX, a novel image enhancement network for low-light conditions, addressing the limitations of existing deep learning methods in capturing long-range image information. Leveraging error regression and an efficient VMamba architecture, ResVMUNetX enhances brightness, recovers structural details, and removes noise through a two-step process involving direct pixel addition… ▽ More

    Submitted 21 July, 2024; v1 submitted 28 June, 2024; originally announced July 2024.

  20. arXiv:2407.05381  [pdf, ps, other

    cs.DS

    Online Matching: A Brief Survey

    Authors: Zhiyi Huang, Zhihao Gavin Tang, David Wajc

    Abstract: Matching, capturing allocation of items to unit-demand buyers, or tasks to workers, or pairs of collaborators, is a central problem in economics. Indeed, the growing prevalence of matching-based markets, many of which online in nature, has motivated much research in economics, operations research, computer science, and their intersection. This brief survey is meant as an introduction to the area o… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Also in SIGECOM Exchanges

  21. arXiv:2407.05023  [pdf, other

    cs.CV

    SurgicalGaussian: Deformable 3D Gaussians for High-Fidelity Surgical Scene Reconstruction

    Authors: Weixing Xie, Junfeng Yao, Xianpeng Cao, Qiqin Lin, Zerui Tang, Xiao Dong, Xiaohu Guo

    Abstract: Dynamic reconstruction of deformable tissues in endoscopic video is a key technology for robot-assisted surgery. Recent reconstruction methods based on neural radiance fields (NeRFs) have achieved remarkable results in the reconstruction of surgical scenes. However, based on implicit representation, NeRFs struggle to capture the intricate details of objects in the scene and cannot achieve real-tim… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  22. arXiv:2407.01909  [pdf, other

    cs.CL cs.SD eess.AS

    Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models

    Authors: Zhiyuan Tang, Dong Wang, Shen Huang, Shidong Shang

    Abstract: Recent studies have demonstrated the efficacy of large language models (LLMs) in error correction for automatic speech recognition (ASR). However, much of the research focuses on the English language. This paper redirects the attention to Chinese. Firstly, we construct a specialized benchmark dataset aimed at error correction for Chinese ASR with 724K hypotheses-transcription pairs, named the Chin… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Interspeech 2024

  23. arXiv:2407.01892  [pdf, other

    cs.AI cs.CL

    GRASP: A Grid-Based Benchmark for Evaluating Commonsense Spatial Reasoning

    Authors: Zhisheng Tang, Mayank Kejriwal

    Abstract: Spatial reasoning, an important faculty of human cognition with many practical applications, is one of the core commonsense skills that is not purely language-based and, for satisfying (as opposed to optimal) solutions, requires some minimum degree of planning. Existing benchmarks of Commonsense Spatial Reasoning (CSR) tend to evaluate how Large Language Models (LLMs) interpret text-based spatial… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  24. arXiv:2407.01178  [pdf, other

    cs.CL cs.AI cs.LG

    $\text{Memory}^3$: Language Modeling with Explicit Memory

    Authors: Hongkang Yang, Zehao Lin, Wenjin Wang, Hao Wu, Zhiyu Li, Bo Tang, Wenqiang Wei, Jinbo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai Chen, Feiyu Xiong, Linpeng Tang, Weinan E

    Abstract: The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowled… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    MSC Class: 68T50 ACM Class: I.2.7

  25. arXiv:2407.00668  [pdf, other

    cs.CL

    HRDE: Retrieval-Augmented Large Language Models for Chinese Health Rumor Detection and Explainability

    Authors: Yanfang Chen, Ding Chen, Shichao Song, Simin Niu, Hanyu Wang, Zeyun Tang, Feiyu Xiong, Zhiyu Li

    Abstract: As people increasingly prioritize their health, the speed and breadth of health information dissemination on the internet have also grown. At the same time, the presence of false health information (health rumors) intermingled with genuine content poses a significant potential threat to public health. However, current research on Chinese health rumors still lacks a large-scale, public, and open-so… ▽ More

    Submitted 3 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

  26. arXiv:2407.00529  [pdf, other

    cs.LG cs.SD eess.AS math.ST stat.ML

    Detecting and Identifying Selection Structure in Sequential Data

    Authors: Yujia Zheng, Zeyu Tang, Yiwen Qiu, Bernhard Schölkopf, Kun Zhang

    Abstract: We argue that the selective inclusion of data points based on latent objectives is common in practical situations, such as music sequences. Since this selection process often distorts statistical analysis, previous work primarily views it as a bias to be corrected and proposes various methods to mitigate its effect. However, while controlling this bias is crucial, selection also offers an opportun… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: ICML 2024

  27. arXiv:2406.17163  [pdf, other

    cs.CL cs.AI cs.LG

    Paraphrase and Aggregate with Large Language Models for Minimizing Intent Classification Errors

    Authors: Vikas Yadav, Zheng Tang, Vijay Srinivasan

    Abstract: Large language models (LLM) have achieved remarkable success in natural language generation but lesser focus has been given to their applicability in decision making tasks such as classification. We show that LLMs like LLaMa can achieve high performance on large multi-class classification tasks but still make classification errors and worse, generate out-of-vocabulary class labels. To address thes… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted at SIGIR 2024

  28. arXiv:2406.15192  [pdf, ps, other

    cs.DS cs.GT

    Setting Targets is All You Need:Improved Order Competitive Ratio for Online Selection

    Authors: Liyan Chen, Nuozhou Sun, Zhihao Gavin Tang

    Abstract: There is a rising interest for studying the online benchmark as an alternative of the classical offline benchmark in online stochastic settings. Ezra, Feldman, Gravin, and Tang (SODA 2023) introduced the notion of order-competitive ratio, defined as the worst-case ratio between the performance of the best order-unaware algorithm and the best order-aware algorithm, to quantify the loss incurred by… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  29. arXiv:2406.13404  [pdf, other

    cs.DC

    Low-Latency Layer-Aware Proactive and Passive Container Migration in Meta Computing

    Authors: Mengjie Liu, Yihua Li, Fangyi Mou, Zhiqing Tang, Jiong Lou, Jianxiong Guo, Weijia Jia

    Abstract: Meta computing is a new computing paradigm that aims to efficiently utilize all network computing resources to provide fault-tolerant, personalized services with strong security and privacy guarantees. It also seeks to virtualize the Internet as many meta computers. In meta computing, tasks can be assigned to containers at edge nodes for processing, based on container images with multiple layers.… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: to be published in IEEE ICMC 2024

  30. arXiv:2406.13399  [pdf, other

    cs.AI

    VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework

    Authors: Zhi Yao, Zhiqing Tang, Jiong Lou, Ping Shen, Weijia Jia

    Abstract: The Large Language Model (LLM) has gained significant popularity and is extensively utilized across various domains. Most LLM deployments occur within cloud data centers, where they encounter substantial response delays and incur high costs, thereby impacting the Quality of Services (QoS) at the network edge. Leveraging vector database caching to store LLM request results at the edge can substanti… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: to be published in IEEE ICWS 2024

  31. arXiv:2406.13124  [pdf, other

    cs.CL

    Learning to Generate Answers with Citations via Factual Consistency Models

    Authors: Rami Aly, Zhiqiang Tang, Samson Tan, George Karypis

    Abstract: Large Language Models (LLMs) frequently hallucinate, impeding their reliability in mission-critical situations. One approach to address this issue is to provide citations to relevant sources alongside generated content, enhancing the verifiability of generations. However, citing passages accurately in answers remains a substantial challenge. This paper proposes a weakly-supervised fine-tuning meth… ▽ More

    Submitted 15 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024. Code is available at https://github.com/amazon-science/learning-to-generate-answers-with-citations

  32. arXiv:2406.12799  [pdf, ps, other

    cs.DS

    Sample-Based Matroid Prophet Inequalities

    Authors: Hu Fu, Pinyan Lu, Zhihao Gavin Tang, Hongxun Wu, Jinzhao Wu, Qianfan Zhang

    Abstract: We study matroid prophet inequalities when distributions are unknown and accessible only through samples. While single-sample prophet inequalities for special matroids are known, no constant-factor competitive algorithm with even a sublinear number of samples was known for general matroids. Adding more to the stake, the single-sample version of the question for general matroids has close (two-way)… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: To appear at EC'24

  33. arXiv:2406.12754  [pdf, other

    cs.CL cs.AI

    Chumor 1.0: A Truly Funny and Challenging Chinese Humor Understanding Dataset from Ruo Zhi Ba

    Authors: Ruiqi He, Yushu He, Longju Bai, Jiarui Liu, Zhenjie Sun, Zenghao Tang, He Wang, Hanchen Xia, Naihao Deng

    Abstract: Existing humor datasets and evaluations predominantly focus on English, lacking resources for culturally nuanced humor in non-English languages like Chinese. To address this gap, we construct Chumor, a dataset sourced from Ruo Zhi Ba (RZB), a Chinese Reddit-like platform dedicated to sharing intellectually challenging and culturally specific jokes. We annotate explanations for each joke and evalua… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  34. arXiv:2406.12216  [pdf, other

    cs.CL cs.AI

    Is persona enough for personality? Using ChatGPT to reconstruct an agent's latent personality from simple descriptions

    Authors: Yongyi Ji, Zhisheng Tang, Mayank Kejriwal

    Abstract: Personality, a fundamental aspect of human cognition, contains a range of traits that influence behaviors, thoughts, and emotions. This paper explores the capabilities of large language models (LLMs) in reconstructing these complex cognitive attributes based only on simple descriptions containing socio-demographic and personality type information. Utilizing the HEXACO personality framework, our st… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted to the ICML 2024 Workshop on Large Language Models and Cognition

  35. arXiv:2406.11528  [pdf, other

    econ.TH cs.GT

    Optimal Robust Contract Design

    Authors: Bo Peng, Zhihao Gavin Tang

    Abstract: We consider the robust contract design problem when the principal only has limited information about the actions the agent can take. The principal evaluates a contract according to its worst-case performance caused by the uncertain action space. Carroll (AER 2015) showed that a linear contract is optimal among deterministic contracts. Recently, Kambhampati (JET 2023) showed that the principal's pa… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Full version of EC 2024 paper

  36. arXiv:2406.10543  [pdf, other

    cs.CV cs.AI

    NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows

    Authors: Zhenggang Tang, Zhongzheng Ren, Xiaoming Zhao, Bowen Wen, Jonathan Tremblay, Stan Birchfield, Alexander Schwing

    Abstract: We present a method for automatically modifying a NeRF representation based on a single observation of a non-rigid transformed version of the original scene. Our method defines the transformation as a 3D flow, specifically as a weighted linear blending of rigid transformations of 3D anchor points that are defined on the surface of the scene. In order to identify anchor points, we introduce a novel… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 8 pages of main paper, CVPR 2024. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024

  37. arXiv:2406.06736  [pdf, other

    cs.LG cs.AI cs.CY

    Long-Term Fairness Inquiries and Pursuits in Machine Learning: A Survey of Notions, Methods, and Challenges

    Authors: Usman Gohar, Zeyu Tang, Jialu Wang, Kun Zhang, Peter L. Spirtes, Yang Liu, Lu Cheng

    Abstract: The widespread integration of Machine Learning systems in daily life, particularly in high-stakes domains, has raised concerns about the fairness implications. While prior works have investigated static fairness measures, recent studies reveal that automated decision-making has long-term implications and that off-the-shelf fairness approaches may not serve the purpose of achieving long-term fairne… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  38. arXiv:2406.04325  [pdf, other

    cs.CV

    ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

    Authors: Lin Chen, Xilin Wei, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Bin Lin, Zhenyu Tang, Li Yuan, Yu Qiao, Dahua Lin, Feng Zhao, Jiaqi Wang

    Abstract: We present the ShareGPT4Video series, aiming to facilitate the video understanding of large video-language models (LVLMs) and the video generation of text-to-video models (T2VMs) via dense and precise captions. The series comprises: 1) ShareGPT4Video, 40K GPT4V annotated dense captions of videos with various lengths and sources, developed through carefully designed data filtering and annotating st… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project Page: https://sharegpt4video.github.io/

  39. arXiv:2406.02924  [pdf, other

    cs.LG cs.CL cs.NE

    Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models

    Authors: Peijie Dong, Lujun Li, Zhenheng Tang, Xiang Liu, Xinglin Pan, Qiang Wang, Xiaowen Chu

    Abstract: Despite the remarkable capabilities, Large Language Models (LLMs) face deployment challenges due to their extensive size. Pruning methods drop a subset of weights to accelerate, but many of them require retraining, which is prohibitively expensive and computationally demanding. Recently, post-training pruning approaches introduced novel metrics, enabling the pruning of LLMs without retraining. How… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML2024, 29 pages, 4 figures

  40. arXiv:2405.18881  [pdf, other

    cs.LG cs.AI

    Tuning-Free Alignment of Diffusion Models with Direct Noise Optimization

    Authors: Zhiwei Tang, Jiangweizhi Peng, Jiasheng Tang, Mingyi Hong, Fan Wang, Tsung-Hui Chang

    Abstract: In this work, we focus on the alignment problem of diffusion models with a continuous reward function, which represents specific objectives for downstream tasks, such as improving human preference. The central goal of the alignment problem is to adjust the distribution learned by diffusion models such that the generated samples maximize the target reward function. We propose a novel alignment appr… ▽ More

    Submitted 3 July, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  41. arXiv:2405.17743  [pdf, other

    cs.CL cs.AI cs.CE cs.LG

    ORLM: Training Large Language Models for Optimization Modeling

    Authors: Zhengyang Tang, Chenyu Huang, Xin Zheng, Shixi Hu, Zizhuo Wang, Dongdong Ge, Benyou Wang

    Abstract: Large Language Models (LLMs) have emerged as powerful tools for tackling complex Operations Research (OR) problem by providing the capacity in automating optimization modeling. However, current methodologies heavily rely on prompt engineering (e.g., multi-agent cooperation) with proprietary LLMs, raising data privacy concerns that could be prohibitive in industry applications. To tackle this issue… ▽ More

    Submitted 29 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Work in progress

  42. arXiv:2405.15185  [pdf, other

    cs.CL cs.AI cs.HC

    An Evaluation of Estimative Uncertainty in Large Language Models

    Authors: Zhisheng Tang, Ke Shen, Mayank Kejriwal

    Abstract: Words of estimative probability (WEPs), such as ''maybe'' or ''probably not'' are ubiquitous in natural language for communicating estimative uncertainty, compared with direct statements involving numerical probability. Human estimative uncertainty, and its calibration with numerical estimates, has long been an area of study -- including by intelligence agencies like the CIA. This study compares e… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  43. arXiv:2405.13028  [pdf, other

    cs.CL cs.AI

    DuetSim: Building User Simulator with Dual Large Language Models for Task-Oriented Dialogues

    Authors: Xiang Luo, Zhiwen Tang, Jin Wang, Xuejie Zhang

    Abstract: User Simulators play a pivotal role in training and evaluating task-oriented dialogue systems. Traditional user simulators typically rely on human-engineered agendas, resulting in generated responses that often lack diversity and spontaneity. Although large language models (LLMs) exhibit a remarkable capacity for generating coherent and contextually appropriate utterances, they may fall short when… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted by COLING 2024

  44. arXiv:2405.12806  [pdf, other

    cs.CV

    MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video

    Authors: Hongsheng Wang, Xiang Cai, Xi Sun, Jinhong Yue, Zhanyun Tang, Shengyu Zhang, Feng Lin, Fei Wu

    Abstract: Single-view clothed human reconstruction holds a central position in virtual reality applications, especially in contexts involving intricate human motions. It presents notable challenges in achieving realistic clothing deformation. Current methodologies often overlook the influence of motion on surface deformation, resulting in surfaces lacking the constraints imposed by global motion. To overcom… ▽ More

    Submitted 21 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:1710.03746 by other authors

  45. arXiv:2405.12477  [pdf, other

    cs.CV

    Gaussian Control with Hierarchical Semantic Graphs in 3D Human Recovery

    Authors: Hongsheng Wang, Weiyue Zhang, Sihao Liu, Xinrui Zhou, Jing Li, Zhanyun Tang, Shengyu Zhang, Fei Wu, Feng Lin

    Abstract: Although 3D Gaussian Splatting (3DGS) has recently made progress in 3D human reconstruction, it primarily relies on 2D pixel-level supervision, overlooking the geometric complexity and topological relationships of different body parts. To address this gap, we introduce the Hierarchical Graph Human Gaussian Control (HUGS) framework for achieving high-fidelity 3D human reconstruction. Our approach i… ▽ More

    Submitted 21 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  46. arXiv:2405.10235  [pdf

    cs.DB physics.data-an

    Novel Data Models for Inter-operable LCA Frameworks

    Authors: Kourosh Malek, Max Dreger, Zirui Tang, Qingshi Tu

    Abstract: Life cycle assessment (LCA) plays a critical role in assessing the environmental impacts of a product, technology, or service throughout its entire life cycle. Nonetheless, many existing LCA tools and methods lack adequate metadata management, which can hinder their further development and wide adoption. In the example of LCA for clean energy technologies, metadata helps monitor data and the envir… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  47. arXiv:2405.07908  [pdf, other

    cs.RO

    Collaborative Planar Pushing of Polytopic Objects with Multiple Robots in Complex Scenes

    Authors: Zili Tang, Yuming Feng, Meng Guo

    Abstract: Pushing is a simple yet effective skill for robots to interact with and further change the environment. Related work has been mostly focused on utilizing it as a non-prehensile manipulation primitive for a robotic manipulator. However, it can also be beneficial for low-cost mobile robots that are not equipped with a manipulator. This work tackles the general problem of controlling a team of mobile… ▽ More

    Submitted 1 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Robotics: Science and Systems (RSS) 2024.Videos are available on https://zilitang.github.io/Collaborative-Pushing

  48. arXiv:2405.05957  [pdf, other

    cs.CL

    OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning

    Authors: Dan Qiao, Yi Su, Pinzheng Wang, Jing Ye, Wenjing Xie, Yuechi Zhou, Yuyang Ding, Zecheng Tang, Jikai Wang, Yixin Ji, Yue Wang, Pei Guo, Zechen Sun, Zikang Zhang, Juntao Li, Pingfu Chao, Wenliang Chen, Guohong Fu, Guodong Zhou, Qiaoming Zhu, Min Zhang

    Abstract: Large Language Models (LLMs) have played an important role in many fields due to their powerful capabilities.However, their massive number of parameters leads to high deployment requirements and incurs significant inference costs, which impedes their practical applications. Training smaller models is an effective way to address this problem. Therefore, we introduce OpenBA-V2, a 3.4B model derived… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  49. arXiv:2405.05584  [pdf, other

    cs.CV cs.AI

    A Survey on Backbones for Deep Video Action Recognition

    Authors: Zixuan Tang, Youjun Zhao, Yuhang Wen, Mengyuan Liu

    Abstract: Action recognition is a key technology in building interactive metaverses. With the rapid development of deep learning, methods in action recognition have also achieved great advancement. Researchers design and implement the backbones referring to multiple standpoints, which leads to the diversity of methods and encountering new challenges. This paper reviews several action recognition methods bas… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: This paper has been accepted by ICME workshop

  50. arXiv:2405.05004  [pdf, other

    cs.CV

    TENet: Targetness Entanglement Incorporating with Multi-Scale Pooling and Mutually-Guided Fusion for RGB-E Object Tracking

    Authors: Pengcheng Shao, Tianyang Xu, Zhangyong Tang, Linze Li, Xiao-Jun Wu, Josef Kittler

    Abstract: There is currently strong interest in improving visual object tracking by augmenting the RGB modality with the output of a visual event camera that is particularly informative about the scene motion. However, existing approaches perform event feature extraction for RGB-E tracking using traditional appearance models, which have been optimised for RGB only tracking, without adapting it for the intri… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.