Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 572 results for author: He, T

.
  1. arXiv:2408.08067  [pdf, other

    cs.CL cs.AI

    RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation

    Authors: Dongyu Ru, Lin Qiu, Xiangkun Hu, Tianhang Zhang, Peng Shi, Shuaichen Chang, Jiayang Cheng, Cunxiang Wang, Shichao Sun, Huanyu Li, Zizhao Zhang, Binjie Wang, Jiarong Jiang, Tong He, Zhiguo Wang, Pengfei Liu, Yue Zhang, Zheng Zhang

    Abstract: Despite Retrieval-Augmented Generation (RAG) has shown promising capability in leveraging external knowledge, a comprehensive evaluation of RAG systems is still challenging due to the modular nature of RAG, evaluation of long-form responses and reliability of measurements. In this paper, we propose a fine-grained evaluation framework, RAGChecker, that incorporates a suite of diagnostic metrics for… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: Under Review

  2. arXiv:2408.07249  [pdf, other

    cs.CV cs.IR

    GQE: Generalized Query Expansion for Enhanced Text-Video Retrieval

    Authors: Zechen Bai, Tianjun Xiao, Tong He, Pichao Wang, Zheng Zhang, Thomas Brox, Mike Zheng Shou

    Abstract: In the rapidly expanding domain of web video content, the task of text-video retrieval has become increasingly critical, bridging the semantic gap between textual queries and video data. This paper introduces a novel data-centric approach, Generalized Query Expansion (GQE), to address the inherent information imbalance between text and video, enhancing the effectiveness of text-video retrieval sys… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 18 pages including appendix

  3. arXiv:2408.05765  [pdf, other

    cs.LG stat.ML

    Scalable and Adaptive Spectral Embedding for Attributed Graph Clustering

    Authors: Yunhui Liu, Tieke He, Qing Wu, Tao Zheng, Jianhua Zhao

    Abstract: Attributed graph clustering, which aims to group the nodes of an attributed graph into disjoint clusters, has made promising advancements in recent years. However, most existing methods face challenges when applied to large graphs due to the expensive computational cost and high memory usage. In this paper, we introduce Scalable and Adaptive Spectral Embedding (SASE), a simple attributed graph clu… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: Accepted by CIKM 2024 (Short Paper)

  4. arXiv:2408.05087  [pdf, other

    cs.LG

    Bootstrap Latents of Nodes and Neighbors for Graph Self-Supervised Learning

    Authors: Yunhui Liu, Huaisong Zhang, Tieke He, Tao Zheng, Jianhua Zhao

    Abstract: Contrastive learning is a significant paradigm in graph self-supervised learning. However, it requires negative samples to prevent model collapse and learn discriminative representations. These negative samples inevitably lead to heavy computation, memory overhead and class collision, compromising the representation learning. Recent studies present that methods obviating negative samples can attai… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: Accepted by ECML PKDD 2024

  5. arXiv:2408.04705  [pdf, other

    cs.LG cs.NI

    Overlay-based Decentralized Federated Learning in Bandwidth-limited Networks

    Authors: Yudi Huang, Tingyang Sun, Ting He

    Abstract: The emerging machine learning paradigm of decentralized federated learning (DFL) has the promise of greatly boosting the deployment of artificial intelligence (AI) by directly learning across distributed agents without centralized coordination. Despite significant efforts on improving the communication efficiency of DFL, most existing solutions were based on the simplistic assumption that neighbor… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  6. arXiv:2408.04562  [pdf, ps, other

    math.OC

    Discrete nonlinear functions: formulations and applications in retail revenue management

    Authors: Taotao He, Mohit Tawarmalani

    Abstract: This paper examines nonlinear optimization problems that incorporate discrete decisions. We introduce new improved formulation techniques that take advantage of the simplotope structure present in the domain of the binarization variables. Our technique identifies new polynomially solvable instances for price promotion problem initially studied by Cohen et al. (2021) and allows us to develop a line… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

  7. arXiv:2408.03765  [pdf, other

    cs.LG

    Reliable Node Similarity Matrix Guided Contrastive Graph Clustering

    Authors: Yunhui Liu, Xinyi Gao, Tieke He, Tao Zheng, Jianhua Zhao, Hongzhi Yin

    Abstract: Graph clustering, which involves the partitioning of nodes within a graph into disjoint clusters, holds significant importance for numerous subsequent applications. Recently, contrastive learning, known for utilizing supervisory information, has demonstrated encouraging results in deep graph clustering. This methodology facilitates the learning of favorable node representations for clustering by a… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE Transactions on Knowledge and Data Engineering (TKDE)

  8. arXiv:2408.02849  [pdf, other

    cs.LG cs.NI

    Active Learning for WBAN-based Health Monitoring

    Authors: Cho-Chun Chiu, Tuan Nguyen, Ting He, Shiqiang Wang, Beom-Su Kim, Ki-Il Kim

    Abstract: We consider a novel active learning problem motivated by the need of learning machine learning models for health monitoring in wireless body area network (WBAN). Due to the limited resources at body sensors, collecting each unlabeled sample in WBAN incurs a nontrivial cost. Moreover, training health monitoring models typically requires labels indicating the patient's health state that need to be g… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  9. arXiv:2408.01485  [pdf, ps, other

    hep-th gr-qc hep-ph

    An Infrared On-Shell Action and its Implications for Soft Charge Fluctuations in Asymptotically Flat Spacetimes

    Authors: Temple He, Ana-Maria Raclariu, Kathryn M. Zurek

    Abstract: We study the infrared on-shell action of Einstein gravity in asymptotically flat spacetimes, obtaining an effective, gauge-invariant boundary action for memory and shockwave spacetimes. We show that the phase space is in both cases parameterized by the leading soft variables in asymptotically flat spacetimes, thereby extending the equivalence between shockwave and soft commutators to spacetimes wi… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 38 pages

    Report number: CALT-TH 2024-028

  10. arXiv:2408.00779  [pdf, other

    cs.LG cs.AI cs.ET cs.IT q-bio.BM

    Learning Structurally Stabilized Representations for Multi-modal Lossless DNA Storage

    Authors: Ben Cao, Tiantian He, Xue Li, Bin Wang, Xiaohu Wu, Qiang Zhang, Yew-Soon Ong

    Abstract: In this paper, we present Reed-Solomon coded single-stranded representation learning (RSRL), a novel end-to-end model for learning representations for multi-modal lossless DNA storage. In contrast to existing learning-based methods, the proposed RSRL is inspired by both error-correction codec and structural biology. Specifically, RSRL first learns the representations for the subsequent storage fro… ▽ More

    Submitted 17 July, 2024; originally announced August 2024.

  11. arXiv:2407.19422  [pdf, other

    cs.AI

    A Generic Review of Integrating Artificial Intelligence in Cognitive Behavioral Therapy

    Authors: Meng Jiang, Qing Zhao, Jianqiang Li, Fan Wang, Tianyu He, Xinyan Cheng, Bing Xiang Yang, Grace W. K. Ho, Guanghui Fu

    Abstract: Cognitive Behavioral Therapy (CBT) is a well-established intervention for mitigating psychological issues by modifying maladaptive cognitive and behavioral patterns. However, delivery of CBT is often constrained by resource limitations and barriers to access. Advancements in artificial intelligence (AI) have provided technical support for the digital transformation of CBT. Particularly, the emerge… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  12. arXiv:2407.17827  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Unified Lexical Representation for Interpretable Visual-Language Alignment

    Authors: Yifan Li, Yikai Wang, Yanwei Fu, Dongyu Ru, Zheng Zhang, Tong He

    Abstract: Visual-Language Alignment (VLA) has gained a lot of attention since CLIP's groundbreaking work. Although CLIP performs well, the typical direct latent feature alignment lacks clarity in its representation and similarity scores. On the other hand, lexical representation, a vector whose element represents the similarity between the sample and a word from the vocabulary, is a natural sparse represent… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  13. arXiv:2407.16958  [pdf, other

    cs.LG cs.AI

    Cheems: Wonderful Matrices More Efficient and More Effective Architecture

    Authors: Jingze Shi, Lu He, Yuhan Wang, Tianyu He, Bingheng Wu, Mingkun Hou

    Abstract: Recent studies have shown that, relative position encoding performs well in selective state space model scanning algorithms, and the architecture that balances SSM and Attention enhances the efficiency and effectiveness of the algorithm, while the sparse activation of the mixture of experts reduces the training cost. I studied the effectiveness of using different position encodings in structured s… ▽ More

    Submitted 24 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

  14. arXiv:2407.15282  [pdf, other

    cs.CV

    Point Transformer V3 Extreme: 1st Place Solution for 2024 Waymo Open Dataset Challenge in Semantic Segmentation

    Authors: Xiaoyang Wu, Xiang Xu, Lingdong Kong, Liang Pan, Ziwei Liu, Tong He, Wanli Ouyang, Hengshuang Zhao

    Abstract: In this technical report, we detail our first-place solution for the 2024 Waymo Open Dataset Challenge's semantic segmentation track. We significantly enhanced the performance of Point Transformer V3 on the Waymo benchmark by implementing cutting-edge, plug-and-play training and inference technologies. Notably, our advanced version, Point Transformer V3 Extreme, leverages multi-frame training and… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: 1st Place Solution for 2024 Waymo Open Dataset Challenge in Semantic Segmentation

  15. arXiv:2407.14908  [pdf, other

    cs.HC cs.GR

    PREVis: Perceived Readability Evaluation for Visualizations

    Authors: Anne-Flore Cabouat, Tingying He, Petra Isenberg, Tobias Isenberg

    Abstract: We developed and validated an instrument to measure the perceived readability in data visualization: PREVis. Researchers and practitioners can easily use this instrument as part of their evaluations to compare the perceived readability of different visual data representations. Our instrument can complement results from controlled experiments on user task performance or provide additional data duri… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: 11 pages, 35 pages appendix, 5 figures in main paper, additional 94 figures in appendix, paper to appear in IEEE Transactions on Visualization and Computer Graphics

  16. arXiv:2407.14488  [pdf, ps, other

    math.AG math.NT

    Perfectoidness via Sen Theory and Applications to Shimura Varieties

    Authors: Tongmu He

    Abstract: Sen's theorem on the ramification of a $p$-adic analytic Galois extension of $p$-adic local fields shows that its perfectoidness is equivalent to the non-vanishing of its arithmetic Sen operator. By developing $p$-adic Hodge theory for general valuation rings, we establish a geometric analogue of Sen's criterion for any $p$-adic analytic Galois extension of $p$-adic varieties: its (Riemann-Zariski… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 57 pages

    MSC Class: 14G45 (primary); 11F80; 14G35

  17. arXiv:2407.09751  [pdf, other

    cs.CV

    TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation

    Authors: Xiaopei Wu, Yuenan Hou, Xiaoshui Huang, Binbin Lin, Tong He, Xinge Zhu, Yuexin Ma, Boxi Wu, Haifeng Liu, Deng Cai, Wanli Ouyang

    Abstract: Training deep models for LiDAR semantic segmentation is challenging due to the inherent sparsity of point clouds. Utilizing temporal data is a natural remedy against the sparsity problem as it makes the input signal denser. However, previous multi-frame fusion algorithms fall short in utilizing sufficient temporal information due to the memory constraint, and they also ignore the informative tempo… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by CVPR 2024

  18. arXiv:2407.09072  [pdf, other

    cs.CL

    New Desiderata for Direct Preference Optimization

    Authors: Xiangkun Hu, Tong He, David Wipf

    Abstract: Large language models in the past have typically relied on some form of reinforcement learning with human feedback (RLHF) to better align model responses with human preferences. However, because of oft-observed instabilities when implementing these RLHF pipelines, various reparameterization techniques have recently been introduced to sidestep the need for separately learning an RL reward model. In… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  19. arXiv:2407.08418  [pdf, other

    cs.LG cs.CV

    PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines

    Authors: ZiDong Wang, Zeyu Lu, Di Huang, Tong He, Xihui Liu, Wanli Ouyang, Lei Bai

    Abstract: In this paper, we introduce PredBench, a benchmark tailored for the holistic evaluation of spatio-temporal prediction networks. Despite significant progress in this field, there remains a lack of a standardized framework for a detailed and comparative analysis of various prediction network architectures. PredBench addresses this gap by conducting large-scale experiments, upholding standardized and… ▽ More

    Submitted 11 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  20. arXiv:2407.07356  [pdf, other

    cs.CV

    Video In-context Learning

    Authors: Wentao Zhang, Junliang Guo, Tianyu He, Li Zhao, Linli Xu, Jiang Bian

    Abstract: In-context learning for vision data has been underexplored compared with that in natural language. Previous works studied image in-context learning, urging models to generate a single image guided by demonstrations. In this paper, we propose and study video in-context learning, where the model starts from an existing video clip and generates diverse potential future sequences, each semantically gu… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  21. arXiv:2407.04051  [pdf, other

    cs.SD cs.AI eess.AS

    FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

    Authors: Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Haoneng Luo, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang , et al. (8 additional authors not shown)

    Abstract: This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, sp… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Work in progress. Authors are listed in alphabetical order by family name

  22. arXiv:2407.00024  [pdf, other

    cs.CV cs.AI cs.MM

    LMVD: A Large-Scale Multimodal Vlog Dataset for Depression Detection in the Wild

    Authors: Lang He, Kai Chen, Junnan Zhao, Yimeng Wang, Ercheng Pei, Haifeng Chen, Jiewei Jiang, Shiqing Zhang, Jie Zhang, Zhongmin Wang, Tao He, Prayag Tiwari

    Abstract: Depression can significantly impact many aspects of an individual's life, including their personal and social functioning, academic and work performance, and overall quality of life. Many researchers within the field of affective computing are adopting deep learning technology to explore potential patterns related to the detection of depression. However, because of subjects' privacy protection con… ▽ More

    Submitted 8 May, 2024; originally announced July 2024.

  23. arXiv:2406.19823  [pdf, ps, other

    math.CO

    Separable integer partition classes and partitions with congruence conditions

    Authors: Thomas Y. He, C. S. Huang, H. X. Li, X. Zhang

    Abstract: In this article, we first investigate the partitions whose parts are congruent to $a$ or $b$ modulo $k$ with the aid of separable integer partition classes with modulus $k$ introduced by Andrews. Then, we introduce the $(k,r)$-overpartitions in which only parts equivalent to $r$ modulo $k$ may be overlined and we will show that the number of $(k,k)$-overpartitions of $n$ equals the number of parti… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  24. arXiv:2406.17555  [pdf, ps, other

    physics.plasm-ph

    A response to commenter Ke Lan's comment on our paper published in Nature Communications (2023)14:5782 by J. Yan et al

    Authors: Ji Yan, Jiwei Li, X. T. He, Lifeng Wang, Yaohua Chen, Feng Wang, Xiaoying Han, Kaiqiang Pan, Juxi Liang, Yulong Li, Zanyang Guan, Xiangming Liu, Xingsen Che, Zhongjing Chen, Xing Zhang, Yan Xu, Bin Li, Minging He, Hongbo Cai, Liang. Hao, Zhanjun Liu, Chunyang Zheng, Zhensheng Dai, Zhengfeng Fan, Bin Qiao , et al. (4 additional authors not shown)

    Abstract: A response to commenter Ke Lan's comment on our paper published in Nature Communications (2023)14:5782 by J. Yan et al

    Submitted 25 June, 2024; originally announced June 2024.

  25. arXiv:2406.15992  [pdf, other

    cs.CL

    Can LLM Graph Reasoning Generalize beyond Pattern Memorization?

    Authors: Yizhuo Zhang, Heng Wang, Shangbin Feng, Zhaoxuan Tan, Xiaochuang Han, Tianxing He, Yulia Tsvetkov

    Abstract: Large language models (LLMs) demonstrate great potential for problems with implicit graphical structures, while recent works seek to enhance the graph reasoning capabilities of LLMs through specialized instruction tuning. The resulting 'graph LLMs' are evaluated with in-distribution settings only, thus it remains underexplored whether LLMs are learning generalizable graph reasoning skills or merel… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 16 pages, 6 figures, Code and data will be publicly available at https://github.com/MatthewYZhang/NLGift

    ACM Class: I.2.7

  26. arXiv:2406.13392  [pdf, other

    cs.CV

    Strengthening Layer Interaction via Dynamic Layer Attention

    Authors: Kaishen Wang, Xun Xia, Jian Liu, Zhang Yi, Tao He

    Abstract: In recent years, employing layer attention to enhance interaction among hierarchical layers has proven to be a significant advancement in building network structures. In this paper, we delve into the distinction between layer attention and the general attention mechanism, noting that existing layer attention methods achieve layer interaction on fixed feature maps in a static manner. These static l… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted by IJCAI2024

  27. arXiv:2406.11253  [pdf, other

    cs.CV

    Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space

    Authors: Yuan Wang, Zhao Wang, Junhao Gong, Di Huang, Tong He, Wanli Ouyang, Jile Jiao, Xuetao Feng, Qi Dou, Shixiang Tang, Dan Xu

    Abstract: In this paper, we introduce a novel path to $\textit{general}$ human motion generation by focusing on 2D space. Traditional methods have primarily generated human motions in 3D, which, while detailed and realistic, are often limited by the scope of available 3D motion data in terms of both the size and the diversity. To address these limitations, we exploit extensive availability of 2D motion data… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 22 pages, 11figures, 17 tables

  28. arXiv:2406.10163  [pdf, other

    cs.CV cs.AI

    MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

    Authors: Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, Chi Zhang

    Abstract: Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement. However, this potential is largely unrealized because these assets always need to be converted to meshes for 3D industry applications, and the meshes produced by current mesh extraction methods are significantly inferior to Artist-Created… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Project Page: https://buaacyw.github.io/mesh-anything/ Code: https://github.com/buaacyw/MeshAnything

  29. arXiv:2406.10111  [pdf, other

    cs.CV

    GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors

    Authors: Xiqian Yu, Hanxin Zhu, Tianyu He, Zhibo Chen

    Abstract: Achieving high-resolution novel view synthesis (HRNVS) from low-resolution input views is a challenging task due to the lack of high-resolution data. Previous methods optimize high-resolution Neural Radiance Field (NeRF) from low-resolution input views but suffer from slow rendering speed. In this work, we base our method on 3D Gaussian Splatting (3DGS) due to its capability of producing high-qual… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  30. arXiv:2406.09196  [pdf, other

    cs.CV cs.LG

    Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

    Authors: Ke Fan, Zechen Bai, Tianjun Xiao, Tong He, Max Horn, Yanwei Fu, Francesco Locatello, Zheng Zhang

    Abstract: Object-centric learning (OCL) extracts the representation of objects with slots, offering an exceptional blend of flexibility and interpretability for abstracting low-level perceptual features. A widely adopted method within OCL is slot attention, which utilizes attention mechanisms to iteratively refine slot representations. However, a major drawback of most object-centric models, including slot… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: CVPR 2024

  31. arXiv:2406.08858  [pdf, other

    cs.RO cs.CV cs.LG eess.SY

    OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

    Authors: Tairan He, Zhengyi Luo, Xialin He, Wenli Xiao, Chong Zhang, Weinan Zhang, Kris Kitani, Changliu Liu, Guanya Shi

    Abstract: We present OmniH2O (Omni Human-to-Humanoid), a learning-based system for whole-body humanoid teleoperation and autonomy. Using kinematic pose as a universal control interface, OmniH2O enables various ways for a human to control a full-sized humanoid with dexterous hands, including using real-time teleoperation through VR headset, verbal instruction, and RGB camera. OmniH2O also enables full autono… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Project page: https://omni.human2humanoid.com/

  32. arXiv:2406.08096  [pdf, other

    cs.CV

    Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement

    Authors: Runyi Yu, Tianyu He, Ailing Zhang, Yuchi Wang, Junliang Guo, Xu Tan, Chang Liu, Jie Chen, Jiang Bian

    Abstract: We aim to edit the lip movements in talking video according to the given speech while preserving the personal identity and visual details. The task can be decomposed into two sub-problems: (1) speech-driven lip motion generation and (2) visual appearance synthesis. Current solutions handle the two sub-problems within a single generative model, resulting in a challenging trade-off between lip-sync… ▽ More

    Submitted 16 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: 14 pages of main text, 23 pages in total, 9 figures

  33. arXiv:2406.07120  [pdf, other

    math.CO

    Total Positivity of Quasi-Riordan Arrays

    Authors: Tian-Xiao He, Roksana Słowik

    Abstract: In this paper the total positivity of quasi-Riordan arrays is investigated with use of the sequence characterization of quasi-Riordan arrays. Due to the correlation between quasi-Riordan arrays and Riordan arrays, this study is an in-depth discussion of the total positivity of Riordan arrays.

    Submitted 11 June, 2024; originally announced June 2024.

    MSC Class: 05A15; 05A05; 15B36; 15A06; 05A19; 11B83

  34. arXiv:2406.06559  [pdf, other

    cs.CL cs.AI cs.LG

    Harnessing Business and Media Insights with Large Language Models

    Authors: Yujia Bao, Ankit Parag Shah, Neeru Narang, Jonathan Rivers, Rajeev Maksey, Lan Guan, Louise N. Barrere, Shelley Evenson, Rahul Basole, Connie Miao, Ankit Mehta, Fabien Boulay, Su Min Park, Natalie E. Pearson, Eldhose Joy, Tiger He, Sumiran Thakur, Koustav Ghosal, Josh On, Phoebe Morrison, Tim Major, Eva Siqi Wang, Gina Escobar, Jiaheng Wei, Tharindu Cyril Weerasooriya , et al. (8 additional authors not shown)

    Abstract: This paper introduces Fortune Analytics Language Model (FALM). FALM empowers users with direct access to comprehensive business analysis, including market trends, company performance metrics, and expert insights. Unlike generic LLMs, FALM leverages a curated knowledge base built from professional journalism, enabling it to deliver precise and in-depth answers to intricate business questions. Users… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  35. arXiv:2406.06005  [pdf, other

    cs.RO cs.GR eess.SY

    WoCoCo: Learning Whole-Body Humanoid Control with Sequential Contacts

    Authors: Chong Zhang, Wenli Xiao, Tairan He, Guanya Shi

    Abstract: Humanoid activities involving sequential contacts are crucial for complex robotic interactions and operations in the real world and are traditionally solved by model-based motion planning, which is time-consuming and often relies on simplified dynamics models. Although model-free reinforcement learning (RL) has become a powerful tool for versatile and robust whole-body humanoid control, it still r… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Website and Videos: https://lecar-lab.github.io/wococo/

  36. arXiv:2406.05374  [pdf, other

    cs.CL

    Planning Like Human: A Dual-process Framework for Dialogue Planning

    Authors: Tao He, Lizi Liao, Yixin Cao, Yuanxing Liu, Ming Liu, Zerui Chen, Bing Qin

    Abstract: In proactive dialogue, the challenge lies not just in generating responses but in steering conversations toward predetermined goals, a task where Large Language Models (LLMs) typically struggle due to their reactive nature. Traditional approaches to enhance dialogue planning in LLMs, ranging from elaborate prompt engineering to the integration of policy networks, either face efficiency issues or d… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 24 pages, 5 figures, ACL 2024 main conference

  37. arXiv:2406.04038  [pdf, other

    cs.LG

    Road Network Representation Learning with the Third Law of Geography

    Authors: Haicang Zhou, Weiming Huang, Yile Chen, Tiantian He, Gao Cong, Yew-Soon Ong

    Abstract: Road network representation learning aims to learn compressed and effective vectorized representations for road segments that are applicable to numerous tasks. In this paper, we identify the limitations of existing methods, particularly their overemphasis on the distance effect as outlined in the First Law of Geography. In response, we propose to endow road network representation with the principl… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  38. arXiv:2406.03774  [pdf, ps, other

    math.CO

    Total Positivity of Almost-Riordan Arrays

    Authors: Tian-Xiao He, Roksana Słowik

    Abstract: In this paper we study the total positivity of almost-Riordan arrays $(d(t)|\, g(t), f(t))$ and establish its necessary conditions and sufficient conditions, particularly, for some well used formal power series $d(t)$. We present a semidirect product of an almost-array and use it to transfer a total positivity problem for an almost-Riordan array to the total positivity problem for a quasi-Riordan… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    MSC Class: 05A15; 05A05; 15B36; 15A06; 05A19; 11B83

  39. arXiv:2406.03495  [pdf, other

    cs.LG cond-mat.dis-nn hep-th math.NT stat.ML

    Grokking Modular Polynomials

    Authors: Darshil Doshi, Tianyu He, Aritra Das, Andrey Gromov

    Abstract: Neural networks readily learn a subset of the modular arithmetic tasks, while failing to generalize on the rest. This limitation remains unmoved by the choice of architecture and training strategies. On the other hand, an analytical solution for the weights of Multi-layer Perceptron (MLP) networks that generalize on the modular addition task is known in the literature. In this work, we (i) extend… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 7+4 pages, 3 figures, 2 tables

  40. arXiv:2406.03051  [pdf, other

    cs.CV

    Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision

    Authors: Minglei Li, Peng Ye, Yongqi Huang, Lin Zhang, Tao Chen, Tong He, Jiayuan Fan, Wanli Ouyang

    Abstract: Parameter-efficient fine-tuning (PEFT) has become increasingly important as foundation models continue to grow in both popularity and size. Adapter has been particularly well-received due to their potential for parameter reduction and adaptability across diverse tasks. However, striking a balance between high efficiency and robust generalization across tasks remains a challenge for adapter-based m… ▽ More

    Submitted 5 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  41. arXiv:2406.02550  [pdf, other

    cs.LG cond-mat.dis-nn hep-th stat.ML

    Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks

    Authors: Tianyu He, Darshil Doshi, Aritra Das, Andrey Gromov

    Abstract: Large language models can solve tasks that were not present in the training set. This capability is believed to be due to in-context learning and skill composition. In this work, we study the emergence of in-context learning and skill composition in a collection of modular arithmetic tasks. Specifically, we consider a finite collection of linear modular functions… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 21 pages, 19 figures

  42. arXiv:2406.01597  [pdf, other

    cs.CV cs.GR

    End-to-End Rate-Distortion Optimized 3D Gaussian Representation

    Authors: Henan Wang, Hanxin Zhu, Tianyu He, Runsen Feng, Jiajun Deng, Jiang Bian, Zhibo Chen

    Abstract: 3D Gaussian Splatting (3DGS) has become an emerging technique with remarkable potential in 3D representation and image rendering. However, the substantial storage overhead of 3DGS significantly impedes its practical applications. In this work, we formulate the compact 3D Gaussian learning as an end-to-end Rate-Distortion Optimization (RDO) problem and propose RDO-Gaussian that can achieve flexible… ▽ More

    Submitted 9 April, 2024; originally announced June 2024.

  43. arXiv:2405.20771  [pdf, other

    cs.CR cs.AI cs.CV cs.LG

    Towards Black-Box Membership Inference Attack for Diffusion Models

    Authors: Jingwei Li, Jing Dong, Tianxing He, Jingzhao Zhang

    Abstract: Identifying whether an artwork was used to train a diffusion model is an important research topic, given the rising popularity of AI-generated art and the associated copyright concerns. The work approaches this problem from the membership inference attack (MIA) perspective. We first identify the limitations of applying existing MIA methods for copyright protection: the required access of internal… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  44. arXiv:2405.20606  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning

    Authors: Yang Chen, Tian He, Junfeng Fu, Ling Wang, Jingcai Guo, Hong Cheng

    Abstract: Supervised and self-supervised learning are two main training paradigms for skeleton-based human action recognition. However, the former one-hot classification requires labor-intensive predefined action categories annotations, while the latter involves skeleton transformations (e.g., cropping) in the pretext tasks that may impair the skeleton structure. To address these challenges, we introduce a… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  45. arXiv:2405.19918  [pdf, ps, other

    math.CO

    A bijection related to Bressoud's conjecture

    Authors: Y. H. Chen, Thomas Y. He

    Abstract: Bressoud introduced the partition function $B(α_1,\ldots,α_λ;η,k,r;n)$, which counts the number of partitions with certain difference conditions. Bressoud posed a conjecture on the generating function for the partition function $B(α_1,\ldots,α_λ;η,k,r;n)$ in multi-summation form. In this article, we introduce a bijection related to Bressoud's conjecture. As an application, we give a new companion… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  46. arXiv:2405.18104  [pdf, other

    math.MG

    The Legendre Transform of Convex Lattice Sets

    Authors: Tingting He, Lin Si

    Abstract: The goal of this paper is to study convex lattice sets by the discrete Legendre transform. The definition of the polar of convex lattice sets in $\mathbb{Z}^n$ is provided. It is worth mentioning that the polar of convex lattice sets have the self-dual property similar to that of convex bodies. Some properties of convex lattice sets are established, for instance, the inclusion relation, the union… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 21 pages,5 figures

    MSC Class: Primary 52C07; Secondary 11H06; 52B20

  47. arXiv:2405.17461  [pdf, other

    cs.LG cs.CV

    EMR-Merging: Tuning-Free High-Performance Model Merging

    Authors: Chenyu Huang, Peng Ye, Tao Chen, Tong He, Xiangyu Yue, Wanli Ouyang

    Abstract: The success of pretrain-finetune paradigm brings about the release of numerous model weights. In this case, merging models finetuned on different tasks to enable a single model with multi-task capabilities is gaining increasing attention for its practicability. Existing model merging methods usually suffer from (1) significant performance degradation or (2) requiring tuning by additional data or t… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  48. arXiv:2405.17220  [pdf, other

    cs.CL

    RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

    Authors: Tianyu Yu, Haoye Zhang, Yuan Yao, Yunkai Dang, Da Chen, Xiaoman Lu, Ganqu Cui, Taiwen He, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun

    Abstract: Learning from feedback reduces the hallucination of multimodal large language models (MLLMs) by aligning them with human preferences. While traditional methods rely on labor-intensive and time-consuming manual labeling, recent approaches employing models as automatic labelers have shown promising results without human intervention. However, these methods heavily rely on costly proprietary models l… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Project Website: https://github.com/RLHF-V/RLAIF-V

  49. arXiv:2405.17212  [pdf, ps, other

    gr-qc astro-ph.CO

    A new parametrization of Hubble function and Hubble tension

    Authors: Tong-Yu He, Jia-Jun Yin, Zhen-Yu Wang, Zhan-Wen Han, Rong-Jia Yang

    Abstract: We present a new Hubble parameterization method and employ observational data from Hubble, Pantheon, and Baryon Acoustic Oscillations to constrain model parameters. The proposed method is thoroughly validated against these datasets, demonstrating a robust fit to the observational data. The obtained best-fit values are $H_0 = 67.5^{+1.3}_{-1.6}$ $\text{km s}^{-1} \text{Mpc}^{-1}$,… ▽ More

    Submitted 16 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: 11 pages, 5 figures

  50. arXiv:2405.15758  [pdf, other

    cs.CV cs.AI

    InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

    Authors: Yuchi Wang, Junliang Guo, Jianhong Bai, Runyi Yu, Tianyu He, Xu Tan, Xu Sun, Jiang Bian

    Abstract: Recent talking avatar generation models have made strides in achieving realistic and accurate lip synchronization with the audio, but often fall short in controlling and conveying detailed expressions and emotions of the avatar, making the generated video less vivid and controllable. In this paper, we propose a novel text-guided approach for generating emotionally expressive 2D avatars, offering f… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Project page: https://wangyuchi369.github.io/InstructAvatar/