Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 397 results for author: Shao, Z

.
  1. arXiv:2408.14158  [pdf, other

    cs.DC cs.AI

    Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

    Authors: Wei An, Xiao Bi, Guanting Chen, Shanhuang Chen, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Wenjun Gao, Kang Guan, Jianzhong Guo, Yongqiang Guo, Zhe Fu, Ying He, Panpan Huang, Jiashi Li, Wenfeng Liang, Xiaodong Liu, Xin Liu, Yiyuan Liu, Yuxuan Liu, Shanghao Lu, Xuan Lu, Xiaotao Nie, Tian Pei , et al. (27 additional authors not shown)

    Abstract: The rapid progress in Deep Learning (DL) and Large Language Models (LLMs) has exponentially increased demands of computational power and bandwidth. This, combined with the high costs of faster computing chips and interconnects, has significantly inflated High Performance Computing (HPC) construction costs. To address these challenges, we introduce the Fire-Flyer AI-HPC architecture, a synergistic… ▽ More

    Submitted 31 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: This is the preprint version of the paper accepted for presentation at the 2024 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'24). \c{opyright} 2024 IEEE. Personal use of this material is permitted. For other uses, permission from IEEE must be obtained. Please refer to IEEE Xplore for the final published version

  2. arXiv:2408.13233  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

    Authors: Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou

    Abstract: The quadratic computational complexity in the self-attention mechanism of popular transformer architectures poses significant challenges for training and inference, particularly in terms of efficiency and memory requirements. Towards addressing these challenges, this paper introduces a novel fast computation method for gradient calculation in multi-layer transformer models. Our approach enables th… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  3. arXiv:2408.10588  [pdf, other

    cs.CV cs.GR

    DEGAS: Detailed Expressions on Full-Body Gaussian Avatars

    Authors: Zhijing Shao, Duotun Wang, Qing-Yao Tian, Yao-Dong Yang, Hengyu Meng, Zeyu Cai, Bo Dong, Yu Zhang, Kang Zhang, Zeyu Wang

    Abstract: Although neural rendering has made significant advancements in creating lifelike, animatable full-body and head avatars, incorporating detailed expressions into full-body avatars remains largely unexplored. We present DEGAS, the first 3D Gaussian Splatting (3DGS)-based modeling method for full-body avatars with rich facial expressions. Trained on multiview videos of a given subject, our method lea… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  4. arXiv:2408.09695  [pdf, other

    cs.LG cs.AI physics.ao-ph

    LightWeather: Harnessing Absolute Positional Encoding to Efficient and Scalable Global Weather Forecasting

    Authors: Yisong Fu, Fei Wang, Zezhi Shao, Chengqing Yu, Yujie Li, Zhao Chen, Zhulin An, Yongjun Xu

    Abstract: Recently, Transformers have gained traction in weather forecasting for their capability to capture long-term spatial-temporal correlations. However, their complex architectures result in large parameter counts and extended training times, limiting their practical application and scalability to global-scale forecasting. This paper aims to explore the key factor for accurate weather forecasting and… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  5. arXiv:2408.08152  [pdf, other

    cs.CL cs.AI cs.LG cs.LO

    DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

    Authors: Huajian Xin, Z. Z. Ren, Junxiao Song, Zhihong Shao, Wanjia Zhao, Haocheng Wang, Bo Liu, Liyue Zhang, Xuan Lu, Qiushi Du, Wenjun Gao, Qihao Zhu, Dejian Yang, Zhibin Gou, Z. F. Wu, Fuli Luo, Chong Ruan

    Abstract: We introduce DeepSeek-Prover-V1.5, an open-source language model designed for theorem proving in Lean 4, which enhances DeepSeek-Prover-V1 by optimizing both training and inference processes. Pre-trained on DeepSeekMath-Base with specialization in formal mathematical languages, the model undergoes supervised fine-tuning using an enhanced formal theorem proving dataset derived from DeepSeek-Prover-… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  6. arXiv:2408.07592  [pdf, other

    eess.SP

    Multi-periodicity dependency Transformer based on spectrum offset for radio frequency fingerprint identification

    Authors: Jing Xiao, Wenrui Ding, Zeqi Shao, Duona Zhang, Yanan Ma, Yufeng Wang, Jian Wang

    Abstract: Radio Frequency Fingerprint Identification (RFFI) has emerged as a pivotal task for reliable device authentication. Despite advancements in RFFI methods, background noise and intentional modulation features result in weak energy and subtle differences in the RFF features. These challenges diminish the capability of RFFI methods in feature representation, complicating the effective identification o… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  7. arXiv:2408.06304  [pdf, other

    cs.CR cs.AR cs.ET

    Control-Flow Attestation: Concepts, Solutions, and Open Challenges

    Authors: Zhanyu Sha, Carlton Shepherd, Amir Rafi, Konstantinos Markantonakis

    Abstract: Control-flow attestation unifies the worlds of control-flow integrity and platform attestation by measuring and reporting a target's run-time behaviour to a verifier. Trust assurances in the target are provided by testing whether its execution follows an authorised control-flow path. The problem has been explored in various settings, such as assessing the trustworthiness of cyber-physical systems,… ▽ More

    Submitted 16 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

  8. arXiv:2407.20570  [pdf, other

    cs.HC

    Fine-Tuned Large Language Model for Visualization System: A Study on Self-Regulated Learning in Education

    Authors: Lin Gao, Jing Lu, Zekai Shao, Ziyue Lin, Shengbin Yue, Chiokit Ieong, Yi Sun, Rory James Zauner, Zhongyu Wei, Siming Chen

    Abstract: Large Language Models (LLMs) have shown great potential in intelligent visualization systems, especially for domain-specific applications. Integrating LLMs into visualization systems presents challenges, and we categorize these challenges into three alignments: domain problems with LLMs, visualization with LLMs, and interaction with LLMs. To achieve these alignments, we propose a framework and out… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  9. arXiv:2407.15502  [pdf, other

    cs.CV

    WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation

    Authors: Zirui Shao, Feiyu Gao, Hangdi Xing, Zepeng Zhu, Zhi Yu, Jiajun Bu, Qi Zheng, Cong Yao

    Abstract: In the era of content creation revolution propelled by advancements in generative models, the field of web design remains unexplored despite its critical role in modern digital communication. The web design process is complex and often time-consuming, especially for those with limited expertise. In this paper, we introduce Web Rendering Parameters Generation (WebRPG), a new task that aims at autom… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024. The dataset and code can be accessed at https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/DocumentUnderstanding/WebRPG

  10. arXiv:2407.13621  [pdf, other

    cs.LG cs.AI cs.CR

    Differential Privacy Mechanisms in Neural Tangent Kernel Regression

    Authors: Jiuxiang Gu, Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song

    Abstract: Training data privacy is a fundamental problem in modern Artificial Intelligence (AI) applications, such as face recognition, recommendation systems, language generation, and many others, as it may contain sensitive user information related to legal issues. To fundamentally understand how privacy mechanisms work in AI applications, we study differential privacy (DP) in the Neural Tangent Kernel (N… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  11. arXiv:2407.10430  [pdf, other

    cs.CL cs.AI

    Expanding the Scope: Inductive Knowledge Graph Reasoning with Multi-Starting Progressive Propagation

    Authors: Zhoutian Shao, Yuanning Cui, Wei Hu

    Abstract: Knowledge graphs (KGs) are widely acknowledged as incomplete, and new entities are constantly emerging in the real world. Inductive KG reasoning aims to predict missing facts for these new entities. Among existing models, graph neural networks (GNNs) based ones have shown promising performance for this task. However, they are still challenged by inefficient message propagation due to the distance… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted in the 23rd International Semantic Web Conference (ISWC 2024)

  12. arXiv:2407.09050  [pdf, other

    cs.CR cs.AI cs.CV cs.LG

    Refusing Safe Prompts for Multi-modal Large Language Models

    Authors: Zedian Shao, Hongbin Liu, Yuepeng Hu, Neil Zhenqiang Gong

    Abstract: Multimodal large language models (MLLMs) have become the cornerstone of today's generative AI ecosystem, sparking intense competition among tech giants and startups. In particular, an MLLM generates a text response given a prompt consisting of an image and a question. While state-of-the-art MLLMs use safety filters and alignment techniques to refuse unsafe prompts, in this work, we introduce MLLM-… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  13. arXiv:2406.19217  [pdf, other

    cs.CV cs.AI cs.RO

    Think Step by Step: Chain-of-Gesture Prompting for Error Detection in Robotic Surgical Videos

    Authors: Zhimin Shao, Jialang Xu, Danail Stoyanov, Evangelos B. Mazomenos, Yueming Jin

    Abstract: Despite significant advancements in robotic systems and surgical data science, ensuring safe and optimal execution in robot-assisted minimally invasive surgery (RMIS) remains a complex challenge. Current surgical error detection methods involve two parts: identifying surgical gestures and then detecting errors within each gesture clip. These methods seldom consider the rich contextual and semantic… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 8 pages, 4 figures

  14. arXiv:2406.18001  [pdf, other

    cs.DC stat.ML

    Scalable Dual Coordinate Descent for Kernel Methods

    Authors: Zishan Shao, Aditya Devarakonda

    Abstract: Dual Coordinate Descent (DCD) and Block Dual Coordinate Descent (BDCD) are important iterative methods for solving convex optimization problems. In this work, we develop scalable DCD and BDCD methods for the kernel support vector machines (K-SVM) and kernel ridge regression (K-RR) problems. On distributed-memory parallel machines the scalability of these methods is limited by the need to communica… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    MSC Class: 65Y05 ACM Class: D.1.3; G.4; F.2.1

  15. arXiv:2406.16675  [pdf, ps, other

    cs.IT eess.SP

    Decentralized and Centralized IDD Schemes for Cell-Free Networks

    Authors: T. Ssettumba, Z. Shao, L. Landau, R. de Lamare

    Abstract: In this paper, we propose iterative interference cancellation schemes with access points selection (APs-Sel) for cell-free massive multiple-input multiple-output (CF-mMIMO) systems. Closed-form expressions for centralized and decentralized linear minimum mean square error (LMMSE) receive filters with APs-Sel are derived assuming imperfect channel state information (CSI). Furthermore, we develop a… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 13 pages, 6 figures

  16. arXiv:2406.16006  [pdf, other

    cs.LG cs.AI

    Bounding-Box Inference for Error-Aware Model-Based Reinforcement Learning

    Authors: Erin J. Talvitie, Zilei Shao, Huiying Li, Jinghan Hu, Jacob Boerma, Rory Zhao, Xintong Wang

    Abstract: In model-based reinforcement learning, simulated experiences from the learned model are often treated as equivalent to experience from the real environment. However, when the model is inaccurate, it can catastrophically interfere with policy learning. Alternatively, the agent might learn about the model's accuracy and selectively use it only when it can provide reliable predictions. We empirically… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: To appear: Reinforcement Learning Conference (RLC), 2024

  17. arXiv:2406.12847  [pdf, other

    cs.CV

    ChangeViT: Unleashing Plain Vision Transformers for Change Detection

    Authors: Duowang Zhu, Xiaohu Huang, Haiyan Huang, Zhenfeng Shao, Qimin Cheng

    Abstract: Change detection in remote sensing images is essential for tracking environmental changes on the Earth's surface. Despite the success of vision transformers (ViTs) as backbones in numerous computer vision applications, they remain underutilized in change detection, where convolutional neural networks (CNNs) continue to dominate due to their powerful feature extraction capabilities. In this paper,… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  18. arXiv:2406.11931  [pdf, other

    cs.SE cs.AI cs.LG

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

    Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

    Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  19. arXiv:2406.10474  [pdf, other

    cs.DC

    Federated Neural Radiance Field for Distributed Intelligence

    Authors: Yintian Zhang, Ziyu Shao

    Abstract: Novel view synthesis (NVS) is an important technology for many AR and VR applications. The recently proposed Neural Radiance Field (NeRF) approach has demonstrated superior performance on NVS tasks, and has been applied to other related fields. However, certain application scenarios with distributed data storage may pose challenges on acquiring training images for the NeRF approach, due to strict… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  20. arXiv:2406.07996  [pdf, other

    cs.NI eess.SP

    Semantic-Aware Resource Allocation Based on Deep Reinforcement Learning for 5G-V2X HetNets

    Authors: Zhiyu Shao, Qiong Wu, Pingyi Fan, Nan Cheng, Qiang Fan, Jiangzhou Wang

    Abstract: This letter proposes a semantic-aware resource allocation (SARA) framework with flexible duty cycle (DC) coexistence mechanism (SARADC) for 5G-V2X Heterogeneous Network (HetNets) based on deep reinforcement learning (DRL) proximal policy optimization (PPO). Specifically, we investigate V2X networks within a two-tiered HetNets structure. In response to the needs of high-speed vehicular networking i… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: This paper has been submitted to IEEE Letter.The source code has been released at: https://github.com/qiongwu86/Semantic-Aware-Resource-Allocation-Based-on-Deep-Reinforcement-Learning-for-5G-V2X-HetNets

  21. arXiv:2406.07637  [pdf, other

    astro-ph.GA

    The destiny of open cluster NGC 6530: past and future

    Authors: Delong Jia, Heng Yu, Zhengyi Shao, Lu Li

    Abstract: Studying the structures of open clusters is crucial for understanding stellar evolution and galactic dynamics. Based on Gaia DR3 data, we apply the hierarchical clustering algorithm to a young open cluster NGC 6530 and group its members into 5 substructures. By linear tracing with the kinematic information of their members, we find that: Sub 1 is the core of the cluster. It is expanding slowly. Su… ▽ More

    Submitted 14 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: 13 pages, 11 figures, accepted for publication in AJ

  22. arXiv:2406.07213  [pdf, other

    cs.LG

    Semantic-Aware Spectrum Sharing in Internet of Vehicles Based on Deep Reinforcement Learning

    Authors: Zhiyu Shao, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Jiangzhou Wang, Khaled B. Letaief

    Abstract: This work aims to investigate semantic communication in high-speed mobile Internet of vehicles (IoV) environments, with a focus on the spectrum sharing between vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications. We specifically address spectrum scarcity and network traffic and then propose a semantic-aware spectrum sharing algorithm (SSS) based on the deep reinforcement le… ▽ More

    Submitted 17 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: This paper has been submitted to IEEE Journal. The source code has been released at: https://github.com/qiongwu86/Semantic-Aware-Spectrum-Sharing-in-Internet-of-Vehicles-Based-on-Deep-Reinforcement-Learning

  23. arXiv:2406.05871  [pdf, other

    cs.CV cs.LG

    OmniControlNet: Dual-stage Integration for Conditional Image Generation

    Authors: Yilin Wang, Haiyang Xu, Xiang Zhang, Zeyuan Chen, Zhizhou Sha, Zirui Wang, Zhuowen Tu

    Abstract: We provide a two-way integration for the widely adopted ControlNet by integrating external condition generation algorithms into a single dense prediction method and incorporating its individually trained image generation processes into a single model. Despite its tremendous success, the ControlNet of a two-stage pipeline bears limitations in being not self-contained (e.g. calls the external condit… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR 2024 Workshop: Generative Models for Computer Vision

  24. arXiv:2406.04761  [pdf, other

    astro-ph.GA

    \texttt{Simba}-\texttt{C}: the evolution of the thermal and chemical properties in the intragroup medium

    Authors: Renier T. Hough, Zhiwei Shao, Weiguang Cui, S. Ilani Loubser, Arif Babul, Romeel Davé, Douglas Rennehan, Chiaki Kobayashi

    Abstract: The newly updated \texttt{GIZMO} and \texttt{Simba} based simulation, \texttt{Simba-C}, with its new stellar feedback, chemical enrichment, and recalibrated AGN feedback, allows for a detailed study of the intragroup medium X-ray properties. We discuss the impact of various physical mechanisms, e.g. stellar and AGN feedback, and chemical enrichment, on the composition and the global scaling relati… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 20 pages, 13 figures, 2 tables, accepted by MNRAS on 6 June 2024

  25. arXiv:2406.04604  [pdf, other

    cs.CL cs.PL

    Learning Task Decomposition to Assist Humans in Competitive Programming

    Authors: Jiaxin Wen, Ruiqi Zhong, Pei Ke, Zhihong Shao, Hongning Wang, Minlie Huang

    Abstract: When using language models (LMs) to solve complex problems, humans might struggle to understand the LM-generated solutions and repair the flawed ones. To assist humans in repairing them, we propose to automatically decompose complex solutions into multiple simpler pieces that correspond to specific subtasks. We introduce a novel objective for learning task decomposition, termed assistive value (As… ▽ More

    Submitted 23 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Main Conference

  26. arXiv:2406.04423  [pdf, other

    stat.ME cs.SI physics.soc-ph

    Determining the Number of Communities in Sparse and Imbalanced Settings

    Authors: Zhixuan Shao, Can M. Le

    Abstract: Community structures represent a crucial aspect of network analysis, and various methods have been developed to identify these communities. However, a common hurdle lies in determining the number of communities K, a parameter that often requires estimation in practice. Existing approaches for estimating K face two notable challenges: the weak community signal present in sparse networks and the imb… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  27. arXiv:2406.00585  [pdf, ps, other

    math.CO

    A note on the Nearly Dispersability of Odd Toroidal Grids

    Authors: Xiaoxiang Yu, Zeling Shao, Zhiguo Li

    Abstract: The \emph{matching book thickness} $mbt(G)$ of $G$ is the minimum integer $m$ such that an $m$-page matching book embedding exists. A graph $G$ is called \emph{dispersable} if $mbt(G)=Δ(G)$, \emph{nearly dispersable} if $mbt(G)=Δ(G)+1$. Recently, the authors determined the nearly dispersability of odd toroidal grids $T_{s,t}$. In this note, we further present a brief proof for this result.

    Submitted 1 June, 2024; originally announced June 2024.

    MSC Class: 05C10

  28. arXiv:2405.19609  [pdf, other

    cs.CV cs.GR

    SMPLX-Lite: A Realistic and Drivable Avatar Benchmark with Rich Geometry and Texture Annotations

    Authors: Yujiao Jiang, Qingmin Liao, Zhaolong Wang, Xiangru Lin, Zongqing Lu, Yuxi Zhao, Hanqing Wei, Jingrui Ye, Yu Zhang, Zhijing Shao

    Abstract: Recovering photorealistic and drivable full-body avatars is crucial for numerous applications, including virtual reality, 3D games, and tele-presence. Most methods, whether reconstruction or generation, require large numbers of human motion sequences and corresponding textured meshes. To easily learn a drivable avatar, a reasonable parametric body model with unified topology is paramount. However,… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: ICME 2024;Project page: https://alex-jyj.github.io/SMPLX-Lite/

  29. arXiv:2405.16253  [pdf, ps, other

    math.CO

    On the dispersability of graph bundles over cycles

    Authors: Zeling Shao, Xiaoxiang Yu, Zhiguo Li

    Abstract: In this paper, the dispersability of the Cartesian graph bundle over two cycles is completely solved. We show the Cartesian graph bundle $G$ over two cycles is dispersable if $G$ is bipartite; otherwise, $G$ is nearly dispersable.

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2310.06612

    MSC Class: 05C10

  30. arXiv:2405.14333  [pdf, other

    cs.AI

    DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

    Authors: Huajian Xin, Daya Guo, Zhihong Shao, Zhizhou Ren, Qihao Zhu, Bo Liu, Chong Ruan, Wenda Li, Xiaodan Liang

    Abstract: Proof assistants like Lean have revolutionized mathematical proof verification, ensuring high accuracy and reliability. Although large language models (LLMs) show promise in mathematical reasoning, their advancement in formal theorem proving is hindered by a lack of training data. To address this issue, we introduce an approach to generate extensive Lean 4 proof data derived from high-school and u… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  31. arXiv:2405.13312  [pdf, other

    cs.IT eess.SP

    Iterative Detection and Decoding Schemes with LLR Refinements in Cell-Free Massive MIMO Networks

    Authors: T. Ssettumba, Z. Shao, L. Landau, R. C. de Lamare

    Abstract: In this paper, we propose low-complexity local detectors and log-likelihood ratio (LLR) refinement techniques for a coded cell-free massive multiple input multiple output (CF- mMIMO) systems, where an iterative detection and decoding (IDD) scheme is applied using parallel interference cancellation (PIC) and access point (AP) selection. In particular, we propose three LLR processing schemes based o… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 6 pages, 2 figures

  32. arXiv:2405.12107  [pdf, other

    cs.CV cs.CL

    Imp: Highly Capable Large Multimodal Models for Mobile Devices

    Authors: Zhenwei Shao, Zhou Yu, Jun Yu, Xuecheng Ouyang, Lihao Zheng, Zhenbiao Gai, Mingyang Wang, Jiajun Ding

    Abstract: By harnessing the capabilities of large language models (LLMs), recent large multimodal models (LMMs) have shown remarkable versatility in open-world multimodal understanding. Nevertheless, they are usually parameter-heavy and computation-intensive, thus hindering their applicability in resource-constrained scenarios. To this end, several lightweight LMMs have been proposed successively to maximiz… ▽ More

    Submitted 29 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: fix some typos and correct a few number in the tables

  33. arXiv:2405.11333  [pdf, other

    cs.LG cs.AI

    GinAR: An End-To-End Multivariate Time Series Forecasting Model Suitable for Variable Missing

    Authors: Chengqing Yu, Fei Wang, Zezhi Shao, Tangwen Qian, Zhao Zhang, Wei Wei, Yongjun Xu

    Abstract: Multivariate time series forecasting (MTSF) is crucial for decision-making to precisely forecast the future values/trends, based on the complex relationships identified from historical observations of multiple sequences. Recently, Spatial-Temporal Graph Neural Networks (STGNNs) have gradually become the theme of MTSF model as their powerful capability in mining spatial-temporal dependencies, but a… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024 (Research track)

  34. arXiv:2405.08348  [pdf, other

    cs.PL

    Foundational Verification of Smart Contracts through Verified Compilation

    Authors: Vilhelm Sjöberg, Kinnari Dave, Daniel Britten, Maria A Schett, Xinyuan Sun, Qinshi Wang, Sean Noble Anderson, Steve Reeves, Zhong Shao

    Abstract: Programs executed on a blockchain - smart contracts - have high financial stakes; their correctness is crucial. We argue, that this correctness needs to be foundational: correctness needs to be based on the operational semantics of their execution environment. In this work we present a foundational system - the DeepSEA system - targeting the Ethereum blockchain as the largest smart contract platfo… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 27 pages, 6 figures

    ACM Class: F.3.1; F.3.2

  35. arXiv:2405.06175  [pdf, other

    eess.IV cs.CV

    Prior-guided Diffusion Model for Cell Segmentation in Quantitative Phase Imaging

    Authors: Zhuchen Shao, Mark A. Anastasio, Hua Li

    Abstract: Purpose: Quantitative phase imaging (QPI) is a label-free technique that provides high-contrast images of tissues and cells without the use of chemicals or dyes. Accurate semantic segmentation of cells in QPI is essential for various biomedical applications. While DM-based segmentation has demonstrated promising results, the requirement for multiple sampling steps reduces efficiency. This study ai… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  36. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  37. arXiv:2405.02564  [pdf

    cs.CV cs.AI q-bio.NC

    Leveraging the Human Ventral Visual Stream to Improve Neural Network Robustness

    Authors: Zhenan Shao, Linjian Ma, Bo Li, Diane M. Beck

    Abstract: Human object recognition exhibits remarkable resilience in cluttered and dynamic visual environments. In contrast, despite their unparalleled performance across numerous visual tasks, Deep Neural Networks (DNNs) remain far less robust than humans, showing, for example, a surprising susceptibility to adversarial attacks involving image perturbations that are (almost) imperceptible to humans. Human… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  38. arXiv:2405.00269  [pdf, other

    cs.RO

    Adaptive Integral Sliding Mode Control for Attitude Tracking of Underwater Robots With Large Range Pitch Variations in Confined Space

    Authors: Xiaorui Wang, Zeyu Sha, Feitian Zhang

    Abstract: Underwater robots play a crucial role in exploring aquatic environments. The ability to flexibly adjust their attitudes is essential for underwater robots to effectively accomplish tasks in confined space. However, the highly coupled six degrees of freedom dynamics resulting from attitude changes and the complex turbulence within limited spatial areas present significant challenges. To address the… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  39. arXiv:2404.19673  [pdf, ps, other

    cs.LG

    Neural Controlled Differential Equations with Quantum Hidden Evolutions

    Authors: Lingyi Yang, Zhen Shao

    Abstract: We introduce a class of neural controlled differential equation inspired by quantum mechanics. Neural quantum controlled differential equations (NQDEs) model the dynamics by analogue of the Schrödinger equation. Specifically, the hidden state represents the wave function, and its collapse leads to an interpretation of the classification probability. We implement and compare the results of four var… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Code available at: https://github.com/lingyiyang/NQDE

  40. arXiv:2404.15899  [pdf, other

    cs.LG cs.AI

    ST-MambaSync: The Complement of Mamba and Transformers for Spatial-Temporal in Traffic Flow Prediction

    Authors: Zhiqi Shao, Xusheng Yao, Ze Wang, Junbin Gao

    Abstract: Accurate traffic flow prediction is crucial for optimizing traffic management, enhancing road safety, and reducing environmental impacts. Existing models face challenges with long sequence data, requiring substantial memory and computational resources, and often suffer from slow inference times due to the lack of a unified summary state. This paper introduces ST-MambaSync, an innovative traffic fl… ▽ More

    Submitted 9 May, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: 11 pages. arXiv admin note: substantial text overlap with arXiv:2404.13257

    MSC Class: 53A45 ACM Class: I.2.0

  41. arXiv:2404.13714  [pdf, other

    eess.SY

    Self-Adjusting Prescribed Performance Control for Nonlinear Systems with Input Saturation

    Authors: Zhuwu Shao, Yujuan Wang, Huanyu Yang, Yongduan Song

    Abstract: Among the existing works on enhancing system performance via prescribed performance functions (PPFs), the decay rates of PPFs need to be predetermined by the designer, directly affecting the convergence time of the closed-loop system. However, if only considering accelerating the system convergence by selecting a big decay rate of the performance function, it may lead to the severe consequence of… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  42. arXiv:2404.13257  [pdf, other

    cs.LG

    ST-Mamba: Spatial-Temporal Selective State Space Model for Traffic Flow Prediction

    Authors: Zhiqi Shao, Michael G. H. Bell, Ze Wang, D. Glenn Geers, Haoning Xi, Junbin Gao

    Abstract: Traffic flow prediction, a critical aspect of intelligent transportation systems, has been increasingly popular in the field of artificial intelligence, driven by the availability of extensive traffic data. The current challenges of traffic flow prediction lie in integrating diverse factors while balancing the trade-off between computational complexity and the precision necessary for effective lon… ▽ More

    Submitted 18 May, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

    Comments: 25 pages, 6 figures

    MSC Class: 53A45 ACM Class: I.2.0

  43. arXiv:2404.12257  [pdf, other

    cs.CV cs.AI cs.LG cs.MM eess.IV

    Food Portion Estimation via 3D Object Scaling

    Authors: Gautham Vinod, Jiangpeng He, Zeman Shao, Fengqing Zhu

    Abstract: Image-based methods to analyze food images have alleviated the user burden and biases associated with traditional methods. However, accurate portion estimation remains a major challenge due to the loss of 3D information in the 2D representation of foods captured by smartphone cameras or wearable devices. In this paper, we propose a new framework to estimate both food volume and energy from 2D imag… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  44. arXiv:2403.17753  [pdf, other

    cs.LG

    CCDSReFormer: Traffic Flow Prediction with a Criss-Crossed Dual-Stream Enhanced Rectified Transformer Model

    Authors: Zhiqi Shao, Michael G. H. Bell, Ze Wang, D. Glenn Geers, Xusheng Yao, Junbin Gao

    Abstract: Accurate, and effective traffic forecasting is vital for smart traffic systems, crucial in urban traffic planning and management. Current Spatio-Temporal Transformer models, despite their prediction capabilities, struggle with balancing computational efficiency and accuracy, favoring global over local information, and handling spatial and temporal data separately, limiting insight into complex int… ▽ More

    Submitted 29 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: 18 pages

    ACM Class: I.2.0

  45. arXiv:2403.14738  [pdf

    cs.LG eess.SP

    A task of anomaly detection for a smart satellite Internet of things system

    Authors: Zilong Shao

    Abstract: When the equipment is working, real-time collection of environmental sensor data for anomaly detection is one of the key links to prevent industrial process accidents and network attacks and ensure system security. However, under the environment with specific real-time requirements, the anomaly detection for environmental sensors still faces the following difficulties: (1) The complex nonlinear co… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  46. arXiv:2403.10719  [pdf

    cond-mat.mtrl-sci

    X-ray Nano-imaging of a Heterogeneous Structural Phase Transition in V2O3

    Authors: Ziming Shao, Aileen Luo, Eti Barazani, Tao Zhou, Zhonghou Cai, Martin V. Holt, Yoav Kalcheim, Andrej Singer

    Abstract: Controlling the Mott transition through strain engineering is crucial for advancing the development and application of memristive and neuromorphic computing devices. Yet, Mott insulators are heterogeneous due to intrinsic phase boundaries and extrinsic defects, posing significant challenges to fully understanding the impact of local microscopic distortions on the local Mott transition. Addressing… ▽ More

    Submitted 30 June, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  47. arXiv:2403.09326  [pdf, other

    cs.GR cs.AI

    HeadEvolver: Text to Head Avatars via Expressive and Attribute-Preserving Mesh Deformation

    Authors: Duotun Wang, Hengyu Meng, Zeyu Cai, Zhijing Shao, Qianxi Liu, Lin Wang, Mingming Fan, Xiaohang Zhan, Zeyu Wang

    Abstract: We present HeadEvolver, a novel framework to generate stylized head avatars from text guidance. HeadEvolver uses locally learnable mesh deformation from a template head mesh, producing high-quality digital assets for detail-preserving editing and animation. To tackle the challenges of lacking fine-grained and semantic-aware local shape control in global deformation through Jacobians, we introduce… ▽ More

    Submitted 10 June, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: 12 pages, 17 figures

    ACM Class: I.2.6; I.3.8

  48. arXiv:2403.06374  [pdf

    physics.optics

    Intrinsic polarization conversion and avoided-mode crossing in X-cut lithium niobate microrings

    Authors: Zelin Tan, Jianfa Zhang, Zhihong Zhu, Wei Chen, Zhengzheng Shao, Ken Liu, Shiqiao Qin

    Abstract: Compared with well-developed free space polarization converters, polarization conversion between TE and TM modes in waveguide is generally considered to be caused by shape birefringence, like curvature, morphology of waveguide cross section and scattering. Here, we reveal a hidden polarization conversion mechanism in X-cut lithium niobate microrings, that is the conversion can be implemented by bi… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  49. LEVA: Using Large Language Models to Enhance Visual Analytics

    Authors: Yuheng Zhao, Yixing Zhang, Yu Zhang, Xinyi Zhao, Junjie Wang, Zekai Shao, Cagatay Turkay, Siming Chen

    Abstract: Visual analytics supports data analysis tasks within complex domain problems. However, due to the richness of data types, visual designs, and interaction designs, users need to recall and process a significant amount of information when they visually analyze data. These challenges emphasize the need for more intelligent visual analytics methods. Large language models have demonstrated the ability… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: Accepted to IEEE TVCG 2024

  50. arXiv:2403.05087  [pdf, other

    cs.GR cs.CV

    SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting

    Authors: Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, Zeyu Wang

    Abstract: We present SplattingAvatar, a hybrid 3D representation of photorealistic human avatars with Gaussian Splatting embedded on a triangle mesh, which renders over 300 FPS on a modern GPU and 30 FPS on a mobile device. We disentangle the motion and appearance of a virtual human with explicit mesh geometry and implicit appearance modeling with Gaussian Splatting. The Gaussians are defined by barycentric… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: [CVPR 2024] Code and data are available at https://github.com/initialneil/SplattingAvatar