Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 2,298 results for author: Zhao, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.20908  [pdf, other

    cs.CV

    Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering

    Authors: Yanpeng Zhao, Yiwei Hao, Siyu Gao, Yunbo Wang, Xiaokang Yang

    Abstract: Learning object-centric representations from unsupervised videos is challenging. Unlike most previous approaches that focus on decomposing 2D images, we present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning within a differentiable volume rendering framework. The key idea is to perform object-centric voxelization to capture the 3D nature of the scene,… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  2. arXiv:2407.20600  [pdf, other

    cs.CV

    Knowledge Fused Recognition: Fusing Hierarchical Knowledge for Image Recognition through Quantitative Relativity Modeling and Deep Metric Learning

    Authors: Yunfeng Zhao, Huiyu Zhou, Fei Wu, Xifeng Wu

    Abstract: Image recognition is an essential baseline for deep metric learning. Hierarchical knowledge about image classes depicts inter-class similarities or dissimilarities. Effective fusion of hierarchical knowledge about image classes to enhance image recognition remains a challenging topic to advance. In this paper, we propose a novel deep metric learning based method to effectively fuse hierarchical pr… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  3. arXiv:2407.19941  [pdf, other

    cs.LG

    Boosting Graph Foundation Model from Structural Perspective

    Authors: Yao Cheng, Yige Zhao, Jianxiang Yu, Xiang Li

    Abstract: Graph foundation models have recently attracted significant attention due to its strong generalizability. Although existing methods resort to language models to learn unified semantic representations across domains, they disregard the unique structural characteristics of graphs from different domains. To address the problem, in this paper, we boost graph foundation model from structural perspectiv… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  4. arXiv:2407.19775  [pdf, other

    cs.AI cs.CL cs.CR cs.DC

    Model Agnostic Hybrid Sharding For Heterogeneous Distributed Inference

    Authors: Claudio Angione, Yue Zhao, Harry Yang, Ahmad Farhan, Fielding Johnston, James Buban, Patrick Colangelo

    Abstract: The rapid growth of large-scale AI models, particularly large language models has brought significant challenges in data privacy, computational resources, and accessibility. Traditional centralized architectures often struggle to meet required data security and scalability needs which hinders the democratization of AI systems. Nesa introduces a model-agnostic sharding framework designed for decent… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  5. arXiv:2407.19711  [pdf, other

    cs.SE

    TVDiag: A Task-oriented and View-invariant Failure Diagnosis Framework with Multimodal Data

    Authors: Shuaiyu Xie, Jian Wang, Hanbin He, Zhihao Wang, Yuqi Zhao, Neng Zhang, Bing Li

    Abstract: Microservice-based systems often suffer from reliability issues due to their intricate interactions and expanding scale. With the rapid growth of observability techniques, various methods have been proposed to achieve failure diagnosis, including root cause localization and failure type identification, by leveraging diverse monitoring data such as logs, metrics, or traces. However, traditional fai… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 30 pages

  6. arXiv:2407.19672  [pdf, other

    cs.CL

    SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

    Authors: Wenxuan Zhang, Hou Pong Chan, Yiran Zhao, Mahani Aljunied, Jianyu Wang, Chaoqun Liu, Yue Deng, Zhiqiang Hu, Weiwen Xu, Yew Ken Chia, Xin Li, Lidong Bing

    Abstract: Large Language Models (LLMs) have shown remarkable abilities across various tasks, yet their development has predominantly centered on high-resource languages like English and Chinese, leaving low-resource languages underserved. To address this disparity, we present SeaLLMs 3, the latest iteration of the SeaLLMs model family, tailored for Southeast Asian languages. This region, characterized by it… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  7. Enhancing CTR Prediction through Sequential Recommendation Pre-training: Introducing the SRP4CTR Framework

    Authors: Ruidong Han, Qianzhong Li, He Jiang, Rui Li, Yurou Zhao, Xiang Li, Wei Lin

    Abstract: Understanding user interests is crucial for Click-Through Rate (CTR) prediction tasks. In sequential recommendation, pre-training from user historical behaviors through self-supervised learning can better comprehend user dynamic preferences, presenting the potential for direct integration with CTR tasks. Previous methods have integrated pre-trained models into downstream tasks with the sole purpos… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  8. arXiv:2407.19414  [pdf, other

    cs.AI

    Appformer: A Novel Framework for Mobile App Usage Prediction Leveraging Progressive Multi-Modal Data Fusion and Feature Extraction

    Authors: Chuike Sun, Junzhou Chen, Yue Zhao, Hao Han, Ruihai Jing, Guang Tan, Di Wu

    Abstract: This article presents Appformer, a novel mobile application prediction framework inspired by the efficiency of Transformer-like architectures in processing sequential data through self-attention mechanisms. Combining a Multi-Modal Data Progressive Fusion Module with a sophisticated Feature Extraction Module, Appformer leverages the synergies of multi-modal data fusion and data mining techniques wh… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  9. arXiv:2407.19401  [pdf, other

    cs.CR cs.AI

    Complete Security and Privacy for AI Inference in Decentralized Systems

    Authors: Hongyang Zhang, Yue Zhao, Claudio Angione, Harry Yang, James Buban, Ahmad Farhan, Fielding Johnston, Patrick Colangelo

    Abstract: The need for data security and model integrity has been accentuated by the rapid adoption of AI and ML in data-driven domains including healthcare, finance, and security. Large models are crucial for tasks like diagnosing diseases and forecasting finances but tend to be delicate and not very scalable. Decentralized systems solve this issue by distributing the workload and reducing central points o… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: 25 pages, 5 figures

  10. arXiv:2407.19053  [pdf, other

    cs.SE

    A Study of Using Multimodal LLMs for Non-Crash Functional Bug Detection in Android Apps

    Authors: Bangyan Ju, Jin Yang, Tingting Yu, Tamerlan Abdullayev, Yuanyuan Wu, Dingbang Wang, Yu Zhao

    Abstract: Numerous approaches employing various strategies have been developed to test the graphical user interfaces (GUIs) of mobile apps. However, traditional GUI testing techniques, such as random and model-based testing, primarily focus on generating test sequences that excel in achieving high code coverage but often fail to act as effective test oracles for non-crash functional (NCF) bug detection. To… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  11. arXiv:2407.18827  [pdf

    cs.IR cs.AI

    Human-artificial intelligence teaming for scientific information extraction from data-driven additive manufacturing research using large language models

    Authors: Mutahar Safdar, Jiarui Xie, Andrei Mircea, Yaoyao Fiona Zhao

    Abstract: Data-driven research in Additive Manufacturing (AM) has gained significant success in recent years. This has led to a plethora of scientific literature to emerge. The knowledge in these works consists of AM and Artificial Intelligence (AI) contexts that have not been mined and formalized in an integrated way. It requires substantial effort and time to extract scientific information from these work… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 11 pages, 5 Figures, 3 Tables. This paper has been accepted to be published in the proceedings of IDETC-CIE 2024

  12. arXiv:2407.18637  [pdf, other

    cs.CV

    DynamicTrack: Advancing Gigapixel Tracking in Crowded Scenes

    Authors: Yunqi Zhao, Yuchen Guo, Zheng Cao, Kai Ni, Ruqi Huang, Lu Fang

    Abstract: Tracking in gigapixel scenarios holds numerous potential applications in video surveillance and pedestrian analysis. Existing algorithms attempt to perform tracking in crowded scenes by utilizing multiple cameras or group relationships. However, their performance significantly degrades when confronted with complex interaction and occlusion inherent in gigapixel images. In this paper, we introduce… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  13. arXiv:2407.17745  [pdf, other

    cs.CL

    Beyond Entity Alignment: Towards Complete Knowledge Graph Alignment via Entity-Relation Synergy

    Authors: Xiaohan Fang, Chaozhuo Li, Yi Zhao, Qian Zang, Litian Zhang, Jiquan Peng, Xi Zhang, Jibing Gong

    Abstract: Knowledge Graph Alignment (KGA) aims to integrate knowledge from multiple sources to address the limitations of individual Knowledge Graphs (KGs) in terms of coverage and depth. However, current KGA models fall short in achieving a ``complete'' knowledge graph alignment. Existing models primarily emphasize the linkage of cross-graph entities but overlook aligning relations across KGs, thereby prov… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  14. arXiv:2407.17406  [pdf, other

    cs.CL cs.AI

    Dependency Transformer Grammars: Integrating Dependency Structures into Transformer Language Models

    Authors: Yida Zhao, Chao Lou, Kewei Tu

    Abstract: Syntactic Transformer language models aim to achieve better generalization through simultaneously modeling syntax trees and sentences. While prior work has been focusing on adding constituency-based structures to Transformers, we introduce Dependency Transformer Grammars (DTGs), a new class of Transformer language model with explicit dependency-based inductive bias. DTGs simulate dependency transi… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  15. arXiv:2407.17229  [pdf, other

    cs.CV

    LPGen: Enhancing High-Fidelity Landscape Painting Generation through Diffusion Model

    Authors: Wanggong Yang, Xiaona Wang, Yingrui Qiu, Yifei Zhao

    Abstract: Generating landscape paintings expands the possibilities of artistic creativity and imagination. Traditional landscape painting methods involve using ink or colored ink on rice paper, which requires substantial time and effort. These methods are susceptible to errors and inconsistencies and lack precise control over lines and colors. This paper presents LPGen, a high-fidelity, controllable model f… ▽ More

    Submitted 25 July, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

  16. arXiv:2407.17190  [pdf, other

    cs.CE

    Fusing LLMs and KGs for Formal Causal Reasoning behind Financial Risk Contagion

    Authors: Guanyuan Yu, Xv Wang, Qing Li, Yu Zhao

    Abstract: Financial risks trend to spread from one entity to another, ultimately leading to systemic risks. The key to preventing such risks lies in understanding the causal chains behind risk contagion. Despite this, prevailing approaches primarily emphasize identifying risks, overlooking the underlying causal analysis of risk. To address such an issue, we propose a Risk Contagion Causal Reasoning model ca… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  17. arXiv:2407.17140  [pdf, ps, other

    cs.CV

    RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer

    Authors: Wenyu Lv, Yian Zhao, Qinyao Chang, Kui Huang, Guanzhong Wang, Yi Liu

    Abstract: In this report, we present RT-DETRv2, an improved Real-Time DEtection TRansformer (RT-DETR). RT-DETRv2 builds upon the previous state-of-the-art real-time detector, RT-DETR, and opens up a set of bag-of-freebies for flexibility and practicality, as well as optimizing the training strategy to achieve enhanced performance. To improve the flexibility, we suggest setting a distinct number of sampling… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  18. arXiv:2407.16940  [pdf, other

    cs.LG q-bio.GN

    GV-Rep: A Large-Scale Dataset for Genetic Variant Representation Learning

    Authors: Zehui Li, Vallijah Subasri, Guy-Bart Stan, Yiren Zhao, Bo Wang

    Abstract: Genetic variants (GVs) are defined as differences in the DNA sequences among individuals and play a crucial role in diagnosing and treating genetic diseases. The rapid decrease in next generation sequencing cost has led to an exponential increase in patient-level GV data. This growth poses a challenge for clinicians who must efficiently prioritize patient-specific GVs and integrate them with exist… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Preprint

  19. arXiv:2407.15648  [pdf, other

    cs.CV

    TreeSBA: Tree-Transformer for Self-Supervised Sequential Brick Assembly

    Authors: Mengqi Guo, Chen Li, Yuyang Zhao, Gim Hee Lee

    Abstract: Inferring step-wise actions to assemble 3D objects with primitive bricks from images is a challenging task due to complex constraints and the vast number of possible combinations. Recent studies have demonstrated promising results on sequential LEGO brick assembly through the utilization of LEGO-Graph modeling to predict sequential actions. However, existing approaches are class-specific and requi… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  20. arXiv:2407.15569  [pdf, other

    cs.CL

    An Empirical Study of Retrieval Augmented Generation with Chain-of-Thought

    Authors: Yuetong Zhao, Hongyu Cao, Xianyu Zhao, Zhijian Ou

    Abstract: Since the launch of ChatGPT at the end of 2022, generative dialogue models represented by ChatGPT have quickly become essential tools in daily life. As user expectations increase, enhancing the capability of generative dialogue models to solve complex problems has become a focal point of current research. This paper delves into the effectiveness of the RAFT (Retrieval Augmented Fine-Tuning) method… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures

  21. arXiv:2407.15476  [pdf, other

    cs.LG cs.IR

    MODRL-TA:A Multi-Objective Deep Reinforcement Learning Framework for Traffic Allocation in E-Commerce Search

    Authors: Peng Cheng, Huimu Wang, Jinyuan Zhao, Yihao Wang, Enqiang Xu, Yu Zhao, Zhuojian Xiao, Songlin Wang, Guoyu Tang, Lin Liu, Sulong Xu

    Abstract: Traffic allocation is a process of redistributing natural traffic to products by adjusting their positions in the post-search phase, aimed at effectively fostering merchant growth, precisely meeting customer demands, and ensuring the maximization of interests across various parties within e-commerce platforms. Existing methods based on learning to rank neglect the long-term value of traffic alloca… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  22. arXiv:2407.15212  [pdf, other

    cs.CV cs.GR

    Surfel-based Gaussian Inverse Rendering for Fast and Relightable Dynamic Human Reconstruction from Monocular Video

    Authors: Yiqun Zhao, Chenming Wu, Binbin Huang, Yihao Zhi, Chen Zhao, Jingdong Wang, Shenghua Gao

    Abstract: Efficient and accurate reconstruction of a relightable, dynamic clothed human avatar from a monocular video is crucial for the entertainment industry. This paper introduces the Surfel-based Gaussian Inverse Avatar (SGIA) method, which introduces efficient training and rendering for relightable dynamic human reconstruction. SGIA advances previous Gaussian Avatar methods by comprehensively modeling… ▽ More

    Submitted 23 July, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

    Comments: Under Review; Project Page: https://GS-IA.github.io

  23. arXiv:2407.15196  [pdf, ps, other

    eess.SP cs.IT

    Channel Shaping Using Beyond Diagonal Reconfigurable Intelligent Surface: Analysis, Optimization, and Enhanced Flexibility

    Authors: Yang Zhao, Hongyu Li, Massimo Franceschetti, Bruno Clerckx

    Abstract: This paper investigates the capability of a passive Reconfigurable Intelligent Surface (RIS) to redistribute the singular values of point-to-point Multiple-Input Multiple-Output (MIMO) channels for achieving power and rate gains. We depart from the conventional Diagonal (D)-RIS with diagonal phase shift matrix and adopt a Beyond Diagonal (BD) architecture that offers greater wave manipulation flex… ▽ More

    Submitted 23 July, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. This version fixes render issue on Fig. 3

  24. arXiv:2407.14814  [pdf, other

    cs.LG

    FMamba: Mamba based on Fast-attention for Multivariate Time-series Forecasting

    Authors: Shusen Ma, Yu Kang, Peng Bai, Yun-Bo Zhao

    Abstract: In multivariate time-series forecasting (MTSF), extracting the temporal correlations of the input sequences is crucial. While popular Transformer-based predictive models can perform well, their quadratic computational complexity results in inefficiency and high overhead. The recently emerged Mamba, a selective state space model, has shown promising results in many fields due to its strong temporal… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  25. arXiv:2407.14800  [pdf, other

    eess.AS cs.SD eess.SP

    Towards Realistic Emotional Voice Conversion using Controllable Emotional Intensity

    Authors: Tianhua Qi, Shiyan Wang, Cheng Lu, Yan Zhao, Yuan Zong, Wenming Zheng

    Abstract: Realistic emotional voice conversion (EVC) aims to enhance emotional diversity of converted audios, making the synthesized voices more authentic and natural. To this end, we propose Emotional Intensity-aware Network (EINet), dynamically adjusting intonation and rhythm by incorporating controllable emotional intensity. To better capture nuances in emotional intensity, we go beyond mere distance mea… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: Accepted to INTERSPEECH2024

  26. arXiv:2407.13734  [pdf, other

    cs.LG cs.AI q-bio.QM stat.ML

    Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review

    Authors: Masatoshi Uehara, Yulai Zhao, Tommaso Biancalani, Sergey Levine

    Abstract: This tutorial provides a comprehensive survey of methods for fine-tuning diffusion models to optimize downstream reward functions. While diffusion models are widely known to provide excellent generative modeling capability, practical applications in domains such as biology require generating samples that maximize some desired metric (e.g., translation efficiency in RNA, docking score in molecules,… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: We plan to add more content/codes. Please let us know if there are any comments

  27. arXiv:2407.13515  [pdf, other

    cs.HC

    CookAR: Affordance Augmentations in Wearable AR to Support Kitchen Tool Interactions for People with Low Vision

    Authors: Jaewook Lee, Andrew D. Tjahjadi, Jiho Kim, Junpu Yu, Minji Park, Jiawen Zhang, Jon E. Froehlich, Yapeng Tian, Yuhang Zhao

    Abstract: Cooking is a central activity of daily living, supporting independence as well as mental and physical health. However, prior work has highlighted key barriers for people with low vision (LV) to cook, particularly around safely interacting with tools, such as sharp knives or hot pans. Drawing on recent advancements in computer vision (CV), we present CookAR, a head-mounted AR system with real-time… ▽ More

    Submitted 27 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

  28. arXiv:2407.13274  [pdf, other

    cs.IR

    Aligning Explanations for Recommendation with Rating and Feature via Maximizing Mutual Information

    Authors: Yurou Zhao, Yiding Sun, Ruidong Han, Fei Jiang, Lu Guan, Xiang Li, Wei Lin, Jiaxin Mao

    Abstract: Providing natural language-based explanations to justify recommendations helps to improve users' satisfaction and gain users' trust. However, as current explanation generation methods are commonly trained with an objective to mimic existing user reviews, the generated explanations are often not aligned with the predicted ratings or some important features of the recommended items, and thus, are su… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: this paper has been accepted by cikm2024, and the camera-ready version will be updated soon

  29. arXiv:2407.13112  [pdf

    cs.CY cs.AI cs.LG

    Improvement of Applicability in Student Performance Prediction Based on Transfer Learning

    Authors: Yan Zhao

    Abstract: Predicting student performance under varying data distributions is a challenging task. This study proposes a method to improve prediction accuracy by employing transfer learning techniques on the dataset with varying distributions. Using datasets from mathematics and Portuguese language courses, the model was trained and evaluated to enhance its generalization ability and prediction accuracy. The… ▽ More

    Submitted 1 June, 2024; originally announced July 2024.

  30. arXiv:2407.12973  [pdf, other

    cs.CV cs.AI

    Temporal Label Hierachical Network for Compound Emotion Recognition

    Authors: Sunan Li, Hailun Lian, Cheng Lu, Yan Zhao, Tianhua Qi, Hao Yang, Yuan Zong, Wenming Zheng

    Abstract: The emotion recognition has attracted more attention in recent decades. Although significant progress has been made in the recognition technology of the seven basic emotions, existing methods are still hard to tackle compound emotion recognition that occurred commonly in practical application. This article introduces our achievements in the 7th Field Emotion Behavior Analysis (ABAW) competition. I… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: draft for abaw7

  31. arXiv:2407.12871  [pdf, other

    cs.CL cs.AI cs.LG

    MetaTool: Facilitating Large Language Models to Master Tools with Meta-task Augmentation

    Authors: Xiaohan Wang, Dian Li, Yilin Zhao, Sinbadliu, Hui Wang

    Abstract: Utilizing complex tools with Large Language Models (LLMs) is a critical component for grounding AI agents in various real-world scenarios. The core challenge of manipulating tools lies in understanding their usage and functionality. The prevailing approach involves few-shot prompting with demonstrations or fine-tuning on expert trajectories. However, for complex tools and tasks, mere in-context de… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures

  32. arXiv:2407.11615  [pdf, other

    cs.LG cs.AI

    Graph Dimension Attention Networks for Enterprise Credit Assessment

    Authors: Shaopeng Wei, Beni Egressy, Xingyan Chen, Yu Zhao, Fuzhen Zhuang, Roger Wattenhofer, Gang Kou

    Abstract: Enterprise credit assessment is critical for evaluating financial risk, and Graph Neural Networks (GNNs), with their advanced capability to model inter-entity relationships, are a natural tool to get a deeper understanding of these financial networks. However, existing GNN-based methodologies predominantly emphasize entity-level attention mechanisms for contagion risk aggregation, often overlookin… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  33. arXiv:2407.11431  [pdf

    cs.CV

    MRIo3DS-Net: A Mutually Reinforcing Images to 3D Surface RNN-like framework for model-adaptation indoor 3D reconstruction

    Authors: Chang Li, Jiao Guo, Yufei Zhao, Yongjun Zhang

    Abstract: This paper is the first to propose an end-to-end framework of mutually reinforcing images to 3D surface recurrent neural network-like for model-adaptation indoor 3D reconstruction,where multi-view dense matching and point cloud surface optimization are mutually reinforced by a RNN-like structure rather than being treated as a separate issue.The characteristics are as follows:In the multi-view dens… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  34. arXiv:2407.11253  [pdf, other

    cs.LG cs.CE

    Separable Operator Networks

    Authors: Xinling Yu, Sean Hooten, Ziyue Liu, Yequan Zhao, Marco Fiorentino, Thomas Van Vaerenbergh, Zheng Zhang

    Abstract: Operator learning has become a powerful tool in machine learning for modeling complex physical systems. Although Deep Operator Networks (DeepONet) show promise, they require extensive data acquisition. Physics-informed DeepONets (PI-DeepONet) mitigate data scarcity but suffer from inefficient training processes. We introduce Separable Operator Networks (SepONet), a novel framework that significant… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  35. arXiv:2407.11060  [pdf

    cs.LG math-ph

    A review of graph neural network applications in mechanics-related domains

    Authors: Yingxue Zhao, Haoran Li, Haosu Zhou, Hamid Reza Attar, Tobias Pfaff, Nan Li

    Abstract: Mechanics-related problems often present unique challenges in achieving accurate geometric and physical representations, particularly for non-uniform structures. Graph neural networks (GNNs) have emerged as a promising tool to tackle these challenges by adeptly learning from graph data with irregular underlying structures. Consequently, recent years have witnessed a surge in complex mechanics-rela… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 28 pages, 10 figures, 4 tables

  36. arXiv:2407.10943  [pdf, other

    cs.RO cs.CV

    GRUtopia: Dream General Robots in a City at Scale

    Authors: Hanqing Wang, Jiahe Chen, Wensi Huang, Qingwei Ben, Tai Wang, Boyu Mi, Tao Huang, Siheng Zhao, Yilun Chen, Sizhe Yang, Peizhou Cao, Wenye Yu, Zichao Ye, Jialun Li, Junfeng Long, Zirui Wang, Huiling Wang, Ying Zhao, Zhongying Tu, Yu Qiao, Dahua Lin, Jiangmiao Pang

    Abstract: Recent works have been exploring the scaling laws in the field of Embodied AI. Given the prohibitive costs of collecting real-world data, we believe the Simulation-to-Real (Sim2Real) paradigm is a crucial step for scaling the learning of embodied models. This paper introduces project GRUtopia, the first simulated interactive 3D society designed for various robots. It features several advancements:… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  37. arXiv:2407.10548  [pdf, other

    cs.IT

    Fluid Antenna Multiple Access Assisted Integrated Data and Energy Transfer: Outage and Multiplexing Gain Analysis

    Authors: Xiao Lin, Yizhe Zhao, Halvin Yang, Jie Hu, Kai-Kit Wong

    Abstract: Fluid antenna multiple access (FAMA) exploits the spatial opportunities in wireless channels to overcome multiuser interference by position (a.k.a.~port) switching, which can achieve better performance compared to traditional fixed multiple-input multiple-output (MIMO) systems. Additionally, integrated data and energy transfer (IDET) is capable of providing both wireless data transfer (WDT) and wi… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: submitted to IEEE journal for possible publication

  38. arXiv:2407.10424  [pdf, other

    cs.PL cs.AI

    CodeV: Empowering LLMs for Verilog Generation through Multi-Level Summarization

    Authors: Yang Zhao, Di Huang, Chongxiao Li, Pengwei Jin, Ziyuan Nan, Tianyun Ma, Lei Qi, Yansong Pan, Zhenxing Zhang, Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Xing Hu, Yunji Chen

    Abstract: The increasing complexity and high costs associated with modern processor design have led to a surge in demand for processor design automation. Instruction-tuned large language models (LLMs) have demonstrated remarkable performance in automatically generating code for general-purpose programming languages like Python. However, these methods fail on hardware description languages (HDLs) like Verilo… ▽ More

    Submitted 20 July, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

    Comments: 16 pages, 8 figures, conference

  39. arXiv:2407.09344  [pdf, other

    cs.CV

    Pre-training Point Cloud Compact Model with Partial-aware Reconstruction

    Authors: Yaohua Zha, Yanzi Wang, Tao Dai, Shu-Tao Xia

    Abstract: The pre-trained point cloud model based on Masked Point Modeling (MPM) has exhibited substantial improvements across various tasks. However, two drawbacks hinder their practical application. Firstly, the positional embedding of masked patches in the decoder results in the leakage of their central coordinates, leading to limited 3D representations. Secondly, the excessive model size of existing MPM… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2405.17149

  40. arXiv:2407.08422  [pdf, other

    cs.CR cs.AI

    On the (In)Security of LLM App Stores

    Authors: Xinyi Hou, Yanjie Zhao, Haoyu Wang

    Abstract: LLM app stores have seen rapid growth, leading to the proliferation of numerous custom LLM apps. However, this expansion raises security concerns. In this study, we propose a three-layer concern framework to identify the potential security risks of LLM apps, i.e., LLM apps with abusive potential, LLM apps with malicious intent, and LLM apps with exploitable vulnerabilities. Over five months, we co… ▽ More

    Submitted 29 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  41. arXiv:2407.08273   

    cs.CL

    RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL

    Authors: Zhenhe Wu, Zhongqiu Li, Jie Zhang, Mengxiang Li, Yu Zhao, Ruiyu Fang, Zhongjiang He, Xuelong Li, Zhoujun Li, Shuangyong Song

    Abstract: Large language models (LLMs) with in-context learning have significantly improved the performance of text-to-SQL task. Previous works generally focus on using exclusive SQL generation prompt to improve the LLMs' reasoning ability. However, they are mostly hard to handle large databases with numerous tables and columns, and usually ignore the significance of pre-processing database and extracting v… ▽ More

    Submitted 12 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: Further improvement and modification are needed.

  42. arXiv:2407.07503  [pdf, other

    cs.CV cs.IR

    Inter and Intra Prior Learning-based Hyperspectral Image Reconstruction Using Snapshot SWIR Metasurface

    Authors: Linqiang Li, Jinglei Hao, Yongqiang Zhao, Pan Liu, Haofang Yan, Ziqin Zhang, Seong G. Kong

    Abstract: Shortwave-infrared(SWIR) spectral information, ranging from 1 μm to 2.5μm, overcomes the limitations of traditional color cameras in acquiring scene information. However, conventional SWIR hyperspectral imaging systems face challenges due to their bulky setups and low acquisition speeds. This work introduces a snapshot SWIR hyperspectral imaging system based on a metasurface filter and a correspon… ▽ More

    Submitted 24 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: 12 pages,9 figures

  43. arXiv:2407.07311  [pdf

    cs.LG cs.AI cs.CV

    ViTime: A Visual Intelligence-Based Foundation Model for Time Series Forecasting

    Authors: Luoxiao Yang, Yun Wang, Xinqi Fan, Israel Cohen, Yue Zhao, Zijun Zhang

    Abstract: The success of large pretrained models in natural language processing (NLP) and computer vision (CV) has opened new avenues for constructing foundation models for time series forecasting (TSF). Traditional TSF foundation models rely heavily on numerical data fitting. In contrast, the human brain is inherently skilled at processing visual information, prefer predicting future trends by observing vi… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  44. arXiv:2407.07179  [pdf, other

    hep-ex cs.LG

    TrackFormers: In Search of Transformer-Based Particle Tracking for the High-Luminosity LHC Era

    Authors: Sascha Caron, Nadezhda Dobreva, Antonio Ferrer Sánchez, José D. Martín-Guerrero, Uraz Odyurt, Roberto Ruiz de Austri Bazan, Zef Wolffs, Yue Zhao

    Abstract: High-Energy Physics experiments are facing a multi-fold data increase with every new iteration. This is certainly the case for the upcoming High-Luminosity LHC upgrade. Such increased data processing requirements forces revisions to almost every step of the data processing pipeline. One such step in need of an overhaul is the task of particle track reconstruction, a.k.a., tracking. A Machine Learn… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  45. arXiv:2407.06939  [pdf, other

    cs.RO cs.CV

    Towards Open-World Mobile Manipulation in Homes: Lessons from the Neurips 2023 HomeRobot Open Vocabulary Mobile Manipulation Challenge

    Authors: Sriram Yenamandra, Arun Ramachandran, Mukul Khanna, Karmesh Yadav, Jay Vakil, Andrew Melnik, Michael Büttner, Leon Harz, Lyon Brown, Gora Chand Nandi, Arjun PS, Gaurav Kumar Yadav, Rahul Kala, Robert Haschke, Yang Luo, Jinxin Zhu, Yansen Han, Bingyi Lu, Xuan Gu, Qinyuan Liu, Yaping Zhao, Qiting Ye, Chenxiao Dou, Yansong Chua, Volodymyr Kuzma , et al. (20 additional authors not shown)

    Abstract: In order to develop robots that can effectively serve as versatile and capable home assistants, it is crucial for them to reliably perceive and interact with a wide variety of objects across diverse environments. To this end, we proposed Open Vocabulary Mobile Manipulation as a key benchmark task for robotics: finding any object in a novel environment and placing it on any receptacle surface withi… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  46. arXiv:2407.06931  [pdf, other

    cs.RO

    A Unified Approach to Multi-task Legged Navigation: Temporal Logic Meets Reinforcement Learning

    Authors: Jesse Jiang, Samuel Coogan, Ye Zhao

    Abstract: This study examines the problem of hopping robot navigation planning to achieve simultaneous goal-directed and environment exploration tasks. We consider a scenario in which the robot has mandatory goal-directed tasks defined using Linear Temporal Logic (LTL) specifications as well as optional exploration tasks represented using a reward function. Additionally, there exists uncertainty in the robo… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures

  47. arXiv:2407.06881  [pdf, other

    cs.DS

    Efficient Stochastic Routing in Path-Centric Uncertain Road Networks -- Extended Version

    Authors: Chenjuan Guo, Ronghui Xu, Bin Yang, Ye Yuan, Tung Kieu, Yan Zhao, Christian S. Jensen

    Abstract: The availability of massive vehicle trajectory data enables the modeling of road-network constrained movement as travel-cost distributions rather than just single-valued costs, thereby capturing the inherent uncertainty of movement and enabling improved routing quality. Thus, stochastic routing has been studied extensively in the edge-centric model, where such costs are assigned to the edges in a… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  48. arXiv:2407.06833  [pdf, other

    q-bio.QM cs.CV eess.IV

    Training-free CryoET Tomogram Segmentation

    Authors: Yizhou Zhao, Hengwei Bian, Michael Mu, Mostofa R. Uddin, Zhenyang Li, Xiang Li, Tianyang Wang, Min Xu

    Abstract: Cryogenic Electron Tomography (CryoET) is a useful imaging technology in structural biology that is hindered by its need for manual annotations, especially in particle picking. Recent works have endeavored to remedy this issue with few-shot learning or contrastive learning techniques. However, supervised training is still inevitable for them. We instead choose to leverage the power of existing 2D… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: This preprint has not undergone peer review or any post-submission improvements or corrections. The Version of Record of this contribution will be published in MICCAI 2024

  49. arXiv:2407.06573  [pdf, other

    cs.SE

    LLM for Mobile: An Initial Roadmap

    Authors: Daihang Chen, Yonghui Liu, Mingyi Zhou, Yanjie Zhao, Haoyu Wang, Shuai Wang, Xiao Chen, Tegawendé F. Bissyandé, Jacques Klein, Li Li

    Abstract: When mobile meets LLMs, mobile app users deserve to have more intelligent usage experiences. For this to happen, we argue that there is a strong need to appl LLMs for the mobile ecosystem. We therefore provide a research roadmap for guiding our fellow researchers to achieve that as a whole. In this roadmap, we sum up six directions that we believe are urgently required for research to enable nativ… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  50. arXiv:2407.06512  [pdf

    cs.CV cs.AI

    LuSNAR:A Lunar Segmentation, Navigation and Reconstruction Dataset based on Muti-sensor for Autonomous Exploration

    Authors: Jiayi Liu, Qianyu Zhang, Xue Wan, Shengyang Zhang, Yaolin Tian, Haodong Han, Yutao Zhao, Baichuan Liu, Zeyuan Zhao, Xubo Luo

    Abstract: With the complexity of lunar exploration missions, the moon needs to have a higher level of autonomy. Environmental perception and navigation algorithms are the foundation for lunar rovers to achieve autonomous exploration. The development and verification of algorithms require highly reliable data support. Most of the existing lunar datasets are targeted at a single task, lacking diverse scenes a… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 22 pages, 11 figures, 9 tables