Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 142 results for author: Fan, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03504  [pdf, other

    cs.IR

    HGAMN: Heterogeneous Graph Attention Matching Network for Multilingual POI Retrieval at Baidu Maps

    Authors: Jizhou Huang, Haifeng Wang, Yibo Sun, Miao Fan, Zhengjie Huang, Chunyuan Yuan, Yawen Li

    Abstract: The increasing interest in international travel has raised the demand of retrieving point of interests in multiple languages. This is even superior to find local venues such as restaurants and scenic spots in unfamiliar languages when traveling abroad. Multilingual POI retrieval, enabling users to find desired POIs in a demanded language using queries in numerous languages, has become an indispens… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted by KDD'21

  2. arXiv:2409.03449  [pdf, other

    cs.IR

    MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu's Sponsored Search

    Authors: Miao Fan, Jiacheng Guo, Shuai Zhu, Shuo Miao, Mingming Sun, Ping Li

    Abstract: Baidu runs the largest commercial web search engine in China, serving hundreds of millions of online users every day in response to a great variety of queries. In order to build a high-efficiency sponsored search engine, we used to adopt a three-layer funnel-shaped structure to screen and sort hundreds of ads from billions of ad candidates subject to the requirement of low response latency and the… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted by KDD'19

  3. arXiv:2409.03445  [pdf, other

    cs.RO

    Neural HD Map Generation from Multiple Vectorized Tiles Locally Produced by Autonomous Vehicles

    Authors: Miao Fan, Yi Yao, Jianping Zhang, Xiangbo Song, Daihui Wu

    Abstract: High-definition (HD) map is a fundamental component of autonomous driving systems, as it can provide precise environmental information about driving scenes. Recent work on vectorized map generation could produce merely 65% local map elements around the ego-vehicle at runtime by one tour with onboard sensors, leaving a puzzle of how to construct a global HD map projected in the world coordinate sys… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted by SpatialDI'24

  4. arXiv:2409.03271  [pdf, other

    cs.AI cs.CL cs.HC

    Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation

    Authors: Yu Wang, Shiwan Zhao, Zhihu Wang, Heyuan Huang, Ming Fan, Yubo Zhang, Zhixing Wang, Haijun Wang, Ting Liu

    Abstract: The Chain-of-Thought (CoT) paradigm has emerged as a critical approach for enhancing the reasoning capabilities of large language models (LLMs). However, despite their widespread adoption and success, CoT methods often exhibit instability due to their inability to consistently ensure the quality of generated reasoning paths, leading to sub-optimal reasoning performance. To address this challenge,… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  5. arXiv:2409.00587  [pdf, other

    cs.SD cs.CV eess.AS

    FLUX that Plays Music

    Authors: Zhengcong Fei, Mingyuan Fan, Changqian Yu, Junshi Huang

    Abstract: This paper explores a simple extension of diffusion-based rectified flow Transformers for text-to-music generation, termed as FluxMusic. Generally, along with design in advanced Flux\footnote{https://github.com/black-forest-labs/flux} model, we transfers it into a latent VAE space of mel-spectrum. It involves first applying a sequence of independent attention to the double text-music stream, follo… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

  6. arXiv:2409.00116  [pdf, other

    cs.CL cs.LG

    FedMCP: Parameter-Efficient Federated Learning with Model-Contrastive Personalization

    Authors: Qianyi Zhao, Chen Qu, Cen Chen, Mingyuan Fan, Yanhao Wang

    Abstract: With increasing concerns and regulations on data privacy, fine-tuning pretrained language models (PLMs) in federated learning (FL) has become a common paradigm for NLP tasks. Despite being extensively studied, the existing methods for this problem still face two primary challenges. First, the huge number of parameters in large-scale PLMs leads to excessive communication and computational overhead.… ▽ More

    Submitted 28 August, 2024; originally announced September 2024.

  7. arXiv:2408.13690  [pdf, other

    cs.LG

    Understanding Uncertainty-based Active Learning Under Model Mismatch

    Authors: Amir Hossein Rahmati, Mingzhou Fan, Ruida Zhou, Nathan M. Urban, Byung-Jun Yoon, Xiaoning Qian

    Abstract: Instead of randomly acquiring training data points, Uncertainty-based Active Learning (UAL) operates by querying the label(s) of pivotal samples from an unlabeled pool selected based on the prediction uncertainty, thereby aiming at minimizing the labeling cost for model training. The efficacy of UAL critically depends on the model capacity as well as the adopted uncertainty-based acquisition funct… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  8. arXiv:2408.08018  [pdf, other

    cs.HC

    Investigating Size Congruency Between the Visual Perception of a VR Object and the Haptic Perception of Its Physical World Agent

    Authors: Wenqi Zheng, Dawei Xiong, Cekai Weng, Jiajun Jiang, Junwei Li, Jinni Zhou, Mingming Fan

    Abstract: The perception of physical objects and miniatures enhances the realism and immersion in VR. This work explores the relationship between haptic feedback from real objects and their visual representations in VR. The study examines how users confirm and adjust the sizes of different virtual objects. The results show that as the size of the virtual cubes increases, users are less likely to perceive th… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 8 pages, 6 figures, VINCI 2024

  9. arXiv:2408.06907  [pdf, other

    cs.IT

    An Information Geometry Interpretation for Approximate Message Passing

    Authors: Bingyan Liu, An-An Lu, Mingrui Fan, Jiyuan Yang, Xiqi Gao

    Abstract: In this paper, we propose an information geometry (IG) framework to solve the standard linear regression problem. The proposed framework is an extension of the one for computing the mean of complex multivariate Gaussian distribution. By applying the proposed framework, the information geometry approach (IGA) and the approximate information geometry approach (AIGA) for basis pursuit de-noising (BPD… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 30 pages, 5 figures

  10. arXiv:2408.06107  [pdf, other

    cs.HC

    Augmented Library: Toward Enriching Physical Library Experience Using HMD-Based Augmented Reality

    Authors: Qianjie Wei, Jingling Zhang, Pengqi Wang, Xiaofu Jin, Mingming Fan

    Abstract: Despite the rise of digital libraries and online reading platforms, physical libraries still offer unique benefits for education and community engagement. However, due to the convenience of digital resources, physical library visits, especially by college students, have declined. This underscores the need to better engage these users. Augmented Reality (AR) could potentially bridge the gap between… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: 5 pages, 3 figures

  11. arXiv:2408.05885  [pdf, other

    cs.LG stat.ML

    GFlowNet Training by Policy Gradients

    Authors: Puhua Niu, Shili Wu, Mingzhou Fan, Xiaoning Qian

    Abstract: Generative Flow Networks (GFlowNets) have been shown effective to generate combinatorial objects with desired properties. We here propose a new GFlowNet training framework, with policy-dependent rewards, that bridges keeping flow balance of GFlowNets to optimizing the expected accumulated reward in traditional Reinforcement-Learning (RL). This enables the derivation of new policy-based GFlowNet tr… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  12. arXiv:2408.02679  [pdf, other

    cs.LG cs.GR cs.HC stat.ME

    Visual Analysis of Multi-outcome Causal Graphs

    Authors: Mengjie Fan, Jinlu Yu, Daniel Weiskopf, Nan Cao, Huai-Yu Wang, Liang Zhou

    Abstract: We introduce a visual analysis method for multiple causal graphs with different outcome variables, namely, multi-outcome causal graphs. Multi-outcome causal graphs are important in healthcare for understanding multimorbidity and comorbidity. To support the visual analysis, we collaborated with medical experts to devise two comparative visualization techniques at different stages of the analysis pr… ▽ More

    Submitted 25 August, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

  13. arXiv:2408.01014  [pdf, other

    cs.CV

    EIUP: A Training-Free Approach to Erase Non-Compliant Concepts Conditioned on Implicit Unsafe Prompts

    Authors: Die Chen, Zhiwen Li, Mingyuan Fan, Cen Chen, Wenmeng Zhou, Yaliang Li

    Abstract: Text-to-image diffusion models have shown the ability to learn a diverse range of concepts. However, it is worth noting that they may also generate undesirable outputs, consequently giving rise to significant security concerns. Specifically, issues such as Not Safe for Work (NSFW) content and potential violations of style copyright may be encountered. Since image generation is conditioned on text,… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  14. arXiv:2408.00521  [pdf, other

    cs.AI

    A new approach for encoding code and assisting code understanding

    Authors: Mengdan Fan, Wei Zhang, Haiyan Zhao, Zhi Jin

    Abstract: Some companies(e.g., Microsoft Research and Google DeepMind) have discovered some of the limitations of GPTs autoregressive paradigm next-word prediction, manifested in the model lack of planning, working memory, backtracking, and reasoning skills. GPTs rely on a local and greedy process of generating the next word, without a global understanding of the task or the output.We have confirmed the abo… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 10 page, 14 figures

  15. arXiv:2407.11633  [pdf, other

    cs.CV

    Scaling Diffusion Transformers to 16 Billion Parameters

    Authors: Zhengcong Fei, Mingyuan Fan, Changqian Yu, Debang Li, Junshi Huang

    Abstract: In this paper, we present DiT-MoE, a sparse version of the diffusion Transformer, that is scalable and competitive with dense networks while exhibiting highly optimized inference. The DiT-MoE includes two simple designs: shared expert routing and expert-level balance loss, thereby capturing common knowledge and reducing redundancy among the different routed experts. When applied to conditional ima… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  16. arXiv:2407.11073  [pdf, other

    cs.CR cs.CV cs.LG

    SemiAdv: Query-Efficient Black-Box Adversarial Attack with Unlabeled Images

    Authors: Mingyuan Fan, Yang Liu, Cen Chen, Ximeng Liu

    Abstract: Adversarial attack has garnered considerable attention due to its profound implications for the secure deployment of robots in sensitive security scenarios. To potentially push for advances in the field, this paper studies the adversarial attack in the black-box setting and proposes an unlabeled data-driven adversarial attack method, called SemiAdv. Specifically, SemiAdv achieves the following bre… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  17. arXiv:2407.06846  [pdf, other

    cs.HC

    SilverCycling: Exploring the Impact of Bike-Based Locomotion on Spatial Orientation for Older Adults in VR

    Authors: Qiongyan Chen, Zhiqing Wu, Yucheng Liu, Lei Han, Zisu Li, Ge Lin Kan, Mingming Fan

    Abstract: Spatial orientation is essential for people to effectively navigate and interact with the environment in everyday life. With age-related cognitive decline, providing VR locomotion techniques with better spatial orientation performance for older adults becomes important. Such advancements not only make VR more accessible to older adults but also enable them to reap the potential health benefits of… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 19 pages, 6 figures

  18. arXiv:2406.08347  [pdf, other

    cs.RO

    Trajectory optimization of tail-sitter considering speed constraints

    Authors: Mingyue Fan, Fangfang Xie, Tingwei Ji, Yao Zheng

    Abstract: Tail-sitters, with the advantages of both the fixed-wing unmanned aerial vehicles (UAVs) and vertical take-off and landing UAVs, have been widely designed and researched in recent years. With the change in modern UAV application scenarios, it is required that UAVs have fast maneuverable three-dimensional flight capabilities. Due to the highly nonlinear aerodynamics produced by the fuselage and win… ▽ More

    Submitted 23 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  19. arXiv:2406.01159  [pdf, other

    cs.CV

    Dimba: Transformer-Mamba Diffusion Models

    Authors: Zhengcong Fei, Mingyuan Fan, Changqian Yu, Debang Li, Youqiang Zhang, Junshi Huang

    Abstract: This paper unveils Dimba, a new text-to-image diffusion model that employs a distinctive hybrid architecture combining Transformer and Mamba elements. Specifically, Dimba sequentially stacked blocks alternate between Transformer and Mamba layers, and integrate conditional information through the cross-attention layer, thus capitalizing on the advantages of both architectural paradigms. We investig… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  20. arXiv:2405.15780  [pdf, other

    cs.CV cs.LG

    Sequence Length Scaling in Vision Transformers for Scientific Images on Frontier

    Authors: Aristeidis Tsaris, Chengming Zhang, Xiao Wang, Junqi Yin, Siyan Liu, Moetasim Ashfaq, Ming Fan, Jong Youl Choi, Mohamed Wahib, Dan Lu, Prasanna Balaprakash, Feiyi Wang

    Abstract: Vision Transformers (ViTs) are pivotal for foundational models in scientific imagery, including Earth science applications, due to their capability to process large sequence lengths. While transformers for text has inspired scaling sequence lengths in ViTs, yet adapting these for ViTs introduces unique challenges. We develop distributed sequence parallelism for ViTs, enabling them to handle up to… ▽ More

    Submitted 17 April, 2024; originally announced May 2024.

  21. arXiv:2405.03782  [pdf, other

    cs.LG cs.HC

    Interpretable Data Fusion for Distributed Learning: A Representative Approach via Gradient Matching

    Authors: Mengchen Fan, Baocheng Geng, Keren Li, Xueqian Wang, Pramod K. Varshney

    Abstract: This paper introduces a representative-based approach for distributed learning that transforms multiple raw data points into a virtual representation. Unlike traditional distributed learning methods such as Federated Learning, which do not offer human interpretability, our method makes complex machine learning processes accessible and comprehensible. It achieves this by condensing extensive datase… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  22. arXiv:2404.14712  [pdf, other

    physics.ao-ph cs.AI cs.DC eess.IV physics.geo-ph

    ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability

    Authors: Xiao Wang, Siyan Liu, Aristeidis Tsaris, Jong-Youl Choi, Ashwin Aji, Ming Fan, Wei Zhang, Junqi Yin, Moetasim Ashfaq, Dan Lu, Prasanna Balaprakash

    Abstract: Earth system predictability is challenged by the complexity of environmental dynamics and the multitude of variables involved. Current AI foundation models, although advanced by leveraging large and heterogeneous data, are often constrained by their size and data integration, limiting their effectiveness in addressing the full range of Earth system prediction challenges. To overcome these limitati… ▽ More

    Submitted 19 August, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  23. arXiv:2404.13358  [pdf, other

    cs.SD cs.AI eess.AS

    Music Consistency Models

    Authors: Zhengcong Fei, Mingyuan Fan, Junshi Huang

    Abstract: Consistency models have exhibited remarkable capabilities in facilitating efficient image/video generation, enabling synthesis with minimal sampling steps. It has proven to be advantageous in mitigating the computational burdens associated with diffusion models. Nevertheless, the application of consistency models in music generation remains largely unexplored. To address this gap, we present Music… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  24. arXiv:2404.04478  [pdf, other

    cs.CV

    Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models

    Authors: Zhengcong Fei, Mingyuan Fan, Changqian Yu, Debang Li, Junshi Huang

    Abstract: Transformers have catalyzed advancements in computer vision and natural language processing (NLP) fields. However, substantial computational complexity poses limitations for their application in long-context tasks, such as high-resolution image generation. This paper introduces a series of architectures adapted from the RWKV model used in the NLP, with requisite modifications tailored for diffusio… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  25. arXiv:2404.00502  [pdf, other

    cs.LG math.NA

    Conditional Pseudo-Reversible Normalizing Flow for Surrogate Modeling in Quantifying Uncertainty Propagation

    Authors: Minglei Yang, Pengjun Wang, Ming Fan, Dan Lu, Yanzhao Cao, Guannan Zhang

    Abstract: We introduce a conditional pseudo-reversible normalizing flow for constructing surrogate models of a physical model polluted by additive noise to efficiently quantify forward and inverse uncertainty propagation. Existing surrogate modeling approaches usually focus on approximating the deterministic component of physical model. However, this strategy necessitates knowledge of noise and resorts to a… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  26. arXiv:2403.16107  [pdf, other

    cs.HC

    Designing Upper-Body Gesture Interaction with and for People with Spinal Muscular Atrophy in VR

    Authors: Jingze Tian, Yingna Wang, Keye Yu, Liyi Xu, Junan Xie, Franklin Mingzhe Li, Yafeng Niu, Mingming Fan

    Abstract: Recent research proposed gaze-assisted gestures to enhance interaction within virtual reality (VR), providing opportunities for people with motor impairments to experience VR. Compared to people with other motor impairments, those with Spinal Muscular Atrophy (SMA) exhibit enhanced distal limb mobility, providing them with more design space. However, it remains unknown what gaze-assisted upper-bod… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11--16, 2024, Honolulu, HI, USA

  27. arXiv:2403.09326  [pdf, other

    cs.GR cs.AI

    HeadEvolver: Text to Head Avatars via Expressive and Attribute-Preserving Mesh Deformation

    Authors: Duotun Wang, Hengyu Meng, Zeyu Cai, Zhijing Shao, Qianxi Liu, Lin Wang, Mingming Fan, Xiaohang Zhan, Zeyu Wang

    Abstract: We present HeadEvolver, a novel framework to generate stylized head avatars from text guidance. HeadEvolver uses locally learnable mesh deformation from a template head mesh, producing high-quality digital assets for detail-preserving editing and animation. To tackle the challenges of lacking fine-grained and semantic-aware local shape control in global deformation through Jacobians, we introduce… ▽ More

    Submitted 10 June, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: 12 pages, 17 figures

    ACM Class: I.2.6; I.3.8

  28. Typist Experiment: an Investigation of Human-to-Human Dictation via Role-play to Inform Voice-based Text Authoring

    Authors: Can Liu, Siying Hu, Li Feng, Mingming Fan

    Abstract: Voice dictation is increasingly used for text entry, especially in mobile scenarios. However, the speech-based experience gets disrupted when users must go back to a screen and keyboard to review and edit the text. While existing dictation systems focus on improving transcription and error correction, little is known about how to support speech input for the entire text creation process, including… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Journal ref: Proc. ACM Hum.-Comput. Interact. 6, CSCW2, Article 338 (November 2022), 33 pages

  29. To Reach the Unreachable: Exploring the Potential of VR Hand Redirection for Upper Limb Rehabilitation

    Authors: Peixuan Xiong, Yukai Zhang, Nandi Zhang, Shihan Fu, Xin Li, Yadan Zheng, Jinni Zhou, Xiquan Hu, Mingming Fan

    Abstract: Rehabilitation therapies are widely employed to assist people with motor impairments in regaining control over their affected body parts. Nevertheless, factors such as fatigue and low self-efficacy can hinder patient compliance during extensive rehabilitation processes. Utilizing hand redirection in virtual reality (VR) enables patients to accomplish seemingly more challenging tasks, thereby bolst… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11--16, 2024, Honolulu, HI, USA

  30. arXiv:2403.05087  [pdf, other

    cs.GR cs.CV

    SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting

    Authors: Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, Zeyu Wang

    Abstract: We present SplattingAvatar, a hybrid 3D representation of photorealistic human avatars with Gaussian Splatting embedded on a triangle mesh, which renders over 300 FPS on a modern GPU and 30 FPS on a mobile device. We disentangle the motion and appearance of a virtual human with explicit mesh geometry and implicit appearance modeling with Gaussian Splatting. The Gaussians are defined by barycentric… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: [CVPR 2024] Code and data are available at https://github.com/initialneil/SplattingAvatar

  31. LightSword: A Customized Virtual Reality Exergame for Long-Term Cognitive Inhibition Training in Older Adults

    Authors: Qiuxin Du, Zhen Song, Haiyan Jiang, Xiaoying Wei, Dongdong Weng, Mingming Fan

    Abstract: The decline of cognitive inhibition significantly impacts older adults' quality of life and well-being, making it a vital public health problem in today's aging society. Previous research has demonstrated that Virtual reality (VR) exergames have great potential to enhance cognitive inhibition among older adults. However, existing commercial VR exergames were unsuitable for older adults' long-term… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 23 pages

    Journal ref: Proceedings of the CHI Conference on Human Factors in Computing Systems 2024 (CHI '24)

  32. arXiv:2402.15723  [pdf, other

    cs.HC

    FetchAid: Making Parcel Lockers More Accessible to Blind and Low Vision People With Deep-learning Enhanced Touchscreen Guidance, Error-Recovery Mechanism, and AR-based Search Support

    Authors: Zhitong Guan, Zeyu Xiong, Mingming Fan

    Abstract: Parcel lockers have become an increasingly prevalent last-mile delivery method. Yet, a recent study revealed its accessibility challenges to blind and low-vision people (BLV). Informed by the study, we designed FetchAid, a standalone intelligent mobile app assisting BLV in using a parcel locker in real-time by integrating computer vision and augmented reality (AR) technologies. FetchAid first uses… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11--16, 2024, Honolulu, HI, USA

  33. arXiv:2402.15719  [pdf, other

    cs.HC

    "It Is Hard to Remove from My Eye": Design Makeup Residue Visualization System for Chinese Traditional Opera (Xiqu) Performers

    Authors: Zeyu Xiong, Shihan Fu, Yanying Zhu, Chenqing Zhu, Xiaojuan Ma, Mingming Fan

    Abstract: Chinese traditional opera (Xiqu) performers often experience skin problems due to the long-term use of heavy-metal-laden face paints. To explore the current skincare challenges encountered by Xiqu performers, we conducted an online survey (N=136) and semi-structured interviews (N=15) as a formative study. We found that incomplete makeup removal is the leading cause of human-induced skin problems,… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI '24), May 11-16, 2024, Honolulu, HI, USA

  34. arXiv:2402.05608  [pdf, other

    cs.CV cs.MM

    Scalable Diffusion Models with State Space Backbone

    Authors: Zhengcong Fei, Mingyuan Fan, Changqian Yu, Junshi Huang

    Abstract: This paper presents a new exploration into a category of diffusion models built upon state space architecture. We endeavor to train diffusion models for image data, wherein the traditional U-Net backbone is supplanted by a state space backbone, functioning on raw patches or latent space. Given its notable efficacy in accommodating long-range dependencies, Diffusion State Space Models (DiS) are dis… ▽ More

    Submitted 28 March, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

  35. arXiv:2402.04991  [pdf, other

    cs.HC

    Exploring the Opportunity of Augmented Reality (AR) in Supporting Older Adults Explore and Learn Smartphone Applications

    Authors: Xiaofu Jin, Wai Tong, Xiaoying Wei, Xian Wang, Emily Kuang, Xiaoyu Mo, Huamin Qu, Mingming Fan

    Abstract: The global aging trend compels older adults to navigate the evolving digital landscape, presenting a substantial challenge in mastering smartphone applications. While Augmented Reality (AR) holds promise for enhancing learning and user experience, its role in aiding older adults' smartphone app exploration remains insufficiently explored. Therefore, we conducted a two-phase study: (1) a workshop w… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  36. arXiv:2401.14121  [pdf, other

    cs.CV

    Incorporating Exemplar Optimization into Training with Dual Networks for Human Mesh Recovery

    Authors: Yongwei Nie, Mingxian Fan, Chengjiang Long, Qing Zhang, Jian Zhu, Xuemiao Xu

    Abstract: We propose a novel optimization-based human mesh recovery method from a single image. Given a test exemplar, previous approaches optimize the pre-trained regression network to minimize the 2D re-projection loss, which however suffer from over-/under-fitting problems. This is because the ``exemplar optimization'' at testing time has too weak relation to the pre-training process, and the exemplar op… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  37. arXiv:2401.05739  [pdf, other

    cs.SE cs.CR

    Cross-Inlining Binary Function Similarity Detection

    Authors: Ang Jia, Ming Fan, Xi Xu, Wuxia Jin, Haijun Wang, Ting Liu

    Abstract: Binary function similarity detection plays an important role in a wide range of security applications. Existing works usually assume that the query function and target function share equal semantics and compare their full semantics to obtain the similarity. However, we find that the function mapping is more complex, especially when function inlining happens. In this paper, we will systematically… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: Accepted at ICSE 2024 (Second Cycle). Camera-ready version

  38. arXiv:2401.02037   

    cs.IT

    Simplified Information Geometry Approach for Massive MIMO-OFDM Channel Estimation -- Part II: Convergence Analysis

    Authors: Jiyuan Yang, Yan Chen, Mingrui Fan, Xiqi Gao, Xiang-Gen Xia, Dirk Slock

    Abstract: In Part II of this two-part paper, we prove the convergence of the simplified information geometry approach (SIGA) proposed in Part I. For a general Bayesian inference problem, we first show that the iteration of the common second-order natural parameter (SONP) is separated from that of the common first-order natural parameter (FONP). Hence, the convergence of the common SONP can be checked indepe… ▽ More

    Submitted 3 June, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: I'm merging the two parts of this paper (arXiv:arXiv:2401.02035 and arXiv:2401.02037). The combined paper will appear as v2 of arXiv:2401.02035. So I need to withdraw this paper

  39. arXiv:2401.02035  [pdf, ps, other

    cs.IT

    Efficient Information Geometry Approach for Massive MIMO-OFDM Channel Estimation

    Authors: Jiyuan Yang, Yan Chen, Mingrui Fan, An-An Lu, Wen Zhong, Xiqi Gao, Xiaohu You, Xiang-Gen Xia, Dirk Slock

    Abstract: We investigate the channel estimation for massive multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) systems. We revisit the information geometry approach (IGA) for massive MIMO-OFDM channel estimation. By using the constant magnitude property of the entries of the measurement matrix, we find that the second-order natural parameters of the distributions on all th… ▽ More

    Submitted 3 June, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

  40. arXiv:2312.14611  [pdf, other

    cs.CV

    Tuning-Free Inversion-Enhanced Control for Consistent Image Editing

    Authors: Xiaoyue Duan, Shuhao Cui, Guoliang Kang, Baochang Zhang, Zhengcong Fei, Mingyuan Fan, Junshi Huang

    Abstract: Consistent editing of real images is a challenging task, as it requires performing non-rigid edits (e.g., changing postures) to the main objects in the input image without changing their identity or attributes. To guarantee consistent attributes, some existing methods fine-tune the entire model or the textual embedding for structural consistency, but they are time-consuming and fail to perform non… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  41. arXiv:2312.03987  [pdf, other

    cs.CL cs.AI

    Cost-Effective In-Context Learning for Entity Resolution: A Design Space Exploration

    Authors: Meihao Fan, Xiaoyue Han, Ju Fan, Chengliang Chai, Nan Tang, Guoliang Li, Xiaoyong Du

    Abstract: Entity resolution (ER) is an important data integration task with a wide spectrum of applications. The state-of-the-art solutions on ER rely on pre-trained language models (PLMs), which require fine-tuning on a lot of labeled matching/non-matching entity pairs. Recently, large languages models (LLMs), such as GPT-4, have shown the ability to perform many tasks without tuning model parameters, whic… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 14 pages, 7 figures

  42. arXiv:2311.15830  [pdf, other

    cs.SD cs.CV eess.AS

    A-JEPA: Joint-Embedding Predictive Architecture Can Listen

    Authors: Zhengcong Fei, Mingyuan Fan, Junshi Huang

    Abstract: This paper presents that the masked-modeling principle driving the success of large foundational vision models can be effectively applied to audio by making predictions in a latent space. We introduce Audio-based Joint-Embedding Predictive Architecture (A-JEPA), a simple extension method for self-supervised learning from the audio spectrum. Following the design of I-JEPA, our A-JEPA encodes visibl… ▽ More

    Submitted 11 January, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: arXiv admin note: text overlap with arXiv:2207.06405 by other authors

  43. arXiv:2311.11269  [pdf, other

    cs.HC cs.MM

    OperARtistry: An AR-based Interactive Application to Assist the Learning of Chinese Traditional Opera (Xiqu) Makeup

    Authors: Zeyu Xiong, Shihan Fu, Mingming Fan

    Abstract: Chinese Traditional Opera (Xiqu) is an important type of intangible cultural heritage and one key characteristic of Xiqu is its visual effects on face achieved via makeup. However, Xiqu makeup process, especially the eye-area makeup process, is complex and time-consuming, which poses a learning challenge for potential younger inheritors. We introduce OperARtistry, an interactive application based… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

    Comments: 11 pages, 9 figures, In Proceedings of The Eleventh International Symposium of Chinese CHI (Chinese CHI 2023)

  44. arXiv:2311.06423  [pdf, other

    cs.LG cs.CR cs.CV

    Flatness-aware Adversarial Attack

    Authors: Mingyuan Fan, Xiaodan Li, Cen Chen, Yinggui Wang

    Abstract: The transferability of adversarial examples can be exploited to launch black-box attacks. However, adversarial examples often present poor transferability. To alleviate this issue, by observing that the diversity of inputs can boost transferability, input regularization based methods are proposed, which craft adversarial examples by combining several transformed inputs. We reveal that input regula… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  45. CoPrompt: Supporting Prompt Sharing and Referring in Collaborative Natural Language Programming

    Authors: Li Feng, Ryan Yen, Yuzhe You, Mingming Fan, Jian Zhao, Zhicong Lu

    Abstract: Natural language (NL) programming has become more approachable due to the powerful code-generation capability of large language models (LLMs). This shift to using NL to program enhances collaborative programming by reducing communication barriers and context-switching among programmers from varying backgrounds. However, programmers may face challenges during prompt engineering in a collaborative s… ▽ More

    Submitted 1 March, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11--16, 2024, Honolulu, HI, USA

  46. uxSense: Supporting User Experience Analysis with Visualization and Computer Vision

    Authors: Andrea Batch, Yipeng Ji, Mingming Fan, Jian Zhao, Niklas Elmqvist

    Abstract: Analyzing user behavior from usability evaluation can be a challenging and time-consuming task, especially as the number of participants and the scale and complexity of the evaluation grows. We propose uxSense, a visual analytics system using machine learning methods to extract user behavior from audio and video recordings as parallel time-stamped data streams. Our implementation draws on pattern… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: 21 pages, 14 figures

    Journal ref: IEEE Transactions on Visualization and Computer Graphics, 2023

  47. arXiv:2310.06959  [pdf, ps, other

    cs.PL

    Proof Repair across Quotient Type Equivalences

    Authors: Cosmo Viola, Max Fan, Talia Ringer

    Abstract: Proofs in proof assistants like Coq can be brittle, breaking easily in response to changes in the terms and types those proofs depend on. To address this, recent work introduced an algorithm and tool in Coq to automatically repair broken proofs in response to changes that correspond to type equivalences. However, many changes remained out of the scope of this algorithm and tool -- especially chang… ▽ More

    Submitted 18 March, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: for associated code, see https://github.com/InnovativeInventor/proof-repair-quotients

  48. arXiv:2310.04610  [pdf, other

    cs.AI cs.LG

    DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

    Authors: Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri , et al. (67 additional authors not shown)

    Abstract: In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique… ▽ More

    Submitted 11 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

  49. arXiv:2309.11816  [pdf, other

    cs.HC

    Designing Loving-Kindness Meditation in Virtual Reality for Long-Distance Romantic Relationships

    Authors: Xian Wang, Xiaoyu Mo, Lik-Hang Lee, Xiaoying Wei, Xiaofu Jin, Mingming Fan, Pan Hui

    Abstract: Loving-kindness meditation (LKM) is used in clinical psychology for couples' relationship therapy, but physical isolation can make the relationship more strained and inaccessible to LKM. Virtual reality (VR) can provide immersive LKM activities for long-distance couples. However, no suitable commercial VR applications for couples exist to engage in LKM activities of long-distance. This paper organ… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  50. arXiv:2309.07408  [pdf, other

    cs.RO

    An Explicit Method for Fast Monocular Depth Recovery in Corridor Environments

    Authors: Yehao Liu, Ruoyan Xia, Xiaosu Xu, Zijian Wang, Yiqing Ya, Mingze Fan

    Abstract: Monocular cameras are extensively employed in indoor robotics, but their performance is limited in visual odometry, depth estimation, and related applications due to the absence of scale information.Depth estimation refers to the process of estimating a dense depth map from the corresponding input image, existing researchers mostly address this issue through deep learning-based approaches, yet the… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: 10 pages, 8 figures. arXiv admin note: text overlap with arXiv:2111.08600 by other authors