Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 2,586 results for author: Yang, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.02371  [pdf, other

    cs.CV

    OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation

    Authors: Kepan Nan, Rui Xie, Penghao Zhou, Tiehan Fan, Zhenheng Yang, Zhijie Chen, Xiang Li, Jian Yang, Ying Tai

    Abstract: Text-to-video (T2V) generation has recently garnered significant attention thanks to the large multi-modality model Sora. However, T2V generation still faces two important challenges: 1) Lacking a precise open sourced high-quality dataset. The previous popular video datasets, e.g. WebVid-10M and Panda-70M, are either with low quality or too large for most research institutions. Therefore, it is ch… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 15 pages, 9 figures

  2. arXiv:2407.02353  [pdf, other

    eess.SP cs.AR eess.SY

    Roadmap to Neuromorphic Computing with Emerging Technologies

    Authors: Adnan Mehonic, Daniele Ielmini, Kaushik Roy, Onur Mutlu, Shahar Kvatinsky, Teresa Serrano-Gotarredona, Bernabe Linares-Barranco, Sabina Spiga, Sergey Savelev, Alexander G Balanov, Nitin Chawla, Giuseppe Desoli, Gerardo Malavena, Christian Monzio Compagnoni, Zhongrui Wang, J Joshua Yang, Ghazi Sarwat Syed, Abu Sebastian, Thomas Mikolajick, Beatriz Noheda, Stefan Slesazeck, Bernard Dieny, Tuo-Hung, Hou, Akhil Varri , et al. (28 additional authors not shown)

    Abstract: The roadmap is organized into several thematic sections, outlining current computing challenges, discussing the neuromorphic computing approach, analyzing mature and currently utilized technologies, providing an overview of emerging technologies, addressing material challenges, exploring novel computing concepts, and finally examining the maturity level of emerging technologies while determining t… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 90 pages, 22 figures, roadmap

  3. arXiv:2407.01960  [pdf, other

    cs.CV cs.LG

    Zero-shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model

    Authors: Cong Cao, Huanjing Yue, Xin Liu, Jingyu Yang

    Abstract: Diffusion-based zero-shot image restoration and enhancement models have achieved great success in various image restoration and enhancement tasks without training. However, directly applying them to video restoration and enhancement results in severe temporal flickering artifacts. In this paper, we propose the first framework for zero-shot video restoration and enhancement based on a pre-trained i… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 19 pages

  4. arXiv:2407.01945  [pdf, other

    cs.CV

    Indoor 3D Reconstruction with an Unknown Camera-Projector Pair

    Authors: Zhaoshuai Qi, Yifeng Hao, Rui Hu, Wenyou Chang, Jiaqi Yang, Yanning Zhang

    Abstract: Structured light-based method with a camera-projector pair (CPP) plays a vital role in indoor 3D reconstruction, especially for scenes with weak textures. Previous methods usually assume known intrinsics, which are pre-calibrated from known objects, or self-calibrated from multi-view observations. It is still challenging to reliably recover CPP intrinsics from only two views without any known obje… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  5. arXiv:2407.01479  [pdf, other

    cs.RO cs.LG

    EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning

    Authors: Jingyun Yang, Zi-ang Cao, Congyue Deng, Rika Antonova, Shuran Song, Jeannette Bohg

    Abstract: Building effective imitation learning methods that enable robots to learn from limited data and still generalize across diverse real-world environments is a long-standing problem in robot learning. We propose EquiBot, a robust, data-efficient, and generalizable approach for robot manipulation task learning. Our approach combines SIM(3)-equivariant neural network architectures with diffusion models… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: The first two authors contributed equally

  6. arXiv:2407.01262  [pdf, other

    cs.LG

    Complementary Fusion of Deep Network and Tree Model for ETA Prediction

    Authors: YuRui Huang, Jie Zhang, HengDa Bao, Yang Yang, Jian Yang

    Abstract: Estimated time of arrival (ETA) is a very important factor in the transportation system. It has attracted increasing attentions and has been widely used as a basic service in navigation systems and intelligent transportation systems. In this paper, we propose a novel solution to the ETA estimation problem, which is an ensemble on tree models and neural networks. We proved the accuracy and robustne… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  7. arXiv:2407.00696  [pdf, other

    cs.LG

    Graph in Graph Neural Network

    Authors: Jiongshu Wang, Jing Yang, Jiankang Deng, Hatice Gunes, Siyang Song

    Abstract: Existing Graph Neural Networks (GNNs) are limited to process graphs each of whose vertices is represented by a vector or a single value, limited their representing capability to describe complex objects. In this paper, we propose the first GNN (called Graph in Graph Neural (GIG) Network) which can process graph-style data (called GIG sample) whose vertices are further represented by graphs. Given… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    MSC Class: 68T05

  8. UWBAD: Towards Effective and Imperceptible Jamming Attacks Against UWB Ranging Systems with COTS Chips

    Authors: Yuqiao Yang, Zhongjie Wu, Yongzhao Zhang, Ting Chen, Jun Li, Jie Yang, Wenhao Liu, Xiaosong Zhang, Ruicong Shi, Jingwei Li, Yu Jiang, Zhuo Su

    Abstract: UWB ranging systems have been adopted in many critical and security sensitive applications due to its precise positioning and secure ranging capabilities. We present a practical jamming attack, namely UWBAD, against commercial UWB ranging systems, which exploits the vulnerability of the adoption of the normalized cross-correlation process in UWB ranging and can selectively and quickly block rangin… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security

  9. arXiv:2406.20081  [pdf, other

    cs.CV cs.LG

    Segment Anything without Supervision

    Authors: XuDong Wang, Jingfeng Yang, Trevor Darrell

    Abstract: The Segmentation Anything Model (SAM) requires labor-intensive data labeling. We present Unsupervised SAM (UnSAM) for promptable and automatic whole-image segmentation that does not require human annotations. UnSAM utilizes a divide-and-conquer strategy to "discover" the hierarchical structure of visual scenes. We first leverage top-down clustering methods to partition an unlabeled image into inst… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Code: https://github.com/frank-xwang/UnSAM

  10. arXiv:2406.20078  [pdf, other

    cs.CV

    GM-DF: Generalized Multi-Scenario Deepfake Detection

    Authors: Yingxin Lai, Zitong Yu, Jing Yang, Bin Li, Xiangui Kang, Linlin Shen

    Abstract: Existing face forgery detection usually follows the paradigm of training models in a single domain, which leads to limited generalization capacity when unseen scenarios and unknown attacks occur. In this paper, we elaborately investigate the generalization capacity of deepfake detection models when jointly trained on multiple face forgery detection datasets. We first find a rapid degradation of de… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  11. arXiv:2406.19398  [pdf, other

    cs.CV cs.GR

    Woven Fabric Capture with a Reflection-Transmission Photo Pair

    Authors: Yingjie Tang, Zixuan Li, Miloš Hašan, Jian Yang, Beibei Wang

    Abstract: Digitizing woven fabrics would be valuable for many applications, from digital humans to interior design. Previous work introduces a lightweight woven fabric acquisition approach by capturing a single reflection image and estimating the fabric parameters with a differentiable geometric and shading model. The renderings of the estimated fabric parameters can closely match the photo; however, the ca… ▽ More

    Submitted 1 July, 2024; v1 submitted 4 May, 2024; originally announced June 2024.

    Comments: 10 pages, 16 figures (in the main paper). Accepted by SIGGRAPH 2024 conference

  12. arXiv:2406.19263  [pdf, other

    cs.CL cs.CV

    Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding

    Authors: Yue Fan, Lei Ding, Ching-Chen Kuo, Shan Jiang, Yang Zhao, Xinze Guan, Jie Yang, Yi Zhang, Xin Eric Wang

    Abstract: Graphical User Interfaces (GUIs) are central to our interaction with digital devices. Recently, growing efforts have been made to build models for various GUI understanding tasks. However, these efforts largely overlook an important GUI-referring task: screen reading based on user-indicated points, which we name the Screen Point-and-Read (SPR) task. This task is predominantly handled by rigid acce… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  13. arXiv:2406.18544  [pdf, other

    cs.CV cs.GR

    GS-ROR: 3D Gaussian Splatting for Reflective Object Relighting via SDF Priors

    Authors: Zuo-Liang Zhu, Beibei Wang, Jian Yang

    Abstract: 3D Gaussian Splatting (3DGS) has shown a powerful capability for novel view synthesis due to its detailed expressive ability and highly efficient rendering speed. Unfortunately, creating relightable 3D assets with 3DGS is still problematic, particularly for reflective objects, as its discontinuous representation raises difficulties in constraining geometries. Inspired by previous works, the signed… ▽ More

    Submitted 22 May, 2024; originally announced June 2024.

  14. arXiv:2406.18294  [pdf, other

    cs.CL

    Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs

    Authors: Lei Zhang, Yunshui Li, Jiaming Li, Xiaobo Xia, Jiaxi Yang, Run Luo, Minzheng Wang, Longze Chen, Junhao Liu, Min Yang

    Abstract: Some recently developed code large language models (Code LLMs) have been pre-trained on repository-level code data (Repo-Code LLMs), enabling these models to recognize repository structures and utilize cross-file information for code completion. However, in real-world development scenarios, simply concatenating the entire code repository often exceeds the context window limits of these Repo-Code L… ▽ More

    Submitted 27 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  15. arXiv:2406.18284  [pdf, other

    cs.CV

    RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network

    Authors: Xiaozhong Ji, Chuming Lin, Zhonggan Ding, Ying Tai, Jian Yang, Junwei Zhu, Xiaobin Hu, Jiangning Zhang, Donghao Luo, Chengjie Wang

    Abstract: Person-generic audio-driven face generation is a challenging task in computer vision. Previous methods have achieved remarkable progress in audio-visual synchronization, but there is still a significant gap between current results and practical applications. The challenges are two-fold: 1) Preserving unique individual traits for achieving high-precision lip synchronization. 2) Generating high-qual… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  16. arXiv:2406.17960  [pdf, other

    cs.CV cs.AI

    MAGIC: Meta-Ability Guided Interactive Chain-of-Distillation for Effective-and-Efficient Vision-and-Language Navigation

    Authors: Liuyi Wang, Zongtao He, Mengjiao Shen, Jingwei Yang, Chengju Liu, Qijun Chen

    Abstract: Despite the remarkable developments of recent large models in Embodied Artificial Intelligence (E-AI), their integration into robotics is hampered by their excessive parameter sizes and computational demands. Towards the Vision-and-Language Navigation (VLN) task, a core task in E-AI, this paper reveals the great potential of using knowledge distillation for obtaining lightweight student models by… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  17. arXiv:2406.17588  [pdf, other

    cs.CL

    LongIns: A Challenging Long-context Instruction-based Exam for LLMs

    Authors: Shawn Gavin, Tuney Zheng, Jiaheng Liu, Quehry Que, Noah Wang, Jian Yang, Chenchen Zhang, Wenhao Huang, Wenhu Chen, Ge Zhang

    Abstract: The long-context capabilities of large language models (LLMs) have been a hot topic in recent years. To evaluate the performance of LLMs in different scenarios, various assessment benchmarks have emerged. However, as most of these benchmarks focus on identifying key information to answer questions, which mainly requires the retrieval ability of LLMs, these benchmarks can partially represent the re… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  18. arXiv:2406.17289  [pdf, other

    cs.IR cs.AI

    Hyperbolic Knowledge Transfer in Cross-Domain Recommendation System

    Authors: Xin Yang, Heng Chang, Zhijian La, Jinze Yang, Xingrun Li, Yu Lu, Shuaiqiang Wang, Dawei Yin, Erxue Min

    Abstract: Cross-Domain Recommendation (CDR) seeks to utilize knowledge from different domains to alleviate the problem of data sparsity in the target recommendation domain, and it has been gaining more attention in recent years. Although there have been notable advancements in this area, most current methods represent users and items in Euclidean space, which is not ideal for handling long-tail distributed… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  19. arXiv:2406.17005  [pdf, other

    cs.CV

    PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

    Authors: Henghui Ding, Chang Liu, Yunchao Wei, Nikhila Ravi, Shuting He, Song Bai, Philip Torr, Deshui Miao, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang, Zhensong Xu, Jiangtao Yao, Chengjing Wu, Ting Liu, Luoqi Liu, Xinyu Liu, Jing Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan Yang, Mingqi Gao, Jingnan Luo , et al. (12 additional authors not shown)

    Abstract: Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: MOSE Challenge: https://henghuiding.github.io/MOSE/ChallengeCVPR2024, MeViS Challenge: https://henghuiding.github.io/MeViS/ChallengeCVPR2024

  20. arXiv:2406.16860  [pdf, other

    cs.CV

    Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

    Authors: Shengbang Tong, Ellis Brown, Penghao Wu, Sanghyun Woo, Manoj Middepogu, Sai Charitha Akula, Jihan Yang, Shusheng Yang, Adithya Iyer, Xichen Pan, Austin Wang, Rob Fergus, Yann LeCun, Saining Xie

    Abstract: We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach. While stronger language models can enhance multimodal capabilities, the design choices for vision components are often insufficiently explored and disconnected from visual representation learning research. This gap hinders accurate sensory grounding in real-world scenarios. Our study uses LLMs and… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Website at https://cambrian-mllm.github.io

  21. arXiv:2406.16852  [pdf, other

    cs.CV

    Long Context Transfer from Language to Vision

    Authors: Peiyuan Zhang, Kaichen Zhang, Bo Li, Guangtao Zeng, Jingkang Yang, Yuanhan Zhang, Ziyue Wang, Haoran Tan, Chunyuan Li, Ziwei Liu

    Abstract: Video sequences offer valuable temporal information, but existing large multimodal models (LMMs) fall short in understanding extremely long videos. Many works address this by reducing the number of visual tokens using visual resamplers. Alternatively, in this paper, we approach this problem from the perspective of the language model. By simply extrapolating the context length of the language backb… ▽ More

    Submitted 30 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: Code, demo, and models are available at https://github.com/EvolvingLMMs-Lab/LongVA

  22. arXiv:2406.16564  [pdf, other

    cs.CV

    FASTC: A Fast Attentional Framework for Semantic Traversability Classification Using Point Cloud

    Authors: Yirui Chen, Pengjin Wei, Zhenhuan Liu, Bingchao Wang, Jie Yang, Wei Liu

    Abstract: Producing traversability maps and understanding the surroundings are crucial prerequisites for autonomous navigation. In this paper, we address the problem of traversability assessment using point clouds. We propose a novel pillar feature extraction module that utilizes PointNet to capture features from point clouds organized in vertical volume and a 2D encoder-decoder structure to conduct travers… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted to ECAI2023 Our code is publicly available at [this](https://github.com/chenyirui/FASTC)

  23. arXiv:2406.16531  [pdf, other

    cs.CV

    GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization

    Authors: Yirui Chen, Xudong Huang, Quan Zhang, Wei Li, Mingjian Zhu, Qiangyu Yan, Simiao Li, Hanting Chen, Hailin Hu, Jie Yang, Wei Liu, Jie Hu

    Abstract: The extraordinary ability of generative models emerges as a new trend in image editing and generating realistic images, posing a serious threat to the trustworthiness of multimedia data and driving the research of image manipulation detection and location(IMDL). However, the lack of a large-scale data foundation makes IMDL task unattainable. In this paper, a local manipulation pipeline is designed… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Code page: https://github.com/chenyirui/GIM

  24. arXiv:2406.16441  [pdf, other

    cs.CL

    UniCoder: Scaling Code Large Language Model via Universal Code

    Authors: Tao Sun, Linzheng Chai, Jian Yang, Yuwei Yin, Hongcheng Guo, Jiaheng Liu, Bing Wang, Liqun Yang, Zhoujun Li

    Abstract: Intermediate reasoning or acting steps have successfully improved large language models (LLMs) for handling various downstream natural language processing (NLP) tasks. When applying LLMs for code generation, recent works mainly focus on directing the models to articulate intermediate natural-language reasoning steps, as in chain-of-thought (CoT) prompting, and then output code with the natural lan… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024 (Main)

  25. arXiv:2406.16137  [pdf, other

    cs.CV

    MLPHand: Real Time Multi-View 3D Hand Mesh Reconstruction via MLP Modeling

    Authors: Jian Yang, Jiakun Li, Guoming Li, Zhen Shen, Huai-Yu Wu, Zhaoxin Fan, Heng Huang

    Abstract: Multi-view hand mesh reconstruction is a critical task for applications in virtual reality and human-computer interaction, but it remains a formidable challenge. Although existing multi-view hand reconstruction methods achieve remarkable accuracy, they typically come with an intensive computational burden that hinders real-time inference. To this end, we propose MLPHand, a novel method designed fo… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  26. arXiv:2406.15501  [pdf

    cs.CR

    Secure Combination of Untrusted Time information Based on Optimized Dempster-Shafer Theory

    Authors: Yang Li, Yujie Luo, Yichen Zhang, Ao Sun, Wei Huang, Shuai Zhang, Tao Zhang, Chuang Zhou, Li Ma, Jie Yang, Mei Wu, Heng Wang, Yan Pan, Yun Shao, Xing Chen, Ziyang Chen, Song Yu, Hong Guo, Bingjie Xu

    Abstract: Secure precision time synchronization is important for applications of Cyber-Physical Systems. However, several attacks, especially the Time Delay Attack (TDA), deteriorates the performance of time synchronization system seriously. Multiple paths scheme is thought as an effective security countermeasure to decrease the influence of TDA. However, the effective secure combination algorithm is still… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  27. arXiv:2406.14825   

    cs.CL

    TemPrompt: Multi-Task Prompt Learning for Temporal Relation Extraction in RAG-based Crowdsourcing Systems

    Authors: Jing Yang, Yu Zhao, Yang Linyao, Xiao Wang, Long Chen, Fei-Yue Wang

    Abstract: Temporal relation extraction (TRE) aims to grasp the evolution of events or actions, and thus shape the workflow of associated tasks, so it holds promise in helping understand task requests initiated by requesters in crowdsourcing systems. However, existing methods still struggle with limited and unevenly distributed annotated data. Therefore, inspired by the abundant global knowledge stored withi… ▽ More

    Submitted 30 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: I submitted the manuscript without obtaining consent from all co-authors

  28. arXiv:2406.14675  [pdf, other

    cs.CV cs.AI cs.LG

    This Looks Better than That: Better Interpretable Models with ProtoPNeXt

    Authors: Frank Willard, Luke Moffett, Emmanuel Mokel, Jon Donnelly, Stark Guo, Julia Yang, Giyoung Kim, Alina Jade Barnett, Cynthia Rudin

    Abstract: Prototypical-part models are a popular interpretable alternative to black-box deep learning models for computer vision. However, they are difficult to train, with high sensitivity to hyperparameter tuning, inhibiting their application to new datasets and our understanding of which methods truly improve their performance. To facilitate the careful study of prototypical-part networks (ProtoPNets), w… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  29. arXiv:2406.14544  [pdf, other

    cs.CV cs.CL

    Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs

    Authors: Yuxuan Qiao, Haodong Duan, Xinyu Fang, Junming Yang, Lin Chen, Songyang Zhang, Jiaqi Wang, Dahua Lin, Kai Chen

    Abstract: Vision Language Models (VLMs) demonstrate remarkable proficiency in addressing a wide array of visual questions, which requires strong perception and reasoning faculties. Assessing these two competencies independently is crucial for model refinement, despite the inherent difficulty due to the intertwined nature of seeing and reasoning in existing VLMs. To tackle this issue, we present Prism, an in… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  30. arXiv:2406.14401  [pdf, other

    cs.LG cs.AI

    Fair Streaming Feature Selection

    Authors: Zhangling Duan, Tianci Li, Xingyu Wu, Zhaolong Ling, Jingye Yang, Zhaohong Jia

    Abstract: Streaming feature selection techniques have become essential in processing real-time data streams, as they facilitate the identification of the most relevant attributes from continuously updating information. Despite their performance, current algorithms to streaming feature selection frequently fall short in managing biases and avoiding discrimination that could be perpetuated by sensitive attrib… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 30 pages, 10 figures

  31. arXiv:2406.14232  [pdf, other

    cs.LG cs.AI

    Enhancing robustness of data-driven SHM models: adversarial training with circle loss

    Authors: Xiangli Yang, Xijie Deng, Hanwei Zhang, Yang Zou, Jianxi Yang

    Abstract: Structural health monitoring (SHM) is critical to safeguarding the safety and reliability of aerospace, civil, and mechanical infrastructure. Machine learning-based data-driven approaches have gained popularity in SHM due to advancements in sensors and computational power. However, machine learning models used in SHM are vulnerable to adversarial examples -- even small changes in input can lead to… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 9 figures

  32. arXiv:2406.13606  [pdf, other

    cs.CV

    DDLNet: Boosting Remote Sensing Change Detection with Dual-Domain Learning

    Authors: Xiaowen Ma, Jiawei Yang, Rui Che, Huanting Zhang, Wei Zhang

    Abstract: Remote sensing change detection (RSCD) aims to identify the changes of interest in a region by analyzing multi-temporal remote sensing images, and has an outstanding value for local development monitoring. Existing RSCD methods are devoted to contextual modeling in the spatial domain to enhance the changes of interest. Despite the satisfactory performance achieved, the lack of knowledge in the fre… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: ICME 2024 Oral

  33. arXiv:2406.13007  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Night Photography Rendering

    Authors: Egor Ershov, Artyom Panshin, Oleg Karasev, Sergey Korchagin, Shepelev Lev, Alexandr Startsev, Daniil Vladimirov, Ekaterina Zaychenkova, Nikola Banić, Dmitrii Iarchuk, Maria Efimova, Radu Timofte, Arseniy Terekhin, Shuwei Yue, Yuyang Liu, Minchen Wei, Lu Xu, Chao Zhang, Yasi Wang, Furkan Kınlı, Doğa Yılmaz, Barış Özcan, Furkan Kıraç, Shuai Liu, Jingyuan Xiao , et al. (25 additional authors not shown)

    Abstract: This paper presents a review of the NTIRE 2024 challenge on night photography rendering. The goal of the challenge was to find solutions that process raw camera images taken in nighttime conditions, and thereby produce a photo-quality output images in the standard RGB (sRGB) space. Unlike the previous year's competition, the challenge images were collected with a mobile phone and the speed of algo… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 10 figures

  34. arXiv:2406.12809  [pdf, other

    cs.CL

    Can Large Language Models Always Solve Easy Problems if They Can Solve Harder Ones?

    Authors: Zhe Yang, Yichang Zhang, Tianyu Liu, Jian Yang, Junyang Lin, Chang Zhou, Zhifang Sui

    Abstract: Large language models (LLMs) have demonstrated impressive capabilities, but still suffer from inconsistency issues (e.g. LLMs can react differently to disturbances like rephrasing or inconsequential order change). In addition to these inconsistencies, we also observe that LLMs, while capable of solving hard problems, can paradoxically fail at easier ones. To evaluate this hard-to-easy inconsistenc… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 25 pages, 12 figures, 10 tables

  35. arXiv:2406.12703  [pdf, other

    eess.IV cs.CV

    Coarse-Fine Spectral-Aware Deformable Convolution For Hyperspectral Image Reconstruction

    Authors: Jincheng Yang, Lishun Wang, Miao Cao, Huan Wang, Yinping Zhao, Xin Yuan

    Abstract: We study the inverse problem of Coded Aperture Snapshot Spectral Imaging (CASSI), which captures a spatial-spectral data cube using snapshot 2D measurements and uses algorithms to reconstruct 3D hyperspectral images (HSI). However, current methods based on Convolutional Neural Networks (CNNs) struggle to capture long-range dependencies and non-local similarities. The recently popular Transformer-b… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 7 pages, 5 figures, Accepted by ICIP2024

  36. arXiv:2406.12588  [pdf, other

    cs.LG cs.AI cs.CR stat.ML

    UIFV: Data Reconstruction Attack in Vertical Federated Learning

    Authors: Jirui Yang, Peng Chen, Zhihui Lu, Qiang Duan, Yubing Bao

    Abstract: Vertical Federated Learning (VFL) facilitates collaborative machine learning without the need for participants to share raw private data. However, recent studies have revealed privacy risks where adversaries might reconstruct sensitive features through data leakage during the learning process. Although data reconstruction methods based on gradient or model information are somewhat effective, they… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  37. arXiv:2406.12297  [pdf, other

    cs.LG cs.AI

    Faithful Density-Peaks Clustering via Matrix Computations on MPI Parallelization System

    Authors: Ji Xu, Tianlong Xiao, Jinye Yang, Panpan Zhu

    Abstract: Density peaks clustering (DP) has the ability of detecting clusters of arbitrary shape and clustering non-Euclidean space data, but its quadratic complexity in both computing and storage makes it difficult to scale for big data. Various approaches have been proposed in this regard, including MapReduce based distribution computing, multi-core parallelism, presentation transformation (e.g., kd-tree,… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: This paper presents a novel approach FaithPDP that takes advantages of both hardware (multi-core architecture of CPU) and modern programming language (Python or Matlab for efficient vector and matrix computation) to achieve clustering result identical to vanilla DP algorithm, while the computing complexity is reduced to pseudo-linear

  38. arXiv:2406.12095  [pdf, other

    cs.CV cs.AI cs.RO

    DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features

    Authors: Letian Wang, Seung Wook Kim, Jiawei Yang, Cunjun Yu, Boris Ivanovic, Steven L. Waslander, Yue Wang, Sanja Fidler, Marco Pavone, Peter Karkus

    Abstract: We propose DistillNeRF, a self-supervised learning framework addressing the challenge of understanding 3D environments from limited 2D observations in autonomous driving. Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs, and is trained self-supervised with differentiable rendering to reconstruct RGB,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  39. arXiv:2406.12009  [pdf, other

    cs.CL

    FinTruthQA: A Benchmark Dataset for Evaluating the Quality of Financial Information Disclosure

    Authors: Ziyue Xu, Peilin Zhou, Xinyu Shi, Jiageng Wu, Yikang Jiang, Bin Ke, Jie Yang

    Abstract: Accurate and transparent financial information disclosure is crucial in the fields of accounting and finance, ensuring market efficiency and investor confidence. Among many information disclosure platforms, the Chinese stock exchanges' investor interactive platform provides a novel and interactive way for listed firms to disclose information of interest to investors through an online question-and-… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  40. arXiv:2406.11739  [pdf, other

    cs.CV

    V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

    Authors: Jiaqi Wang, Yuhang Zang, Pan Zhang, Tao Chu, Yuhang Cao, Zeyi Sun, Ziyu Liu, Xiaoyi Dong, Tong Wu, Dahua Lin, Zeming Chen, Zhi Wang, Lingchen Meng, Wenhao Yao, Jianwei Yang, Sihong Wu, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Peixi Wu, Bosong Chai, Xuan Nie, Longquan Yan, Zeyu Wang, Qifan Zhou , et al. (9 additional authors not shown)

    Abstract: Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  41. arXiv:2406.11519  [pdf, other

    cs.CV eess.IV

    HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

    Authors: Di Wang, Meiqi Hu, Yao Jin, Yuchun Miao, Jiaqi Yang, Yichu Xu, Xiaolei Qin, Jiaqi Ma, Lingyu Sun, Chenxing Li, Chuan Fu, Hongruixuan Chen, Chengxi Han, Naoto Yokoya, Jing Zhang, Minqiang Xu, Lin Liu, Lefei Zhang, Chen Wu, Bo Du, Dacheng Tao, Liangpei Zhang

    Abstract: Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: The code and models will be released at https://github.com/WHU-Sigma/HyperSIGMA

  42. arXiv:2406.11414  [pdf, other

    cs.LO cs.AI

    Formally Certified Approximate Model Counting

    Authors: Yong Kiam Tan, Jiong Yang, Mate Soos, Magnus O. Myreen, Kuldeep S. Meel

    Abstract: Approximate model counting is the task of approximating the number of solutions to an input Boolean formula. The state-of-the-art approximate model counter for formulas in conjunctive normal form (CNF), ApproxMC, provides a scalable means of obtaining model counts with probably approximately correct (PAC)-style guarantees. Nevertheless, the validity of ApproxMC's approximation relies on a careful… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: The extended version, including the appendix, of the paper to be published in CAV24. The associated artifact is available at https://doi.org/10.5281/zenodo.10948839

  43. arXiv:2406.11301  [pdf, other

    cs.AI cs.CL cs.LG

    Optimizing and Testing Instruction-Following: Analyzing the Impact of Fine-Grained Instruction Variants on instruction-tuned LLMs

    Authors: Jiuding Yang, Weidong Guo, Kaitong Yang, Xiangyang Li, Zhuwei Rao, Yu Xu, Di Niu

    Abstract: The effective alignment of Large Language Models (LLMs) with precise instructions is essential for their application in diverse real-world scenarios. Current methods focus on enhancing the diversity and complexity of training and evaluation samples, yet they fall short in accurately assessing LLMs' ability to follow similar instruction variants. We introduce an effective data augmentation techniqu… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  44. arXiv:2406.11214  [pdf, other

    cs.CL

    Global Data Constraints: Ethical and Effectiveness Challenges in Large Language Model

    Authors: Jin Yang, Zhiqiang Wang, Yanbin Lin, Zunduo Zhao

    Abstract: The efficacy and ethical integrity of large language models (LLMs) are profoundly influenced by the diversity and quality of their training datasets. However, the global landscape of data accessibility presents significant challenges, particularly in regions with stringent data privacy laws or limited open-source information. This paper examines the multifaceted challenges associated with acquirin… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 6 pages, 3 figures, and 5 tables

  45. arXiv:2406.10932  [pdf, other

    cs.SD cs.AI eess.AS

    Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation for Embedding Undetectable Vulnerabilities on Speech Recognition

    Authors: Wenhan Yao, Jiangkun Yang, Yongqiang He, Jia Liu, Weiping Wen

    Abstract: Speech recognition is an essential start ring of human-computer interaction, and recently, deep learning models have achieved excellent success in this task. However, when the model training and private data provider are always separated, some security threats that make deep neural networks (DNNs) abnormal deserve to be researched. In recent years, the typical backdoor attacks have been researched… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  46. arXiv:2406.10765  [pdf, other

    cs.DC

    PWDFT-SW: Extending the Limit of Plane-Wave DFT Calculations to 16K Atoms on the New Sunway Supercomputer

    Authors: Qingcai Jiang, Zhenwei Cao, Junshi Chen, Xinming Qin, Wei Hu, Hong An, Jinlong Yang

    Abstract: First-principles density functional theory (DFT) with plane wave (PW) basis set is the most widely used method in quantum mechanical material simulations due to its advantages in accuracy and universality. However, a perceived drawback of PW-based DFT calculations is their substantial computational cost and memory usage, which currently limits their ability to simulate large-scale complex systems… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  47. arXiv:2406.10304  [pdf, other

    cs.CL

    Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design

    Authors: Ming Gao, Hang Chen, Jun Du, Xin Xu, Hongxiao Guo, Hui Bu, Jianxing Yang, Ming Li, Chin-Hui Lee

    Abstract: Smart home technology has gained widespread adoption, facilitating effortless control of devices through voice commands. However, individuals with dysarthria, a motor speech disorder, face challenges due to the variability of their speech. This paper addresses the wake-up word spotting (WWS) task for dysarthric individuals, aiming to integrate them into real-world applications. To support this, we… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: to be published in Interspeech 2024

  48. arXiv:2406.10262  [pdf, other

    cs.IR cs.AI math.OC stat.CO

    Fast solution to the fair ranking problem using the Sinkhorn algorithm

    Authors: Yuki Uehara, Shunnosuke Ikeda, Naoki Nishimura, Koya Ohashi, Yilin Li, Jie Yang, Deddy Jobson, Xingxia Zha, Takeshi Matsumoto, Noriyoshi Sukegawa, Yuichi Takano

    Abstract: In two-sided marketplaces such as online flea markets, recommender systems for providing consumers with personalized item rankings play a key role in promoting transactions between providers and consumers. Meanwhile, two-sided marketplaces face the problem of balancing consumer satisfaction and fairness among items to stimulate activity of item providers. Saito and Joachims (2022) devised an impac… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  49. arXiv:2406.09798  [pdf, other

    cs.RO cs.CV

    Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language Navigation

    Authors: Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Shuqiang Jiang

    Abstract: Vision-and-language navigation (VLN) enables the agent to navigate to a remote location in 3D environments following the natural language instruction. In this field, the agent is usually trained and evaluated in the navigation simulators, lacking effective approaches for sim-to-real transfer. The VLN agents with only a monocular camera exhibit extremely limited performance, while the mainstream VL… ▽ More

    Submitted 20 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: Submitted to CoRL 2024. The code is available at https://github.com/MrZihan/Sim2Real-VLN-3DFF

  50. arXiv:2406.09601  [pdf, other

    cs.CV

    Turns Out I'm Not Real: Towards Robust Detection of AI-Generated Videos

    Authors: Qingyuan Liu, Pengyuan Shi, Yun-Yun Tsai, Chengzhi Mao, Junfeng Yang

    Abstract: The impressive achievements of generative models in creating high-quality videos have raised concerns about digital integrity and privacy vulnerabilities. Recent works to combat Deepfakes videos have developed detectors that are highly accurate at identifying GAN-generated samples. However, the robustness of these detectors on diffusion-generated videos generated from video creation tools (e.g., S… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.