Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 68 results for author: Kuang, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.07351  [pdf, other

    cs.CV

    Unity in Diversity: Multi-expert Knowledge Confrontation and Collaboration for Generalizable Vehicle Re-identification

    Authors: Zhenyu Kuang, Hongyang Zhang, Lidong Cheng, Yinhao Liu, Yue Huang, Xinghao Ding

    Abstract: Generalizable vehicle re-identification (ReID) aims to enable the well-trained model in diverse source domains to broadly adapt to unknown target domains without additional fine-tuning or retraining. However, it still faces the challenges of domain shift problem and has difficulty accurately generalizing to unknown target domains. This limitation occurs because the model relies heavily on primary… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  2. arXiv:2406.17219  [pdf, other

    cs.CV

    Facial Identity Anonymization via Intrinsic and Extrinsic Attention Distraction

    Authors: Zhenzhong Kuang, Xiaochen Yang, Yingjie Shen, Chao Hu, Jun Yu

    Abstract: The unprecedented capture and application of face images raise increasing concerns on anonymization to fight against privacy disclosure. Most existing methods may suffer from the problem of excessive change of the identity-independent information or insufficient identity protection. In this paper, we present a new face anonymization approach by distracting the intrinsic and extrinsic identity atte… ▽ More

    Submitted 6 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: Zhenzhong Kuang, Xiaochen Yang, Yingjie Shen, Chao Hu, Jun Yu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 12406-12415 Date of Conference: 17-21 June 2024 Conference Location: Seattle, USA

  3. arXiv:2406.13919  [pdf, other

    cs.AI

    SPL: A Socratic Playground for Learning Powered by Large Language Model

    Authors: Liang Zhang, Jionghao Lin, Ziyi Kuang, Sheng Xu, Mohammed Yeasin, Xiangen Hu

    Abstract: Dialogue-based Intelligent Tutoring Systems (ITSs) have significantly advanced adaptive and personalized learning by automating sophisticated human tutoring strategies within interactive dialogues. However, replicating the nuanced patterns of expert human communication remains a challenge in Natural Language Processing (NLP). Recent advancements in NLP, particularly Large Language Models (LLMs) su… ▽ More

    Submitted 20 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  4. arXiv:2405.17414  [pdf, other

    cs.CV cs.GR

    Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

    Authors: Zhengfei Kuang, Shengqu Cai, Hao He, Yinghao Xu, Hongsheng Li, Leonidas Guibas, Gordon Wetzstein

    Abstract: Research on video generation has recently made tremendous progress, enabling high-quality videos to be generated from text prompts or images. Adding control to the video generation process is an important goal moving forward and recent approaches that condition video generation models on camera trajectories make strides towards it. Yet, it remains challenging to generate a video of the same scene… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  5. arXiv:2404.17805  [pdf, other

    cs.LG cs.CV

    From Optimization to Generalization: Fair Federated Learning against Quality Shift via Inter-Client Sharpness Matching

    Authors: Nannan Wu, Zhuo Kuang, Zengqiang Yan, Li Yu

    Abstract: Due to escalating privacy concerns, federated learning has been recognized as a vital approach for training deep neural networks with decentralized medical data. In practice, it is challenging to ensure consistent imaging quality across various institutions, often attributed to equipment malfunctions affecting a minority of clients. This imbalance in image quality can cause the federated model to… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: This paper is accepted at IJCAI'24 (Main Track)

  6. arXiv:2404.10318  [pdf, other

    cs.CV

    SRGS: Super-Resolution 3D Gaussian Splatting

    Authors: Xiang Feng, Yongbo He, Yubo Wang, Yan Yang, Wen Li, Yifei Chen, Zhenzhong Kuang, Jiajun ding, Jianping Fan, Yu Jun

    Abstract: Recently, 3D Gaussian Splatting (3DGS) has gained popularity as a novel explicit 3D representation. This approach relies on the representation power of Gaussian primitives to provide a high-quality rendering. However, primitives optimized at low resolution inevitably exhibit sparsity and texture deficiency, posing a challenge for achieving high-resolution novel view synthesis (HRNVS). To address t… ▽ More

    Submitted 18 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The first to focus on the HRNVS of 3DGS

  7. arXiv:2403.18356  [pdf, other

    cs.CV

    MonoHair: High-Fidelity Hair Modeling from a Monocular Video

    Authors: Keyu Wu, Lingchen Yang, Zhiyi Kuang, Yao Feng, Xutao Han, Yuefan Shen, Hongbo Fu, Kun Zhou, Youyi Zheng

    Abstract: Undoubtedly, high-fidelity 3D hair is crucial for achieving realism, artistic expression, and immersion in computer graphics. While existing 3D hair modeling methods have achieved impressive performance, the challenge of achieving high-quality hair reconstruction persists: they either require strict capture conditions, making practical applications difficult, or heavily rely on learned prior data,… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE CVPR 2024

  8. arXiv:2403.05574  [pdf, other

    cs.HC cs.AI cs.CL

    HealMe: Harnessing Cognitive Reframing in Large Language Models for Psychotherapy

    Authors: Mengxi Xiao, Qianqian Xie, Ziyan Kuang, Zhicheng Liu, Kailai Yang, Min Peng, Weiguang Han, Jimin Huang

    Abstract: Large Language Models (LLMs) can play a vital role in psychotherapy by adeptly handling the crucial task of cognitive reframing and overcoming challenges such as shame, distrust, therapist skill variability, and resource scarcity. Previous LLMs in cognitive reframing mainly converted negative emotions to positive ones, but these approaches have limited efficacy, often not promoting clients' self-d… ▽ More

    Submitted 29 July, 2024; v1 submitted 26 February, 2024; originally announced March 2024.

    Comments: 19 pages, 4 figures

    ACM Class: J.4

  9. arXiv:2402.12659  [pdf, other

    cs.CL cs.AI cs.CE

    FinBen: A Holistic Financial Benchmark for Large Language Models

    Authors: Qianqian Xie, Weiguang Han, Zhengyu Chen, Ruoyu Xiang, Xiao Zhang, Yueru He, Mengxi Xiao, Dong Li, Yongfu Dai, Duanyu Feng, Yijing Xu, Haoqiang Kang, Ziyan Kuang, Chenhan Yuan, Kailai Yang, Zheheng Luo, Tianlin Zhang, Zhiwei Liu, Guojun Xiong, Zhiyang Deng, Yuechen Jiang, Zhiyuan Yao, Haohang Li, Yangyang Yu, Gang Hu , et al. (9 additional authors not shown)

    Abstract: LLMs have transformed NLP and shown promise in various fields, yet their potential in finance is underexplored due to a lack of comprehensive evaluation benchmarks, the rapid development of LLMs, and the complexity of financial tasks. In this paper, we introduce FinBen, the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks, covering seven critical… ▽ More

    Submitted 18 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: 26 pages, 11 figures

  10. arXiv:2401.15365  [pdf, other

    cs.CV

    An open dataset for oracle bone script recognition and decipherment

    Authors: Pengjie Wang, Kaile Zhang, Xinyu Wang, Shengwei Han, Yongge Liu, Jinpeng Wan, Haisu Guan, Zhebin Kuang, Lianwen Jin, Xiang Bai, Yuliang Liu

    Abstract: Oracle Bone Script (OBS), one of the earliest known forms of ancient Chinese writing, holds invaluable insights into the humanities and geography of the Shang Dynasty, dating back 3,000 years. The immense historical and cultural significance of these writings cannot be overstated. However, the passage of time has obscured much of their meaning, presenting a significant challenge in deciphering the… ▽ More

    Submitted 5 June, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

  11. arXiv:2401.12467  [pdf, other

    cs.AI

    An open dataset for the evolution of oracle bone characters: EVOBC

    Authors: Haisu Guan, Jinpeng Wan, Yuliang Liu, Pengjie Wang, Kaile Zhang, Zhebin Kuang, Xinyu Wang, Xiang Bai, Lianwen Jin

    Abstract: The earliest extant Chinese characters originate from oracle bone inscriptions, which are closely related to other East Asian languages. These inscriptions hold immense value for anthropology and archaeology. However, deciphering oracle bone script remains a formidable challenge, with only approximately 1,600 of the over 4,500 extant characters elucidated to date. Further scholarly investigation i… ▽ More

    Submitted 13 February, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

  12. arXiv:2401.08123  [pdf, other

    cs.CV

    The Devil is in the Details: Boosting Guided Depth Super-Resolution via Rethinking Cross-Modal Alignment and Aggregation

    Authors: Xinni Jiang, Zengsheng Kuang, Chunle Guo, Ruixun Zhang, Lei Cai, Xiao Fan, Chongyi Li

    Abstract: Guided depth super-resolution (GDSR) involves restoring missing depth details using the high-resolution RGB image of the same scene. Previous approaches have struggled with the heterogeneity and complementarity of the multi-modal inputs, and neglected the issues of modal misalignment, geometrical misalignment, and feature selection. In this study, we rethink some essential components in GDSR netwo… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  13. arXiv:2312.14460  [pdf, other

    cs.CE

    Quantum computing with error mitigation for data-driven computational mechanics

    Authors: Zengtao Kuang, Yongchun Xu, Qun Huang, Jie Yang, Chafik El Kihal, Heng Hu

    Abstract: As a crossover frontier of physics and mechanics, quantum computing is showing its great potential in computational mechanics. However, quantum hardware noise remains a critical barrier to achieving accurate simulation results due to the limitation of the current hardware level. In this paper, we integrate error-mitigated quantum computing in data-driven computational mechanics, where the zero-noi… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: 26 pages, 15 figures

  14. arXiv:2312.12122  [pdf, other

    cs.CV cs.GR

    ZS-SRT: An Efficient Zero-Shot Super-Resolution Training Method for Neural Radiance Fields

    Authors: Xiang Feng, Yongbo He, Yubo Wang, Chengkai Wang, Zhenzhong Kuang, Jiajun Ding, Feiwei Qin, Jun Yu, Jianping Fan

    Abstract: Neural Radiance Fields (NeRF) have achieved great success in the task of synthesizing novel views that preserve the same resolution as the training views. However, it is challenging for NeRF to synthesize high-quality high-resolution novel views with low-resolution training data. To solve this problem, we propose a zero-shot super-resolution training framework for NeRF. This framework aims to guid… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  15. arXiv:2311.17857  [pdf, other

    cs.CV cs.GR

    Gaussian Shell Maps for Efficient 3D Human Generation

    Authors: Rameen Abdal, Wang Yifan, Zifan Shi, Yinghao Xu, Ryan Po, Zhengfei Kuang, Qifeng Chen, Dit-Yan Yeung, Gordon Wetzstein

    Abstract: Efficient generation of 3D digital humans is important in several industries, including virtual reality, social media, and cinematic production. 3D generative adversarial networks (GANs) have demonstrated state-of-the-art (SOTA) quality and diversity for generated assets. Current 3D GAN architectures, however, typically rely on volume representations, which are slow to render, thereby hampering th… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: Project page : https://rameenabdal.github.io/GaussianShellMaps/

  16. arXiv:2311.13199  [pdf, other

    cs.CV

    Two-stage Synthetic Supervising and Multi-view Consistency Self-supervising based Animal 3D Reconstruction by Single Image

    Authors: Zijian Kuang, Lihang Ying, Shi Jin, Li Cheng

    Abstract: Pixel-aligned Implicit Function (PIFu) effectively captures subtle variations in body shape within a low-dimensional space through extensive training with human 3D scans, its application to live animals presents formidable challenges due to the difficulty of obtaining animal cooperation for 3D scanning. To address this challenge, we propose the combination of two-stage supervised and self-supervis… ▽ More

    Submitted 19 February, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

  17. arXiv:2311.11590  [pdf

    cs.CV cs.AI

    Advancing Urban Renewal: An Automated Approach to Generating Historical Arcade Facades with Stable Diffusion Models

    Authors: Zheyuan Kuang, Jiaxin Zhang, Yiying Huang, Yunqin Li

    Abstract: Urban renewal and transformation processes necessitate the preservation of the historical urban fabric, particularly in districts known for their architectural and historical significance. These regions, with their diverse architectural styles, have traditionally required extensive preliminary research, often leading to subjective results. However, the advent of machine learning models has opened… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: HABITS OF THE ANTHROPOCENE - Proceedings of the 43rd ACADIA Conference - Volume II: Proceedings book one, University of Colorado Denver, Denver, Colorado, USA, 26-28 October 2023, pp. 616-625, CUMINCAD, 2023

  18. arXiv:2311.06697  [pdf, other

    cs.CL

    Trusted Source Alignment in Large Language Models

    Authors: Vasilisa Bashlovkina, Zhaobin Kuang, Riley Matthews, Edward Clifford, Yennie Jun, William W. Cohen, Simon Baumgartner

    Abstract: Large language models (LLMs) are trained on web-scale corpora that inevitably include contradictory factual information from sources of varying reliability. In this paper, we propose measuring an LLM property called trusted source alignment (TSA): the model's propensity to align with content produced by trusted publishers in the face of uncertainty or controversy. We present FactCheckQA, a TSA eva… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

  19. arXiv:2310.16044  [pdf, other

    cs.CV cs.GR

    Stanford-ORB: A Real-World 3D Object Inverse Rendering Benchmark

    Authors: Zhengfei Kuang, Yunzhi Zhang, Hong-Xing Yu, Samir Agarwala, Shangzhe Wu, Jiajun Wu

    Abstract: We introduce Stanford-ORB, a new real-world 3D Object inverse Rendering Benchmark. Recent advances in inverse rendering have enabled a wide range of real-world applications in 3D content generation, moving rapidly from research and commercial use cases to consumer devices. While the results continue to improve, there is no real-world benchmark that can quantitatively assess and compare the perform… ▽ More

    Submitted 16 January, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023 Datasets and Benchmarks Track. The first two authors contributed equally to this work. Project page: https://stanfordorb.github.io/

  20. arXiv:2309.15646  [pdf, other

    cs.IR cs.LG

    Cold & Warm Net: Addressing Cold-Start Users in Recommender Systems

    Authors: Xiangyu Zhang, Zongqiang Kuang, Zehao Zhang, Fan Huang, Xianfeng Tan

    Abstract: Cold-start recommendation is one of the major challenges faced by recommender systems (RS). Herein, we focus on the user cold-start problem. Recently, methods utilizing side information or meta-learning have been used to model cold-start users. However, it is difficult to deploy these methods to industrial RS. There has not been much research that pays attention to the user cold-start problem in t… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  21. MentaLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models

    Authors: Kailai Yang, Tianlin Zhang, Ziyan Kuang, Qianqian Xie, Jimin Huang, Sophia Ananiadou

    Abstract: With the development of web technology, social media texts are becoming a rich source for automatic mental health analysis. As traditional discriminative methods bear the problem of low interpretability, the recent large language models have been explored for interpretable mental health analysis on social media, which aims to provide detailed explanations along with predictions. The results show t… ▽ More

    Submitted 3 February, 2024; v1 submitted 24 September, 2023; originally announced September 2023.

    Comments: Accepted by WWW 2024

  22. Self-supervised Learning of Rotation-invariant 3D Point Set Features using Transformer and its Self-distillation

    Authors: Takahiko Furuya, Zhoujie Chen, Ryutarou Ohbuchi, Zhenzhong Kuang

    Abstract: Invariance against rotations of 3D objects is an important property in analyzing 3D point set data. Conventional 3D point set DNNs having rotation invariance typically obtain accurate 3D shape features via supervised learning by using labeled 3D point sets as training samples. However, due to the rapid increase in 3D point set data and the high cost of labeling, a framework to learn rotation-invar… ▽ More

    Submitted 18 April, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

    Comments: Accepted to the CVIU journal

  23. SMILE: Evaluation and Domain Adaptation for Social Media Language Understanding

    Authors: Vasilisa Bashlovkina, Riley Matthews, Zhaobin Kuang, Simon Baumgartner, Michael Bendersky

    Abstract: We study the ability of transformer-based language models (LMs) to understand social media language. Social media (SM) language is distinct from standard written language, yet existing benchmarks fall short of capturing LM performance in this socially, economically, and politically important domain. We quantify the degree to which social media language differs from conventional language and conclu… ▽ More

    Submitted 30 June, 2023; originally announced July 2023.

  24. arXiv:2306.08754  [pdf, other

    cs.LG physics.ao-ph

    ClimSim-Online: A Large Multi-scale Dataset and Framework for Hybrid ML-physics Climate Emulation

    Authors: Sungduk Yu, Zeyuan Hu, Akshay Subramaniam, Walter Hannah, Liran Peng, Jerry Lin, Mohamed Aziz Bhouri, Ritwik Gupta, Björn Lütjens, Justus C. Will, Gunnar Behrens, Julius J. M. Busecke, Nora Loose, Charles I. Stern, Tom Beucler, Bryce Harrop, Helge Heuer, Benjamin R. Hillman, Andrea Jenney, Nana Liu, Alistair White, Tian Zheng, Zhiming Kuang, Fiaz Ahmed, Elizabeth Barnes , et al. (22 additional authors not shown)

    Abstract: Modern climate projections lack adequate spatial and temporal resolution due to computational constraints, leading to inaccuracies in representing critical processes like thunderstorms that occur on the sub-resolution scale. Hybrid methods combining physics with machine learning (ML) offer faster, higher fidelity climate simulations by outsourcing compute-hungry, high-resolution simulations to ML… ▽ More

    Submitted 8 July, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: This manuscript is an expanded version of our paper that received the Outstanding Paper Award at the NeurIPS 2023 conference

  25. arXiv:2306.08305  [pdf, other

    cs.CE

    Quantum Computing Enhanced Distance-Minimizing Data-Driven Computational Mechanics

    Authors: Yongchun Xu, Jie Yang, Zengtao Kuang, Qun Huang, Wei Huang, Heng Hu

    Abstract: The distance-minimizing data-driven computational mechanics has great potential in engineering applications by eliminating material modeling error and uncertainty. In this computational framework, the solution-seeking procedure relies on minimizing the distance between the constitutive database and the conservation law. However, the distance calculation is time-consuming and often takes up most of… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: 22 pages, 14 figures

  26. arXiv:2304.03347  [pdf, other

    cs.CL

    Towards Interpretable Mental Health Analysis with Large Language Models

    Authors: Kailai Yang, Shaoxiong Ji, Tianlin Zhang, Qianqian Xie, Ziyan Kuang, Sophia Ananiadou

    Abstract: The latest large language models (LLMs) such as ChatGPT, exhibit strong capabilities in automated mental health analysis. However, existing relevant studies bear several limitations, including inadequate evaluations, lack of prompting strategies, and ignorance of exploring LLMs for explainability. To bridge these gaps, we comprehensively evaluate the mental health analysis and emotional reasoning… ▽ More

    Submitted 11 October, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

    Comments: Accepted by EMNLP 2023 main conference as a long paper

  27. arXiv:2301.06267  [pdf, other

    cs.CV cs.AI cs.LG cs.SD eess.AS

    Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models

    Authors: Zhiqiu Lin, Samuel Yu, Zhiyi Kuang, Deepak Pathak, Deva Ramanan

    Abstract: The ability to quickly learn a new task with minimal instruction - known as few-shot learning - is a central aspect of intelligent agents. Classical few-shot benchmarks make use of few-shot samples from a single modality, but such samples may not be sufficient to characterize an entire concept class. In contrast, humans use cross-modal information to learn new concepts efficiently. In this work, w… ▽ More

    Submitted 2 August, 2023; v1 submitted 16 January, 2023; originally announced January 2023.

    Comments: CVPR 2023. Project website: https://linzhiqiu.github.io/papers/cross_modal/

  28. arXiv:2212.10699  [pdf, other

    cs.CV cs.GR

    PaletteNeRF: Palette-based Appearance Editing of Neural Radiance Fields

    Authors: Zhengfei Kuang, Fujun Luan, Sai Bi, Zhixin Shu, Gordon Wetzstein, Kalyan Sunkavalli

    Abstract: Recent advances in neural radiance fields have enabled the high-fidelity 3D reconstruction of complex scenes for novel view synthesis. However, it remains underexplored how the appearance of such representations can be efficiently edited while maintaining photorealism. In this work, we present PaletteNeRF, a novel method for photorealistic appearance editing of neural radiance fields (NeRF) base… ▽ More

    Submitted 24 January, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  29. arXiv:2210.00434  [pdf, other

    eess.AS cs.AI cs.CL cs.MM cs.SD

    Music-to-Text Synaesthesia: Generating Descriptive Text from Music Recordings

    Authors: Zhihuan Kuang, Shi Zong, Jianbing Zhang, Jiajun Chen, Hongfu Liu

    Abstract: In this paper, we consider a novel research problem: music-to-text synaesthesia. Different from the classical music tagging problem that classifies a music recording into pre-defined categories, music-to-text synaesthesia aims to generate descriptive texts from music recordings with the same sentiment for further understanding. As existing music-related datasets do not contain the semantic descrip… ▽ More

    Submitted 7 May, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

  30. arXiv:2205.05150  [pdf, other

    cs.IT physics.optics

    Bounds on the Coupling Strengths of Communication Channels and Their Information Capacities

    Authors: Zeyu Kuang, David A. B. Miller, Owen D. Miller

    Abstract: The concept of optimal communication channels shapes our understanding of wave-based communication. Its analysis, however, always pertains to specific communication-domain geometries, without a general theory of scaling laws or fundamental limits. In this article, we derive shape-independent bounds on the coupling strengths and information capacities of optimal communication channels for any two d… ▽ More

    Submitted 10 May, 2022; originally announced May 2022.

  31. arXiv:2201.02533  [pdf, other

    cs.CV

    NeROIC: Neural Rendering of Objects from Online Image Collections

    Authors: Zhengfei Kuang, Kyle Olszewski, Menglei Chai, Zeng Huang, Panos Achlioptas, Sergey Tulyakov

    Abstract: We present a novel method to acquire object representations from online image collections, capturing high-quality geometry and material properties of arbitrary objects from photographs with varying cameras, illumination, and backgrounds. This enables various object-centric rendering applications such as novel-view synthesis, relighting, and harmonized background composition from challenging in-the… ▽ More

    Submitted 1 September, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

    Comments: SIGGRAPH 2022 (Journal Track). Project page: https://formyfamily.github.io/NeROIC/ Code repository: https://github.com/snap-research/NeROIC/

  32. arXiv:2112.07431  [pdf, other

    cs.CV

    Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation

    Authors: Yi Li, Yiqun Duan, Zhanghui Kuang, Yimin Chen, Wayne Zhang, Xiaomeng Li

    Abstract: Weakly-Supervised Semantic Segmentation (WSSS) segments objects without a heavy burden of dense annotation. While as a price, generated pseudo-masks exist obvious noisy pixels, which result in sub-optimal segmentation models trained over these pseudo-masks. But rare studies notice or work on this problem, even these noisy pixels are inevitable after their improvements on pseudo-mask. So we try to… ▽ More

    Submitted 14 December, 2021; originally announced December 2021.

    Comments: Accept at AAAI 2022, Code is available at https://github.com/XMed-Lab/URN

  33. arXiv:2112.06910  [pdf, other

    cs.CV

    DenseGAP: Graph-Structured Dense Correspondence Learning with Anchor Points

    Authors: Zhengfei Kuang, Jiaman Li, Mingming He, Tong Wang, Yajie Zhao

    Abstract: Establishing dense correspondence between two images is a fundamental computer vision problem, which is typically tackled by matching local feature descriptors. However, without global awareness, such local features are often insufficient for disambiguating similar regions. And computing the pairwise feature correlation across images is both computation-expensive and memory-intensive. To make the… ▽ More

    Submitted 21 December, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

  34. arXiv:2111.06593  [pdf, other

    q-bio.QM cs.LG

    Using Deep Learning Sequence Models to Identify SARS-CoV-2 Divergence

    Authors: Yanyi Ding, Zhiyi Kuang, Yuxin Pei, Jeff Tan, Ziyu Zhang, Joseph Konan

    Abstract: SARS-CoV-2 is an upper respiratory system RNA virus that has caused over 3 million deaths and infecting over 150 million worldwide as of May 2021. With thousands of strains sequenced to date, SARS-CoV-2 mutations pose significant challenges to scientists on keeping pace with vaccine development and public health measures. Therefore, an efficient method of identifying the divergence of lab samples… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

  35. arXiv:2111.01258  [pdf, other

    cs.RO eess.SY

    Safe Online Gain Optimization for Variable Impedance Control

    Authors: Changhao Wang, Zhian Kuang, Xiang Zhang, Masayoshi Tomizuka

    Abstract: Smooth behaviors are preferable for many contact-rich manipulation tasks. Impedance control arises as an effective way to regulate robot movements by mimicking a mass-spring-damping system. Consequently, the robot behavior can be determined by the impedance gains. However, tuning the impedance gains for different tasks is tricky, especially for unstructured environments. Moreover, online adapting… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

  36. arXiv:2108.12995  [pdf, other

    cs.CV

    Pseudo-mask Matters in Weakly-supervised Semantic Segmentation

    Authors: Yi Li, Zhanghui Kuang, Liyang Liu, Yimin Chen, Wayne Zhang

    Abstract: Most weakly supervised semantic segmentation (WSSS) methods follow the pipeline that generates pseudo-masks initially and trains the segmentation model with the pseudo-masks in fully supervised manner after. However, we find some matters related to the pseudo-masks, including high quality pseudo-masks generation from class activation maps (CAMs), and training with noisy pseudo-mask supervision. Fo… ▽ More

    Submitted 7 September, 2021; v1 submitted 30 August, 2021; originally announced August 2021.

    Comments: Accepted by ICCV2021

  37. arXiv:2108.06543  [pdf, other

    cs.CV

    MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding

    Authors: Zhanghui Kuang, Hongbin Sun, Zhizhong Li, Xiaoyu Yue, Tsui Hin Lin, Jianyong Chen, Huaqiang Wei, Yiqin Zhu, Tong Gao, Wenwei Zhang, Kai Chen, Wayne Zhang, Dahua Lin

    Abstract: We present MMOCR-an open-source toolbox which provides a comprehensive pipeline for text detection and recognition, as well as their downstream tasks such as named entity recognition and key information extraction. MMOCR implements 14 state-of-the-art algorithms, which is significantly more than all the existing open-source OCR projects we are aware of to date. To facilitate future research and in… ▽ More

    Submitted 14 August, 2021; originally announced August 2021.

    Comments: Accepted to ACM MM (Open Source Competition Track)

  38. arXiv:2108.01684  [pdf, other

    cs.CV

    Vision Transformer with Progressive Sampling

    Authors: Xiaoyu Yue, Shuyang Sun, Zhanghui Kuang, Meng Wei, Philip Torr, Wayne Zhang, Dahua Lin

    Abstract: Transformers with powerful global relation modeling abilities have been introduced to fundamental computer vision tasks recently. As a typical example, the Vision Transformer (ViT) directly applies a pure transformer architecture on image classification, by simply splitting images into tokens with a fixed length, and employing transformers to learn relations between these tokens. However, such nai… ▽ More

    Submitted 3 August, 2021; originally announced August 2021.

    Comments: Accepted to ICCV 2021

  39. arXiv:2108.00708  [pdf, other

    cs.CV cs.LG

    Group Fisher Pruning for Practical Network Compression

    Authors: Liyang Liu, Shilong Zhang, Zhanghui Kuang, Aojun Zhou, Jing-Hao Xue, Xinjiang Wang, Yimin Chen, Wenming Yang, Qingmin Liao, Wayne Zhang

    Abstract: Network compression has been widely studied since it is able to reduce the memory and computation cost during inference. However, previous methods seldom deal with complicated structures like residual connections, group/depth-wise convolution and feature pyramid network, where channels of multiple layers are coupled and need to be pruned simultaneously. In this paper, we present a general channel… ▽ More

    Submitted 2 August, 2021; originally announced August 2021.

    Comments: ICML2021; Code: https://github.com/jshilong/FisherPruning

  40. arXiv:2106.04004  [pdf, other

    cs.CV cs.GR

    Task-Generic Hierarchical Human Motion Prior using VAEs

    Authors: Jiaman Li, Ruben Villegas, Duygu Ceylan, Jimei Yang, Zhengfei Kuang, Hao Li, Yajie Zhao

    Abstract: A deep generative model that describes human motions can benefit a wide range of fundamental computer vision and graphics tasks, such as providing robustness to video-based human pose estimation, predicting complete body movements for motion capture systems during occlusions, and assisting key frame animation with plausible movements. In this paper, we present a method for learning complex human m… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

  41. arXiv:2105.07145  [pdf, other

    cs.RO

    Development of Soft Tactile Sensor for Force Measurement and Position Detection

    Authors: Wu-Te Yang, Zhian Kuang, Changhao Wang, Masayoshi Tomizuka

    Abstract: As more robots are implemented for contact-rich tasks, tactile sensors are in increasing demand. For many circumstances, the contact is required to be compliant, and soft sensors are in need. This paper introduces a novelly designed soft sensor that can simultaneously estimate the contact force and contact location. Inspired by humans' skin, which contains multi-layers of receptors, the designed t… ▽ More

    Submitted 15 May, 2021; originally announced May 2021.

    Comments: 8 pages, 8 figures

  42. arXiv:2104.10442  [pdf, other

    cs.CV

    Fourier Contour Embedding for Arbitrary-Shaped Text Detection

    Authors: Yiqin Zhu, Jianyong Chen, Lingyu Liang, Zhanghui Kuang, Lianwen Jin, Wayne Zhang

    Abstract: One of the main challenges for arbitrary-shaped text detection is to design a good text instance representation that allows networks to learn diverse text geometry variances. Most of existing methods model text instances in image spatial domain via masks or contour point sequences in the Cartesian or the polar coordinate system. However, the mask representation might lead to expensive post-process… ▽ More

    Submitted 22 April, 2021; v1 submitted 21 April, 2021; originally announced April 2021.

    Comments: Accepted by CVPR 2021

  43. arXiv:2104.09752  [pdf, other

    cs.CV

    Flow-based Video Segmentation for Human Head and Shoulders

    Authors: Zijian Kuang, Xinran Tie

    Abstract: Video segmentation for the human head and shoulders is essential in creating elegant media for videoconferencing and virtual reality applications. The main challenge is to process high-quality background subtraction in a real-time manner and address the segmentation issues under motion blurs, e.g., shaking the head or waving hands during conference video. To overcome the motion blur problem in vid… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

  44. arXiv:2103.14470  [pdf, other

    cs.CV

    Spatial Dual-Modality Graph Reasoning for Key Information Extraction

    Authors: Hongbin Sun, Zhanghui Kuang, Xiaoyu Yue, Chenhao Lin, Wayne Zhang

    Abstract: Key information extraction from document images is of paramount importance in office automation. Conventional template matching based approaches fail to generalize well to document images of unseen templates, and are not robust against text recognition errors. In this paper, we propose an end-to-end Spatial Dual-Modality Graph Reasoning method (SDMG-R) to extract key information from unstructured… ▽ More

    Submitted 26 March, 2021; originally announced March 2021.

  45. arXiv:2103.13477  [pdf, ps, other

    cs.MM cs.CV

    A Survey of Multimedia Technologies and Robust Algorithms

    Authors: Zijian Kuang, Xinran Tie

    Abstract: Multimedia technologies are now more practical and deployable in real life, and the algorithms are widely used in various researching areas such as deep learning, signal processing, haptics, computer vision, robotics, and medical multimedia processing. This survey provides an overview of multimedia technologies and robust algorithms in multimedia data processing, medical multimedia processing, hum… ▽ More

    Submitted 25 March, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

    Comments: arXiv admin note: text overlap with arXiv:2010.12968

  46. arXiv:2102.06838  [pdf, other

    cs.RO

    Learning Variable Impedance Control via Inverse Reinforcement Learning for Force-Related Tasks

    Authors: Xiang Zhang, Liting Sun, Zhian Kuang, Masayoshi Tomizuka

    Abstract: Many manipulation tasks require robots to interact with unknown environments. In such applications, the ability to adapt the impedance according to different task phases and environment constraints is crucial for safety and performance. Although many approaches based on deep reinforcement learning (RL) and learning from demonstration (LfD) have been proposed to obtain variable impedance skills on… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

    Comments: Accepted by IEEE Robotics and Automation Letters. Feb 2020

  47. arXiv:2102.03540  [pdf, other

    eess.SY cs.RO

    Practical Fractional-Order Variable-Gain Super-Twisting Control with Application to Wafer Stages of Photolithography Systems

    Authors: Zhian Kuang, Liting Sun, Huijun Gao, Masayoshi Tomizuka

    Abstract: In this paper, a practical fractional-order variable-gain super-twisting algorithm (PFVSTA) is proposed to improve the tracking performance of wafer stages for semiconductor manufacturing. Based on the sliding mode control (SMC), the proposed PFVSTA enhances the tracking performance from three aspects: 1) alleviating the chattering phenomenon via super-twisting algorithm and a novel fractional-ord… ▽ More

    Submitted 6 February, 2021; originally announced February 2021.

    Comments: This paper has been accepted by IEEE Trans. Mechatronics

  48. arXiv:2102.03531  [pdf, other

    cs.RO eess.SY

    Feedback-based Digital Higher-order Terminal Sliding Mode for 6-DOF Industrial Manipulators

    Authors: Zhian Kuang, Xiang Zhang, Liting Sun, Huijun Gao, Masayoshi Tomizuka

    Abstract: The precise motion control of a multi-degree of freedom~(DOF) robot manipulator is always challenging due to its nonlinear dynamics, disturbances, and uncertainties. Because most manipulators are controlled by digital signals, a novel higher-order sliding mode controller in the discrete-time form with time delay estimation is proposed in this paper. The dynamic model of the manipulator used in the… ▽ More

    Submitted 6 February, 2021; originally announced February 2021.

    Comments: This paper has been accepted by American Control Conference 2021

  49. arXiv:2102.00368  [pdf, other

    eess.SY cs.RO

    Precise Motion Control of Wafer Stages via Adaptive Neural Network and Fractional-Order Super-Twisting Algorithm

    Authors: Zhian Kuang, Liting Sun, Huijun Gao, Masayoshi Tomizuka

    Abstract: To obtain precise motion control of wafer stages, an adaptive neural network and fractional-order super-twisting control strategy is proposed. Based on sliding mode control (SMC), the proposed controller aims to address two challenges in SMC: 1) reducing the chattering phenomenon, and 2) attenuating the influence of model uncertainties and disturbances. For the first challenge, a fractional-order… ▽ More

    Submitted 30 January, 2021; originally announced February 2021.

    Comments: Published in IFAC World Congress 2020

  50. arXiv:2012.06737  [pdf, other

    cs.CV

    Computer Vision and Normalizing Flow-Based Defect Detection

    Authors: Zijian Kuang, Xinran Tie, Lihang Ying, Shi Jin

    Abstract: Visual defect detection is critical to ensure the quality of most products. However, the majority of small and medium-sized manufacturing enterprises still rely on tedious and error-prone human manual inspection. The main reasons include: 1) the existing automated visual defect detection systems require altering production assembly lines, which is time consuming and expensive 2) the existing syste… ▽ More

    Submitted 13 February, 2022; v1 submitted 12 December, 2020; originally announced December 2020.