Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 51 results for author: Niu, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.15435  [pdf, other

    cs.CV

    Enhancement of 3D Gaussian Splatting using Raw Mesh for Photorealistic Recreation of Architectures

    Authors: Ruizhe Wang, Chunliang Hua, Tomakayev Shingys, Mengyuan Niu, Qingxin Yang, Lizhong Gao, Yi Zheng, Junyan Yang, Qiao Wang

    Abstract: The photorealistic reconstruction and rendering of architectural scenes have extensive applications in industries such as film, games, and transportation. It also plays an important role in urban planning, architectural design, and the city's promotion, especially in protecting historical and cultural relics. The 3D Gaussian Splatting, due to better performance over NeRF, has become a mainstream t… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  2. arXiv:2407.13185  [pdf, other

    cs.CV

    KFD-NeRF: Rethinking Dynamic NeRF with Kalman Filter

    Authors: Yifan Zhan, Zhuoxiao Li, Muyao Niu, Zhihang Zhong, Shohei Nobuhara, Ko Nishino, Yinqiang Zheng

    Abstract: We introduce KFD-NeRF, a novel dynamic neural radiance field integrated with an efficient and high-quality motion reconstruction framework based on Kalman filtering. Our key idea is to model the dynamic radiance field as a dynamic system whose temporally varying states are estimated based on two sources of knowledge: observations and predictions. We introduce a novel plug-in Kalman filter guided d… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: accepted to eccv2024

  3. arXiv:2407.11345  [pdf, other

    cs.CL cs.SD eess.AS

    Beyond Binary: Multiclass Paraphasia Detection with Generative Pretrained Transformers and End-to-End Models

    Authors: Matthew Perez, Aneesha Sampath, Minxue Niu, Emily Mower Provost

    Abstract: Aphasia is a language disorder that can lead to speech errors known as paraphasias, which involve the misuse, substitution, or invention of words. Automatic paraphasia detection can help those with Aphasia by facilitating clinical assessment and treatment planning options. However, most automatic paraphasia detection works have focused solely on binary detection, which involves recognizing only th… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  4. arXiv:2407.10267  [pdf, other

    cs.CV

    RS-NeRF: Neural Radiance Fields from Rolling Shutter Images

    Authors: Muyao Niu, Tong Chen, Yifan Zhan, Zhuoxiao Li, Xiang Ji, Yinqiang Zheng

    Abstract: Neural Radiance Fields (NeRFs) have become increasingly popular because of their impressive ability for novel view synthesis. However, their effectiveness is hindered by the Rolling Shutter (RS) effects commonly found in most camera systems. To solve this, we present RS-NeRF, a method designed to synthesize normal images from novel views using input with RS distortions. This involves a physical mo… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024 ; Codes and data: https://github.com/MyNiuuu/RS-NeRF

  5. arXiv:2407.01583  [pdf, other

    quant-ph cs.LG math.NA physics.data-an

    Optimal Low-Depth Quantum Signal-Processing Phase Estimation

    Authors: Yulong Dong, Jonathan A. Gross, Murphy Yuezhen Niu

    Abstract: Quantum effects like entanglement and coherent amplification can be used to drastically enhance the accuracy of quantum parameter estimation beyond classical limits. However, challenges such as decoherence and time-dependent errors hinder Heisenberg-limited amplification. We introduce Quantum Signal-Processing Phase Estimation algorithms that are robust against these challenges and achieve optimal… ▽ More

    Submitted 17 June, 2024; originally announced July 2024.

    Comments: 53 pages, 21 figures. arXiv admin note: substantial text overlap with arXiv:2209.11207

  6. arXiv:2405.20279  [pdf, other

    cs.CV cs.AI eess.IV

    CV-VAE: A Compatible Video VAE for Latent Generative Video Models

    Authors: Sijie Zhao, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Muyao Niu, Xiaoyu Li, Wenbo Hu, Ying Shan

    Abstract: Spatio-temporal compression of videos, utilizing networks such as Variational Autoencoders (VAE), plays a crucial role in OpenAI's SORA and numerous other video generative models. For instance, many LLM-like video models learn the distribution of discrete tokens derived from 3D VAEs within the VQVAE framework, while most diffusion-based video models capture the distribution of continuous latent ex… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Project Page: https://ailab-cvc.github.io/cvvae/index.html

  7. arXiv:2405.20222  [pdf, other

    cs.CV cs.AI

    MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

    Authors: Muyao Niu, Xiaodong Cun, Xintao Wang, Yong Zhang, Ying Shan, Yinqiang Zheng

    Abstract: We present MOFA-Video, an advanced controllable image animation method that generates video from the given image using various additional controllable signals (such as human landmarks reference, manual trajectories, and another even provided video) or their combinations. This is different from previous methods which only can work on a specific motion domain or show weak control abilities with diff… ▽ More

    Submitted 11 July, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: ECCV 2024 ; Project Page: https://myniuuu.github.io/MOFA_Video/ ; Codes: https://github.com/MyNiuuu/MOFA-Video

  8. arXiv:2405.19327  [pdf, other

    cs.CL cs.AI cs.LG

    MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

    Authors: Ge Zhang, Scott Qu, Jiaheng Liu, Chenchen Zhang, Chenghua Lin, Chou Leuang Yu, Danny Pan, Esther Cheng, Jie Liu, Qunshu Lin, Raven Yuan, Tuney Zheng, Wei Pang, Xinrun Du, Yiming Liang, Yinghao Ma, Yizhi Li, Ziyang Ma, Bill Lin, Emmanouil Benetos, Huan Yang, Junting Zhou, Kaijing Ma, Minghao Liu, Morry Niu , et al. (20 additional authors not shown)

    Abstract: Large Language Models (LLMs) have made great strides in recent years to achieve unprecedented performance across different tasks. However, due to commercial interest, the most competitive models like GPT, Gemini, and Claude have been gated behind proprietary interfaces without disclosing the training details. Recently, many institutions have open-sourced several strong LLMs like LLaMA-3, comparabl… ▽ More

    Submitted 10 July, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: https://map-neo.github.io/

  9. arXiv:2405.17042  [pdf, other

    cs.LG cs.CR

    LabObf: A Label Protection Scheme for Vertical Federated Learning Through Label Obfuscation

    Authors: Ying He, Mingyang Niu, Jingyu Hua, Yunlong Mao, Xu Huang, Chen Li, Sheng Zhong

    Abstract: Split Neural Network, as one of the most common architectures used in vertical federated learning, is popular in industry due to its privacy-preserving characteristics. In this architecture, the party holding the labels seeks cooperation from other parties to improve model performance due to insufficient feature data. Each of these participants has a self-defined bottom model to learn hidden repre… ▽ More

    Submitted 22 July, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  10. arXiv:2405.06545  [pdf, other

    cs.CL cs.LG

    Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval

    Authors: Mengjia Niu, Hao Li, Jie Shi, Hamed Haddadi, Fan Mo

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across various domains, although their susceptibility to hallucination poses significant challenges for their deployment in critical areas such as healthcare. To address this issue, retrieving relevant facts from knowledge graphs (KGs) is considered a promising method. Existing KG-augmented approaches tend to be resource-intens… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    ACM Class: I.2.7; H.3.3

  11. arXiv:2404.18369  [pdf, other

    quant-ph cs.ET

    A SAT Scalpel for Lattice Surgery: Representation and Synthesis of Subroutines for Surface-Code Fault-Tolerant Quantum Computing

    Authors: Daniel Bochen Tan, Murphy Yuezhen Niu, Craig Gidney

    Abstract: Quantum error correction is necessary for large-scale quantum computing. A promising quantum error correcting code is the surface code. For this code, fault-tolerant quantum computing (FTQC) can be performed via lattice surgery, i.e., splitting and merging patches of code. Given the frequent use of certain lattice-surgery subroutines (LaS), it becomes crucial to optimize their design in order to m… ▽ More

    Submitted 17 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: To appear in 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)

  12. arXiv:2402.07200  [pdf, other

    cs.CV cs.LG cs.NE

    Outlier-Aware Training for Low-Bit Quantization of Structural Re-Parameterized Networks

    Authors: Muqun Niu, Yuan Ren, Boyu Li, Chenchen Ding

    Abstract: Lightweight design of Convolutional Neural Networks (CNNs) requires co-design efforts in the model architectures and compression techniques. As a novel design paradigm that separates training and inference, a structural re-parameterized (SR) network such as the representative RepVGG revitalizes the simple VGG-like network with a high accuracy comparable to advanced and often more complicated netwo… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

    Comments: 8 pages, 8 figures

  13. arXiv:2401.17207  [pdf, other

    cs.CV

    Self-Supervised Representation Learning for Nerve Fiber Distribution Patterns in 3D-PLI

    Authors: Alexander Oberstrass, Sascha E. A. Muenzing, Meiqi Niu, Nicola Palomero-Gallagher, Christian Schiffer, Markus Axer, Katrin Amunts, Timo Dickscheid

    Abstract: A comprehensive understanding of the organizational principles in the human brain requires, among other factors, well-quantifiable descriptors of nerve fiber architecture. Three-dimensional polarized light imaging (3D-PLI) is a microscopic imaging technique that enables insights into the fine-grained organization of myelinated nerve fibers with high resolution. Descriptors characterizing the fiber… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  14. arXiv:2312.16486  [pdf, other

    cs.CV cs.AI

    PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion

    Authors: Guansong Lu, Yuanfan Guo, Jianhua Han, Minzhe Niu, Yihan Zeng, Songcen Xu, Zeyi Huang, Zhao Zhong, Wei Zhang, Hang Xu

    Abstract: Current large-scale diffusion models represent a giant leap forward in conditional image synthesis, capable of interpreting diverse cues like text, human poses, and edges. However, their reliance on substantial computational resources and extensive data collection remains a bottleneck. On the other hand, the integration of existing diffusion models, each specialized for different controls and oper… ▽ More

    Submitted 28 December, 2023; v1 submitted 27 December, 2023; originally announced December 2023.

    Comments: 16 pages, 16 figures

  15. arXiv:2312.10209  [pdf, other

    cs.HC cs.LG

    Beyond Empirical Windowing: An Attention-Based Approach for Trust Prediction in Autonomous Vehicles

    Authors: Minxue Niu, Zhaobo Zheng, Kumar Akash, Teruhisa Misu

    Abstract: Humans' internal states play a key role in human-machine interaction, leading to the rise of human state estimation as a prominent field. Compared to swift state changes such as surprise and irritation, modeling gradual states like trust and satisfaction are further challenged by label sparsity: long time-series signals are usually associated with a single label, making it difficult to identify th… ▽ More

    Submitted 16 January, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

  16. arXiv:2311.02181  [pdf, other

    math.OC cs.AI cs.LG

    Joint Problems in Learning Multiple Dynamical Systems

    Authors: Mengjia Niu, Xiaoyu He, Petr Ryšavý, Quan Zhou, Jakub Marecek

    Abstract: Clustering of time series is a well-studied problem, with applications ranging from quantitative, personalized models of metabolism obtained from metabolite concentrations to state discrimination in quantum information theory. We consider a variant, where given a set of trajectories and a number of parts, we jointly partition the set of trajectories and learn linear dynamical system (LDS) models f… ▽ More

    Submitted 23 February, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

  17. arXiv:2310.05900  [pdf, other

    quant-ph cs.LG

    Learning to Decode the Surface Code with a Recurrent, Transformer-Based Neural Network

    Authors: Johannes Bausch, Andrew W Senior, Francisco J H Heras, Thomas Edlich, Alex Davies, Michael Newman, Cody Jones, Kevin Satzinger, Murphy Yuezhen Niu, Sam Blackwell, George Holland, Dvir Kafri, Juan Atalaya, Craig Gidney, Demis Hassabis, Sergio Boixo, Hartmut Neven, Pushmeet Kohli

    Abstract: Quantum error-correction is a prerequisite for reliable quantum computation. Towards this goal, we present a recurrent, transformer-based neural network which learns to decode the surface code, the leading quantum error-correction code. Our decoder outperforms state-of-the-art algorithmic decoders on real-world data from Google's Sycamore quantum processor for distance 3 and 5 surface codes. On di… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    MSC Class: 81P73; 68T07 ACM Class: I.2.0; J.2

  18. arXiv:2309.05845  [pdf, other

    cs.LG cs.AI

    Effective Abnormal Activity Detection on Multivariate Time Series Healthcare Data

    Authors: Mengjia Niu, Yuchen Zhao, Hamed Haddadi

    Abstract: Multivariate time series (MTS) data collected from multiple sensors provide the potential for accurate abnormal activity detection in smart healthcare scenarios. However, anomalies exhibit diverse patterns and become unnoticeable in MTS data. Consequently, achieving accurate anomaly detection is challenging since we have to capture both temporal dependencies of time series and inter-relationships… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: Poster accepted by the 29th Annual International Conference On Mobile Computing And Networking (ACM MobiCom 2023)

    ACM Class: J.3; I.2.6

  19. arXiv:2308.08767  [pdf, other

    eess.AS cs.SD

    Graph Neural Network Backend for Speaker Recognition

    Authors: Liang He, Ruida Li, Mengqi Niu

    Abstract: Currently, most speaker recognition backends, such as cosine, linear discriminant analysis (LDA), or probabilistic linear discriminant analysis (PLDA), make decisions by calculating similarity or distance between enrollment and test embeddings which are already extracted from neural networks. However, for each embedding, the local structure of itself and its neighbor embeddings in the low-dimensio… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

  20. arXiv:2305.18771  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    SFCNeXt: a simple fully convolutional network for effective brain age estimation with small sample size

    Authors: Yu Fu, Yanyan Huang, Shunjie Dong, Yalin Wang, Tianbai Yu, Meng Niu, Cheng Zhuo

    Abstract: Deep neural networks (DNN) have been designed to predict the chronological age of a healthy brain from T1-weighted magnetic resonance images (T1 MRIs), and the predicted brain age could serve as a valuable biomarker for the early detection of development-related or aging-related disorders. Recent DNN models for brain age estimations usually rely too much on large sample sizes and complex network s… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: This paper has been accepted by IEEE ISBI 2023

  21. arXiv:2303.11642  [pdf, other

    cs.CV eess.IV

    Visibility Constrained Wide-band Illumination Spectrum Design for Seeing-in-the-Dark

    Authors: Muyao Niu, Zhuoxiao Li, Zhihang Zhong, Yinqiang Zheng

    Abstract: Seeing-in-the-dark is one of the most important and challenging computer vision tasks due to its wide applications and extreme complexities of in-the-wild scenarios. Existing arts can be mainly divided into two threads: 1) RGB-dependent methods restore information using degraded RGB inputs only (\eg, low-light enhancement), 2) RGB-independent methods translate images captured under auxiliary near-… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023

  22. arXiv:2303.01788  [pdf, other

    cs.CV

    Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving

    Authors: Xiwen Liang, Minzhe Niu, Jianhua Han, Hang Xu, Chunjing Xu, Xiaodan Liang

    Abstract: Multi-task learning has emerged as a powerful paradigm to solve a range of tasks simultaneously with good efficiency in both computation resources and inference time. However, these algorithms are designed for different tasks mostly not within the scope of autonomous driving, thus making it hard to compare multi-task methods in autonomous driving. Aiming to enable the comprehensive evaluation of p… ▽ More

    Submitted 3 March, 2023; originally announced March 2023.

    Comments: Accepted at CVPR 2023

  23. arXiv:2301.12581  [pdf, other

    stat.ML cs.LG

    Intrinsic Bayesian Optimisation on Complex Constrained Domain

    Authors: Yuan Liu, Mu Niu, Claire Miller

    Abstract: Motivated by the success of Bayesian optimisation algorithms in the Euclidean space, we propose a novel approach to construct Intrinsic Bayesian optimisation (In-BO) on manifolds with a primary focus on complex constrained domains or irregular-shaped spaces arising as submanifolds of R2, R3 and beyond. Data may be collected in a spatial domain but restricted to a complex or intricately structured… ▽ More

    Submitted 29 January, 2023; originally announced January 2023.

  24. arXiv:2301.06533  [pdf, other

    stat.ML cs.LG stat.ME

    Intrinsic Gaussian Process on Unknown Manifolds with Probabilistic Metrics

    Authors: Mu Niu, Zhenwen Dai, Pokman Cheung, Yizhu Wang

    Abstract: This article presents a novel approach to construct Intrinsic Gaussian Processes for regression on unknown manifolds with probabilistic metrics (GPUM) in point clouds. In many real world applications, one often encounters high dimensional data (e.g. point cloud data) centred around some lower dimensional unknown manifolds. The geometry of manifold is in general different from the usual Euclidean g… ▽ More

    Submitted 16 January, 2023; originally announced January 2023.

  25. arXiv:2212.13886  [pdf, other

    math.OC cs.LG

    Extrinsic Bayesian Optimizations on Manifolds

    Authors: Yihao Fang, Mu Niu, Pokman Cheung, Lizhen Lin

    Abstract: We propose an extrinsic Bayesian optimization (eBO) framework for general optimization problems on manifolds. Bayesian optimization algorithms build a surrogate of the objective function by employing Gaussian processes and quantify the uncertainty in that surrogate by deriving an acquisition function. This acquisition function represents the probability of improvement based on the kernel of the Ga… ▽ More

    Submitted 28 December, 2022; v1 submitted 21 December, 2022; originally announced December 2022.

  26. arXiv:2212.01241  [pdf, other

    cs.PF

    MMBench: Benchmarking End-to-End Multi-modal DNNs and Understanding Their Hardware-Software Implications

    Authors: Cheng Xu, Xiaofeng Hou, Jiacheng Liu, Chao Li, Tianhao Huang, Xiaozhi Zhu, Mo Niu, Lingyu Sun, Peng Tang, Tongqiao Xu, Kwang-Ting Cheng, Minyi Guo

    Abstract: The explosive growth of various types of big data and advances in AI technologies have catalyzed a new type of workloads called multi-modal DNNs. Multi-modal DNNs are capable of interpreting and reasoning about information from multiple modalities, making them more applicable to real-world AI scenarios. In recent research, multi-modal DNNs have outperformed the best uni-modal DNN in a wide range o… ▽ More

    Submitted 28 August, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

  27. arXiv:2210.14231  [pdf

    eess.IV cs.LG

    NAS-PRNet: Neural Architecture Search generated Phase Retrieval Net for Off-axis Quantitative Phase Imaging

    Authors: Xin Shu, Mengxuan Niu, Yi Zhang, Renjie Zhou

    Abstract: Single neural networks have achieved simultaneous phase retrieval with aberration compensation and phase unwrapping in off-axis Quantitative Phase Imaging (QPI). However, when designing the phase retrieval neural network architecture, the trade-off between computation latency and accuracy has been largely neglected. Here, we propose Neural Architecture Search (NAS) generated Phase Retrieval Net (N… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

  28. arXiv:2209.11207  [pdf, other

    quant-ph cs.LG

    Beyond Heisenberg Limit Quantum Metrology through Quantum Signal Processing

    Authors: Yulong Dong, Jonathan Gross, Murphy Yuezhen Niu

    Abstract: Leveraging quantum effects in metrology such as entanglement and coherence allows one to measure parameters with enhanced sensitivity. However, time-dependent noise can disrupt such Heisenberg-limited amplification. We propose a quantum-metrology method based on the quantum-signal-processing framework to overcome these realistic noise-induced limitations in practical quantum metrology. Our algorit… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

  29. arXiv:2207.14482  [pdf, other

    cs.AR quant-ph

    Domain-Specific Quantum Architecture Optimization

    Authors: Wan-Hsuan Lin, Bochen Tan, Murphy Yuezhen Niu, Jason Kimko, Jason Cong

    Abstract: With the steady progress in quantum computing over recent years, roadmaps for upscaling quantum processors have relied heavily on the targeted qubit architectures. So far, similarly to the early age of classical computing, these designs have been crafted by human experts. These general-purpose architectures, however, leave room for customization and optimization, especially when targeting popular… ▽ More

    Submitted 29 July, 2022; originally announced July 2022.

  30. arXiv:2205.03983  [pdf, other

    cs.CL cs.AI cs.LG

    Building Machine Translation Systems for the Next Thousand Languages

    Authors: Ankur Bapna, Isaac Caswell, Julia Kreutzer, Orhan Firat, Daan van Esch, Aditya Siddhant, Mengmeng Niu, Pallavi Baljekar, Xavier Garcia, Wolfgang Macherey, Theresa Breiner, Vera Axelrod, Jason Riesa, Yuan Cao, Mia Xu Chen, Klaus Macherey, Maxim Krikun, Pidong Wang, Alexander Gutkin, Apurva Shah, Yanping Huang, Zhifeng Chen, Yonghui Wu, Macduff Hughes

    Abstract: In this paper we share findings from our effort to build practical machine translation (MT) systems capable of translating across over one thousand languages. We describe results in three research domains: (i) Building clean, web-mined datasets for 1500+ languages by leveraging semi-supervised pre-training for language identification and developing data-driven filtering techniques; (ii) Developing… ▽ More

    Submitted 6 July, 2022; v1 submitted 8 May, 2022; originally announced May 2022.

    Comments: V2: updated with some details from 24-language Google Translate launch in May 2022 V3: spelling corrections, additional acknowledgements

  31. arXiv:2203.13394  [pdf, other

    cs.CV

    Point2Seq: Detecting 3D Objects as Sequences

    Authors: Yujing Xue, Jiageng Mao, Minzhe Niu, Hang Xu, Michael Bi Mi, Wei Zhang, Xiaogang Wang, Xinchao Wang

    Abstract: We present a simple and effective framework, named Point2Seq, for 3D object detection from point clouds. In contrast to previous methods that normally {predict attributes of 3D objects all at once}, we expressively model the interdependencies between attributes of 3D objects, which in turn enables a better detection accuracy. Specifically, we view each 3D object as a sequence of words and reformul… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

    Comments: To appear in CVPR2022

  32. arXiv:2202.06767  [pdf, other

    cs.CV cs.LG

    Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark

    Authors: Jiaxi Gu, Xiaojun Meng, Guansong Lu, Lu Hou, Minzhe Niu, Xiaodan Liang, Lewei Yao, Runhui Huang, Wei Zhang, Xin Jiang, Chunjing Xu, Hang Xu

    Abstract: Vision-Language Pre-training (VLP) models have shown remarkable performance on various downstream tasks. Their success heavily relies on the scale of pre-trained cross-modal datasets. However, the lack of large-scale datasets and benchmarks in Chinese hinders the development of Chinese VLP models and broader multilingual applications. In this work, we release a large-scale Chinese cross-modal data… ▽ More

    Submitted 28 September, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: Accepted by NeurIPS 2022 Track Datasets and Benchmarks

  33. arXiv:2111.07783  [pdf, other

    cs.CV cs.LG

    FILIP: Fine-grained Interactive Language-Image Pre-Training

    Authors: Lewei Yao, Runhui Huang, Lu Hou, Guansong Lu, Minzhe Niu, Hang Xu, Xiaodan Liang, Zhenguo Li, Xin Jiang, Chunjing Xu

    Abstract: Unsupervised large-scale vision-language pre-training has shown promising advances on various downstream tasks. Existing methods often model the cross-modal interaction either via the similarity of the global feature of each modality which misses sufficient information, or finer-grained interactions using cross/self-attention upon visual and textual tokens. However, cross/self-attention suffers fr… ▽ More

    Submitted 9 November, 2021; originally announced November 2021.

  34. arXiv:2109.02499  [pdf, other

    cs.CV

    Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection

    Authors: Jiageng Mao, Minzhe Niu, Haoyue Bai, Xiaodan Liang, Hang Xu, Chunjing Xu

    Abstract: We present a flexible and high-performance framework, named Pyramid R-CNN, for two-stage 3D object detection from point clouds. Current approaches generally rely on the points or voxels of interest for RoI feature extraction on the second stage, but cannot effectively handle the sparsity and non-uniform distribution of those points, and this may result in failures in detecting objects that are far… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: To appear at ICCV 2021

  35. arXiv:2109.02497  [pdf, other

    cs.CV

    Voxel Transformer for 3D Object Detection

    Authors: Jiageng Mao, Yujing Xue, Minzhe Niu, Haoyue Bai, Jiashi Feng, Xiaodan Liang, Hang Xu, Chunjing Xu

    Abstract: We present Voxel Transformer (VoTr), a novel and effective voxel-based Transformer backbone for 3D object detection from point clouds. Conventional 3D convolutional backbones in voxel-based 3D detectors cannot efficiently capture large context information, which is crucial for object recognition and localization, owing to the limited receptive fields. In this paper, we resolve the problem by intro… ▽ More

    Submitted 13 September, 2021; v1 submitted 6 September, 2021; originally announced September 2021.

    Comments: To appear at ICCV 2021

  36. arXiv:2106.11037  [pdf, other

    cs.CV

    One Million Scenes for Autonomous Driving: ONCE Dataset

    Authors: Jiageng Mao, Minzhe Niu, Chenhan Jiang, Hanxue Liang, Jingheng Chen, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei Zhang, Zhenguo Li, Jie Yu, Hang Xu, Chunjing Xu

    Abstract: Current perception models in autonomous driving have become notorious for greatly relying on a mass of annotated data to cover unseen cases and address the long-tail problem. On the other hand, learning from unlabeled large-scale collected data and incrementally self-training powerful recognition models have received increasing attention and may become the solutions of next-generation industry-lev… ▽ More

    Submitted 25 October, 2021; v1 submitted 21 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021 Datasets and Benchmarks Track

  37. arXiv:2106.00610  [pdf, other

    eess.SP cs.SD eess.AS

    Deep Learning for Depression Recognition with Audiovisual Cues: A Review

    Authors: Lang He, Mingyue Niu, Prayag Tiwari, Pekka Marttinen, Rui Su, Jiewei Jiang, Chenguang Guo, Hongyu Wang, Songtao Ding, Zhongmin Wang, Wei Dang, Xiaoying Pan

    Abstract: With the acceleration of the pace of work and life, people have to face more and more pressure, which increases the possibility of suffering from depression. However, many patients may fail to get a timely diagnosis due to the serious imbalance in the doctor-patient ratio in the world. Promisingly, physiological and psychological studies have indicated some differences in speech and facial express… ▽ More

    Submitted 27 May, 2021; originally announced June 2021.

  38. arXiv:2104.00426  [pdf, other

    cs.CL cs.AI

    WakaVT: A Sequential Variational Transformer for Waka Generation

    Authors: Yuka Takeishi, Mingxuan Niu, Jing Luo, Zhong Jin, Xinyu Yang

    Abstract: Poetry generation has long been a challenge for artificial intelligence. In the scope of Japanese poetry generation, many researchers have paid attention to Haiku generation, but few have focused on Waka generation. To further explore the creative potential of natural language generation systems in Japanese poetry creation, we propose a novel Waka generation model, WakaVT, which automatically prod… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

    Comments: This paper has been submitted to Neural Processing Letters

  39. arXiv:2102.07809  [pdf, ps, other

    cs.GT econ.TH physics.soc-ph

    Best vs. All: Equity and Accuracy of Standardized Test Score Reporting

    Authors: Sampath Kannan, Mingzi Niu, Aaron Roth, Rakesh Vohra

    Abstract: We study a game theoretic model of standardized testing for college admissions. Students are of two types; High and Low. There is a college that would like to admit the High type students. Students take a potentially costly standardized exam which provides a noisy signal of their type. The students come from two populations, which are identical in talent (i.e. the type distribution is the same),… ▽ More

    Submitted 15 February, 2021; originally announced February 2021.

  40. arXiv:2012.11960  [pdf, other

    cs.CL cs.LG

    A Hierarchical Reasoning Graph Neural Network for The Automatic Scoring of Answer Transcriptions in Video Job Interviews

    Authors: Kai Chen, Meng Niu, Qingcai Chen

    Abstract: We address the task of automatically scoring the competency of candidates based on textual features, from the automatic speech recognition (ASR) transcriptions in the asynchronous video job interview (AVI). The key challenge is how to construct the dependency relation between questions and answers, and conduct the semantic level interaction for each question-answer (QA) pair. However, most of the… ▽ More

    Submitted 22 December, 2020; originally announced December 2020.

    Comments: 9 pages, 2 figures

  41. arXiv:2010.13856  [pdf, ps, other

    cs.CL cs.LG

    Data Troubles in Sentence Level Confidence Estimation for Machine Translation

    Authors: Ciprian Chelba, Junpei Zhou, Yuezhang, Li, Hideto Kazawa, Jeff Klingner, Mengmeng Niu

    Abstract: The paper investigates the feasibility of confidence estimation for neural machine translation models operating at the high end of the performance spectrum. As a side product of the data annotation process necessary for building such models we propose sentence level accuracy $SACC$ as a simple, self-explanatory evaluation metric for quality of translation. Experiments on two different annotator… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

  42. arXiv:2010.11983  [pdf, other

    quant-ph cs.CC cs.LG

    Learnability and Complexity of Quantum Samples

    Authors: Murphy Yuezhen Niu, Andrew M. Dai, Li Li, Augustus Odena, Zhengli Zhao, Vadim Smelyanskyi, Hartmut Neven, Sergio Boixo

    Abstract: Given a quantum circuit, a quantum computer can sample the output distribution exponentially faster in the number of bits than classical computers. A similar exponential separation has yet to be established in generative models through quantum sample learning: given samples from an n-qubit computation, can we learn the underlying quantum distribution using models with training parameters that scal… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

  43. arXiv:2007.01788  [pdf, ps, other

    cs.CL cs.DL cs.IR

    TICO-19: the Translation Initiative for Covid-19

    Authors: Antonios Anastasopoulos, Alessandro Cattelan, Zi-Yi Dou, Marcello Federico, Christian Federman, Dmitriy Genzel, Francisco Guzmán, Junjie Hu, Macduff Hughes, Philipp Koehn, Rosie Lazar, Will Lewis, Graham Neubig, Mengmeng Niu, Alp Öktem, Eric Paquin, Grace Tang, Sylwia Tur

    Abstract: The COVID-19 pandemic is the worst pandemic to strike the world in over a century. Crucial to stemming the tide of the SARS-CoV-2 virus is communicating to vulnerable populations the means by which they can protect themselves. To this end, the collaborators forming the Translation Initiative for COvid-19 (TICO-19) have made test and development data available to AI and MT researchers in 35 differe… ▽ More

    Submitted 6 July, 2020; v1 submitted 3 July, 2020; originally announced July 2020.

  44. arXiv:2003.02989  [pdf, other

    quant-ph cond-mat.dis-nn cs.LG cs.PL

    TensorFlow Quantum: A Software Framework for Quantum Machine Learning

    Authors: Michael Broughton, Guillaume Verdon, Trevor McCourt, Antonio J. Martinez, Jae Hyeon Yoo, Sergei V. Isakov, Philip Massey, Ramin Halavati, Murphy Yuezhen Niu, Alexander Zlokapa, Evan Peters, Owen Lockwood, Andrea Skolik, Sofiene Jerbi, Vedran Dunjko, Martin Leib, Michael Streif, David Von Dollen, Hongxiang Chen, Shuxiang Cao, Roeland Wiersema, Hsin-Yuan Huang, Jarrod R. McClean, Ryan Babbush, Sergio Boixo , et al. (4 additional authors not shown)

    Abstract: We introduce TensorFlow Quantum (TFQ), an open source library for the rapid prototyping of hybrid quantum-classical models for classical or quantum data. This framework offers high-level abstractions for the design and training of both discriminative and generative quantum models under TensorFlow and supports high-performance quantum circuit simulators. We provide an overview of the software archi… ▽ More

    Submitted 26 August, 2021; v1 submitted 5 March, 2020; originally announced March 2020.

    Comments: 56 pages, 34 figures, many updates throughout the manuscript, several new sections are added

  45. arXiv:1912.04368  [pdf, other

    quant-ph cs.LG

    Learning Non-Markovian Quantum Noise from Moiré-Enhanced Swap Spectroscopy with Deep Evolutionary Algorithm

    Authors: Murphy Yuezhen Niu, Vadim Smelyanskyi, Paul Klimov, Sergio Boixo, Rami Barends, Julian Kelly, Yu Chen, Kunal Arya, Brian Burkett, Dave Bacon, Zijun Chen, Ben Chiaro, Roberto Collins, Andrew Dunsworth, Brooks Foxen, Austin Fowler, Craig Gidney, Marissa Giustina, Rob Graff, Trent Huang, Evan Jeffrey, David Landhuis, Erik Lucero, Anthony Megrant, Josh Mutus , et al. (8 additional authors not shown)

    Abstract: Two-level-system (TLS) defects in amorphous dielectrics are a major source of noise and decoherence in solid-state qubits. Gate-dependent non-Markovian errors caused by TLS-qubit coupling are detrimental to fault-tolerant quantum computation and have not been rigorously treated in the existing literature. In this work, we derive the non-Markovian dynamics between TLS and qubits during a SWAP-like… ▽ More

    Submitted 9 December, 2019; originally announced December 2019.

  46. Expression Analysis Based on Face Regions in Read-world Conditions

    Authors: Zheng Lian, Ya Li, Jian-Hua Tao, Jian Huang, Ming-Yue Niu

    Abstract: Facial emotion recognition is an essential and important aspect of the field of human-machine interaction. Past research on facial emotion recognition focuses on the laboratory environment. However, it faces many challenges in real-world conditions, i.e., illumination changes, large pose variations and partial or full occlusions. Those challenges lead to different face areas with different degrees… ▽ More

    Submitted 23 October, 2019; originally announced November 2019.

    Comments: International Journal of Automation and Computing 2018

  47. arXiv:1904.12933  [pdf, other

    cs.LG quant-ph stat.ML

    Recurrent Neural Networks in the Eye of Differential Equations

    Authors: Murphy Yuezhen Niu, Lior Horesh, Isaac Chuang

    Abstract: To understand the fundamental trade-offs between training stability, temporal dynamics and architectural complexity of recurrent neural networks~(RNNs), we directly analyze RNN architectures using numerical methods of ordinary differential equations~(ODEs). We define a general family of RNNs--the ODERNNs--by relating the composition rules of RNNs to integration methods of ODEs at discrete time ste… ▽ More

    Submitted 29 April, 2019; originally announced April 2019.

    Comments: 25pages, 3 figures

  48. arXiv:1807.00311  [pdf, other

    cs.IR cs.LG stat.ML

    Product-based Neural Networks for User Response Prediction over Multi-field Categorical Data

    Authors: Yanru Qu, Bohui Fang, Weinan Zhang, Ruiming Tang, Minzhe Niu, Huifeng Guo, Yong Yu, Xiuqiang He

    Abstract: User response prediction is a crucial component for personalized information retrieval and filtering scenarios, such as recommender system and web search. The data in user response prediction is mostly in a multi-field categorical format and transformed into sparse representations via one-hot encoding. Due to the sparsity problems in representation and optimization, most research focuses on featur… ▽ More

    Submitted 1 July, 2018; originally announced July 2018.

  49. arXiv:1803.03502  [pdf, other

    cs.IR

    Collaborative Filtering with Graph-based Implicit Feedback

    Authors: Minzhe Niu, Weinan Zhang, Yanru Qu, Xuezhi Cao, Ruiming Tang, Xiuqiang He, Yong Yu

    Abstract: Introducing consumed items as users' implicit feedback in matrix factorization (MF) method, SVD++ is one of the most effective collaborative filtering methods for personalized recommender systems. Though powerful, SVD++ has two limitations: (i). only user-side implicit feedback is utilized, whereas item-side implicit feedback, which can also enrich item representations, is not leveraged;(ii). in S… ▽ More

    Submitted 9 March, 2018; originally announced March 2018.

    Comments: 8 pages, 7 figures

  50. arXiv:1801.01061  [pdf, other

    stat.ML cs.LG

    Intrinsic Gaussian processes on complex constrained domains

    Authors: Mu Niu, Pokman Cheung, Lizhen Lin, Zhenwen Dai, Neil Lawrence, David Dunson

    Abstract: We propose a class of intrinsic Gaussian processes (in-GPs) for interpolation, regression and classification on manifolds with a primary focus on complex constrained domains or irregular shaped spaces arising as subsets or submanifolds of R, R2, R3 and beyond. For example, in-GPs can accommodate spatial domains arising as complex subsets of Euclidean space. in-GPs respect the potentially complex b… ▽ More

    Submitted 3 January, 2018; originally announced January 2018.