Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 142 results for author: Rao, Y

.
  1. arXiv:2408.15050  [pdf, other

    cs.CL

    Self-supervised Topic Taxonomy Discovery in the Box Embedding Space

    Authors: Yuyin Lu, Hegang Chen, Pengbo Mao, Yanghui Rao, Haoran Xie, Fu Lee Wang, Qing Li

    Abstract: Topic taxonomy discovery aims at uncovering topics of different abstraction levels and constructing hierarchical relations between them. Unfortunately, most of prior work can hardly model semantic scopes of words and topics by holding the Euclidean embedding space assumption. What's worse, they infer asymmetric hierarchical relations by symmetric distances between topic embeddings. As a result, ex… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: to be published in TACL

  2. arXiv:2408.00754  [pdf, other

    cs.CV cs.LG

    Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model

    Authors: Benlin Liu, Yuhao Dong, Yiqin Wang, Yongming Rao, Yansong Tang, Wei-Chiu Ma, Ranjay Krishna

    Abstract: Multimodal language models (MLLMs) are increasingly being implemented in real-world environments, necessitating their ability to interpret 3D spaces and comprehend temporal dynamics. Despite their potential, current top models within our community still fall short in adequately understanding spatial and temporal dimensions. We introduce Coarse Correspondence, a simple, training-free, effective, an… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: project page: https://coarse-correspondence.github.io

  3. arXiv:2407.18121  [pdf, other

    cs.CV

    Efficient Inference of Vision Instruction-Following Models with Elastic Cache

    Authors: Zuyan Liu, Benlin Liu, Jiahui Wang, Yuhao Dong, Guangyi Chen, Yongming Rao, Ranjay Krishna, Jiwen Lu

    Abstract: In the field of instruction-following large vision-language models (LVLMs), the efficient deployment of these models faces challenges, notably due to the high memory demands of their key-value (KV) caches. Conventional cache management strategies for LLMs focus on cache eviction, which often fails to address the specific needs of multimodal instruction-following models. Recognizing this gap, in th… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  4. arXiv:2407.16883  [pdf, other

    cs.IR cs.AI cs.CY cs.DB cs.LG

    A Standardized Machine-readable Dataset Documentation Format for Responsible AI

    Authors: Nitisha Jain, Mubashara Akhtar, Joan Giner-Miguelez, Rajat Shinde, Joaquin Vanschoren, Steffen Vogler, Sujata Goswami, Yuhan Rao, Tim Santos, Luis Oala, Michalis Karamousadakis, Manil Maskey, Pierre Marcenac, Costanza Conforti, Michael Kuchnik, Lora Aroyo, Omar Benjelloun, Elena Simperl

    Abstract: Data is critical to advancing AI technologies, yet its quality and documentation remain significant challenges, leading to adverse downstream effects (e.g., potential biases) in AI applications. This paper addresses these issues by introducing Croissant-RAI, a machine-readable metadata format designed to enhance the discoverability, interoperability, and trustworthiness of AI datasets. Croissant-R… ▽ More

    Submitted 4 June, 2024; originally announced July 2024.

    Comments: 10 pages, appendix

  5. arXiv:2407.08349  [pdf

    cs.CV

    Spine Vision X-Ray Image based GUI Planning of Pedicle Screws Using Enhanced YOLOv5 for Vertebrae Segmentation

    Authors: Yashwanth Rao, Gaurisankar S, Durga R, Aparna Purayath, Vivek Maik, Manojkumar Lakshmanan, Mohanasankar Sivaprakasm

    Abstract: In this paper, we propose an innovative Graphical User Interface (GUI) aimed at improving preoperative planning and intra-operative guidance for precise spinal screw placement through vertebrae segmentation. The methodology encompasses both front-end and back-end computations. The front end comprises a GUI that allows surgeons to precisely adjust the placement of screws on X-Ray images, thereby im… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  6. arXiv:2405.17934  [pdf, other

    cs.AI

    Proof of Quality: A Costless Paradigm for Trustless Generative AI Model Inference on Blockchains

    Authors: Zhenjie Zhang, Yuyang Rao, Hao Xiao, Xiaokui Xiao, Yin Yang

    Abstract: Generative AI models, such as GPT-4 and Stable Diffusion, have demonstrated powerful and disruptive capabilities in natural language and image tasks. However, deploying these models in decentralized environments remains challenging. Unlike traditional centralized deployment, systematically guaranteeing the integrity of AI model services in fully decentralized environments, particularly on trustles… ▽ More

    Submitted 30 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 12 pages, 5 figures

  7. arXiv:2404.19613  [pdf

    cond-mat.mtrl-sci physics.app-ph physics.comp-ph

    High-throughput discovery of metal oxides with high thermoelectric performance via interpretable feature engineering on small data

    Authors: Shengluo Ma, Yongchao Rao, Xiang Huang, Shenghong Ju

    Abstract: In this work, we have proposed a data-driven screening framework combining the interpretable machine learning with high-throughput calculations to identify a series of metal oxides that exhibit both high-temperature tolerance and high power factors. Aiming at the problem of weak generalization ability of small data with power factors at high temperatures, we employ symbolic regression for feature… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  8. arXiv:2404.15010  [pdf, other

    cs.CV

    X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition

    Authors: Shuofeng Sun, Yongming Rao, Jiwen Lu, Haibin Yan

    Abstract: Numerous prior studies predominantly emphasize constructing relation vectors for individual neighborhood points and generating dynamic kernels for each vector and embedding these into high-dimensional spaces to capture implicit local structures. However, we contend that such implicit high-dimensional structure modeling approch inadequately represents the local geometric structure of point clouds d… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Journal ref: The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024

  9. arXiv:2403.13276  [pdf

    physics.app-ph cond-mat.other

    All-magnonic repeater based on bistability

    Authors: Qi Wang, Roman Verba, Kristyna Davidkova, Bjorn Heinz, Shixian Tian, Yiheng Rao, Mengying Guo, Xueyu Guo, Carsten Dubs, Philipp Pirro, Andrii V. Chumak

    Abstract: Bistability, a universal phenomenon found in diverse fields such as biology, chemistry, and physics, describes a scenario in which a system has two stable equilibrium states and resets to one of the two states. The ability to switch between these two states is the basis for a wide range of applications, particularly in memory and logic operations. Here, we present a universal approach to achieve b… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: 13 pages, 4 figures

  10. arXiv:2403.12966  [pdf, other

    cs.CV

    Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models

    Authors: Zuyan Liu, Yuhao Dong, Yongming Rao, Jie Zhou, Jiwen Lu

    Abstract: In the realm of vision-language understanding, the proficiency of models in interpreting and reasoning over visual content has become a cornerstone for numerous applications. However, it is challenging for the visual encoder in Large Vision-Language Models (LVLMs) to extract useful features tailored to questions that aid the language model's response. Furthermore, a common practice among existing… ▽ More

    Submitted 21 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: Project Page: https://sites.google.com/view/chain-of-spot/

  11. arXiv:2312.13286  [pdf, other

    cs.CV

    Generative Multimodal Models are In-Context Learners

    Authors: Quan Sun, Yufeng Cui, Xiaosong Zhang, Fan Zhang, Qiying Yu, Zhengxiong Luo, Yueze Wang, Yongming Rao, Jingjing Liu, Tiejun Huang, Xinlong Wang

    Abstract: The human ability to easily solve multimodal tasks in context (i.e., with only a few demonstrations or simple instructions), is what current multimodal systems have largely struggled to imitate. In this work, we demonstrate that the task-agnostic in-context learning capabilities of large multimodal models can be significantly enhanced by effective scaling-up. We introduce Emu2, a generative multim… ▽ More

    Submitted 7 May, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: Accepted to CVPR 2024. Project page: https://baaivision.github.io/emu2

  12. arXiv:2312.10898  [pdf

    cond-mat.dis-nn physics.optics

    Replica symmetry breaking in 1D Rayleigh scattering system: theory and validations

    Authors: Yifei Qi, Longqun Ni, Zhenyu Ye, Jiaojiao Zhang, Xingyu Bao, Pan Wang, Yunjiang Rao, Ernesto P. Raposo, Anderson S. L. Gomes, Zinan Wang

    Abstract: Spin glass theory, as a paradigm for describing disordered magnetic systems, constitutes a prominent subject of study within statistical physics. Replica symmetry breaking (RSB), as one of the pivotal concepts for the understanding of spin glass theory, means that, under identical conditions disordered systems can yield distinct states with nontrivial correlations. Random fiber laser (RFL) based o… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: 15 pages, 9 figures

  13. arXiv:2312.06655  [pdf, other

    cs.CV cs.GR cs.LG

    Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

    Authors: Fangfu Liu, Diankun Wu, Yi Wei, Yongming Rao, Yueqi Duan

    Abstract: Recently, 3D content creation from text prompts has demonstrated remarkable progress by utilizing 2D and 3D diffusion models. While 3D diffusion models ensure great multi-view consistency, their ability to generate high-quality and diverse 3D assets is hindered by the limited 3D data. In contrast, 2D diffusion models find a distillation approach that achieves excellent generalization and rich deta… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Project page: https://liuff19.github.io/Sherpa3D/

  14. arXiv:2312.04784  [pdf, other

    cs.CV

    Reality's Canvas, Language's Brush: Crafting 3D Avatars from Monocular Video

    Authors: Yuchen Rao, Eduardo Perez Pellitero, Benjamin Busam, Yiren Zhou, Jifei Song

    Abstract: Recent advancements in 3D avatar generation excel with multi-view supervision for photorealistic models. However, monocular counterparts lag in quality despite broader applicability. We propose ReCaLaB to close this gap. ReCaLaB is a fully-differentiable pipeline that learns high-fidelity 3D human avatars from just a single RGB video. A pose-conditioned deformable NeRF is optimized to volumetrical… ▽ More

    Submitted 24 March, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: Video link: https://youtu.be/Oz83z1es2J4

  15. arXiv:2309.11857  [pdf, other

    cs.CV

    TCOVIS: Temporally Consistent Online Video Instance Segmentation

    Authors: Junlong Li, Bingyao Yu, Yongming Rao, Jie Zhou, Jiwen Lu

    Abstract: In recent years, significant progress has been made in video instance segmentation (VIS), with many offline and online methods achieving state-of-the-art performance. While offline methods have the advantage of producing temporally consistent predictions, they are not suitable for real-time scenarios. Conversely, online methods are more practical, but maintaining temporal consistency remains a cha… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: 11 pages, 4 figures. This paper has been accepted for ICCV 2023

  16. arXiv:2309.05437  [pdf, ps, other

    quant-ph

    Generation of three-dimensional cluster entangled state

    Authors: Chan Roh, Geunhee Gwak, Young-Do Yoon, Young-Sik Ra

    Abstract: Measurement-based quantum computing is a promising paradigm of quantum computation, where universal computing is achieved through a sequence of local measurements. The backbone of this approach is the preparation of multipartite entanglement, known as cluster states. While a cluster state with two-dimensional (2D) connectivity is required for universality, a three-dimensional (3D) cluster state is… ▽ More

    Submitted 16 January, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

  17. arXiv:2308.14912  [pdf, other

    astro-ph.SR

    Multi-wavelength observations of a B-class flare using XSM, AIA, and XRT

    Authors: Yamini K. Rao, B. Mondal, Giulio Del Zanna, N. P. S. Mithun, S. V. Vadawale, K. K. Reeves, Helen E. Mason, Anil Bhardwaj

    Abstract: We present multi-wavelength observations by Chandrayaan-2/XSM, SDO/AIA and Hinode/XRT of a B-class flare observed on 25th February, 2021, originating from an active region (AR 12804) near the North-West limb. The microflare lasts for approx 30 mins and is composed of hot loops reaching temperatures of 10 MK. We report excellent agreement (within 20 percent) for the average effective temperatures o… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: 18, pages, 18 figures, ApJ, Accepted

  18. arXiv:2308.05221  [pdf, other

    cs.HC cs.AI cs.RO

    Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI

    Authors: Hangjie Shi, Leslie Ball, Govind Thattai, Desheng Zhang, Lucy Hu, Qiaozi Gao, Suhaila Shakiah, Xiaofeng Gao, Aishwarya Padmakumar, Bofei Yang, Cadence Chung, Dinakar Guthy, Gaurav Sukhatme, Karthika Arumugam, Matthew Wen, Osman Ipek, Patrick Lange, Rohan Khanna, Shreyas Pansare, Vasu Sharma, Chao Zhang, Cris Flagg, Daniel Pressel, Lavina Vaz, Luke Dai , et al. (17 additional authors not shown)

    Abstract: The Alexa Prize program has empowered numerous university students to explore, experiment, and showcase their talents in building conversational agents through challenges like the SocialBot Grand Challenge and the TaskBot Challenge. As conversational agents increasingly appear in multimodal and embodied contexts, it is important to explore the affordances of conversational interaction augmented wi… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

  19. arXiv:2307.14971  [pdf, other

    cs.CV cs.AI cs.LG

    Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models

    Authors: Ziyi Wang, Xumin Yu, Yongming Rao, Jie Zhou, Jiwen Lu

    Abstract: With the overwhelming trend of mask image modeling led by MAE, generative pre-training has shown a remarkable potential to boost the performance of fundamental models in 2D vision. However, in 3D vision, the over-reliance on Transformer-based backbones and the unordered nature of point clouds have restricted the further development of generative pre-training. In this paper, we propose a novel 3D-t… ▽ More

    Submitted 7 September, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

    Comments: Accepted to ICCV 2023, project page: https://tap.ivg-research.xyz

  20. arXiv:2306.11175  [pdf

    physics.soc-ph physics.ao-ph stat.AP

    Developing Digital Twins for Earth Systems: Purpose, Requisites, and Benefits

    Authors: Yuhan Rao, Rob Redmon, Kirstine Dale, Sue E. Haupt, Aaron Hopkinson, Ann Bostrom, Sid Boukabara, Thomas Geenen, David M. Hall, Benjamin D. Smith, Dev Niyogi, V. Ramaswamy, Eric A. Kihn

    Abstract: The accelerated change in our planet due to human activities has led to grand societal challenges including health crises, intensified extreme weather events, food security, environmental injustice, etc. Digital twin systems combined with emerging technologies such as artificial intelligence and edge computing provide opportunities to support planning and decision-making to address these challenge… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

    Comments: This whitepaper is an outcome of the 4th NOAA AI Workshop

  21. Recovering quantum entanglement after its certification

    Authors: Hyeon-Jin Kim, Ji-Hyeok Jung, Kyung-Jun Lee, Young-Sik Ra

    Abstract: Entanglement is a crucial quantum resource with broad applications in quantum information science. For harnessing entanglement in practice, it is a prerequisite to certify the entanglement of a given quantum state. However, the certification process itself destroys the entanglement, thereby precluding further exploitation of the entanglement. Resolving this conflict, here we present a protocol tha… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

    Journal ref: Sci. Adv. 9, eadi5261 (2023)

  22. arXiv:2303.04060  [pdf, other

    cond-mat.mtrl-sci cond-mat.mes-hall physics.app-ph physics.comp-ph

    Large modulation of thermal transport in 2D semimetal triphosphides by doping-induced electron-phonon coupling

    Authors: Yongchao Rao, C. Y. Zhao, Lei Shen, Shenghong Ju

    Abstract: Recent studies demonstrate that novel 2D triphosphides semiconductors possess high carrier mobility and promising thermoelectric performance, while the carrier transport behaviors in 2D semimetal triphosphides have never been elucidated before. Herein, using the first-principles calculations and Boltzmann transport theory, we reveal that the electron-phonon coupling can be significant and thus gre… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Journal ref: Phys. Rev. B 108, 085413, 2023

  23. arXiv:2303.02153  [pdf, other

    cs.CV

    Unleashing Text-to-Image Diffusion Models for Visual Perception

    Authors: Wenliang Zhao, Yongming Rao, Zuyan Liu, Benlin Liu, Jie Zhou, Jiwen Lu

    Abstract: Diffusion models (DMs) have become the new trend of generative models and have demonstrated a powerful ability of conditional synthesis. Among those, text-to-image diffusion models pre-trained on large-scale image-text pairs are highly controllable by customizable prompts. Unlike the unconditional generative models that focus on low-level attributes and details, text-to-image diffusion models cont… ▽ More

    Submitted 3 March, 2023; originally announced March 2023.

    Comments: project page: https://vpd.ivg-research.xyz

  24. arXiv:2303.01586  [pdf, other

    cs.HC cs.AI cs.RO

    Alexa Arena: A User-Centric Interactive Platform for Embodied AI

    Authors: Qiaozi Gao, Govind Thattai, Suhaila Shakiah, Xiaofeng Gao, Shreyas Pansare, Vasu Sharma, Gaurav Sukhatme, Hangjie Shi, Bofei Yang, Desheng Zheng, Lucy Hu, Karthika Arumugam, Shui Hu, Matthew Wen, Dinakar Guthy, Cadence Chung, Rohan Khanna, Osman Ipek, Leslie Ball, Kate Bland, Heather Rocker, Yadunandana Rao, Michael Johnston, Reza Ghanadan, Arindam Mandal , et al. (2 additional authors not shown)

    Abstract: We introduce Alexa Arena, a user-centric simulation platform for Embodied AI (EAI) research. Alexa Arena provides a variety of multi-room layouts and interactable objects, for the creation of human-robot interaction (HRI) missions. With user-friendly graphics and control mechanisms, Alexa Arena supports the development of gamified robotic tasks readily accessible to general human users, thus openi… ▽ More

    Submitted 7 June, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

  25. arXiv:2302.04867  [pdf, other

    cs.LG cs.CV

    UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models

    Authors: Wenliang Zhao, Lujia Bai, Yongming Rao, Jie Zhou, Jiwen Lu

    Abstract: Diffusion probabilistic models (DPMs) have demonstrated a very promising ability in high-resolution image synthesis. However, sampling from a pre-trained DPM is time-consuming due to the multiple evaluations of the denoising network, making it more and more important to accelerate the sampling of DPMs. Despite recent progress in designing fast samplers, existing methods still cannot generate satis… ▽ More

    Submitted 17 October, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

    Comments: Accepted by NeurIPS 2023. Project page: https://unipc.ivg-research.xyz

  26. arXiv:2302.00665  [pdf, ps, other

    stat.ME math.ST stat.AP

    Necessary and sufficient conditions for posterior propriety for generalized linear mixed models

    Authors: Yalin Rao, Vivekananda Roy

    Abstract: Generalized linear mixed models (GLMMs) are commonly used to analyze correlated discrete or continuous response data. In Bayesian GLMMs, the often-used improper priors may yield undesirable improper posterior distributions. Thus, verifying posterior propriety is crucial for valid applications of Bayesian GLMMs with improper priors. Here, we consider the popular improper uniform prior o… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

  27. arXiv:2301.07950  [pdf

    physics.optics

    A Monolithic Graphene-Functionalized Microlaser for Multispecies Gas Detection

    Authors: Yanhong Guo, Zhaoyu Li, Ning An, Yongzheng Guo, Yuchen Wang, Yusen Yuan, Hao Zhang, Teng Tan, Caihao Wu, Bo Peng, Giancarlo Soavi, Yunjiang Rao, Baicheng Yao

    Abstract: Optical microcavity enhanced light-matter interaction offers a powerful tool to develop fast and precise sensing techniques, spurring applications in the detection of biochemical targets ranging from cells, nanoparticles, and large molecules. However, the intrinsic inertness of such pristine microresonators limits their spread in new fields such as gas detection. Here, a functionalized microlaser… ▽ More

    Submitted 19 January, 2023; originally announced January 2023.

    Journal ref: Advanced Materials 34 (2022) 2207777

  28. arXiv:2301.04545  [pdf, other

    cs.CV cs.AI

    AdaPoinTr: Diverse Point Cloud Completion with Adaptive Geometry-Aware Transformers

    Authors: Xumin Yu, Yongming Rao, Ziyi Wang, Jiwen Lu, Jie Zhou

    Abstract: In this paper, we present a new method that reformulates point cloud completion as a set-to-set translation problem and design a new model, called PoinTr, which adopts a Transformer encoder-decoder architecture for point cloud completion. By representing the point cloud as a set of unordered groups of points with position embeddings, we convert the input data to a sequence of point proxies and emp… ▽ More

    Submitted 11 January, 2023; originally announced January 2023.

    Comments: Extension of our ICCV 2021 work: arXiv:2108.08839 . Code is available at https://github.com/yuxumin/PoinTr

  29. Continuous-Variable Nonclassicality Detection under Coarse-Grained Measurement

    Authors: Chan Roh, Young-Do Yoon, Jiyong Park, Young-Sik Ra

    Abstract: Coarse graining is a common imperfection of realistic quantum measurement, obstructing the direct observation of quantum features. Under highly coarse-grained measurement, we experimentally detect the continuous-variable nonclassicality of both Gaussian and non-Gaussian states. Remarkably, we find that this coarse-grained measurement outperforms the conventional fine-grained measurement for noncla… ▽ More

    Submitted 29 December, 2022; originally announced December 2022.

    Journal ref: Phys. Rev. Research 5, 043057 (2023)

  30. arXiv:2212.04638  [pdf, other

    cs.CV

    FLAG3D: A 3D Fitness Activity Dataset with Language Instruction

    Authors: Yansong Tang, Jinpeng Liu, Aoyang Liu, Bin Yang, Wenxun Dai, Yongming Rao, Jiwen Lu, Jie Zhou, Xiu Li

    Abstract: With the continuously thriving popularity around the world, fitness activity analytic has become an emerging research topic in computer vision. While a variety of new tasks and algorithms have been proposed recently, there are growing hunger for data resources involved in high-quality data, fine-grained labels, and diverse environments. In this paper, we present FLAG3D, a large-scale 3D fitness ac… ▽ More

    Submitted 19 April, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: Accepted to CVPR2023

  31. Intensity Limit in Compact H$^-$ and H$_2^+$ Cyclotrons

    Authors: Thomas Planche, Richard A. Baartman, Hui Wen Koay, Yi-Nong Rao, Lige Zhang

    Abstract: Compact H$^-$ cyclotrons are used all across the globe to produce medical isotopes. Machines with external ion sources have demonstrated average extracted currents on the order of a few mA, although reported operational numbers are typically around 1\,mA or below. To explore the possibility of extracting even more current from such cyclotrons, it is important to understand the mechanisms that driv… ▽ More

    Submitted 4 January, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

  32. Redundant Field Survey Data of Cyclotron with Imperfect Median Plane

    Authors: Lige Zhang, Yi-Nong Rao

    Abstract: An accurate and detailed field map is important for cyclotron beam dynamics studies. During the long history of cyclotron studies, many techniques have been developed by cyclotron pioneers for the treatment of median plane field map. In this paper, we take the TRIUMF 500 MeV cyclotron as an example to study the asymmetric field resulting from the imperfect median plane symmetry. The ``Gordon appro… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

  33. arXiv:2210.03364  [pdf, other

    astro-ph.SR astro-ph.HE

    Soft X-ray Spectral Diagnostics of Multi-thermal Plasma in Solar Flares with Chandrayaan-2 XSM

    Authors: N. P. S. Mithun, Santosh V. Vadawale, Giulio Del Zanna, Yamini K. Rao, Bhuwan Joshi, Aveek Sarkar, Biswajit Mondal, P. Janardhan, Anil Bhardwaj, Helen E. Mason

    Abstract: Spectroscopic observations in X-ray wavelengths provide excellent diagnostics of the temperature distribution in solar flare plasma. The Solar X-ray Monitor (XSM) onboard the Chandrayaan-2 mission provides broad-band disk integrated soft X-ray solar spectral measurements in the energy range of 1-15 keV with high spectral resolution and time cadence. In this study, we analyse X-ray spectra of three… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: Accepted for publication in ApJ

  34. arXiv:2210.01253  [pdf, other

    cs.CV cs.CL cs.LG

    PLOT: Prompt Learning with Optimal Transport for Vision-Language Models

    Authors: Guangyi Chen, Weiran Yao, Xiangchen Song, Xinyue Li, Yongming Rao, Kun Zhang

    Abstract: With the increasing attention to large vision-language models such as CLIP, there has been a significant amount of effort dedicated to building efficient prompts. Unlike conventional methods of only learning one single prompt, we propose to learn multiple comprehensive prompts to describe diverse characteristics of categories such as intrinsic attributes or extrinsic contexts. However, directly ma… ▽ More

    Submitted 9 February, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: ICLR 2023, Spotlight

  35. arXiv:2209.05555  [pdf

    cs.CL cs.IR

    An Embedding-Based Grocery Search Model at Instacart

    Authors: Yuqing Xie, Taesik Na, Xiao Xiao, Saurav Manchanda, Young Rao, Zhihong Xu, Guanghua Shu, Esther Vasiete, Tejaswi Tenneti, Haixun Wang

    Abstract: The key to e-commerce search is how to best utilize the large yet noisy log data. In this paper, we present our embedding-based model for grocery search at Instacart. The system learns query and product representations with a two-tower transformer-based encoder architecture. To tackle the cold-start problem, we focus on content-based features. To train the model efficiently on noisy data, we propo… ▽ More

    Submitted 12 September, 2022; originally announced September 2022.

    Comments: Accepted by SIGIR eCom, July 15, 2022

  36. arXiv:2208.02812  [pdf, other

    cs.CV cs.AI cs.LG

    P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting

    Authors: Ziyi Wang, Xumin Yu, Yongming Rao, Jie Zhou, Jiwen Lu

    Abstract: Nowadays, pre-training big models on large-scale datasets has become a crucial topic in deep learning. The pre-trained models with high representation ability and transferability achieve a great success and dominate many downstream tasks in natural language processing and 2D vision. However, it is non-trivial to promote such a pretraining-tuning paradigm to the 3D vision, given the limited trainin… ▽ More

    Submitted 12 October, 2022; v1 submitted 4 August, 2022; originally announced August 2022.

    Comments: Accepted to NeurIPS 2022, project page: https://p2p.ivg-research.xyz

  37. arXiv:2207.14284  [pdf, other

    cs.CV

    HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

    Authors: Yongming Rao, Wenliang Zhao, Yansong Tang, Jie Zhou, Ser-Nam Lim, Jiwen Lu

    Abstract: Recent progress in vision Transformers exhibits great success in various tasks driven by the new spatial modeling mechanism based on dot-product self-attention. In this paper, we show that the key ingredients behind the vision Transformers, namely input-adaptive, long-range and high-order spatial interactions, can also be efficiently implemented with a convolution-based framework. We present the R… ▽ More

    Submitted 11 October, 2022; v1 submitted 28 July, 2022; originally announced July 2022.

    Comments: project page: https://hornet.ivg-research.xyz

  38. arXiv:2207.06879  [pdf, ps, other

    astro-ph.SR astro-ph.HE

    Multi-wavelength observations by XSM, Hinode and SDO of an active region. Chemical abundances and temperatures

    Authors: G. Del Zanna, B. Mondal, Y. K. Rao, N. P. S. Mithun, S. V. Vadawale, K. K. Reeves, H. E. Mason, A. Sarkar, P. Janardhan, A. Bhardwaj

    Abstract: We have reviewed the first year of observations of the Solar X-ray Monitor (XSM) onboard Chandrayaan-2, and the available multi-wavelength observations to complement the XSM data, focusing on Solar Dynamics Observatory AIA and Hinode XRT, EIS observations. XSM has provided disk-integrated solar spectra in the 1--15 keV energy range, observing a large number of microflares. We present an analysis o… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

    Comments: accepted for publication

  39. arXiv:2207.02409  [pdf

    physics.optics physics.app-ph physics.bio-ph

    Sub-monolayer Biolasers: Lower Gain, Higher Sensitivity

    Authors: C. Gong, X. Yang, S. J. Tang, Q. Q. Zhang, Y. Wang, Y. L. Liu, Y. C. Chen, G. D. Peng, X. Fan, Y. F. Xiao, Y. J. Rao, Y. Gong

    Abstract: Biomarker detection is the key to identifying health risks. However, designing sensitive biosensors in a single-use mode for disease diagnosis remains a major challenge. Here, we report sub-monolayer biolasers with remarkable repeatability for ultrasensitive and disposable biomarker detection. The biolaser sensors are designed by employing the telecom optical fibers as distributed optical microcav… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

    Comments: 27 pages, 15 figures

    MSC Class: 78A70

  40. arXiv:2207.01580  [pdf, other

    cs.CV cs.AI cs.LG

    Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks

    Authors: Yongming Rao, Zuyan Liu, Wenliang Zhao, Jie Zhou, Jiwen Lu

    Abstract: In this paper, we present a new approach for model acceleration by exploiting spatial sparsity in visual data. We observe that the final prediction in vision Transformers is only based on a subset of the most informative tokens, which is sufficient for accurate image recognition. Based on this observation, we propose a dynamic token sparsification framework to prune redundant tokens progressively… ▽ More

    Submitted 2 June, 2023; v1 submitted 4 July, 2022; originally announced July 2022.

    Comments: Accepted to T-PAMI. Journal version of our NeurIPS 2021 work: arXiv:2106.02034. Code is available at https://github.com/raoyongming/DynamicViT

  41. arXiv:2206.11228  [pdf, other

    q-bio.NC cs.LG

    Adversarially trained neural representations may already be as robust as corresponding biological neural representations

    Authors: Chong Guo, Michael J. Lee, Guillaume Leclerc, Joel Dapello, Yug Rao, Aleksander Madry, James J. DiCarlo

    Abstract: Visual systems of primates are the gold standard of robust perception. There is thus a general belief that mimicking the neural representations that underlie those systems will yield artificial visual systems that are adversarially robust. In this work, we develop a method for performing adversarial visual attacks directly on primate brain activity. We then leverage this method to demonstrate that… ▽ More

    Submitted 19 June, 2022; originally announced June 2022.

    Comments: 10 pages, 6 figures, ICML2022

  42. arXiv:2206.04916  [pdf, other

    cs.CV

    PatchComplete: Learning Multi-Resolution Patch Priors for 3D Shape Completion on Unseen Categories

    Authors: Yuchen Rao, Yinyu Nie, Angela Dai

    Abstract: While 3D shape representations enable powerful reasoning in many visual and perception applications, learning 3D shape priors tends to be constrained to the specific categories trained on, leading to an inefficient learning process, particularly for general applications with unseen categories. Thus, we propose PatchComplete, which learns effective shape priors based on multi-resolution local patch… ▽ More

    Submitted 12 October, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

    Comments: Video link: https://www.youtube.com/watch?v=Ch1rvw2D_Kc ; Project page: https://yuchenrao.github.io/projects/patchComplete/patchComplete.html ; Accepted to NeurIPS'22

  43. arXiv:2206.03385  [pdf

    physics.optics cond-mat.mes-hall

    Nonlinear co-generation of graphene plasmons for optoelectronic logic operations

    Authors: Y. Li, N. An, Z. Lu, Y. Wang, B. Chang, T. Tan, X. Guo, X. Xu, J. He, H. Xia, Z. Wu, Y. Su, Y. Liu, Y. Rao, G. Soavi, B. Yao

    Abstract: Surface plasmons in graphene provide a compelling strategy for advanced photonic technologies thanks to their tight confinement, fast response and tunability. Recent advances in the field of all optical generation of graphene plasmons in planar waveguides offer a promising method for high speed signal processing in nanoscale integrated optoelectronic devices. Here, we use two counter propagating f… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Journal ref: Nat. Commun. 13, 3138 (2022)

  44. arXiv:2205.13490  [pdf, other

    cs.CV cs.AI cs.LG

    SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation

    Authors: Ziyi Wang, Yongming Rao, Xumin Yu, Jie Zhou, Jiwen Lu

    Abstract: Conventional point cloud semantic segmentation methods usually employ an encoder-decoder architecture, where mid-level features are locally aggregated to extract geometric information. However, the over-reliance on these class-agnostic local geometric representations may raise confusion between local parts from different categories that are similar in appearance or spatially adjacent. To address t… ▽ More

    Submitted 26 May, 2022; originally announced May 2022.

    Comments: Accepted to CVPR 2022

  45. arXiv:2204.03646  [pdf, other

    cs.CV

    FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment

    Authors: Jinglin Xu, Yongming Rao, Xumin Yu, Guangyi Chen, Jie Zhou, Jiwen Lu

    Abstract: Most existing action quality assessment methods rely on the deep features of an entire video to predict the score, which is less reliable due to the non-transparent inference process and poor interpretability. We argue that understanding both high-level semantics and internal temporal structures of actions in competitive sports videos is the key to making predictions accurate and interpretable. To… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

    Comments: Computer Vision and Pattern Recognition 2022 (Oral presentation)

  46. arXiv:2204.03636  [pdf, other

    cs.CV

    SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation

    Authors: Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Yongming Rao, Guan Huang, Jiwen Lu, Jie Zhou

    Abstract: Depth estimation from images serves as the fundamental step of 3D perception for autonomous driving and is an economical alternative to expensive depth sensors like LiDAR. The temporal photometric constraints enables self-supervised depth estimation without labels, further facilitating its application. However, most existing methods predict the depth solely based on each monocular image and ignore… ▽ More

    Submitted 20 September, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted to CoRL 2022. Project page: https://surrounddepth.ivg-research.xyz Code: https://github.com/weiyithu/SurroundDepth

  47. arXiv:2203.16613  [pdf, other

    cond-mat.mtrl-sci cond-mat.mes-hall physics.app-ph physics.comp-ph

    High thermoelectric performance in metastable phase of silicon: a first-principles study

    Authors: Yongchao Rao, C. Y. Zhao, Shenghong Ju

    Abstract: In this work, both thermal and electrical transport properties of diamond$-$cubic Si (Si$-$I) and metastable R8 phase of Si (Si$-$XII) are comparatively studied by using first$-$principles calculations combined with Boltzmann transport theory. The metastable Si$-$XII shows one magnitude lower lattice thermal conductivity than stable Si$-$I from 300 to 500~K, attributed from the stronger phonon sca… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Journal ref: Applied Physics Letter 120, 163901, 2022

  48. arXiv:2203.14956  [pdf, other

    cs.CV cs.RO

    LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection

    Authors: Yi Wei, Zibu Wei, Yongming Rao, Jiaxin Li, Jie Zhou, Jiwen Lu

    Abstract: In this paper, we propose the LiDAR Distillation to bridge the domain gap induced by different LiDAR beams for 3D object detection. In many real-world applications, the LiDAR points used by mass-produced robots and vehicles usually have fewer beams than that in large-scale public datasets. Moreover, as the LiDARs are upgraded to other product models with different beam amount, it becomes challengi… ▽ More

    Submitted 14 August, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: Accepted to ECCV 2022. Code is available at https://github.com/weiyithu/LiDAR-Distillation

  49. arXiv:2203.14101   

    cs.LG cs.AI cs.CL

    A Roadmap for Big Model

    Authors: Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han, Zhenghao Liu, Ning Ding, Yongming Rao, Yizhao Gao, Liang Zhang, Ming Ding, Cong Fang, Yisen Wang, Mingsheng Long, Jing Zhang, Yinpeng Dong, Tianyu Pang, Peng Cui , et al. (75 additional authors not shown)

    Abstract: With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM… ▽ More

    Submitted 20 April, 2022; v1 submitted 26 March, 2022; originally announced March 2022.

    Comments: This report has been withdrawn by the authors due to critical issues in Section 2.3.1 of Article 2

  50. arXiv:2203.13777  [pdf, other

    cs.CV cs.LG

    Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion

    Authors: Tianpei Gu, Guangyi Chen, Junlong Li, Chunze Lin, Yongming Rao, Jie Zhou, Jiwen Lu

    Abstract: Human behavior has the nature of indeterminacy, which requires the pedestrian trajectory prediction system to model the multi-modality of future motion states. Unlike existing stochastic trajectory prediction methods which usually use a latent variable to represent multi-modality, we explicitly simulate the process of human motion variation from indeterminate to determinate. In this paper, we pres… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR2022