Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 61 results for author: Yu, S X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11608  [pdf, other

    cs.CV

    Learning Hierarchical Semantic Classification by Grounding on Consistent Image Segmentations

    Authors: Seulki Park, Youren Zhang, Stella X. Yu, Sara Beery, Jonathan Huang

    Abstract: Hierarchical semantic classification requires the prediction of a taxonomy tree instead of a single flat level of the tree, where both accuracies at individual levels and consistency across levels matter. We can train classifiers for individual levels, which has accuracy but not consistency, or we can train only the finest level classification and infer higher levels, which has consistency but not… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 34 pages

  2. arXiv:2403.14973  [pdf, other

    cs.CV cs.LG

    Trajectory Regularization Enhances Self-Supervised Geometric Representation

    Authors: Jiayun Wang, Stella X. Yu, Yubei Chen

    Abstract: Self-supervised learning (SSL) has proven effective in learning high-quality representations for various downstream tasks, with a primary focus on semantic tasks. However, its application in geometric tasks remains underexplored, partially due to the absence of a standardized evaluation method for geometric representations. To address this gap, we introduce a new pose-estimation benchmark for asse… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  3. arXiv:2312.15393  [pdf, other

    cs.CV

    Debiased Learning for Remote Sensing Data

    Authors: Chun-Hsiao Yeh, Xudong Wang, Stella X. Yu, Charles Hill, Zackery Steck, Scott Kangas, Aaron Reite

    Abstract: Deep learning has had remarkable success at analyzing handheld imagery such as consumer photos due to the availability of large-scale human annotations (e.g., ImageNet). However, remote sensing data lacks such extensive annotation and thus potential for supervised learning. To address this, we propose a highly effective semi-supervised approach tailored specifically to remote sensing data. Our app… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

    Comments: Accepted to CVPR 2023 MultiEarth Workshop

  4. arXiv:2312.12479  [pdf, other

    cs.CV

    Zero-shot Building Attribute Extraction from Large-Scale Vision and Language Models

    Authors: Fei Pan, Sangryul Jeon, Brian Wang, Frank Mckenna, Stella X. Yu

    Abstract: Existing building recognition methods, exemplified by BRAILS, utilize supervised learning to extract information from satellite and street-view images for classification and segmentation. However, each task module requires human-annotated data, hindering the scalability and robustness to regional variations and annotation imbalances. In response, we propose a new zero-shot workflow for building at… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: Accepted to WACV 2024, Project Page: https://sites.google.com/view/zobae/home

  5. arXiv:2312.04709  [pdf, other

    cs.LG cs.NE

    How to guess a gradient

    Authors: Utkarsh Singhal, Brian Cheung, Kartik Chandra, Jonathan Ragan-Kelley, Joshua B. Tenenbaum, Tomaso A. Poggio, Stella X. Yu

    Abstract: How much can you say about the gradient of a neural network without computing a loss or knowing the label? This may sound like a strange question: surely the answer is "very little." However, in this paper, we show that gradients are more structured than previously thought. Gradients lie in a predictable low-dimensional subspace which depends on the network architecture and incoming features. Expl… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  6. arXiv:2309.16672  [pdf, other

    cs.CV cs.LG

    Learning to Transform for Generalizable Instance-wise Invariance

    Authors: Utkarsh Singhal, Carlos Esteves, Ameesh Makadia, Stella X. Yu

    Abstract: Computer vision research has long aimed to build systems that are robust to spatial transformations found in natural data. Traditionally, this is done using data augmentation or hard-coding invariances into the architecture. However, too much or too little invariance can hurt, and the correct amount is unknown a priori and dependent on the instance. Ideally, the appropriate invariance would be lea… ▽ More

    Submitted 15 February, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: Accepted to ICCV 2023

  7. arXiv:2309.06745  [pdf, other

    cs.CV cs.HC cs.MM

    VEATIC: Video-based Emotion and Affect Tracking in Context Dataset

    Authors: Zhihang Ren, Jefferson Ortega, Yifan Wang, Zhimin Chen, Yunhui Guo, Stella X. Yu, David Whitney

    Abstract: Human affect recognition has been a significant topic in psychophysics and computer vision. However, the currently published datasets have many limitations. For example, most datasets contain frames that contain only information about facial expressions. Due to the limitations of previous datasets, it is very hard to either understand the mechanisms for affect recognition of humans or generalize w… ▽ More

    Submitted 14 September, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

  8. arXiv:2307.01421  [pdf, other

    cs.CV cs.AI

    Unsupervised Feature Learning with Emergent Data-Driven Prototypicality

    Authors: Yunhui Guo, Youren Zhang, Yubei Chen, Stella X. Yu

    Abstract: Given an image set without any labels, our goal is to train a model that maps each image to a point in a feature space such that, not only proximity indicates visual similarity, but where it is located directly encodes how prototypical the image is according to the dataset. Our key insight is to perform unsupervised feature learning in hyperbolic instead of Euclidean space, where the distance be… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: 17 pages

  9. arXiv:2304.08025  [pdf, other

    cs.CV

    Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual Grouping

    Authors: Long Lian, Zhirong Wu, Stella X. Yu

    Abstract: We study learning object segmentation from unlabeled videos. Humans can easily segment moving objects without knowing what they are. The Gestalt law of common fate, i.e., what move at the same speed belong together, has inspired unsupervised object discovery based on motion segmentation. However, common fate is not a reliable indicator of objectness: Parts of an articulated / deformable object may… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Comments: Accepted by CVPR 2023. An extension of preprint 2212.08816. 19 pages, 11 figures

  10. The Audio-Visual BatVision Dataset for Research on Sight and Sound

    Authors: Amandine Brunetto, Sascha Hornauer, Stella X. Yu, Fabien Moutarde

    Abstract: Vision research showed remarkable success in understanding our world, propelled by datasets of images and videos. Sensor data from radar, LiDAR and cameras supports research in robotics and autonomous driving for at least a decade. However, while visual sensors may fail in some conditions, sound has recently shown potential to complement sensor data. Simulated room impulse responses (RIR) in 3D ap… ▽ More

    Submitted 1 March, 2024; v1 submitted 13 March, 2023; originally announced March 2023.

    Comments: Project page https://amandinebtto.github.io/Batvision-Dataset/ This version contains camera ready paper

    Journal ref: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  11. arXiv:2301.11320  [pdf, other

    cs.CV cs.AI cs.LG

    Cut and Learn for Unsupervised Object Detection and Instance Segmentation

    Authors: Xudong Wang, Rohit Girdhar, Stella X. Yu, Ishan Misra

    Abstract: We propose Cut-and-LEaRn (CutLER), a simple approach for training unsupervised object detection and segmentation models. We leverage the property of self-supervised models to 'discover' objects without supervision and amplify it to train a state-of-the-art localization model without any human labels. CutLER first uses our proposed MaskCut approach to generate coarse masks for multiple objects in a… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

    Comments: Tech report. Project page: http://people.eecs.berkeley.edu/~xdwang/projects/CutLER/. Code is available at https://github.com/facebookresearch/CutLER

  12. arXiv:2212.10817  [pdf, other

    eess.IV cs.CV

    High-fidelity Direct Contrast Synthesis from Magnetic Resonance Fingerprinting

    Authors: Ke Wang, Mariya Doneva, Jakob Meineke, Thomas Amthor, Ekin Karasan, Fei Tan, Jonathan I. Tamir, Stella X. Yu, Michael Lustig

    Abstract: Magnetic Resonance Fingerprinting (MRF) is an efficient quantitative MRI technique that can extract important tissue and system parameters such as T1, T2, B0, and B1 from a single scan. This property also makes it attractive for retrospectively synthesizing contrast-weighted images. In general, contrast-weighted images like T1-weighted, T2-weighted, etc., can be synthesized directly from parameter… ▽ More

    Submitted 21 December, 2022; originally announced December 2022.

    Comments: 19 pages, 8 figures

  13. arXiv:2212.08816  [pdf, other

    cs.CV

    Improving Unsupervised Video Object Segmentation with Motion-Appearance Synergy

    Authors: Long Lian, Zhirong Wu, Stella X. Yu

    Abstract: We present IMAS, a method that segments the primary objects in videos without manual annotation in training or inference. Previous methods in unsupervised video object segmentation (UVOS) have demonstrated the effectiveness of motion as either input or supervision for segmentation. However, motion signals may be uninformative or even misleading in cases such as deformable objects and objects with… ▽ More

    Submitted 17 December, 2022; originally announced December 2022.

    Comments: 15 pages, 10 figures

  14. arXiv:2212.03450  [pdf, other

    cs.CV

    Tracking the Dynamics of the Tear Film Lipid Layer

    Authors: Tejasvi Kothapalli, Charlie Shou, Jennifer Ding, Jiayun Wang, Andrew D. Graham, Tatyana Svitova, Stella X. Yu, Meng C. Lin

    Abstract: Dry Eye Disease (DED) is one of the most common ocular diseases: over five percent of US adults suffer from DED. Tear film instability is a known factor for DED, and is thought to be regulated in large part by the thin lipid layer that covers and stabilizes the tear film. In order to aid eye related disease diagnosis, this work proposes a novel paradigm in using computer vision techniques to numer… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: NeurIPS Medical Imaging Workshop

  15. arXiv:2211.11797  [pdf, other

    cs.CV

    Multi-Spectral Image Classification with Ultra-Lean Complex-Valued Models

    Authors: Utkarsh Singhal, Stella X. Yu, Zackery Steck, Scott Kangas, Aaron A. Reite

    Abstract: Multi-spectral imagery is invaluable for remote sensing due to different spectral signatures exhibited by materials that often appear identical in greyscale and RGB imagery. Paired with modern deep learning methods, this modality has great potential utility in a variety of remote sensing applications, such as humanitarian assistance and disaster recovery efforts. State-of-the-art deep learning met… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

    Comments: NeuRIPS 2022 HADR workshop submission

  16. arXiv:2210.00314  [pdf, other

    cs.CV cs.AI cs.LG

    Learning Hierarchical Image Segmentation For Recognition and By Recognition

    Authors: Tsung-Wei Ke, Sangwoo Mo, Stella X. Yu

    Abstract: Large vision and language models learned directly through image-text associations often lack detailed visual substantiation, whereas image segmentation tasks are treated separately from recognition, supervisedly learned without interconnections. Our key observation is that, while an image can be recognized in multiple ways, each has a consistent part-and-whole visual organization. Segmentation thu… ▽ More

    Submitted 2 May, 2024; v1 submitted 1 October, 2022; originally announced October 2022.

    Comments: ICLR 2024 (spotlight). First two authors contributed equally. Code available at https://github.com/twke18/CAST

    ACM Class: I.4.6; I.4.10; I.5.3

  17. arXiv:2209.02834  [pdf, other

    cs.CV

    Unsupervised Scene Sketch to Photo Synthesis

    Authors: Jiayun Wang, Sangryul Jeon, Stella X. Yu, Xi Zhang, Himanshu Arora, Yu Lou

    Abstract: Sketches make an intuitive and powerful visual expression as they are fast executed freehand drawings. We present a method for synthesizing realistic photos from scene sketches. Without the need for sketch and photo pairs, our framework directly learns from readily available large-scale photo datasets in an unsupervised manner. To this end, we introduce a standardization module that provides pseud… ▽ More

    Submitted 6 September, 2022; originally announced September 2022.

    Journal ref: ECCVW 2022

  18. arXiv:2208.08349  [pdf, other

    cs.CV cs.LG

    Open Long-Tailed Recognition in a Dynamic World

    Authors: Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, Stella X. Yu

    Abstract: Real world data often exhibits a long-tailed and open-ended (with unseen classes) distribution. A practical recognition system must balance between majority (head) and minority (tail) classes, generalize across the distribution, and acknowledge novelty upon the instances of unseen classes (open classes). We define Open Long-Tailed Recognition++ (OLTR++) as learning from such naturally distributed… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

    Comments: To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022. Extended version of our previous CVPR oral paper (arXiv:1904.05160)

  19. arXiv:2204.11432  [pdf, other

    cs.CV cs.LG

    Unsupervised Hierarchical Semantic Segmentation with Multiview Cosegmentation and Clustering Transformers

    Authors: Tsung-Wei Ke, Jyh-Jing Hwang, Yunhui Guo, Xudong Wang, Stella X. Yu

    Abstract: Unsupervised semantic segmentation aims to discover groupings within and across images that capture object and view-invariance of a category without external supervision. Grouping naturally has levels of granularity, creating ambiguity in unsupervised segmentation. Existing methods avoid this ambiguity and treat it as a factor outside modeling, whereas we embrace it and desire hierarchical groupin… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

    Comments: In CVPR 2022. Webpage & Code: https://twke18.github.io/projects/hsg.html

  20. arXiv:2201.01490  [pdf, other

    cs.LG cs.CL cs.CV

    Debiased Learning from Naturally Imbalanced Pseudo-Labels

    Authors: Xudong Wang, Zhirong Wu, Long Lian, Stella X. Yu

    Abstract: Pseudo-labels are confident predictions made on unlabeled target data by a classifier trained on labeled source data. They are widely used for adapting a model to unlabeled data, e.g., in a semi-supervised learning setting. Our key insight is that pseudo-labels are naturally imbalanced due to intrinsic data similarity, even when a model is trained on balanced source data and evaluated on balance… ▽ More

    Submitted 21 April, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

    Comments: Accepted by CVPR 2022

  21. arXiv:2112.01525  [pdf, other

    cs.CV cs.AI cs.LG

    Co-domain Symmetry for Complex-Valued Deep Learning

    Authors: Utkarsh Singhal, Yifei Xing, Stella X. Yu

    Abstract: We study complex-valued scaling as a type of symmetry natural and unique to complex-valued measurements and representations. Deep Complex Networks (DCN) extends real-valued algebra to the complex domain without addressing complex-valued scaling. SurReal takes a restrictive manifold view of complex numbers, adopting a distance metric to achieve complex-scaling invariance while losing rich complex-v… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

  22. arXiv:2111.06394  [pdf, other

    cs.CV cs.LG

    The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos

    Authors: Runtao Liu, Zhirong Wu, Stella X. Yu, Stephen Lin

    Abstract: Humans can easily segment moving objects without knowing what they are. That objectness could emerge from continuous visual observations motivates us to model grouping and movement concurrently from unlabeled videos. Our premise is that a video has different views of the same scene related by moving components, and the right region segmentation and region flow would allow mutual view synthesis whi… ▽ More

    Submitted 11 November, 2021; originally announced November 2021.

    Comments: This paper has been accepted to NeurIPS 2021

  23. arXiv:2110.03006  [pdf, other

    cs.LG cs.CV

    Unsupervised Selective Labeling for More Effective Semi-Supervised Learning

    Authors: Xudong Wang, Long Lian, Stella X. Yu

    Abstract: Given an unlabeled dataset and an annotation budget, we study how to selectively label a fixed number of instances so that semi-supervised learning (SSL) on such a partially labeled dataset is most effective. We focus on selecting the right data to label, in addition to usual SSL's propagating labels from labeled data to the rest unlabeled data. This instance selection task is challenging, as with… ▽ More

    Submitted 23 August, 2023; v1 submitted 6 October, 2021; originally announced October 2021.

    Comments: Accepted by ECCV 2022; Fixed a few typos

  24. arXiv:2107.11472  [pdf, other

    cs.LG cs.CV

    Clipped Hyperbolic Classifiers Are Super-Hyperbolic Classifiers

    Authors: Yunhui Guo, Xudong Wang, Yubei Chen, Stella X. Yu

    Abstract: Hyperbolic space can naturally embed hierarchies, unlike Euclidean space. Hyperbolic Neural Networks (HNNs) exploit such representational power by lifting Euclidean features into hyperbolic space for classification, outperforming Euclidean neural networks (ENNs) on datasets with known semantic hierarchies. However, HNNs underperform ENNs on standard benchmarks without clear hierarchies, greatly re… ▽ More

    Submitted 13 May, 2022; v1 submitted 23 July, 2021; originally announced July 2021.

    Comments: CVPR 2022

  25. arXiv:2107.07110  [pdf, other

    cs.CV cs.LG

    Compact and Optimal Deep Learning with Recurrent Parameter Generators

    Authors: Jiayun Wang, Yubei Chen, Stella X. Yu, Brian Cheung, Yann LeCun

    Abstract: Deep learning has achieved tremendous success by training increasingly large models, which are then compressed for practical deployment. We propose a drastically different approach to compact and optimal deep learning: We decouple the Degrees of freedom (DoF) and the actual number of parameters of a model, optimize a small DoF with predefined random linear constraints for a large model of arbitrar… ▽ More

    Submitted 26 October, 2022; v1 submitted 15 July, 2021; originally announced July 2021.

    Journal ref: WACV 2023

  26. Unsupervised Discriminative Learning of Sounds for Audio Event Classification

    Authors: Sascha Hornauer, Ke Li, Stella X. Yu, Shabnam Ghaffarzadegan, Liu Ren

    Abstract: Recent progress in network-based audio event classification has shown the benefit of pre-training models on visual data such as ImageNet. While this process allows knowledge transfer across different domains, training a model on large-scale visual datasets is time consuming. On several audio event classification benchmarks, we show a fast and effective alternative that pre-trains the model unsuper… ▽ More

    Submitted 20 May, 2021; v1 submitted 19 May, 2021; originally announced May 2021.

    Comments: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) | 978-1-7281-7605-5/20/$31.00 (c) 2021 IEEE | DOI: 10.1109/ICASSP39728.2021.9413482

  27. Iterative Human and Automated Identification of Wildlife Images

    Authors: Zhongqi Miao, Ziwei Liu, Kaitlyn M. Gaynor, Meredith S. Palmer, Stella X. Yu, Wayne M. Getz

    Abstract: Camera trapping is increasingly used to monitor wildlife, but this technology typically requires extensive data annotation. Recently, deep learning has significantly advanced automatic wildlife recognition. However, current methods are hampered by a dependence on large static data sets when wildlife data is intrinsically dynamic and involves long-tailed distributions. These two drawbacks can be ov… ▽ More

    Submitted 18 October, 2021; v1 submitted 5 May, 2021; originally announced May 2021.

    Comments: This preprint has not undergone peer review (when applicable) or any post-submission improvements or corrections. It is published in Nature Machine Intelligence: https://www.nature.com/articles/s42256-021-00393-0

    Journal ref: Nat Mach Intell 3, 885-895 (2021)

  28. arXiv:2105.00957  [pdf, other

    cs.CV

    Universal Weakly Supervised Segmentation by Pixel-to-Segment Contrastive Learning

    Authors: Tsung-Wei Ke, Jyh-Jing Hwang, Stella X. Yu

    Abstract: Weakly supervised segmentation requires assigning a label to every pixel based on training instances with partial annotations such as image-level tags, object bounding boxes, labeled points and scribbles. This task is challenging, as coarse annotations (tags, boxes) lack precise pixel localization whereas sparse annotations (points, scribbles) lack broad region coverage. Existing methods tackle th… ▽ More

    Submitted 10 May, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

    Comments: In ICLR 2021. Webpage & Code: https://twke18.github.io/projects/spml.html

  29. arXiv:2104.02921  [pdf, other

    cs.RO cs.CV

    Unsupervised Visual Attention and Invariance for Reinforcement Learning

    Authors: Xudong Wang, Long Lian, Stella X. Yu

    Abstract: Vision-based reinforcement learning (RL) is successful, but how to generalize it to unknown test environments remains challenging. Existing methods focus on training an RL policy that is universal to changing visual domains, whereas we focus on extracting visual foreground that is universal, feeding clean invariant vision to the RL policy learner. Our method is completely unsupervised, without man… ▽ More

    Submitted 16 April, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

    Comments: Accepted at CVPR 2021

  30. arXiv:2103.04003  [pdf, other

    eess.IV cs.CV

    Memory-efficient Learning for High-Dimensional MRI Reconstruction

    Authors: Ke Wang, Michael Kellman, Christopher M. Sandino, Kevin Zhang, Shreyas S. Vasanawala, Jonathan I. Tamir, Stella X. Yu, Michael Lustig

    Abstract: Deep learning (DL) based unrolled reconstructions have shown state-of-the-art performance for under-sampled magnetic resonance imaging (MRI). Similar to compressed sensing, DL can leverage high-dimensional data (e.g. 3D, 2D+time, 3D+time) to further improve performance. However, network size and depth are currently limited by the GPU memory required for backpropagation. Here we use a memory-effici… ▽ More

    Submitted 5 March, 2021; originally announced March 2021.

    Comments: 14 pages, 8 figures

  31. arXiv:2010.01809  [pdf, other

    cs.CV

    Long-tailed Recognition by Routing Diverse Distribution-Aware Experts

    Authors: Xudong Wang, Long Lian, Zhongqi Miao, Ziwei Liu, Stella X. Yu

    Abstract: Natural data are often long-tail distributed over semantic classes. Existing recognition methods tackle this imbalanced classification by placing more emphasis on the tail data, through class re-balancing/re-weighting or ensembling over different data groups, resulting in increased tail accuracies but reduced head accuracies. We take a dynamic view of the training data and provide a principled m… ▽ More

    Submitted 1 May, 2022; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: Accepted at ICLR 2021 (Spotlight); Add experiments on Swin Transformer

  32. arXiv:2009.12021  [pdf, other

    cs.CV

    Tied Block Convolution: Leaner and Better CNNs with Shared Thinner Filters

    Authors: Xudong Wang, Stella X. Yu

    Abstract: Convolution is the main building block of convolutional neural networks (CNN). We observe that an optimized CNN often has highly correlated filters as the number of channels increases with depth, reducing the expressive power of feature representations. We propose Tied Block Convolution (TBC) that shares the same thinner filters over equal blocks of channels and produces multiple responses with a… ▽ More

    Submitted 24 September, 2020; originally announced September 2020.

    Comments: 13 pages

  33. arXiv:2008.03813  [pdf, other

    cs.CV cs.LG stat.ML

    Unsupervised Feature Learning by Cross-Level Instance-Group Discrimination

    Authors: Xudong Wang, Ziwei Liu, Stella X. Yu

    Abstract: Unsupervised feature learning has made great strides with contrastive learning based on instance discrimination and invariant mapping, as benchmarked on curated class-balanced datasets. However, natural data could be highly correlated and long-tail distributed. Natural between-instance similarity conflicts with the presumed instance distinction, causing unstable training and poor performance. Ou… ▽ More

    Submitted 15 May, 2021; v1 submitted 9 August, 2020; originally announced August 2020.

    Comments: Accepted at CVPR 2021; Project page: http://people.eecs.berkeley.edu/~xdwang/projects/CLD/

  34. arXiv:2006.09694  [pdf, other

    cs.CV

    3D Shape Reconstruction from Free-Hand Sketches

    Authors: Jiayun Wang, Jierui Lin, Qian Yu, Runtao Liu, Yubei Chen, Stella X. Yu

    Abstract: Sketches are the most abstract 2D representations of real-world objects. Although a sketch usually has geometrical distortion and lacks visual cues, humans can effortlessly envision a 3D object from it. This suggests that sketches encode the information necessary for reconstructing 3D shapes. Despite great progress achieved in 3D reconstruction from distortion-free line drawings, such as CAD and e… ▽ More

    Submitted 18 January, 2022; v1 submitted 17 June, 2020; originally announced June 2020.

  35. arXiv:1911.12207  [pdf, other

    cs.CV

    Orthogonal Convolutional Neural Networks

    Authors: Jiayun Wang, Yubei Chen, Rudrasis Chakraborty, Stella X. Yu

    Abstract: Deep convolutional neural networks are hindered by training instability and feature redundancy towards further performance improvement. A promising solution is to impose orthogonality on convolutional filters. We develop an efficient approach to impose filter orthogonality on a convolutional layer based on the doubly block-Toeplitz matrix representation of the convolutional kernel instead of usi… ▽ More

    Submitted 8 April, 2020; v1 submitted 27 November, 2019; originally announced November 2019.

    Comments: To appear in CVPR 2020, project page: http://pwang.pw/ocnn.html

  36. arXiv:1910.13050  [pdf, other

    cs.CV

    POIRot: A rotation invariant omni-directional pointnet

    Authors: Liu Yang, Rudrasis Chakraborty, Stella X. Yu

    Abstract: Point-cloud is an efficient way to represent 3D world. Analysis of point-cloud deals with understanding the underlying 3D geometric structure. But due to the lack of smooth topology, and hence the lack of neighborhood structure, standard correlation can not be directly applied on point-cloud. One of the popular approaches to do point correlation is to partition the point-cloud into voxels and extr… ▽ More

    Submitted 29 October, 2019; v1 submitted 28 October, 2019; originally announced October 2019.

  37. arXiv:1910.06962  [pdf, other

    cs.CV cs.LG eess.IV

    SegSort: Segmentation by Discriminative Sorting of Segments

    Authors: Jyh-Jing Hwang, Stella X. Yu, Jianbo Shi, Maxwell D. Collins, Tien-Ju Yang, Xiao Zhang, Liang-Chieh Chen

    Abstract: Almost all existing deep learning approaches for semantic segmentation tackle this task as a pixel-wise classification problem. Yet humans understand a scene not in terms of pixels, but by decomposing it into perceptual groups and structures that are the basic building blocks of recognition. This motivates us to propose an end-to-end pixel-wise metric learning approach that mimics this process. In… ▽ More

    Submitted 30 October, 2019; v1 submitted 15 October, 2019; originally announced October 2019.

    Comments: In ICCV 2019. Webpage & Code: https://jyhjinghwang.github.io/projects/segsort.html

  38. Building Information Modeling and Classification by Visual Learning At A City Scale

    Authors: Qian Yu, Chaofeng Wang, Barbaros Cetiner, Stella X. Yu, Frank Mckenna, Ertugrul Taciroglu, Kincho H. Law

    Abstract: In this paper, we provide two case studies to demonstrate how artificial intelligence can empower civil engineering. In the first case, a machine learning-assisted framework, BRAILS, is proposed for city-scale building information modeling. Building information modeling (BIM) is an efficient way of describing buildings, which is essential to architecture, engineering, and construction. Our propose… ▽ More

    Submitted 20 July, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

    Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

  39. arXiv:1909.03403  [pdf, other

    cs.CV cs.LG stat.ML

    Open Compound Domain Adaptation

    Authors: Ziwei Liu, Zhongqi Miao, Xingang Pan, Xiaohang Zhan, Dahua Lin, Stella X. Yu, Boqing Gong

    Abstract: A typical domain adaptation approach is to adapt models trained on the annotated data in a source domain (e.g., sunny weather) for achieving high performance on the test data in a target domain (e.g., rainy weather). Whether the target contains a single homogeneous domain or multiple heterogeneous domains, existing works always assume that there exist clear distinctions between the domains, which… ▽ More

    Submitted 29 March, 2020; v1 submitted 8 September, 2019; originally announced September 2019.

    Comments: To appear in CVPR 2020 as an oral presentation. Code, datasets and models are available at: https://liuziwei7.github.io/projects/CompoundDomain.html

  40. Spatial Transformer for 3D Point Clouds

    Authors: Jiayun Wang, Rudrasis Chakraborty, Stella X. Yu

    Abstract: Deep neural networks are widely used for understanding 3D point clouds. At each point convolution layer, features are computed from local neighborhoods of 3D points and combined for subsequent processing in order to extract semantic information. Existing methods adopt the same individual point neighborhoods throughout the network layers, defined by the same metric on the fixed input point coordina… ▽ More

    Submitted 29 March, 2021; v1 submitted 26 June, 2019; originally announced June 2019.

    Comments: To appear in IEEE Transactions on PAMI, 2021

  41. arXiv:1906.10048  [pdf, ps, other

    cs.CV

    SurReal: Fréchet Mean and Distance Transform for Complex-Valued Deep Learning

    Authors: Rudrasis Chakraborty, Jiayun Wang, Stella X. Yu

    Abstract: We develop a novel deep learning architecture for naturally complex-valued data, which is often subject to complex scaling ambiguity. We treat each sample as a field in the space of complex numbers. With the polar form of a complex-valued number, the general group that acts in this space is the product of planar rotation and non-zero scaling. This perspective allows us to develop not only a novel… ▽ More

    Submitted 24 June, 2019; originally announced June 2019.

    Comments: IEEE Computer Vision and Pattern Recognition Workshop on Perception Beyond the Visible Spectrum, Long Beach, California, 16 June 2019 Best Paper Award

  42. arXiv:1904.05160  [pdf, other

    cs.CV cs.LG

    Large-Scale Long-Tailed Recognition in an Open World

    Authors: Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, Stella X. Yu

    Abstract: Real world data often have a long-tailed and open-ended distribution. A practical recognition system must classify among majority and minority classes, generalize from a few known instances, and acknowledge novelty upon a never seen instance. We define Open Long-Tailed Recognition (OLTR) as learning from such naturally distributed data and optimizing the classification accuracy over a balanced tes… ▽ More

    Submitted 16 April, 2019; v1 submitted 10 April, 2019; originally announced April 2019.

    Comments: To appear in CVPR 2019 as an oral presentation. Code, datasets and models are available at https://liuziwei7.github.io/projects/LongTail.html

  43. arXiv:1808.04699  [pdf, other

    cs.CV cs.LG

    Improving Generalization via Scalable Neighborhood Component Analysis

    Authors: Zhirong Wu, Alexei A. Efros, Stella X. Yu

    Abstract: Current major approaches to visual recognition follow an end-to-end formulation that classifies an input image into one of the pre-determined set of semantic categories. Parametric softmax classifiers are a common choice for such a closed world with fixed categories, especially when big labeled data is available during training. However, this becomes problematic for open-set scenarios where new ca… ▽ More

    Submitted 14 August, 2018; originally announced August 2018.

    Comments: To appear in ECCV 2018

  44. arXiv:1805.07457  [pdf, other

    cs.CV

    Adversarial Structure Matching for Structured Prediction Tasks

    Authors: Jyh-Jing Hwang, Tsung-Wei Ke, Jianbo Shi, Stella X. Yu

    Abstract: Pixel-wise losses, e.g., cross-entropy or L2, have been widely used in structured prediction tasks as a spatial extension of generic image classification or regression. However, its i.i.d. assumption neglects the structural regularity present in natural images. Various attempts have been made to incorporate structural reasoning mostly through structure priors in a cooperative way where co-occurrin… ▽ More

    Submitted 21 October, 2019; v1 submitted 18 May, 2018; originally announced May 2018.

    Comments: In CVPR 2019. Webpage & Code: https://jyhjinghwang.github.io/projects/asm.html

  45. arXiv:1804.00064  [pdf, other

    cs.CV cs.AI

    Learning Beyond Human Expertise with Generative Models for Dental Restorations

    Authors: Jyh-Jing Hwang, Sergei Azernikov, Alexei A. Efros, Stella X. Yu

    Abstract: Computer vision has advanced significantly that many discriminative approaches such as object recognition are now widely used in real applications. We present another exciting development that utilizes generative models for the mass customization of medical products such as dental crowns. In the dental industry, it takes a technician years of training to design synthetic crowns that restore the fu… ▽ More

    Submitted 30 March, 2018; originally announced April 2018.

  46. arXiv:1803.10335  [pdf, other

    cs.CV

    Adaptive Affinity Fields for Semantic Segmentation

    Authors: Tsung-Wei Ke, Jyh-Jing Hwang, Ziwei Liu, Stella X. Yu

    Abstract: Semantic segmentation has made much progress with increasingly powerful pixel-wise classifiers and incorporating structural priors via Conditional Random Fields (CRF) or Generative Adversarial Networks (GAN). We propose a simpler alternative that learns to verify the spatial structure of segmentation during training only. Unlike existing approaches that enforce semantic labels on individual pixels… ▽ More

    Submitted 21 August, 2018; v1 submitted 27 March, 2018; originally announced March 2018.

    Comments: To appear in European Conference on Computer Vision (ECCV) 2018

  47. arXiv:1712.01511  [pdf, other

    cs.CV

    Successive Embedding and Classification Loss for Aerial Image Classification

    Authors: Jiayun Wang, Patrick Virtue, Stella X. Yu

    Abstract: Deep neural networks can be effective means to automatically classify aerial images but is easy to overfit to the training data. It is critical for trained neural networks to be robust to variations that exist between training and test environments. To address the overfitting problem in aerial image classification, we consider the neural network as successive transformations of an input image into… ▽ More

    Submitted 24 September, 2019; v1 submitted 5 December, 2017; originally announced December 2017.

  48. arXiv:1709.10512  [pdf, other

    cs.RO

    Learning to Roam Free from Small-Space Autonomous Driving with A Path Planner

    Authors: Sascha Hornauer, Karl Zipser, Stella X. Yu

    Abstract: Modern autonomous driving algorithms often rely on learning the mapping from visual inputs to steering actions from human driving data in a variety of scenarios and visual scenes. The required data collection is not only labor intensive, but such data are often noisy, inconsistent, and inflexible, as there is no differentiation between good and bad drivers, or between different driving intentions.… ▽ More

    Submitted 16 March, 2018; v1 submitted 29 September, 2017; originally announced September 2017.

    Comments: Changes to previous version: Added further evaluations. Added tests in different environment. Added depth-input validation. Dropped multi-task learning aspect. Currently under review for publication at the ECCV 2018

  49. arXiv:1707.00070  [pdf, other

    cs.CV

    Better than Real: Complex-valued Neural Nets for MRI Fingerprinting

    Authors: Patrick Virtue, Stella X. Yu, Michael Lustig

    Abstract: The task of MRI fingerprinting is to identify tissue parameters from complex-valued MRI signals. The prevalent approach is dictionary based, where a test MRI signal is compared to stored MRI signals with known tissue parameters and the most similar signals and tissue parameters retrieved. Such an approach does not scale with the number of parameters and is rather slow when the tissue parameter spa… ▽ More

    Submitted 30 June, 2017; originally announced July 2017.

    Comments: Accepted in Proc. IEEE International Conference on Image Processing (ICIP), 2017

  50. arXiv:1612.08510  [pdf, other

    cs.CV

    Learning Non-Lambertian Object Intrinsics across ShapeNet Categories

    Authors: Jian Shi, Yue Dong, Hao Su, Stella X. Yu

    Abstract: We consider the non-Lambertian object intrinsic problem of recovering diffuse albedo, shading, and specular highlights from a single image of an object. We build a large-scale object intrinsics database based on existing 3D models in the ShapeNet database. Rendered with realistic environment maps, millions of synthetic images of objects and their corresponding albedo, shading, and specular groun… ▽ More

    Submitted 27 December, 2016; originally announced December 2016.