Qiaosong Wang

Online shopping is difficult for people with motor impairments. Proprietary software can emulate mouse and keyboard via head tracking. Smartphones are easier to carry indoors and outdoors compared to bulky laptop or desktop devices.... more

Online shopping is difficult for people with motor impairments. Proprietary software can emulate mouse and keyboard via head tracking. Smartphones are easier to carry indoors and outdoors compared to bulky laptop or desktop devices. However, head tracking solutions are not common for smartphones. To address this, we implement and open source button that is sensitive to head movements tracked from the front camera of iPhone X. This allows developers to integrate in eCommerce applications easily without requiring specialized knowledge. Other applications include gaming and use in hands-free situations such as during cooking, auto-repair. We built a sample online shopping application that allows users to easily browse between items from various categories and take relevant action just by head movements. We present results of user studies on this sample application and also include sensitivity studies based on two independent tests performed at 3 different distances to the screen.

Research Interests:
Head-Tracking Interfaces, Iphones, Depth Cameras, and Head Movement Tracking

Download (.pdf)

In this work, we propose an efficient and effective approach for unconstrained salient object detection in images using deep convolutional neural networks. Instead of generating thousands of candidate bounding boxes and refining them, our... more

In this work, we propose an efficient and effective approach for unconstrained salient object detection in images using deep convolutional neural networks. Instead of generating thousands of candidate bounding boxes and refining them, our network directly learns to generate the saliency map containing the exact number of salient objects. During training, we convert the ground-truth rectangular boxes to Gaussian distributions that better capture the ROI regarding individual salient objects. During inference, the network predicts Gaussian distributions centered at salient objects with an appropriate covariance, from which bounding boxes are easily inferred. Notably, our network performs saliency map prediction without pixel-level annotations , salient object detection without object proposals, and salient object subitizing simultaneously, all in a single pass within a unified framework. Extensive experiments show that our approach outperforms existing methods on various datasets by a large margin, and achieves more than 100 fps with VGG16 network on a single GPU during inference.

Publication Date: 2018

Publication Name: IEEE Winter Conference on Applications of Computer Vision (WACV)

Research Interests:
Deep Learning, Visual Saliency, Object Detection, and Convolutional Neural Networks

Download (.pdf)

In this paper, we propose a novel end-to-end approach for scalable visual search infrastructure. We discuss the challenges we faced for a massive volatile inventory like at eBay and present our solution to overcome those 1. We harness the... more

In this paper, we propose a novel end-to-end approach for scalable visual search infrastructure. We discuss the challenges we faced for a massive volatile inventory like at eBay and present our solution to overcome those 1. We harness the availability of large image collection of eBay listings and state-of-the-art deep learning techniques to perform visual search at scale. Supervised approach for optimized search limited to top predicted categories and also for compact binary signature are key to scale up without compromising accuracy and precision. Both use a common deep neural network requiring only a single forward inference. e system architecture is presented with in-depth discussions of its basic components and optimizations for a trade-oo between search relevance and latency. is solution is currently deployed in a distributed cloud infrastructure and fuels visual search in eBay ShopBot and Close5. We show benchmark on ImageNet dataset on which our approach is faster and more accurate than several unsupervised baselines. We share our learnings with the hope that visual search becomes a rst class citizen for all large scale search engines rather than an aaerthought.

DOI: 10.1145/3097983.3098162

Publication Date: 2017

Publication Name: ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD)

Research Interests:
Computer Vision, Object Recognition (Computer Vision), Ecommerce, Search Engines, Visual Search, and Deep Learning

Download (.pdf)

There are numerous applications of unmanned aerial vehicles (UAVs) in the management of civil infrastructure assets. A few examples include routine bridge inspections, disaster management, power line surveillance and traffic surveying. As... more

There are numerous applications of unmanned aerial vehicles (UAVs) in the management of civil infrastructure assets. A few examples include routine bridge inspections, disaster management, power line surveillance and traffic surveying. As UAV applications become widespread, increased levels of autonomy and independent decision-making are necessary to improve the safety, efficiency, and accuracy of the devices. This paper details the procedure and parameters used for the training of convolutional neural networks (CNNs) on a set of aerial images for efficient and automated object recognition. Potential application areas in the transportation field are also highlighted. The accuracy and reliability of CNNs depend on the network's training and the selection of operational parameters. This paper details the CNN training procedure and parameter selection. The object recognition results show that by selecting a proper set of parameters, a CNN can detect and classify objects with a high level of accuracy (97.5%) and computational efficiency. Furthermore, using a convolutional neural network implemented in the "YOLO" ("You Only Look Once") platform, objects can be tracked, detected ("seen"), and classified ("comprehended") from video feeds supplied by UAVs in real-time.

DOI: 10.3390/jimaging3020021

Publication Date: 2017

Publication Name: Journal of Imaging

Research Interests:
Object Recognition (Computer Vision), Unmanned Aerial Vehicles, Deep Learning, Object Detection, Aerial Imagery, and Convolutional Neural Networks

Download (.pdf)

We present a visual object detector based on a deep convo-lutional neural network that quickly outputs bounding box hypotheses without a separate proposal generation stage [1]. We modify the network for better performance, specialize it... more

We present a visual object detector based on a deep convo-lutional neural network that quickly outputs bounding box hypotheses without a separate proposal generation stage [1]. We modify the network for better performance, specialize it for a robotic application involving "bird" and "nest" categories (including the creation of a new dataset for the latter), and extend it to enforce temporal continuity for tracking. The system exhibits very competitive detection accuracy and speed, as well as robust, high-speed tracking on several difficult sequences.

DOI: 10.1007/978-3-319-50835-114

Publication Date: 2016

Publication Name: International Symposium on Visual Computing (ISVC)

Research Interests:
Computer Vision, Object Recognition (Computer Vision), Object Tracking (Computer Vision), Deep Learning, and Convolutional Neural Networks

Download (.pdf)

We propose an unsupervised bottom-up saliency detection approach by exploiting novel graph structure and background priors. The input image is represented as an undi-rected graph with superpixels as nodes. Feature vectors are extracted... more

We propose an unsupervised bottom-up saliency detection approach by exploiting novel graph structure and background priors. The input image is represented as an undi-rected graph with superpixels as nodes. Feature vectors are extracted from each node to cover regional color, contrast and texture information. A novel graph model is proposed to effectively capture local and global saliency cues. To obtain more accurate saliency estimations, we optimize the saliency map by using a robust background measure. Comprehensive evaluations on benchmark datasets indicate that our algorithm universally surpasses state-of-the-art unsu-pervised solutions and performs favorably against supervised approaches.

DOI: 10.1109/CVPR.2016.64

Publication Date: 2016

Publication Name: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Research Interests:
Graphical Models, Manifold learning, Visual Saliency, Feature Extraction, Spectral Clustering, and 2 moreImage saliency and Geodesic Distance

Download (.pdf)

Recent advances in consumer depth sensors have created many opportunities for human body measurement and modeling. Estimation of 3D body shape is particularly useful for fashion e-commerce applications such as virtual try-on or fit... more

Recent advances in consumer depth sensors have created many opportunities for human body measurement and modeling. Estimation of 3D body shape is particularly useful for fashion e-commerce applications such as virtual try-on or fit personaliza-tion. In this paper, we propose a method for capturing accurate human body shape and anthropometrics from a single consumer grade depth sensor. We first generate a large dataset of synthetic 3D human body models using real-world body size distributions. Next, we estimate key body measurements from a single monocu-lar depth image. We combine body measurement estimates with local geometry features around key joint positions to form a robust multi-dimensional feature vector. This allows us to conduct a fast nearest-neighbor search to every sample in the dataset and return the closest one. Compared to existing methods, our approach is able to predict accurate full body parameters from a partial view using measurement parameters learned from the synthetic dataset. Furthermore, our system is capable of generating 3D human mesh models in real-time, which is significantly faster than methods which attempt to model shape and pose deformations. To validate the efficiency and applicability of our system, we collected a dataset that contains frontal and back scans of 83 clothed people with ground truth height and weight. Experiments on real-world dataset show that the proposed method can achieve real-time performance with competing results achieving an average error of 1.9 cm in estimated measurements.

DOI: 10.2352/ISSN.2470-1173.2016.21.3DIPM-045

Publication Date: 2016

Publication Name: Electronic Imaging

Research Interests:
Anthropometrics, 3D Reconstruction, Virtual Reality, Kinect, Depth Cameras, and 3 more3D Human Body Reconstruction, 3D Avatars, and 3D virtual try-on

Download (.pdf)

We propose a novel approach that jointly removes reflection or translucent layer from a scene and estimates scene depth. The input data are captured via light field imaging. The problem is couched as minimizing the rank of the transmitted... more

We propose a novel approach that jointly removes reflection or translucent layer from a scene and estimates scene depth. The input data are captured via light field imaging. The problem is couched as minimizing the rank of the transmitted scene layer via Robust Principle Component Analysis (RPCA). We also impose regularization based on piece-wise smoothness, gradient sparsity, and layer independence to simultaneously recover 3D geometry of the transmitted layer. Experimental results on synthetic and real data show that our technique is robust and reliable, and can handle a broad range of challenging layer separation problems.

Publication Date: 2015

Publication Name: arxiv

Research Interests:
Computational Photography, 3D Reconstruction, Compressive Sensing, and Light Fields

Download (.pdf)

This paper describes the hardware and software components of a general-purpose humanoid robot system for autonomously driving several different types of utility vehicles. The robot recognizes which vehicle it is in, localizes itself with... more

This paper describes the hardware and software components of a general-purpose humanoid robot system for autonomously driving several different types of utility vehicles. The robot recognizes which vehicle it is in, localizes itself with respect to the dashboard, and self-aligns in order to interface with the steering wheel and accelerator pedal. Low-and higher-level methods are presented for speed control, environment perception, and trajectory planning and following suitable for operation in planar areas with discrete obstacles as well as along road-like paths.

DOI: 10.1109/IROS.2014.6942677

Publication Date: 2014

Publication Name: IEEE/RSJ International Conference on Intelligent Robots and Systems

Research Interests:
Humanoid Robotics, LiDAR, 3D Computer Vision, Autonomous driving, DARPA Robotics Challenge, and 3D Object Recognition

Download (.pdf)

The depth of field (DoF) effect is a useful tool in photography and cinematography because of its aesthetic value. However, capturing and displaying dynamic DoF effect were until recently a quality unique to expensive and bulky movie... more

The depth of field (DoF) effect is a useful tool in photography and cinematography because of its aesthetic value. However, capturing and displaying dynamic DoF effect were until recently a quality unique to expensive and bulky movie cameras. A computational approach to generate realistic DoF effects for mobile devices such as tablets is proposed. We first calibrate the rear-facing stereo cameras and rectify the stereo image pairs through FCam API, then generate a low-res disparity map using graph cuts stereo matching and subsequently upsample it via joint bilateral upsampling. Next, we generate a synthetic light field by warping the raw color image to nearby viewpoints, according to the corresponding values in the upsampled high-resolution disparity map. Finally, we render dynamic DoF effect on the tablet screen with light field rendering. The user can easily capture and generate desired DoF effects with arbitrary aperture sizes or focal depths using the tablet only, with no additional hardware or software required. The system has been examined in a variety of environments with satisfactory results, according to the subjective evaluation tests. 1 Introduction Dynamic depth of field (DoF) effect is a useful tool in photography and cinematography because of its aesthetic value. Capturing and displaying dynamic DoF effect were until recently a quality unique to expensive and bulky movie cameras. Problems such as radial distortion may also arise if the lens system is not setup properly. Recent advances in computational photography enable the user to refocus an image at any desired depth after it has been taken. The hand-held plenoptic camera 1 places a micro-lens array behind the main lens, so that each microlens image captures the scene from a slightly different viewpoint. By fusing these images together, one can generate photographs focusing at different depths. However, due to the spatial-angular tradeoff 2 of the light field camera, the resolution of the final rendered image is greatly reduced. To overcome this problem, Georgiev and Lumsdaine 3 introduced the focused plenoptic camera and significantly increased spatial resolution near the main lens focal plane. However, angular resolution is reduced and may introduce aliasing effects to the rendered image. Despite recent advances in computational light field imaging , the costs of plenoptic cameras are still high due to the complicated lens structures. Also, this complicated structure makes it difficult and expensive to integrate light field cameras into small hand-held devices like smartphones or tablets. Moreover, the huge amount of data generated by the plenop-tic camera prohibits it from performing light field rendering on video streams. To address this problem, we develop a light field rendering algorithm on mobile platforms. Because our algorithm works on regular stereo camera systems, it can be directly

DOI: 10.1117/1.JEI.23.2.023009

Publication Date: 2014

Publication Name: Journal of Electronic Imaging

Research Interests:
Computer Vision, Computational Photography, Stereo Vision (Computer Vision), Light Fields, and Depth of Field

Download (.pdf)

The Depth of Field (DoF) effect is a useful tool in photography and cinematography because of its aesthetic values. However, capturing and displaying dynamic DoF effect was until recently a quality unique to expensive and bulky movie... more

The Depth of Field (DoF) effect is a useful tool in photography and cinematography because of its aesthetic values. However, capturing and displaying dynamic DoF effect was until recently a quality unique to expensive and bulky movie cameras. In this paper, we propose a computational approach to generate realistic DoF effects for mobile devices such as tablets. We first calibrate the rear-facing stereo cameras and rectify stereo image pairs through FCam API, then generate a low-res disparity map using graph cuts stereo matching and subsequently upsample it via joint bilateral upsampling. Next we generate a synthetic light field by warping the raw color image to nearby viewpoints according to corresponding values in the upsampled high resolution disparity map. Finally, we render dynamic DoF effect on the tablet screen with light field rendering. The user can easily capture and generate desired DoF effects with arbitrary aperture sizes or focal depths using the tablet only with no additional hardware or software required. The system has been tested in a variety of environments with satisfactory results.

DOI: 10.1117/12.2038196

Publication Date: 2014

Publication Name: Electronic Imaging

Research Interests:
Computational Photography, Stereo Vision (Computer Vision), Light Field Photography, and Depth of Field

Download (.pdf)

Research Interests: Head-Tracking Interfaces, Iphones, Depth Cameras, and Head Movement Tracking<div>()</div>

Publication Date: 2018

Publication Name: IEEE Winter Conference on Applications of Computer Vision (WACV)

Research Interests: Deep Learning, Visual Saliency, Object Detection, and Convolutional Neural Networks<div>()</div>

DOI: 10.1145/3097983.3098162

Publication Date: 2017

Publication Name: ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD)

Research Interests: Computer Vision, Object Recognition (Computer Vision), Ecommerce, Search Engines, Visual Search, and Deep Learning<div>()</div>

DOI: 10.3390/jimaging3020021

Publication Date: 2017

Publication Name: Journal of Imaging

Research Interests: Object Recognition (Computer Vision), Unmanned Aerial Vehicles, Deep Learning, Object Detection, Aerial Imagery, and Convolutional Neural Networks<div>()</div>

DOI: 10.1007/978-3-319-50835-114

Publication Date: 2016

Publication Name: International Symposium on Visual Computing (ISVC)

Research Interests: Computer Vision, Object Recognition (Computer Vision), Object Tracking (Computer Vision), Deep Learning, and Convolutional Neural Networks<div>()</div>

DOI: 10.1109/CVPR.2016.64

Publication Date: 2016

Publication Name: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

DOI: 10.2352/ISSN.2470-1173.2016.21.3DIPM-045

Publication Date: 2016

Publication Name: Electronic Imaging

Publication Date: 2015

Publication Name: arxiv

Research Interests: Computational Photography, 3D Reconstruction, Compressive Sensing, and Light Fields<div>()</div>

DOI: 10.1109/IROS.2014.6942677

Publication Date: 2014

Publication Name: IEEE/RSJ International Conference on Intelligent Robots and Systems

Research Interests: Humanoid Robotics, LiDAR, 3D Computer Vision, Autonomous driving, DARPA Robotics Challenge, and 3D Object Recognition<div>()</div>

DOI: 10.1117/1.JEI.23.2.023009

Publication Date: 2014

Publication Name: Journal of Electronic Imaging

Research Interests: Computer Vision, Computational Photography, Stereo Vision (Computer Vision), Light Fields, and Depth of Field<div>()</div>

DOI: 10.1117/12.2038196

Publication Date: 2014

Publication Name: Electronic Imaging

Research Interests: Computational Photography, Stereo Vision (Computer Vision), Light Field Photography, and Depth of Field<div>()</div>

Log In

Research Interests:
Head-Tracking Interfaces, Iphones, Depth Cameras, and Head Movement Tracking

Research Interests:
Deep Learning, Visual Saliency, Object Detection, and Convolutional Neural Networks

Research Interests:
Computer Vision, Object Recognition (Computer Vision), Ecommerce, Search Engines, Visual Search, and Deep Learning

Research Interests:
Object Recognition (Computer Vision), Unmanned Aerial Vehicles, Deep Learning, Object Detection, Aerial Imagery, and Convolutional Neural Networks

Research Interests:
Computer Vision, Object Recognition (Computer Vision), Object Tracking (Computer Vision), Deep Learning, and Convolutional Neural Networks

Research Interests:
Computational Photography, 3D Reconstruction, Compressive Sensing, and Light Fields

Research Interests:
Humanoid Robotics, LiDAR, 3D Computer Vision, Autonomous driving, DARPA Robotics Challenge, and 3D Object Recognition

Research Interests:
Computer Vision, Computational Photography, Stereo Vision (Computer Vision), Light Fields, and Depth of Field

Research Interests:
Computational Photography, Stereo Vision (Computer Vision), Light Field Photography, and Depth of Field