Search | arXiv e-print repository

arXiv:2109.06122 [pdf, other]

Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering

Authors: Jihyung Kil, Cheng Zhang, Dong Xuan, Wei-Lun Chao

Abstract: Visual question answering (VQA) is challenging not only because the model has to handle multi-modal information, but also because it is just so hard to collect sufficient training examples -- there are too many questions one can ask about an image. As a result, a VQA model trained solely on human-annotated examples could easily over-fit specific question styles or image contents that are being ask… ▽ More Visual question answering (VQA) is challenging not only because the model has to handle multi-modal information, but also because it is just so hard to collect sufficient training examples -- there are too many questions one can ask about an image. As a result, a VQA model trained solely on human-annotated examples could easily over-fit specific question styles or image contents that are being asked, leaving the model largely ignorant about the sheer diversity of questions. Existing methods address this issue primarily by introducing an auxiliary task such as visual grounding, cycle consistency, or debiasing. In this paper, we take a drastically different approach. We found that many of the "unknowns" to the learned VQA model are indeed "known" in the dataset implicitly. For instance, questions asking about the same object in different images are likely paraphrases; the number of detected or annotated objects in an image already provides the answer to the "how many" question, even if the question has not been annotated for that image. Building upon these insights, we present a simple data augmentation pipeline SimpleAug to turn this "known" knowledge into training examples for VQA. We show that these augmented examples can notably improve the learned VQA models' performance, not only on the VQA-CP dataset with language prior shifts but also on the VQA v2 dataset without such shifts. Our method further opens up the door to leverage weakly-labeled or unlabeled images in a principled way to enhance VQA models. Our code and data are publicly available at https://github.com/heendung/simpleAUG. △ Less

Submitted 8 November, 2022; v1 submitted 13 September, 2021; originally announced September 2021.

Comments: Accepted to EMNLP 2021

arXiv:2107.02170 [pdf, other]

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation

Authors: Tai-Yu Pan, Cheng Zhang, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao

Abstract: Vanilla models for object detection and instance segmentation suffer from the heavy bias toward detecting frequent objects in the long-tailed setting. Existing methods address this issue mostly during training, e.g., by re-sampling or re-weighting. In this paper, we investigate a largely overlooked approach -- post-processing calibration of confidence scores. We propose NorCal, Normalized Calibrat… ▽ More Vanilla models for object detection and instance segmentation suffer from the heavy bias toward detecting frequent objects in the long-tailed setting. Existing methods address this issue mostly during training, e.g., by re-sampling or re-weighting. In this paper, we investigate a largely overlooked approach -- post-processing calibration of confidence scores. We propose NorCal, Normalized Calibration for long-tailed object detection and instance segmentation, a simple and straightforward recipe that reweighs the predicted scores of each class by its training sample size. We show that separately handling the background class and normalizing the scores over classes for each proposal are keys to achieving superior performance. On the LVIS dataset, NorCal can effectively improve nearly all the baseline models not only on rare classes but also on common and frequent classes. Finally, we conduct extensive analysis and ablation studies to offer insights into various modeling choices and mechanisms of our approach. Our code is publicly available at https://github.com/tydpan/NorCal/. △ Less

Submitted 29 November, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

Comments: Accepted to NeurIPS 2021

arXiv:2102.11171 [pdf, other]

WLAN-Log-Based Superspreader Detection in the COVID-19 Pandemic

Authors: Cheng Zhang, Yunze Pan, Yunqi Zhang, Adam C. Champion, Zhaohui Shen, Dong Xuan, Zhiqiang Lin, Ness B. Shroff

Abstract: Identifying "superspreaders" of disease is a pressing concern for society during pandemics such as COVID-19. Superspreaders represent a group of people who have much more social contacts than others. The widespread deployment of WLAN infrastructure enables non-invasive contact tracing via people's ubiquitous mobile devices. This technology offers promise for detecting superspreaders. In this paper… ▽ More Identifying "superspreaders" of disease is a pressing concern for society during pandemics such as COVID-19. Superspreaders represent a group of people who have much more social contacts than others. The widespread deployment of WLAN infrastructure enables non-invasive contact tracing via people's ubiquitous mobile devices. This technology offers promise for detecting superspreaders. In this paper, we propose a general framework for WLAN-log-based superspreader detection. In our framework, we first use WLAN logs to construct contact graphs by jointly considering human symmetric and asymmetric interactions. Next, we adopt three vertex centrality measurements over the contact graphs to generate three groups of superspreader candidates. Finally, we leverage SEIR simulation to determine groups of superspreaders among these candidates, who are the most critical individuals for the spread of disease based on the simulation results. We have implemented our framework and evaluate it over a WLAN dataset with 41 million log entries from a large-scale university. Our evaluation shows superspreaders exist on university campuses. They change over the first few weeks of a semester, but stabilize throughout the rest of the term. The data also demonstrate that both symmetric and asymmetric contact tracing can discover superspreaders, but the latter performs better with daily contact graphs. Further, the evaluation shows no consistent differences among three vertex centrality measures for long-term (i.e., weekly) contact graphs, which necessitates the inclusion of SEIR simulation in our framework. We believe our proposed framework and these results may provide timely guidance for public health administrators regarding effective testing, intervention, and vaccination policies. △ Less

Submitted 29 March, 2021; v1 submitted 22 February, 2021; originally announced February 2021.

Comments: Accepted to Elsevier High-Confidence Computing Journal

arXiv:2102.08884 [pdf, other]

MosaicOS: A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection

Authors: Cheng Zhang, Tai-Yu Pan, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao

Abstract: Many objects do not appear frequently enough in complex scenes (e.g., certain handbags in living rooms) for training an accurate object detector, but are often found frequently by themselves (e.g., in product images). Yet, these object-centric images are not effectively leveraged for improving object detection in scene-centric images. In this paper, we propose Mosaic of Object-centric images as Sc… ▽ More Many objects do not appear frequently enough in complex scenes (e.g., certain handbags in living rooms) for training an accurate object detector, but are often found frequently by themselves (e.g., in product images). Yet, these object-centric images are not effectively leveraged for improving object detection in scene-centric images. In this paper, we propose Mosaic of Object-centric images as Scene-centric images (MosaicOS), a simple and novel framework that is surprisingly effective at tackling the challenges of long-tailed object detection. Keys to our approach are three-fold: (i) pseudo scene-centric image construction from object-centric images for mitigating domain differences, (ii) high-quality bounding box imputation using the object-centric images' class labels, and (iii) a multi-stage training procedure. On LVIS object detection (and instance segmentation), MosaicOS leads to a massive 60% (and 23%) relative improvement in average precision for rare object categories. We also show that our framework can be compatibly used with other existing approaches to achieve even further gains. Our pre-trained models are publicly available at https://github.com/czhang0528/MosaicOS/. △ Less

Submitted 13 September, 2021; v1 submitted 17 February, 2021; originally announced February 2021.

Comments: Accepted to ICCV 2021

arXiv:2006.13362 [pdf, other]

ACOUSTIC-TURF: Acoustic-based Privacy-Preserving COVID-19 Contact Tracing

Authors: Yuxiang Luo, Cheng Zhang, Yunqi Zhang, Chaoshun Zuo, Dong Xuan, Zhiqiang Lin, Adam C. Champion, Ness Shroff

Abstract: In this paper, we propose a new privacy-preserving, automated contact tracing system, ACOUSTIC-TURF, to fight COVID-19 using acoustic signals sent from ubiquitous mobile devices. At a high level, ACOUSTIC-TURF adaptively broadcasts inaudible ultrasonic signals with randomly generated IDs in the vicinity. Simultaneously, the system receives other ultrasonic signals sent from nearby (e.g., 6 feet) u… ▽ More In this paper, we propose a new privacy-preserving, automated contact tracing system, ACOUSTIC-TURF, to fight COVID-19 using acoustic signals sent from ubiquitous mobile devices. At a high level, ACOUSTIC-TURF adaptively broadcasts inaudible ultrasonic signals with randomly generated IDs in the vicinity. Simultaneously, the system receives other ultrasonic signals sent from nearby (e.g., 6 feet) users. In such a system, individual user IDs are not disclosed to others and the system can accurately detect encounters in physical proximity with 6-foot granularity. We have implemented a prototype of ACOUSTIC-TURF on Android and evaluated its performance in terms of acoustic-signal-based encounter detection accuracy and power consumption at different ranges and under various occlusion scenarios. Experimental results show that ACOUSTIC-TURF can detect multiple contacts within a 6-foot range for mobile phones placed in pockets and outside pockets. Furthermore, our acoustic-signal-based system achieves greater precision than wireless-signal-based approaches when contact tracing is performed through walls. ACOUSTIC-TURF correctly determines that people on opposite sides of a wall are not in contact with one another, whereas the Bluetooth-based approaches detect nonexistent contacts among them. △ Less

Submitted 23 June, 2020; originally announced June 2020.

arXiv:1907.12133 [pdf, other]

An Empirical Study on Leveraging Scene Graphs for Visual Question Answering

Authors: Cheng Zhang, Wei-Lun Chao, Dong Xuan

Abstract: Visual question answering (Visual QA) has attracted significant attention these years. While a variety of algorithms have been proposed, most of them are built upon different combinations of image and language features as well as multi-modal attention and fusion. In this paper, we investigate an alternative approach inspired by conventional QA systems that operate on knowledge graphs. Specifically… ▽ More Visual question answering (Visual QA) has attracted significant attention these years. While a variety of algorithms have been proposed, most of them are built upon different combinations of image and language features as well as multi-modal attention and fusion. In this paper, we investigate an alternative approach inspired by conventional QA systems that operate on knowledge graphs. Specifically, we investigate the use of scene graphs derived from images for Visual QA: an image is abstractly represented by a graph with nodes corresponding to object entities and edges to object relationships. We adapt the recently proposed graph network (GN) to encode the scene graph and perform structured reasoning according to the input question. Our empirical studies demonstrate that scene graphs can already capture essential information of images and graph networks have the potential to outperform state-of-the-art Visual QA algorithms but with a much cleaner architecture. By analyzing the features generated by GNs we can further interpret the reasoning process, suggesting a promising direction towards explainable Visual QA. △ Less

Submitted 28 July, 2019; originally announced July 2019.

Comments: Accepted as oral presentation at BMVC 2019

arXiv:1711.07687 [pdf, ps, other]

Optimal Task Allocation in Near-Far Computing Enhanced C-RAN for Wireless Big Data Processing

Authors: Lianming Zhang, Kezhi Wang, Du Xuan, Kun Yang

Abstract: With the increasing popularity of user equipments (UEs), the corresponding UEs' generating big data (UGBD) is also growing substantially, which makes both UEs and current network structures struggling in processing those data and applications. This paper proposes a Near-Far Computing Enhanced C-RAN (NFC-RAN) architecture, which can better process big data and its corresponding applications. NFC-RA… ▽ More With the increasing popularity of user equipments (UEs), the corresponding UEs' generating big data (UGBD) is also growing substantially, which makes both UEs and current network structures struggling in processing those data and applications. This paper proposes a Near-Far Computing Enhanced C-RAN (NFC-RAN) architecture, which can better process big data and its corresponding applications. NFC-RAN is composed of near edge computing (NEC) and far edge computing (FEC) units. NEC is located in remote radio head (RRH), which can fast respond to delay sensitive tasks from the UEs, while FEC sits next to baseband unit (BBU) pool which can do other computational intensive tasks. The task allocation between NEC or FEC is introduced in this paper. Also WiFi indoor positioning is illustrated as a case study of the proposed architecture. Moreover, simulation and experiment results are provided to show the effectiveness of the proposed task allocation and architecture. △ Less

Submitted 21 November, 2017; originally announced November 2017.

Comments: Accepted by IEEE Wireless Communications

arXiv:1501.02484 [pdf, other]

Crowd-ML: A Privacy-Preserving Learning Framework for a Crowd of Smart Devices

Authors: Jihun Hamm, Adam Champion, Guoxing Chen, Mikhail Belkin, Dong Xuan

Abstract: Smart devices with built-in sensors, computational capabilities, and network connectivity have become increasingly pervasive. The crowds of smart devices offer opportunities to collectively sense and perform computing tasks in an unprecedented scale. This paper presents Crowd-ML, a privacy-preserving machine learning framework for a crowd of smart devices, which can solve a wide range of learning… ▽ More Smart devices with built-in sensors, computational capabilities, and network connectivity have become increasingly pervasive. The crowds of smart devices offer opportunities to collectively sense and perform computing tasks in an unprecedented scale. This paper presents Crowd-ML, a privacy-preserving machine learning framework for a crowd of smart devices, which can solve a wide range of learning problems for crowdsensing data with differential privacy guarantees. Crowd-ML endows a crowdsensing system with an ability to learn classifiers or predictors online from crowdsensing data privately with minimal computational overheads on devices and servers, suitable for a practical and large-scale employment of the framework. We analyze the performance and the scalability of Crowd-ML, and implement the system with off-the-shelf smartphones as a proof of concept. We demonstrate the advantages of Crowd-ML with real and simulated experiments under various conditions. △ Less

Submitted 11 January, 2015; originally announced January 2015.

arXiv:1308.2950

BlueSky: Realizing Buried Potential of Bluetooth to Sustain a Large-scale Multi-hop Network

Authors: Xinfeng Li, Chenshu Wu, Xiaoyuan Wang, Ming Gu, Xiang-Yang Li, Dong Xuan

Abstract: Traditionally, Bluetooth has been deemed unsuitable for sustaining a large-scale multi-hop network. There are two main reasons: severe frequency channel collisions under a large-scale network and high complexity of designing an efficient formation protocol. In this work, we reconsider this viewpoint from a practical usability perspective and aim to realize the buried potential of Bluetooth. Firstl… ▽ More Traditionally, Bluetooth has been deemed unsuitable for sustaining a large-scale multi-hop network. There are two main reasons: severe frequency channel collisions under a large-scale network and high complexity of designing an efficient formation protocol. In this work, we reconsider this viewpoint from a practical usability perspective and aim to realize the buried potential of Bluetooth. Firstly, we find that the collision probability under a low-overhead network is fairly small, which is acceptable for practical applications. Secondly, we propose BlueSky, a complete system solution to provide necessary networking functionalities for Bluetooth. In BlueSky, we develop a connection maintenance mechanism for mitigating the influence of collisions and a network formation protocol for reliable packet transmissions. We implement BlueSky on Windows Mobile using 100 commercial smartphones. Comprehensive usability evaluations demonstrate the negligible overheads of BlueSky and its good network performance. In particular, 90%-95% of the whole 100 nodes can participate in the communication smoothly. △ Less

Submitted 22 November, 2013; v1 submitted 13 August, 2013; originally announced August 2013.

Comments: We need to improve this paper further

Showing 1–9 of 9 results for author: Xuan, D