-
Real Risks of Fake Data: Synthetic Data, Diversity-Washing and Consent Circumvention
Authors:
Cedric Deslandes Whitney,
Justin Norman
Abstract:
Machine learning systems require representations of the real world for training and testing - they require data, and lots of it. Collecting data at scale has logistical and ethical challenges, and synthetic data promises a solution to these challenges. Instead of needing to collect photos of real people's faces to train a facial recognition system, a model creator could create and use photo-realis…
▽ More
Machine learning systems require representations of the real world for training and testing - they require data, and lots of it. Collecting data at scale has logistical and ethical challenges, and synthetic data promises a solution to these challenges. Instead of needing to collect photos of real people's faces to train a facial recognition system, a model creator could create and use photo-realistic, synthetic faces. The comparative ease of generating this synthetic data rather than relying on collecting data has made it a common practice. We present two key risks of using synthetic data in model development. First, we detail the high risk of false confidence when using synthetic data to increase dataset diversity and representation. We base this in the examination of a real world use-case of synthetic data, where synthetic datasets were generated for an evaluation of facial recognition technology. Second, we examine how using synthetic data risks circumventing consent for data usage. We illustrate this by considering the importance of consent to the U.S. Federal Trade Commission's regulation of data collection and affected models. Finally, we discuss how these two risks exemplify how synthetic data complicates existing governance and ethical practice; by decoupling data from those it impacts, synthetic data is prone to consolidating power away those most impacted by algorithmically-mediated harm.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
VEATIC: Video-based Emotion and Affect Tracking in Context Dataset
Authors:
Zhihang Ren,
Jefferson Ortega,
Yifan Wang,
Zhimin Chen,
Yunhui Guo,
Stella X. Yu,
David Whitney
Abstract:
Human affect recognition has been a significant topic in psychophysics and computer vision. However, the currently published datasets have many limitations. For example, most datasets contain frames that contain only information about facial expressions. Due to the limitations of previous datasets, it is very hard to either understand the mechanisms for affect recognition of humans or generalize w…
▽ More
Human affect recognition has been a significant topic in psychophysics and computer vision. However, the currently published datasets have many limitations. For example, most datasets contain frames that contain only information about facial expressions. Due to the limitations of previous datasets, it is very hard to either understand the mechanisms for affect recognition of humans or generalize well on common cases for computer vision models trained on those datasets. In this work, we introduce a brand new large dataset, the Video-based Emotion and Affect Tracking in Context Dataset (VEATIC), that can conquer the limitations of the previous datasets. VEATIC has 124 video clips from Hollywood movies, documentaries, and home videos with continuous valence and arousal ratings of each frame via real-time annotation. Along with the dataset, we propose a new computer vision task to infer the affect of the selected character via both context and character information in each video frame. Additionally, we propose a simple model to benchmark this new computer vision task. We also compare the performance of the pretrained model using our dataset with other similar datasets. Experiments show the competing results of our pretrained model via VEATIC, indicating the generalizability of VEATIC. Our dataset is available at https://veatic.github.io.
△ Less
Submitted 14 September, 2023; v1 submitted 13 September, 2023;
originally announced September 2023.
-
Situated and Interactive Multimodal Conversations
Authors:
Seungwhan Moon,
Satwik Kottur,
Paul A. Crook,
Ankita De,
Shivani Poddar,
Theodore Levin,
David Whitney,
Daniel Difranco,
Ahmad Beirami,
Eunjoon Cho,
Rajen Subba,
Alborz Geramifard
Abstract:
Next generation virtual assistants are envisioned to handle multimodal inputs (e.g., vision, memories of previous interactions, in addition to the user's utterances), and perform multimodal actions (e.g., displaying a route in addition to generating the system's utterance). We introduce Situated Interactive MultiModal Conversations (SIMMC) as a new direction aimed at training agents that take mult…
▽ More
Next generation virtual assistants are envisioned to handle multimodal inputs (e.g., vision, memories of previous interactions, in addition to the user's utterances), and perform multimodal actions (e.g., displaying a route in addition to generating the system's utterance). We introduce Situated Interactive MultiModal Conversations (SIMMC) as a new direction aimed at training agents that take multimodal actions grounded in a co-evolving multimodal input context in addition to the dialog history. We provide two SIMMC datasets totalling ~13K human-human dialogs (~169K utterances) using a multimodal Wizard-of-Oz (WoZ) setup, on two shopping domains: (a) furniture (grounded in a shared virtual environment) and, (b) fashion (grounded in an evolving set of images). We also provide logs of the items appearing in each scene, and contextual NLU and coreference annotations, using a novel and unified framework of SIMMC conversational acts for both user and assistant utterances. Finally, we present several tasks within SIMMC as objective evaluation protocols, such as Structural API Prediction and Response Generation. We benchmark a collection of existing models on these SIMMC tasks as strong baselines, and demonstrate rich multimodal conversational interactions. Our data, annotations, code, and models are publicly available.
△ Less
Submitted 10 November, 2020; v1 submitted 2 June, 2020;
originally announced June 2020.
-
SIMMC: Situated Interactive Multi-Modal Conversational Data Collection And Evaluation Platform
Authors:
Paul A. Crook,
Shivani Poddar,
Ankita De,
Semir Shafi,
David Whitney,
Alborz Geramifard,
Rajen Subba
Abstract:
As digital virtual assistants become ubiquitous, it becomes increasingly important to understand the situated behaviour of users as they interact with these assistants. To this end, we introduce SIMMC, an extension to ParlAI for multi-modal conversational data collection and system evaluation. SIMMC simulates an immersive setup, where crowd workers are able to interact with environments constructe…
▽ More
As digital virtual assistants become ubiquitous, it becomes increasingly important to understand the situated behaviour of users as they interact with these assistants. To this end, we introduce SIMMC, an extension to ParlAI for multi-modal conversational data collection and system evaluation. SIMMC simulates an immersive setup, where crowd workers are able to interact with environments constructed in AI Habitat or Unity while engaging in a conversation. The assistant in SIMMC can be a crowd worker or Artificial Intelligent (AI) agent. This enables both (i) a multi-player / Wizard of Oz setting for data collection, or (ii) a single player mode for model / system evaluation. We plan to open-source a situated conversational data-set collected on this platform for the Conversational AI research community.
△ Less
Submitted 30 January, 2020; v1 submitted 6 November, 2019;
originally announced November 2019.
-
Periphery-Fovea Multi-Resolution Driving Model guided by Human Attention
Authors:
Ye Xia,
Jinkyu Kim,
John Canny,
Karl Zipser,
David Whitney
Abstract:
Inspired by human vision, we propose a new periphery-fovea multi-resolution driving model that predicts vehicle speed from dash camera videos. The peripheral vision module of the model processes the full video frames in low resolution. Its foveal vision module selects sub-regions and uses high-resolution input from those regions to improve its driving performance. We train the fovea selection modu…
▽ More
Inspired by human vision, we propose a new periphery-fovea multi-resolution driving model that predicts vehicle speed from dash camera videos. The peripheral vision module of the model processes the full video frames in low resolution. Its foveal vision module selects sub-regions and uses high-resolution input from those regions to improve its driving performance. We train the fovea selection module with supervision from driver gaze. We show that adding high-resolution input from predicted human driver gaze locations significantly improves the driving accuracy of the model. Our periphery-fovea multi-resolution model outperforms a uni-resolution periphery-only model that has the same amount of floating-point operations. More importantly, we demonstrate that our driving model achieves a significantly higher performance gain in pedestrian-involved critical situations than in other non-critical situations.
△ Less
Submitted 24 March, 2019;
originally announced March 2019.
-
Learning Robust Dialog Policies in Noisy Environments
Authors:
Maryam Fazel-Zarandi,
Shang-Wen Li,
Jin Cao,
Jared Casale,
Peter Henderson,
David Whitney,
Alborz Geramifard
Abstract:
Modern virtual personal assistants provide a convenient interface for completing daily tasks via voice commands. An important consideration for these assistants is the ability to recover from automatic speech recognition (ASR) and natural language understanding (NLU) errors. In this paper, we focus on learning robust dialog policies to recover from these errors. To this end, we develop a user simu…
▽ More
Modern virtual personal assistants provide a convenient interface for completing daily tasks via voice commands. An important consideration for these assistants is the ability to recover from automatic speech recognition (ASR) and natural language understanding (NLU) errors. In this paper, we focus on learning robust dialog policies to recover from these errors. To this end, we develop a user simulator which interacts with the assistant through voice commands in realistic scenarios with noisy audio, and use it to learn dialog policies through deep reinforcement learning. We show that dialogs generated by our simulator are indistinguishable from human generated dialogs, as determined by human evaluators. Furthermore, preliminary experimental results show that the learned policies in noisy environments achieve the same execution success rate with fewer dialog turns compared to fixed rule-based policies.
△ Less
Submitted 11 December, 2017;
originally announced December 2017.
-
Predicting Driver Attention in Critical Situations
Authors:
Ye Xia,
Danqing Zhang,
Jinkyu Kim,
Ken Nakayama,
Karl Zipser,
David Whitney
Abstract:
Robust driver attention prediction for critical situations is a challenging computer vision problem, yet essential for autonomous driving. Because critical driving moments are so rare, collecting enough data for these situations is difficult with the conventional in-car data collection protocol---tracking eye movements during driving. Here, we first propose a new in-lab driver attention collection…
▽ More
Robust driver attention prediction for critical situations is a challenging computer vision problem, yet essential for autonomous driving. Because critical driving moments are so rare, collecting enough data for these situations is difficult with the conventional in-car data collection protocol---tracking eye movements during driving. Here, we first propose a new in-lab driver attention collection protocol and introduce a new driver attention dataset, Berkeley DeepDrive Attention (BDD-A) dataset, which is built upon braking event videos selected from a large-scale, crowd-sourced driving video dataset. We further propose Human Weighted Sampling (HWS) method, which uses human gaze behavior to identify crucial frames of a driving dataset and weights them heavily during model training. With our dataset and HWS, we built a driver attention prediction model that outperforms the state-of-the-art and demonstrates sophisticated behaviors, like attending to crossing pedestrians but not giving false alarms to pedestrians safely walking on the sidewalk. Its prediction results are nearly indistinguishable from ground-truth to humans. Although only being trained with our in-lab attention data, the model also predicts in-car driver attention data of routine driving with state-of-the-art accuracy. This result not only demonstrates the performance of our model but also proves the validity and usefulness of our dataset and data collection protocol.
△ Less
Submitted 5 December, 2018; v1 submitted 16 November, 2017;
originally announced November 2017.
-
Communicating Robot Arm Motion Intent Through Mixed Reality Head-mounted Displays
Authors:
Eric Rosen,
David Whitney,
Elizabeth Phillips,
Gary Chien,
James Tompkin,
George Konidaris,
Stefanie Tellex
Abstract:
Efficient motion intent communication is necessary for safe and collaborative work environments with collocated humans and robots. Humans efficiently communicate their motion intent to other humans through gestures, gaze, and social cues. However, robots often have difficulty efficiently communicating their motion intent to humans via these methods. Many existing methods for robot motion intent co…
▽ More
Efficient motion intent communication is necessary for safe and collaborative work environments with collocated humans and robots. Humans efficiently communicate their motion intent to other humans through gestures, gaze, and social cues. However, robots often have difficulty efficiently communicating their motion intent to humans via these methods. Many existing methods for robot motion intent communication rely on 2D displays, which require the human to continually pause their work and check a visualization. We propose a mixed reality head-mounted display visualization of the proposed robot motion over the wearer's real-world view of the robot and its environment. To evaluate the effectiveness of this system against a 2D display visualization and against no visualization, we asked 32 participants to labeled different robot arm motions as either colliding or non-colliding with blocks on a table. We found a 16% increase in accuracy with a 62% decrease in the time it took to complete the task compared to the next best system. This demonstrates that a mixed-reality HMD allows a human to more quickly and accurately tell where the robot is going to move than the compared baselines.
△ Less
Submitted 11 August, 2017;
originally announced August 2017.
-
Asymmetry in in-degree and out-degree distributions of large-scale industrial networks
Authors:
Jianxi Luo,
Daniel E. Whitney
Abstract:
Many natural, physical and social networks commonly exhibit power-law degree distributions. In this paper, we discover previously unreported asymmetrical patterns in the degree distributions of incoming and outgoing links in the investigation of large-scale industrial networks, and provide interpretations. In industrial networks, nodes are firms and links are directed supplier-customer relationshi…
▽ More
Many natural, physical and social networks commonly exhibit power-law degree distributions. In this paper, we discover previously unreported asymmetrical patterns in the degree distributions of incoming and outgoing links in the investigation of large-scale industrial networks, and provide interpretations. In industrial networks, nodes are firms and links are directed supplier-customer relationships. While both in- and out-degree distributions have "power law" regimes, out-degree distribution decays faster than in-degree distribution and crosses it at a consistent nodal degree. It implies that, as link degree increases, the constraints to the capacity for designing, producing and transmitting artifacts out to others grow faster than and surpasses those for acquiring, absorbing and synthesizing artifacts provided from others. We further discover that this asymmetry in decaying rates of in-degree and out-degree distributions is smaller in networks that process and transmit more decomposable artifacts, e.g. informational artifacts in contrast with physical artifacts. This asymmetry in in-degree and out-degree distributions is likely to hold for other directed networks, but to different degrees, depending on the decomposability of the processed and transmitted artifacts.
△ Less
Submitted 16 July, 2015;
originally announced July 2015.
-
Curiosity Based Exploration for Learning Terrain Models
Authors:
Yogesh Girdhar,
David Whitney,
Gregory Dudek
Abstract:
We present a robotic exploration technique in which the goal is to learn to a visual model and be able to distinguish between different terrains and other visual components in an unknown environment. We use ROST, a realtime online spatiotemporal topic modeling framework to model these terrains using the observations made by the robot, and then use an information theoretic path planning technique t…
▽ More
We present a robotic exploration technique in which the goal is to learn to a visual model and be able to distinguish between different terrains and other visual components in an unknown environment. We use ROST, a realtime online spatiotemporal topic modeling framework to model these terrains using the observations made by the robot, and then use an information theoretic path planning technique to define the exploration path. We conduct experiments with aerial view and underwater datasets with millions of observations and varying path lengths, and find that paths that are biased towards locations with high topic perplexity produce better terrain models with high discriminative power, especially with paths of length close to the diameter of the world.
△ Less
Submitted 24 October, 2013;
originally announced October 2013.
-
Growth Patterns of Subway/Metro Systems Tracked by Degree Correlation
Authors:
Daniel E. Whitney
Abstract:
Urban transportation systems grow over time as city populations grow and move and their transportation needs evolve. Typical network growth models, such as preferential attachment, grow the network node by node whereas rail and metro systems grow by adding entire lines with all their nodes. The objective of this paper is to see if any canonical regular network forms such as stars or grids capture…
▽ More
Urban transportation systems grow over time as city populations grow and move and their transportation needs evolve. Typical network growth models, such as preferential attachment, grow the network node by node whereas rail and metro systems grow by adding entire lines with all their nodes. The objective of this paper is to see if any canonical regular network forms such as stars or grids capture the growth patterns of urban metro systems for which we have historical data in terms of old maps. Data from these maps reveal that the systems' Pearson degree correlation grows increasingly from initially negative values toward positive values over time and in some cases becomes decidedly positive. We have derived closed form expressions for degree correlation and clustering coefficient for a variety of canonical forms that might be similar to metro systems. Of all those examined, only a few types patterned after a wide area network (WAN) with a "core-periphery" structure show similar positive-trending degree correlation as network size increases. This suggests that large metro systems either are designed or evolve into the equivalent of message carriers that seek to balance travel between arbitrary node-destination pairs with avoidance of congestion in the central regions of the network.
Keywords: metro, subway, urban transport networks, degree correlation
△ Less
Submitted 14 February, 2012; v1 submitted 8 February, 2012;
originally announced February 2012.