Search | arXiv e-print repository

arXiv:2407.20980 [pdf, other]

Impact of Conflicting Transactions in Blockchain: Detecting and Mitigating Potential Attacks

Authors: Faisal Haque Bappy, Kamrul Hasan, Joon S. Park, Carlos Caicedo, Tariqul Islam

Abstract: Conflicting transactions within blockchain networks not only pose performance challenges but also introduce security vulnerabilities, potentially facilitating malicious attacks. In this paper, we explore the impact of conflicting transactions on blockchain attack vectors. Through modeling and simulation, we delve into the dynamics of four pivotal attacks - block withholding, double spending, balan… ▽ More Conflicting transactions within blockchain networks not only pose performance challenges but also introduce security vulnerabilities, potentially facilitating malicious attacks. In this paper, we explore the impact of conflicting transactions on blockchain attack vectors. Through modeling and simulation, we delve into the dynamics of four pivotal attacks - block withholding, double spending, balance, and distributed denial of service (DDoS), all orchestrated using conflicting transactions. Our analysis not only focuses on the mechanisms through which these attacks exploit transaction conflicts but also underscores their potential impact on the integrity and reliability of blockchain networks. Additionally, we propose a set of countermeasures for mitigating these attacks. Through implementation and evaluation, we show their effectiveness in lowering attack rates and enhancing overall network performance seamlessly, without introducing additional overhead. Our findings emphasize the critical importance of actively managing conflicting transactions to reinforce blockchain security and performance. △ Less

Submitted 30 July, 2024; originally announced July 2024.

arXiv:2407.01942 [pdf, other]

Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness

Authors: Khyathi Raghavi Chandu, Linjie Li, Anas Awadalla, Ximing Lu, Jae Sung Park, Jack Hessel, Lijuan Wang, Yejin Choi

Abstract: The ability to acknowledge the inevitable uncertainty in their knowledge and reasoning is a prerequisite for AI systems to be truly truthful and reliable. In this paper, we present a taxonomy of uncertainty specific to vision-language AI systems, distinguishing between epistemic uncertainty (arising from a lack of information) and aleatoric uncertainty (due to inherent unpredictability), and furth… ▽ More The ability to acknowledge the inevitable uncertainty in their knowledge and reasoning is a prerequisite for AI systems to be truly truthful and reliable. In this paper, we present a taxonomy of uncertainty specific to vision-language AI systems, distinguishing between epistemic uncertainty (arising from a lack of information) and aleatoric uncertainty (due to inherent unpredictability), and further explore finer categories within. Based on this taxonomy, we synthesize a benchmark dataset, CertainlyUncertain, featuring 178K visual question answering (VQA) samples as contrastive pairs. This is achieved by 1) inpainting images to make previously answerable questions into unanswerable ones; and 2) using image captions to prompt large language models for both answerable and unanswerable questions. Additionally, we introduce a new metric confidence-weighted accuracy, that is well correlated with both accuracy and calibration error, to address the shortcomings of existing metrics. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 26 pages

arXiv:2405.18400 [pdf, other]

Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

Authors: Ethan Shen, Alan Fan, Sarah M. Pratt, Jae Sung Park, Matthew Wallingford, Sham M. Kakade, Ari Holtzman, Ranjay Krishna, Ali Farhadi, Aditya Kusupati

Abstract: Many applications today provide users with multiple auto-complete drafts as they type, including GitHub's code completion, Gmail's smart compose, and Apple's messaging auto-suggestions. Under the hood, language models support this by running an autoregressive inference pass to provide a draft. Consequently, providing $k$ drafts to the user requires running an expensive language model $k$ times. To… ▽ More Many applications today provide users with multiple auto-complete drafts as they type, including GitHub's code completion, Gmail's smart compose, and Apple's messaging auto-suggestions. Under the hood, language models support this by running an autoregressive inference pass to provide a draft. Consequently, providing $k$ drafts to the user requires running an expensive language model $k$ times. To alleviate the computation cost of running $k$ inference passes, we propose Superposed Decoding, a new decoding algorithm that generates $k$ drafts at the computation cost of one autoregressive inference pass. We achieve this by feeding a superposition of the most recent token embeddings from the $k$ drafts as input to the next decoding step of the language model. At every inference step we combine the $k$ drafts with the top-$k$ tokens to get $k^2$ new drafts and cache the $k$ most likely options, using an n-gram interpolation with minimal compute overhead to filter out incoherent generations. Our experiments show that $k$ drafts from Superposed Decoding are at least as coherent and factual as Nucleus Sampling and Greedy Decoding respectively, while being at least $2.44\times$ faster for $k\ge3$. In a compute-normalized setting, user evaluations demonstrably favor text generated by Superposed Decoding over Nucleus Sampling. Code and more examples open-sourced at https://github.com/RAIVNLab/SuperposedDecoding. △ Less

Submitted 24 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: 22 pages, 15 figures

arXiv:2403.10748 [pdf, other]

A Comprehensive Review of Latent Space Dynamics Identification Algorithms for Intrusive and Non-Intrusive Reduced-Order-Modeling

Authors: Christophe Bonneville, Xiaolong He, April Tran, Jun Sur Park, William Fries, Daniel A. Messenger, Siu Wun Cheung, Yeonjong Shin, David M. Bortz, Debojyoti Ghosh, Jiun-Shyan Chen, Jonathan Belof, Youngsoo Choi

Abstract: Numerical solvers of partial differential equations (PDEs) have been widely employed for simulating physical systems. However, the computational cost remains a major bottleneck in various scientific and engineering applications, which has motivated the development of reduced-order models (ROMs). Recently, machine-learning-based ROMs have gained significant popularity and are promising for addressi… ▽ More Numerical solvers of partial differential equations (PDEs) have been widely employed for simulating physical systems. However, the computational cost remains a major bottleneck in various scientific and engineering applications, which has motivated the development of reduced-order models (ROMs). Recently, machine-learning-based ROMs have gained significant popularity and are promising for addressing some limitations of traditional ROM methods, especially for advection dominated systems. In this chapter, we focus on a particular framework known as Latent Space Dynamics Identification (LaSDI), which transforms the high-fidelity data, governed by a PDE, to simpler and low-dimensional latent-space data, governed by ordinary differential equations (ODEs). These ODEs can be learned and subsequently interpolated to make ROM predictions. Each building block of LaSDI can be easily modulated depending on the application, which makes the LaSDI framework highly flexible. In particular, we present strategies to enforce the laws of thermodynamics into LaSDI models (tLaSDI), enhance robustness in the presence of noise through the weak form (WLaSDI), select high-fidelity training data efficiently through active learning (gLaSDI, GPLaSDI), and quantify the ROM prediction uncertainty through Gaussian processes (GPLaSDI). We demonstrate the performance of different LaSDI approaches on Burgers equation, a non-linear heat conduction problem, and a plasma physics problem, showing that LaSDI algorithms can achieve relative errors of less than a few percent and up to thousands of times speed-ups. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.05848 [pdf, other]

tLaSDI: Thermodynamics-informed latent space dynamics identification

Authors: Jun Sur Richard Park, Siu Wun Cheung, Youngsoo Choi, Yeonjong Shin

Abstract: We propose a latent space dynamics identification method, namely tLaSDI, that embeds the first and second principles of thermodynamics. The latent variables are learned through an autoencoder as a nonlinear dimension reduction model. The latent dynamics are constructed by a neural network-based model that precisely preserves certain structures for the thermodynamic laws through the GENERIC formali… ▽ More We propose a latent space dynamics identification method, namely tLaSDI, that embeds the first and second principles of thermodynamics. The latent variables are learned through an autoencoder as a nonlinear dimension reduction model. The latent dynamics are constructed by a neural network-based model that precisely preserves certain structures for the thermodynamic laws through the GENERIC formalism. An abstract error estimate is established, which provides a new loss formulation involving the Jacobian computation of autoencoder. The autoencoder and the latent dynamics are simultaneously trained to minimize the new loss. Computational examples demonstrate the effectiveness of tLaSDI, which exhibits robust generalization ability, even in extrapolation. In addition, an intriguing correlation is empirically observed between a quantity from tLaSDI in the latent space and the behaviors of the full-state solution. △ Less

Submitted 21 March, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

Comments: 32 pages, 8 figures

arXiv:2401.03568 [pdf, other]

Agent AI: Surveying the Horizons of Multimodal Interaction

Authors: Zane Durante, Qiuyuan Huang, Naoki Wake, Ran Gong, Jae Sung Park, Bidipta Sarkar, Rohan Taori, Yusuke Noda, Demetri Terzopoulos, Yejin Choi, Katsushi Ikeuchi, Hoi Vo, Li Fei-Fei, Jianfeng Gao

Abstract: Multi-modal AI systems will likely become a ubiquitous presence in our everyday lives. A promising approach to making these systems more interactive is to embody them as agents within physical and virtual environments. At present, systems leverage existing foundation models as the basic building blocks for the creation of embodied agents. Embedding agents within such environments facilitates the a… ▽ More Multi-modal AI systems will likely become a ubiquitous presence in our everyday lives. A promising approach to making these systems more interactive is to embody them as agents within physical and virtual environments. At present, systems leverage existing foundation models as the basic building blocks for the creation of embodied agents. Embedding agents within such environments facilitates the ability of models to process and interpret visual and contextual data, which is critical for the creation of more sophisticated and context-aware AI systems. For example, a system that can perceive user actions, human behavior, environmental objects, audio expressions, and the collective sentiment of a scene can be used to inform and direct agent responses within the given environment. To accelerate research on agent-based multimodal intelligence, we define "Agent AI" as a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data, and can produce meaningful embodied actions. In particular, we explore systems that aim to improve agents based on next-embodied action prediction by incorporating external knowledge, multi-sensory inputs, and human feedback. We argue that by developing agentic AI systems in grounded environments, one can also mitigate the hallucinations of large foundation models and their tendency to generate environmentally incorrect outputs. The emerging field of Agent AI subsumes the broader embodied and agentic aspects of multimodal interactions. Beyond agents acting and interacting in the physical world, we envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment. △ Less

Submitted 25 January, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

arXiv:2312.04837 [pdf, other]

Localized Symbolic Knowledge Distillation for Visual Commonsense Models

Authors: Jae Sung Park, Jack Hessel, Khyathi Raghavi Chandu, Paul Pu Liang, Ximing Lu, Peter West, Youngjae Yu, Qiuyuan Huang, Jianfeng Gao, Ali Farhadi, Yejin Choi

Abstract: Instruction following vision-language (VL) models offer a flexible interface that supports a broad range of multimodal tasks in a zero-shot fashion. However, interfaces that operate on full images do not directly enable the user to "point to" and access specific regions within images. This capability is important not only to support reference-grounded VL benchmarks, but also, for practical applica… ▽ More Instruction following vision-language (VL) models offer a flexible interface that supports a broad range of multimodal tasks in a zero-shot fashion. However, interfaces that operate on full images do not directly enable the user to "point to" and access specific regions within images. This capability is important not only to support reference-grounded VL benchmarks, but also, for practical applications that require precise within-image reasoning. We build Localized Visual Commonsense models, which allow users to specify (multiple) regions as input. We train our model by sampling localized commonsense knowledge from a large language model (LLM): specifically, we prompt an LLM to collect commonsense knowledge given a global literal image description and a local literal region description automatically generated by a set of VL models. With a separately trained critic model that selects high-quality examples, we find that training on the localized commonsense corpus can successfully distill existing VL models to support a reference-as-input interface. Empirical results and human evaluations in a zero-shot setup demonstrate that our distillation method results in more precise VL models of reasoning compared to a baseline of passing a generated referring expression to an LLM. △ Less

Submitted 12 December, 2023; v1 submitted 8 December, 2023; originally announced December 2023.

Comments: Neurips 2023

arXiv:2311.04287 [pdf, other]

Holistic Evaluation of Text-To-Image Models

Authors: Tony Lee, Michihiro Yasunaga, Chenlin Meng, Yifan Mai, Joon Sung Park, Agrim Gupta, Yunzhi Zhang, Deepak Narayanan, Hannah Benita Teufel, Marco Bellagente, Minguk Kang, Taesung Park, Jure Leskovec, Jun-Yan Zhu, Li Fei-Fei, Jiajun Wu, Stefano Ermon, Percy Liang

Abstract: The stunning qualitative improvement of recent text-to-image models has led to their widespread attention and adoption. However, we lack a comprehensive quantitative understanding of their capabilities and risks. To fill this gap, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM). Whereas previous evaluations focus mostly on text-image alignment and image quality, we… ▽ More The stunning qualitative improvement of recent text-to-image models has led to their widespread attention and adoption. However, we lack a comprehensive quantitative understanding of their capabilities and risks. To fill this gap, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM). Whereas previous evaluations focus mostly on text-image alignment and image quality, we identify 12 aspects, including text-image alignment, image quality, aesthetics, originality, reasoning, knowledge, bias, toxicity, fairness, robustness, multilinguality, and efficiency. We curate 62 scenarios encompassing these aspects and evaluate 26 state-of-the-art text-to-image models on this benchmark. Our results reveal that no single model excels in all aspects, with different models demonstrating different strengths. We release the generated images and human evaluation results for full transparency at https://crfm.stanford.edu/heim/v1.1.0 and the code at https://github.com/stanford-crfm/helm, which is integrated with the HELM codebase. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: NeurIPS 2023. First three authors contributed equally

arXiv:2308.04453 [pdf, other]

Towards Immutability: A Secure and Efficient Auditing Framework for Cloud Supporting Data Integrity and File Version Control

Authors: Faisal Haque Bappy, Saklain Zaman, Tariqul Islam, Redwan Ahmed Rizvee, Joon S. Park, Kamrul Hasan

Abstract: Although wide-scale integration of cloud services with myriad applications increases quality of services (QoS) for enterprise users, verifying the existence and manipulation of stored cloud information remains an open research problem. Decentralized blockchain-based solutions are becoming more appealing for cloud auditing environments because of the immutable nature of blockchain. However, the dec… ▽ More Although wide-scale integration of cloud services with myriad applications increases quality of services (QoS) for enterprise users, verifying the existence and manipulation of stored cloud information remains an open research problem. Decentralized blockchain-based solutions are becoming more appealing for cloud auditing environments because of the immutable nature of blockchain. However, the decentralized structure of blockchain results in considerable synchronization and communication overhead, which increases maintenance costs for cloud service providers (CSP). This paper proposes a Merkle Hash Tree based architecture named Entangled Merkle Forest to support version control and dynamic auditing of information in centralized cloud environments. We utilized a semi-trusted third-party auditor to conduct the auditing tasks with minimal privacy-preserving file metadata. To the best of our knowledge, we are the first to design a node sharing Merkle Forest to offer a cost-effective auditing framework for centralized cloud infrastructures while achieving the immutable feature of blockchain, mitigating the synchronization and performance challenges of the decentralized architectures. Our proposed scheme outperforms it's equivalent Blockchain-based schemes by ensuring time and storage efficiency with minimum overhead as evidenced by performance analysis. △ Less

Submitted 4 August, 2023; originally announced August 2023.

arXiv:2307.04280 [pdf, ps, other]

Shaping the Emerging Norms of Using Large Language Models in Social Computing Research

Authors: Hong Shen, Tianshi Li, Toby Jia-Jun Li, Joon Sung Park, Diyi Yang

Abstract: The emergence of Large Language Models (LLMs) has brought both excitement and concerns to social computing research. On the one hand, LLMs offer unprecedented capabilities in analyzing vast amounts of textual data and generating human-like responses, enabling researchers to delve into complex social phenomena. On the other hand, concerns are emerging regarding the validity, privacy, and ethics of… ▽ More The emergence of Large Language Models (LLMs) has brought both excitement and concerns to social computing research. On the one hand, LLMs offer unprecedented capabilities in analyzing vast amounts of textual data and generating human-like responses, enabling researchers to delve into complex social phenomena. On the other hand, concerns are emerging regarding the validity, privacy, and ethics of the research when LLMs are involved. This SIG aims at offering an open space for social computing researchers who are interested in understanding the impacts of LLMs to discuss their current practices, perspectives, challenges when engaging with LLMs in their everyday work and collectively shaping the emerging norms of using LLMs in social computing research. △ Less

Submitted 9 July, 2023; originally announced July 2023.

arXiv:2305.00970 [pdf, other]

ArK: Augmented Reality with Knowledge Interactive Emergent Ability

Authors: Qiuyuan Huang, Jae Sung Park, Abhinav Gupta, Paul Bennett, Ran Gong, Subhojit Som, Baolin Peng, Owais Khan Mohammed, Chris Pal, Yejin Choi, Jianfeng Gao

Abstract: Despite the growing adoption of mixed reality and interactive AI agents, it remains challenging for these systems to generate high quality 2D/3D scenes in unseen environments. The common practice requires deploying an AI agent to collect large amounts of data for model training for every new task. This process is costly, or even impossible, for many domains. In this study, we develop an infinite a… ▽ More Despite the growing adoption of mixed reality and interactive AI agents, it remains challenging for these systems to generate high quality 2D/3D scenes in unseen environments. The common practice requires deploying an AI agent to collect large amounts of data for model training for every new task. This process is costly, or even impossible, for many domains. In this study, we develop an infinite agent that learns to transfer knowledge memory from general foundation models (e.g. GPT4, DALLE) to novel domains or scenarios for scene understanding and generation in the physical or virtual world. The heart of our approach is an emerging mechanism, dubbed Augmented Reality with Knowledge Inference Interaction (ArK), which leverages knowledge-memory to generate scenes in unseen physical world and virtual reality environments. The knowledge interactive emergent ability (Figure 1) is demonstrated as the observation learns i) micro-action of cross-modality: in multi-modality models to collect a large amount of relevant knowledge memory data for each interaction task (e.g., unseen scene understanding) from the physical reality; and ii) macro-behavior of reality-agnostic: in mix-reality environments to improve interactions that tailor to different characterized roles, target variables, collaborative information, and so on. We validate the effectiveness of ArK on the scene generation and editing tasks. We show that our ArK approach, combined with large foundation models, significantly improves the quality of generated 2D/3D scenes, compared to baselines, demonstrating the potential benefit of incorporating ArK in generative AI for applications such as metaverse and gaming simulation. △ Less

Submitted 1 May, 2023; originally announced May 2023.

Report number: EFI-94-11

arXiv:2304.03442 [pdf, other]

Generative Agents: Interactive Simulacra of Human Behavior

Authors: Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein

Abstract: Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; t… ▽ More Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior. △ Less

Submitted 5 August, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

arXiv:2212.09746 [pdf, other]

Evaluating Human-Language Model Interaction

Authors: Mina Lee, Megha Srivastava, Amelia Hardy, John Thickstun, Esin Durmus, Ashwin Paranjape, Ines Gerard-Ursin, Xiang Lisa Li, Faisal Ladhak, Frieda Rong, Rose E. Wang, Minae Kwon, Joon Sung Park, Hancheng Cao, Tony Lee, Rishi Bommasani, Michael Bernstein, Percy Liang

Abstract: Many real-world applications of language models (LMs), such as writing assistance and code autocomplete, involve human-LM interaction. However, most benchmarks are non-interactive in that a model produces output without human involvement. To evaluate human-LM interaction, we develop a new framework, Human-AI Language-based Interaction Evaluation (HALIE), that defines the components of interactive… ▽ More Many real-world applications of language models (LMs), such as writing assistance and code autocomplete, involve human-LM interaction. However, most benchmarks are non-interactive in that a model produces output without human involvement. To evaluate human-LM interaction, we develop a new framework, Human-AI Language-based Interaction Evaluation (HALIE), that defines the components of interactive systems and dimensions to consider when designing evaluation metrics. Compared to standard, non-interactive evaluation, HALIE captures (i) the interactive process, not only the final output; (ii) the first-person subjective experience, not just a third-party assessment; and (iii) notions of preference beyond quality (e.g., enjoyment and ownership). We then design five tasks to cover different forms of interaction: social dialogue, question answering, crossword puzzles, summarization, and metaphor generation. With four state-of-the-art LMs (three variants of OpenAI's GPT-3 and AI21 Labs' Jurassic-1), we find that better non-interactive performance does not always translate to better human-LM interaction. In particular, we highlight three cases where the results from non-interactive and interactive metrics diverge and underscore the importance of human-LM interaction for LM evaluation. △ Less

Submitted 5 January, 2024; v1 submitted 19 December, 2022; originally announced December 2022.

Comments: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI)

arXiv:2208.13094 [pdf, other]

Measuring the Prevalence of Anti-Social Behavior in Online Communities

Authors: Joon Sung Park, Joseph Seering, Michael S. Bernstein

Abstract: With increasing attention to online anti-social behaviors such as personal attacks and bigotry, it is critical to have an accurate accounting of how widespread anti-social behaviors are. In this paper, we empirically measure the prevalence of anti-social behavior in one of the world's most popular online community platforms. We operationalize this goal as measuring the proportion of unmoderated co… ▽ More With increasing attention to online anti-social behaviors such as personal attacks and bigotry, it is critical to have an accurate accounting of how widespread anti-social behaviors are. In this paper, we empirically measure the prevalence of anti-social behavior in one of the world's most popular online community platforms. We operationalize this goal as measuring the proportion of unmoderated comments in the 97 most popular communities on Reddit that violate eight widely accepted platform norms. To achieve this goal, we contribute a human-AI pipeline for identifying these violations and a bootstrap sampling method to quantify measurement uncertainty. We find that 6.25% (95% Confidence Interval [5.36%, 7.13%]) of all comments in 2016, and 4.28% (95% CI [2.50%, 6.26%]) in 2020-2021, are violations of these norms. Most anti-social behaviors remain unmoderated: moderators only removed one in twenty violating comments in 2016, and one in ten violating comments in 2020. Personal attacks were the most prevalent category of norm violation; pornography and bigotry were the most likely to be moderated, while politically inflammatory comments and misogyny/vulgarity were the least likely to be moderated. This paper offers a method and set of empirical results for tracking these phenomena as both the social practices (e.g., moderation) and technical practices (e.g., design) evolve. △ Less

Submitted 27 August, 2022; originally announced August 2022.

Comments: This work will appear in the Proc. ACM Hum.-Comput. Interact. 6, CSCW (CSCW'22)

arXiv:2208.04024 [pdf, other]

Social Simulacra: Creating Populated Prototypes for Social Computing Systems

Authors: Joon Sung Park, Lindsay Popowski, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein

Abstract: Social computing prototypes probe the social behaviors that may arise in an envisioned system design. This prototyping practice is currently limited to recruiting small groups of people. Unfortunately, many challenges do not arise until a system is populated at a larger scale. Can a designer understand how a social system might behave when populated, and make adjustments to the design before the s… ▽ More Social computing prototypes probe the social behaviors that may arise in an envisioned system design. This prototyping practice is currently limited to recruiting small groups of people. Unfortunately, many challenges do not arise until a system is populated at a larger scale. Can a designer understand how a social system might behave when populated, and make adjustments to the design before the system falls prey to such challenges? We introduce social simulacra, a prototyping technique that generates a breadth of realistic social interactions that may emerge when a social computing system is populated. Social simulacra take as input the designer's description of a community's design -- goal, rules, and member personas -- and produce as output an instance of that design with simulated behavior, including posts, replies, and anti-social behaviors. We demonstrate that social simulacra shift the behaviors that they generate appropriately in response to design changes, and that they enable exploration of "what if?" scenarios where community members or moderators intervene. To power social simulacra, we contribute techniques for prompting a large language model to generate thousands of distinct community members and their social interactions with each other; these techniques are enabled by the observation that large language models' training data already includes a wide variety of positive and negative behavior on social media platforms. In evaluations, we show that participants are often unable to distinguish social simulacra from actual community behavior and that social computing designers successfully refine their social computing designs when using social simulacra. △ Less

Submitted 8 August, 2022; originally announced August 2022.

Comments: This work will appear in the 35th Annual ACM Symposium on User Interface Software and Technology (UIST '22)

arXiv:2206.02837 [pdf, other]

EVAC+: Multi-scale V-net with Deep Feature CRF Layers for Brain Extraction

Authors: Jong Sung Park, Shreyas Fadnavis, Eleftherios Garyfallidis

Abstract: Brain extraction is one of the first steps of pre-processing 3D brain MRI data and a prerequisite for any forthcoming brain imaging analyses. However, it is not a simple segmentation problem due to the complex structure of the brain and human head. Although multiple solutions have been proposed in the literature, we are still far from having truly robust methods. While previous methods have used m… ▽ More Brain extraction is one of the first steps of pre-processing 3D brain MRI data and a prerequisite for any forthcoming brain imaging analyses. However, it is not a simple segmentation problem due to the complex structure of the brain and human head. Although multiple solutions have been proposed in the literature, we are still far from having truly robust methods. While previous methods have used machine learning with structural/geometric priors, with the development of Deep Learning (DL), there has been an increase in proposed Neural Network architectures. Most models focus on improving the training data and loss functions with little change in the architecture. However, the amount of accessible training data with expert-labelled ground truth vary between groups. Moreover, the labels are created not from scratch but from outputs of non-DL methods. Thus, most DL method's performance depend on the amount and quality of data one has. In this paper, we propose a novel architecture we call EVAC+ to work around this issue. We show that EVAC+ has 3 major advantages compared to other networks: (1) Multi-scale input with limited random augmentation for efficient learning, (2) a unique way of using Conditional Random Fields Recurrent Layer and (3) a loss function specifically created to enhance this architecture. We compare our model to state-of-the-art non-DL and DL methods. Results show that even with little change in the traditional architecture and limited training resources, EVAC+ achieves a high and stable Dice Coefficient and Jaccard Index along with a desirable lower surface distance. Ultimately, our model provides a robust way of accurately reducing segmentation errors in complex multi-tissue interfacing areas of brain. △ Less

Submitted 5 January, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

Comments: Replaced with advancements in the model and results

arXiv:2202.04800 [pdf, other]

The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning

Authors: Jack Hessel, Jena D. Hwang, Jae Sung Park, Rowan Zellers, Chandra Bhagavatula, Anna Rohrbach, Kate Saenko, Yejin Choi

Abstract: Humans have remarkable capacity to reason abductively and hypothesize about what lies beyond the literal content of an image. By identifying concrete visual clues scattered throughout a scene, we almost can't help but draw probable inferences beyond the literal scene based on our everyday experience and knowledge about the world. For example, if we see a "20 mph" sign alongside a road, we might as… ▽ More Humans have remarkable capacity to reason abductively and hypothesize about what lies beyond the literal content of an image. By identifying concrete visual clues scattered throughout a scene, we almost can't help but draw probable inferences beyond the literal scene based on our everyday experience and knowledge about the world. For example, if we see a "20 mph" sign alongside a road, we might assume the street sits in a residential area (rather than on a highway), even if no houses are pictured. Can machines perform similar visual reasoning? We present Sherlock, an annotated corpus of 103K images for testing machine capacity for abductive reasoning beyond literal image contents. We adopt a free-viewing paradigm: participants first observe and identify salient clues within images (e.g., objects, actions) and then provide a plausible inference about the scene, given the clue. In total, we collect 363K (clue, inference) pairs, which form a first-of-its-kind abductive visual reasoning dataset. Using our corpus, we test three complementary axes of abductive reasoning. We evaluate the capacity of models to: i) retrieve relevant inferences from a large candidate corpus; ii) localize evidence for inferences via bounding boxes, and iii) compare plausible inferences to match human judgments on a newly-collected diagnostic corpus of 19K Likert-scale judgments. While we find that fine-tuning CLIP-RN50x64 with a multitask objective outperforms strong baselines, significant headroom exists between model performance and human agreement. Data, models, and leaderboard available at http://visualabduction.com/ △ Less

Submitted 25 July, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

Comments: code, data, models at http://visualabduction.com/

Journal ref: ECCV 2022

arXiv:2202.02950 [pdf, other]

doi 10.1145/3491102.3502004

Jury Learning: Integrating Dissenting Voices into Machine Learning Models

Authors: Mitchell L. Gordon, Michelle S. Lam, Joon Sung Park, Kayur Patel, Jeffrey T. Hancock, Tatsunori Hashimoto, Michael S. Bernstein

Abstract: Whose labels should a machine learning (ML) algorithm learn to emulate? For ML tasks ranging from online comment toxicity to misinformation detection to medical diagnosis, different groups in society may have irreconcilable disagreements about ground truth labels. Supervised ML today resolves these label disagreements implicitly using majority vote, which overrides minority groups' labels. We intr… ▽ More Whose labels should a machine learning (ML) algorithm learn to emulate? For ML tasks ranging from online comment toxicity to misinformation detection to medical diagnosis, different groups in society may have irreconcilable disagreements about ground truth labels. Supervised ML today resolves these label disagreements implicitly using majority vote, which overrides minority groups' labels. We introduce jury learning, a supervised ML approach that resolves these disagreements explicitly through the metaphor of a jury: defining which people or groups, in what proportion, determine the classifier's prediction. For example, a jury learning model for online toxicity might centrally feature women and Black jurors, who are commonly targets of online harassment. To enable jury learning, we contribute a deep learning architecture that models every annotator in a dataset, samples from annotators' models to populate the jury, then runs inference to classify. Our architecture enables juries that dynamically adapt their composition, explore counterfactuals, and visualize dissent. △ Less

Submitted 7 February, 2022; originally announced February 2022.

Comments: To appear at CHI 2022

arXiv:2108.07258 [pdf, other]

On the Opportunities and Risks of Foundation Models

Authors: Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh , et al. (89 additional authors not shown)

Abstract: AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their cap… ▽ More AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature. △ Less

Submitted 12 July, 2022; v1 submitted 16 August, 2021; originally announced August 2021.

Comments: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Report page with citation guidelines: https://crfm.stanford.edu/report.html

arXiv:2106.02636 [pdf, other]

MERLOT: Multimodal Neural Script Knowledge Models

Authors: Rowan Zellers, Ximing Lu, Jack Hessel, Youngjae Yu, Jae Sung Park, Jize Cao, Ali Farhadi, Yejin Choi

Abstract: As humans, we understand events in the visual world contextually, performing multimodal reasoning across time to make inferences about the past, present, and future. We introduce MERLOT, a model that learns multimodal script knowledge by watching millions of YouTube videos with transcribed speech -- in an entirely label-free, self-supervised manner. By pretraining with a mix of both frame-level (s… ▽ More As humans, we understand events in the visual world contextually, performing multimodal reasoning across time to make inferences about the past, present, and future. We introduce MERLOT, a model that learns multimodal script knowledge by watching millions of YouTube videos with transcribed speech -- in an entirely label-free, self-supervised manner. By pretraining with a mix of both frame-level (spatial) and video-level (temporal) objectives, our model not only learns to match images to temporally corresponding words, but also to contextualize what is happening globally over time. As a result, MERLOT exhibits strong out-of-the-box representations of temporal commonsense, and achieves state-of-the-art performance on 12 different video QA datasets when finetuned. It also transfers well to the world of static images, allowing models to reason about the dynamic context behind visual scenes. On Visual Commonsense Reasoning, MERLOT answers questions correctly with 80.6% accuracy, outperforming state-of-the-art models of similar size by over 3%, even those that make heavy use of auxiliary supervised data (like object bounding boxes). Ablation analyses demonstrate the complementary importance of: 1) training on videos versus static images; 2) scaling the magnitude and diversity of the pretraining video corpus; and 3) using diverse objectives that encourage full-stack multimodal reasoning, from the recognition to cognition level. △ Less

Submitted 21 October, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

Comments: project page at https://rowanzellers.com/merlot; NeurIPS 2021 camera ready

arXiv:2106.01487 [pdf, other]

LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes

Authors: Aditya Kusupati, Matthew Wallingford, Vivek Ramanujan, Raghav Somani, Jae Sung Park, Krishna Pillutla, Prateek Jain, Sham Kakade, Ali Farhadi

Abstract: Learning binary representations of instances and classes is a classical problem with several high potential applications. In modern settings, the compression of high-dimensional neural representations to low-dimensional binary codes is a challenging task and often require large bit-codes to be accurate. In this work, we propose a novel method for Learning Low-dimensional binary Codes (LLC) for ins… ▽ More Learning binary representations of instances and classes is a classical problem with several high potential applications. In modern settings, the compression of high-dimensional neural representations to low-dimensional binary codes is a challenging task and often require large bit-codes to be accurate. In this work, we propose a novel method for Learning Low-dimensional binary Codes (LLC) for instances as well as classes. Our method does not require any side-information, like annotated attributes or label meta-data, and learns extremely low-dimensional binary codes (~20 bits for ImageNet-1K). The learnt codes are super-efficient while still ensuring nearly optimal classification accuracy for ResNet50 on ImageNet-1K. We demonstrate that the learnt codes capture intrinsically important features in the data, by discovering an intuitive taxonomy over classes. We further quantitatively measure the quality of our codes by applying it to the efficient image retrieval as well as out-of-distribution (OOD) detection problems. For ImageNet-100 retrieval problem, our learnt binary codes outperform 16 bit HashNet using only 10 bits and also are as accurate as 10 dimensional real representations. Finally, our learnt binary codes can perform OOD detection, out-of-the-box, as accurately as a baseline that needs ~3000 samples to tune its threshold, while we require none. Code is open-sourced at https://github.com/RAIVNLab/LLC. △ Less

Submitted 6 October, 2021; v1 submitted 2 June, 2021; originally announced June 2021.

Comments: NeurIPS 2021 Camera Ready. 19 pages, 6 figures

arXiv:2103.09058 [pdf, ps, other]

doi 10.1145/3461702.3462590

Understanding the Representation and Representativeness of Age in AI Data Sets

Authors: Joon Sung Park, Michael S. Bernstein, Robin N. Brewer, Ece Kamar, Meredith Ringel Morris

Abstract: A diverse representation of different demographic groups in AI training data sets is important in ensuring that the models will work for a large range of users. To this end, recent efforts in AI fairness and inclusion have advocated for creating AI data sets that are well-balanced across race, gender, socioeconomic status, and disability status. In this paper, we contribute to this line of work by… ▽ More A diverse representation of different demographic groups in AI training data sets is important in ensuring that the models will work for a large range of users. To this end, recent efforts in AI fairness and inclusion have advocated for creating AI data sets that are well-balanced across race, gender, socioeconomic status, and disability status. In this paper, we contribute to this line of work by focusing on the representation of age by asking whether older adults are represented proportionally to the population at large in AI data sets. We examine publicly-available information about 92 face data sets to understand how they codify age as a case study to investigate how the subjects' ages are recorded and whether older generations are represented. We find that older adults are very under-represented; five data sets in the study that explicitly documented the closed age intervals of their subjects included older adults (defined as older than 65 years), while only one included oldest-old adults (defined as older than 85 years). Additionally, we find that only 24 of the data sets include any age-related information in their documentation or metadata, and that there is no consistent method followed across these data sets to collect and record the subjects' ages. We recognize the unique difficulties in creating representative data sets in terms of age, but raise it as an important dimension that researchers and engineers interested in inclusive AI should consider. △ Less

Submitted 6 May, 2021; v1 submitted 10 March, 2021; originally announced March 2021.

Comments: 9 pages

Journal ref: In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (AIES '21)

arXiv:2102.07028 [pdf, other]

ThetA -- fast and robust clustering via a distance parameter

Authors: Eleftherios Garyfallidis, Shreyas Fadnavis, Jong Sung Park, Bramsh Qamar Chandio, Javier Guaje, Serge Koudoro, Nasim Anousheh

Abstract: Clustering is a fundamental problem in machine learning where distance-based approaches have dominated the field for many decades. This set of problems is often tackled by partitioning the data into K clusters where the number of clusters is chosen apriori. While significant progress has been made on these lines over the years, it is well established that as the number of clusters or dimensions in… ▽ More Clustering is a fundamental problem in machine learning where distance-based approaches have dominated the field for many decades. This set of problems is often tackled by partitioning the data into K clusters where the number of clusters is chosen apriori. While significant progress has been made on these lines over the years, it is well established that as the number of clusters or dimensions increase, current approaches dwell in local minima resulting in suboptimal solutions. In this work, we propose a new set of distance threshold methods called Theta-based Algorithms (ThetA). Via experimental comparisons and complexity analyses we show that our proposed approach outperforms existing approaches in: a) clustering accuracy and b) time complexity. Additionally, we show that for a large class of problems, learning the optimal threshold is straightforward in comparison to learning K. Moreover, we show how ThetA can infer the sparsity of datasets in higher dimensions. △ Less

Submitted 1 March, 2021; v1 submitted 13 February, 2021; originally announced February 2021.

arXiv:2010.07526 [pdf, other]

Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs

Authors: Ana Marasović, Chandra Bhagavatula, Jae Sung Park, Ronan Le Bras, Noah A. Smith, Yejin Choi

Abstract: Natural language rationales could provide intuitive, higher-level explanations that are easily understandable by humans, complementing the more broadly studied lower-level explanations based on gradients or attention weights. We present the first study focused on generating natural language rationales across several complex visual reasoning tasks: visual commonsense reasoning, visual-textual entai… ▽ More Natural language rationales could provide intuitive, higher-level explanations that are easily understandable by humans, complementing the more broadly studied lower-level explanations based on gradients or attention weights. We present the first study focused on generating natural language rationales across several complex visual reasoning tasks: visual commonsense reasoning, visual-textual entailment, and visual question answering. The key challenge of accurate rationalization is comprehensive image understanding at all levels: not just their explicit content at the pixel level, but their contextual contents at the semantic and pragmatic levels. We present Rationale^VT Transformer, an integrated model that learns to generate free-text rationales by combining pretrained language models with object recognition, grounded visual semantic frames, and visual commonsense graphs. Our experiments show that the base pretrained language model benefits from visual adaptation and that free-text rationalization is a promising research direction to complement model interpretability for complex visual-textual reasoning tasks. △ Less

Submitted 15 October, 2020; originally announced October 2020.

Comments: Accepted to Findings of EMNLP

arXiv:2008.09791 [pdf, other]

Identity-Aware Multi-Sentence Video Description

Authors: Jae Sung Park, Trevor Darrell, Anna Rohrbach

Abstract: Standard video and movie description tasks abstract away from person identities, thus failing to link identities across sentences. We propose a multi-sentence Identity-Aware Video Description task, which overcomes this limitation and requires to re-identify persons locally within a set of consecutive clips. We introduce an auxiliary task of Fill-in the Identity, that aims to predict persons' IDs c… ▽ More Standard video and movie description tasks abstract away from person identities, thus failing to link identities across sentences. We propose a multi-sentence Identity-Aware Video Description task, which overcomes this limitation and requires to re-identify persons locally within a set of consecutive clips. We introduce an auxiliary task of Fill-in the Identity, that aims to predict persons' IDs consistently within a set of clips, when the video descriptions are given. Our proposed approach to this task leverages a Transformer architecture allowing for coherent joint prediction of multiple IDs. One of the key components is a gender-aware textual representation as well an additional gender prediction objective in the main model. This auxiliary task allows us to propose a two-stage approach to Identity-Aware Video Description. We first generate multi-sentence video descriptions, and then apply our Fill-in the Identity model to establish links between the predicted person entities. To be able to tackle both tasks, we augment the Large Scale Movie Description Challenge (LSMDC) benchmark with new annotations suited for our problem statement. Experiments show that our proposed Fill-in the Identity model is superior to several baselines and recent works, and allows us to generate descriptions with locally re-identified people. △ Less

Submitted 22 August, 2020; originally announced August 2020.

Comments: Project link at https://sites.google.com/site/describingmovies/lsmdc-2019/

arXiv:2006.03193 [pdf, other]

LSTM-based Anomaly Detection for Non-linear Dynamical System

Authors: Yue Tan, Chunjing Hu, Kuan Zhang, Kan Zheng, Ethan A. Davis, Jae Sung Park

Abstract: Anomaly detection for non-linear dynamical system plays an important role in ensuring the system stability. However, it is usually complex and has to be solved by large-scale simulation which requires extensive computing resources. In this paper, we propose a novel anomaly detection scheme in non-linear dynamical system based on Long Short-Term Memory (LSTM) to capture complex temporal changes of… ▽ More Anomaly detection for non-linear dynamical system plays an important role in ensuring the system stability. However, it is usually complex and has to be solved by large-scale simulation which requires extensive computing resources. In this paper, we propose a novel anomaly detection scheme in non-linear dynamical system based on Long Short-Term Memory (LSTM) to capture complex temporal changes of the time sequence and make multi-step predictions. Specifically, we first present the framework of LSTM-based anomaly detection in non-linear dynamical system, including data preprocessing, multi-step prediction and anomaly detection. According to the prediction requirement, two types of training modes are explored in multi-step prediction, where samples in a wall shear stress dataset are collected by an adaptive sliding window. On the basis of the multi-step prediction result, a Local Average with Adaptive Parameters (LAAP) algorithm is proposed to extract local numerical features of the time sequence and estimate the upcoming anomaly. The experimental results show that our proposed multi-step prediction method can achieve a higher prediction accuracy than traditional method in wall shear stress dataset, and the LAAP algorithm performs better than the absolute value-based method in anomaly detection task. △ Less

Submitted 4 June, 2020; originally announced June 2020.

Comments: 8 pages, 6 figures

arXiv:2006.00424 [pdf, other]

HMPO: Human Motion Prediction in Occluded Environments for Safe Motion Planning

Authors: Jae Sung Park, Dinesh Manocha

Abstract: We present a novel approach to generate collision-free trajectories for a robot operating in close proximity with a human obstacle in an occluded environment. The self-occlusions of the robot can significantly reduce the accuracy of human motion prediction, and we present a novel deep learning-based prediction algorithm. Our formulation uses CNNs and LSTMs and we augment human-action datasets with… ▽ More We present a novel approach to generate collision-free trajectories for a robot operating in close proximity with a human obstacle in an occluded environment. The self-occlusions of the robot can significantly reduce the accuracy of human motion prediction, and we present a novel deep learning-based prediction algorithm. Our formulation uses CNNs and LSTMs and we augment human-action datasets with synthetically generated occlusion information for training. We also present an occlusion-aware planner that uses our motion prediction algorithm to compute collision-free trajectories. We highlight performance of the overall approach (HMPO) in complex scenarios and observe upto 68% performance improvement in motion prediction accuracy, and 38% improvement in terms of error distance between the ground-truth and the predicted human joint positions. △ Less

Submitted 30 May, 2020; originally announced June 2020.

Comments: 11 pages, 5 figures, 2 tables

arXiv:2004.10796 [pdf, other]

VisualCOMET: Reasoning about the Dynamic Context of a Still Image

Authors: Jae Sung Park, Chandra Bhagavatula, Roozbeh Mottaghi, Ali Farhadi, Yejin Choi

Abstract: Even from a single frame of a still image, people can reason about the dynamic story of the image before, after, and beyond the frame. For example, given an image of a man struggling to stay afloat in water, we can reason that the man fell into the water sometime in the past, the intent of that man at the moment is to stay alive, and he will need help in the near future or else he will get washed… ▽ More Even from a single frame of a still image, people can reason about the dynamic story of the image before, after, and beyond the frame. For example, given an image of a man struggling to stay afloat in water, we can reason that the man fell into the water sometime in the past, the intent of that man at the moment is to stay alive, and he will need help in the near future or else he will get washed away. We propose VisualComet, the novel framework of visual commonsense reasoning tasks to predict events that might have happened before, events that might happen next, and the intents of the people at present. To support research toward visual commonsense reasoning, we introduce the first large-scale repository of Visual Commonsense Graphs that consists of over 1.4 million textual descriptions of visual commonsense inferences carefully annotated over a diverse set of 60,000 images, each paired with short video summaries of before and after. In addition, we provide person-grounding (i.e., co-reference links) between people appearing in the image and people mentioned in the textual commonsense descriptions, allowing for tighter integration between images and text. We establish strong baseline performances on this task and demonstrate that integration between visual and textual commonsense reasoning is the key and wins over non-integrative alternatives. △ Less

Submitted 1 August, 2020; v1 submitted 22 April, 2020; originally announced April 2020.

Comments: Project Page: http://visualcomet.xyz (ECCV 2020 Spotlight)

arXiv:1905.09616 [pdf, other]

A Comparative Study of Analog/Digital Self-Interference Cancellation for Full Duplex Radios

Authors: Jong Woo Kwak, Min Soo Sim, In-Woong Kang, Jong Sung Park, Jaedon Park, Chan-Byoung Chae

Abstract: Self-interference (SI) is the main obstacle to full-duplex radios. To overcome the SI, researchers have proposed several analog and digital domain self-interference cancellation (SIC) techniques. How well the digital cancellation works depends on the results of analog cancellation. Therefore, to analyze overall SIC performance, one should do so in an integrated manner. In this paper, we build a si… ▽ More Self-interference (SI) is the main obstacle to full-duplex radios. To overcome the SI, researchers have proposed several analog and digital domain self-interference cancellation (SIC) techniques. How well the digital cancellation works depends on the results of analog cancellation. Therefore, to analyze overall SIC performance, one should do so in an integrated manner. In this paper, we build a simulator that can analyze the performance of analog and digital SIC techniques. Through this simulator, we can analyze the overall SIC performance within various system parameters such as the resolution of an analog-to-digital converter (ADC) and/or nonlinearity of a power amplifier (PA). With our simulator, we expect that configurations and tuning algorithms of an active analog canceller can be optimized before real hardware implementation. △ Less

Submitted 23 May, 2019; originally announced May 2019.

arXiv:1905.00933 [pdf, other]

Joint High Dynamic Range Imaging and Super-Resolution from a Single Image

Authors: Jae Woong Soh, Jae Sung Park, Nam Ik Cho

Abstract: This paper presents a new framework for jointly enhancing the resolution and the dynamic range of an image, i.e., simultaneous super-resolution (SR) and high dynamic range imaging (HDRI), based on a convolutional neural network (CNN). From the common trends of both tasks, we train a CNN for the joint HDRI and SR by focusing on the reconstruction of high-frequency details. Specifically, the high-fr… ▽ More This paper presents a new framework for jointly enhancing the resolution and the dynamic range of an image, i.e., simultaneous super-resolution (SR) and high dynamic range imaging (HDRI), based on a convolutional neural network (CNN). From the common trends of both tasks, we train a CNN for the joint HDRI and SR by focusing on the reconstruction of high-frequency details. Specifically, the high-frequency component in our work is the reflectance component according to the Retinex-based image decomposition, and only the reflectance component is manipulated by the CNN while another component (illumination) is processed in a conventional way. In training the CNN, we devise an appropriate loss function that contributes to the naturalness quality of resulting images. Experiments show that our algorithm outperforms the cascade implementation of CNN-based SR and HDRI. △ Less

Submitted 2 May, 2019; originally announced May 2019.

Comments: 11 pages

MSC Class: 68T45

arXiv:1903.04271 [pdf, ps, other]

CloudSafe: A Tool for an Automated Security Analysis for Cloud Computing

Authors: Seoungmo An, Taehoon Eom, Jong Sou Park, Jin B. Hong, Armstrong Nhlabatsi, Noora Fetais, Khaled M. Khan, Dong Seong Kim

Abstract: Cloud computing has been adopted widely, providing on-demand computing resources to improve perfornance and reduce the operational costs. However, these new functionalities also bring new ways to exploit the cloud computing environment. To assess the security of the cloud, graphical security models can be used, such as Attack Graphs and Attack Trees. However, existing models do not consider all ty… ▽ More Cloud computing has been adopted widely, providing on-demand computing resources to improve perfornance and reduce the operational costs. However, these new functionalities also bring new ways to exploit the cloud computing environment. To assess the security of the cloud, graphical security models can be used, such as Attack Graphs and Attack Trees. However, existing models do not consider all types of threats, and also automating the security assessment functions are difficult. In this paper, we propose a new security assessment tool for the cloud named CloudSafe, an automated security assessment for the cloud. The CloudSafe tool collates various tools and frameworks to automate the security assessment process. To demonstrate the applicability of the CloudSafe, we conducted security assessment in Amazon AWS, where our experimental results showed that we can effectively gather security information of the cloud and carry out security assessment to produce security reports. Users and cloud service providers can use the security report generated by the CloudSafe to understand the security posture of the cloud being used/provided. △ Less

Submitted 11 March, 2019; originally announced March 2019.

arXiv:1902.10252 [pdf, other]

Efficient Probabilistic Collision Detection for Non-Gaussian Noise Distributions

Authors: Jae Sung Park, Dinesh Manocha

Abstract: We present an efficient algorithm to compute tight upper bounds of collision probability between two objects with positional uncertainties, whose error distributions are represented with non-Gaussian forms. Our approach can handle noisy datasets from depth sensors, whose distributions may correspond to Truncated Gaussian, Weighted Samples, or Truncated Gaussian Mixture Model. We derive tight proba… ▽ More We present an efficient algorithm to compute tight upper bounds of collision probability between two objects with positional uncertainties, whose error distributions are represented with non-Gaussian forms. Our approach can handle noisy datasets from depth sensors, whose distributions may correspond to Truncated Gaussian, Weighted Samples, or Truncated Gaussian Mixture Model. We derive tight probability bounds for convex shapes and extend them to non-convex shapes using hierarchical representations. We highlight the benefits of our approach over prior probabilistic collision detection algorithms in terms of tighter bounds ($10$x) and improved running time ($3$x). Moreover, we use our tight bounds to design an efficient and accurate motion planning algorithm for a 7-DOF robot arm operating in tight scenarios with sensor and motion uncertainties. △ Less

Submitted 15 December, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

Comments: 11 pages, 6 figures, 1 table

arXiv:1812.05634 [pdf, other]

Adversarial Inference for Multi-Sentence Video Description

Authors: Jae Sung Park, Marcus Rohrbach, Trevor Darrell, Anna Rohrbach

Abstract: While significant progress has been made in the image captioning task, video description is still in its infancy due to the complex nature of video data. Generating multi-sentence descriptions for long videos is even more challenging. Among the main issues are the fluency and coherence of the generated descriptions, and their relevance to the video. Recently, reinforcement and adversarial learning… ▽ More While significant progress has been made in the image captioning task, video description is still in its infancy due to the complex nature of video data. Generating multi-sentence descriptions for long videos is even more challenging. Among the main issues are the fluency and coherence of the generated descriptions, and their relevance to the video. Recently, reinforcement and adversarial learning based methods have been explored to improve the image captioning models; however, both types of methods suffer from a number of issues, e.g. poor readability and high redundancy for RL and stability issues for GANs. In this work, we instead propose to apply adversarial techniques during inference, designing a discriminator which encourages better multi-sentence video description. In addition, we find that a multi-discriminator "hybrid" design, where each discriminator targets one aspect of a description, leads to the best results. Specifically, we decouple the discriminator to evaluate on three criteria: 1) visual relevance to the video, 2) language diversity and fluency, and 3) coherence across sentences. Our approach results in more accurate, diverse, and coherent multi-sentence video descriptions, as shown by automatic as well as human evaluation on the popular ActivityNet Captions dataset. △ Less

Submitted 15 April, 2019; v1 submitted 13 December, 2018; originally announced December 2018.

Comments: Accepted to Computer Vision and Pattern Recognition (CVPR) 2019

arXiv:1811.10673 [pdf, other]

Adversarial Video Compression Guided by Soft Edge Detection

Authors: Sungsoo Kim, Jin Soo Park, Christos G. Bampis, Jaeseong Lee, Mia K. Markey, Alexandros G. Dimakis, Alan C. Bovik

Abstract: We propose a video compression framework using conditional Generative Adversarial Networks (GANs). We rely on two encoders: one that deploys a standard video codec and another which generates low-level maps via a pipeline of down-sampling, a newly devised soft edge detector, and a novel lossless compression scheme. For decoding, we use a standard video decoder as well as a neural network based one… ▽ More We propose a video compression framework using conditional Generative Adversarial Networks (GANs). We rely on two encoders: one that deploys a standard video codec and another which generates low-level maps via a pipeline of down-sampling, a newly devised soft edge detector, and a novel lossless compression scheme. For decoding, we use a standard video decoder as well as a neural network based one, which is trained using a conditional GAN. Recent "deep" approaches to video compression require multiple videos to pre-train generative networks to conduct interpolation. In contrast to this prior work, our scheme trains a generative decoder on pairs of a very limited number of key frames taken from a single video and corresponding low-level maps. The trained decoder produces reconstructed frames relying on a guidance of low-level maps, without any interpolation. Experiments on a diverse set of 131 videos demonstrate that our proposed GAN-based compression engine achieves much higher quality reconstructions at very low bitrates than prevailing standard codecs such as H.264 or HEVC. △ Less

Submitted 26 November, 2018; originally announced November 2018.

arXiv:1708.00636 [pdf, ps, other]

Generation of High Dynamic Range Illumination from a Single Image for the Enhancement of Undesirably Illuminated Images

Authors: Jae Sung Park, Nam Ik Cho

Abstract: This paper presents an algorithm that enhances undesirably illuminated images by generating and fusing multi-level illuminations from a single image.The input image is first decomposed into illumination and reflectance components by using an edge-preserving smoothing filter. Then the reflectance component is scaled up to improve the image details in bright areas. The illumination component is scal… ▽ More This paper presents an algorithm that enhances undesirably illuminated images by generating and fusing multi-level illuminations from a single image.The input image is first decomposed into illumination and reflectance components by using an edge-preserving smoothing filter. Then the reflectance component is scaled up to improve the image details in bright areas. The illumination component is scaled up and down to generate several illumination images that correspond to certain camera exposure values different from the original. The virtual multi-exposure illuminations are blended into an enhanced illumination, where we also propose a method to generate appropriate weight maps for the tone fusion. Finally, an enhanced image is obtained by multiplying the equalized illumination and enhanced reflectance. Experiments show that the proposed algorithm produces visually pleasing output and also yields comparable objective results to the conventional enhancement methods, while requiring modest computational loads. △ Less

Submitted 2 August, 2017; originally announced August 2017.

arXiv:1707.02387 [pdf, other]

Efficient Generation of Motion Plans from Attribute-Based Natural Language Instructions Using Dynamic Constraint Mapping

Authors: Jae Sung Park, Biao Jia, Mohit Bansal, Dinesh Manocha

Abstract: We present an algorithm for combining natural language processing (NLP) and fast robot motion planning to automatically generate robot movements. Our formulation uses a novel concept called Dynamic Constraint Mapping to transform complex, attribute-based natural language instructions into appropriate cost functions and parametric constraints for optimization-based motion planning. We generate a fa… ▽ More We present an algorithm for combining natural language processing (NLP) and fast robot motion planning to automatically generate robot movements. Our formulation uses a novel concept called Dynamic Constraint Mapping to transform complex, attribute-based natural language instructions into appropriate cost functions and parametric constraints for optimization-based motion planning. We generate a factor graph from natural language instructions called the Dynamic Grounding Graph (DGG), which takes latent parameters into account. The coefficients of this factor graph are learned based on conditional random fields (CRFs) and are used to dynamically generate the constraints for motion planning. We map the cost function directly to the motion parameters of the planner and compute smooth trajectories in dynamic scenes. We highlight the performance of our approach in a simulated environment and via a human interacting with a 7-DOF Fetch robot using intricate language commands including negation, orientation specification, and distance constraints. △ Less

Submitted 15 October, 2018; v1 submitted 7 July, 2017; originally announced July 2017.

Comments: 12 pages, 8 figures, 4 tables

arXiv:1610.03651 [pdf, other]

Efficient Probabilistic Collision Detection for Non-Convex Shapes

Authors: Jae Sung Park, Chonhyon Park, Dinesh Manocha

Abstract: We present new algorithms to perform fast probabilistic collision queries between convex as well as non-convex objects. Our approach is applicable to general shapes, where one or more objects are represented using Gaussian probability distributions. We present a fast new algorithm for a pair of convex objects, and extend the approach to non-convex models using hierarchical representations. We high… ▽ More We present new algorithms to perform fast probabilistic collision queries between convex as well as non-convex objects. Our approach is applicable to general shapes, where one or more objects are represented using Gaussian probability distributions. We present a fast new algorithm for a pair of convex objects, and extend the approach to non-convex models using hierarchical representations. We highlight the performance of our algorithms with various convex and non-convex shapes on complex synthetic benchmarks and trajectory planning benchmarks for a 7-DOF Fetch robot arm. △ Less

Submitted 12 October, 2016; originally announced October 2016.

Comments: 9 pages, 6 figures

arXiv:1608.04837 [pdf, other]

I-Planner: Intention-Aware Motion Planning Using Learning Based Human Motion Prediction

Authors: Jae Sung Park, Chonhyon Park, Dinesh Manocha

Abstract: We present a motion planning algorithm to compute collision-free and smooth trajectories for high-DOF robots interacting with humans in a shared workspace. Our approach uses offline learning of human actions along with temporal coherence to predict the human actions. Our intention-aware online planning algorithm uses the learned database to compute a reliable trajectory based on the predicted acti… ▽ More We present a motion planning algorithm to compute collision-free and smooth trajectories for high-DOF robots interacting with humans in a shared workspace. Our approach uses offline learning of human actions along with temporal coherence to predict the human actions. Our intention-aware online planning algorithm uses the learned database to compute a reliable trajectory based on the predicted actions. We represent the predicted human motion using a Gaussian distribution and compute tight upper bounds on collision probabilities for safe motion planning. We also describe novel techniques to account for noise in human motion prediction. We highlight the performance of our planning algorithm in complex simulated scenarios and real world benchmarks with 7-DOF robot arms operating in a workspace with a human performing complex tasks. We demonstrate the benefits of our intention-aware planner in terms of computing safe trajectories in such uncertain environments. △ Less

Submitted 27 November, 2017; v1 submitted 16 August, 2016; originally announced August 2016.

Comments: 14 pages, 8 figures, 3 tables

arXiv:1607.04788 [pdf, other]

Fast and Bounded Probabilistic Collision Detection in Dynamic Environments for High-DOF Trajectory Planning

Authors: Chonhyon Park, Jae Sung Park, Dinesh Manocha

Abstract: We present a novel approach to perform probabilistic collision detection between a high-DOF robot and high-DOF obstacles in dynamic, uncertain environments. In dynamic environments with a high-DOF robot and moving obstacles, our approach efficiently computes accurate collision probability between the robot and obstacles with upper error bounds. Furthermore, we describe a prediction algorithm for f… ▽ More We present a novel approach to perform probabilistic collision detection between a high-DOF robot and high-DOF obstacles in dynamic, uncertain environments. In dynamic environments with a high-DOF robot and moving obstacles, our approach efficiently computes accurate collision probability between the robot and obstacles with upper error bounds. Furthermore, we describe a prediction algorithm for future obstacle position and motion that accounts for both spatial and temporal uncertainties. We present a trajectory optimization algorithm for high-DOF robots in dynamic, uncertain environments based on probabilistic collision detection. We highlight motion planning performance in challenging scenarios with robot arms operating in environments with dynamically moving human obstacles. △ Less

Submitted 16 July, 2016; originally announced July 2016.

Showing 1–39 of 39 results for author: Park, J S