-
Is user feedback always informative? Retrieval Latent Defending for Semi-Supervised Domain Adaptation without Source Data
Authors:
Junha Song,
Tae Soo Kim,
Junha Kim,
Gunhee Nam,
Thijs Kooi,
Jaegul Choo
Abstract:
This paper aims to adapt the source model to the target environment, leveraging small user feedback (i.e., labeled target data) readily available in real-world applications. We find that existing semi-supervised domain adaptation (SemiSDA) methods often suffer from poorly improved adaptation performance when directly utilizing such feedback data, as shown in Figure 1. We analyze this phenomenon vi…
▽ More
This paper aims to adapt the source model to the target environment, leveraging small user feedback (i.e., labeled target data) readily available in real-world applications. We find that existing semi-supervised domain adaptation (SemiSDA) methods often suffer from poorly improved adaptation performance when directly utilizing such feedback data, as shown in Figure 1. We analyze this phenomenon via a novel concept called Negatively Biased Feedback (NBF), which stems from the observation that user feedback is more likely for data points where the model produces incorrect predictions. To leverage this feedback while avoiding the issue, we propose a scalable adapting approach, Retrieval Latent Defending. This approach helps existing SemiSDA methods to adapt the model with a balanced supervised signal by utilizing latent defending samples throughout the adaptation process. We demonstrate the problem caused by NBF and the efficacy of our approach across various benchmarks, including image classification, semantic segmentation, and a real-world medical imaging application. Our extensive experiments reveal that integrating our approach with multiple state-of-the-art SemiSDA methods leads to significant performance improvements.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
One vs. Many: Comprehending Accurate Information from Multiple Erroneous and Inconsistent AI Generations
Authors:
Yoonjoo Lee,
Kihoon Son,
Tae Soo Kim,
Jisu Kim,
John Joon Young Chung,
Eytan Adar,
Juho Kim
Abstract:
As Large Language Models (LLMs) are nondeterministic, the same input can generate different outputs, some of which may be incorrect or hallucinated. If run again, the LLM may correct itself and produce the correct answer. Unfortunately, most LLM-powered systems resort to single results which, correct or not, users accept. Having the LLM produce multiple outputs may help identify disagreements or a…
▽ More
As Large Language Models (LLMs) are nondeterministic, the same input can generate different outputs, some of which may be incorrect or hallucinated. If run again, the LLM may correct itself and produce the correct answer. Unfortunately, most LLM-powered systems resort to single results which, correct or not, users accept. Having the LLM produce multiple outputs may help identify disagreements or alternatives. However, it is not obvious how the user will interpret conflicts or inconsistencies. To this end, we investigate how users perceive the AI model and comprehend the generated information when they receive multiple, potentially inconsistent, outputs. Through a preliminary study, we identified five types of output inconsistencies. Based on these categories, we conducted a study (N=252) in which participants were given one or more LLM-generated passages to an information-seeking question. We found that inconsistency within multiple LLM-generated outputs lowered the participants' perceived AI capacity, while also increasing their comprehension of the given information. Specifically, we observed that this positive effect of inconsistencies was most significant for participants who read two passages, compared to those who read three. Based on these findings, we present design implications that, instead of regarding LLM output inconsistencies as a drawback, we can reveal the potential inconsistencies to transparently indicate the limitations of these models and promote critical LLM usage.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Unveiling Disparities in Web Task Handling Between Human and Web Agent
Authors:
Kihoon Son,
Jinhyeon Kwon,
DaEun Choi,
Tae Soo Kim,
Young-Ho Kim,
Sangdoo Yun,
Juho Kim
Abstract:
With the advancement of Large-Language Models (LLMs) and Large Vision-Language Models (LVMs), agents have shown significant capabilities in various tasks, such as data analysis, gaming, or code generation. Recently, there has been a surge in research on web agents, capable of performing tasks within the web environment. However, the web poses unforeseeable scenarios, challenging the generalizabili…
▽ More
With the advancement of Large-Language Models (LLMs) and Large Vision-Language Models (LVMs), agents have shown significant capabilities in various tasks, such as data analysis, gaming, or code generation. Recently, there has been a surge in research on web agents, capable of performing tasks within the web environment. However, the web poses unforeseeable scenarios, challenging the generalizability of these agents. This study investigates the disparities between human and web agents' performance in web tasks (e.g., information search) by concentrating on planning, action, and reflection aspects during task execution. We conducted a web task study with a think-aloud protocol, revealing distinct cognitive actions and operations on websites employed by humans. Comparative examination of existing agent structures and human behavior with thought processes highlighted differences in knowledge updating and ambiguity handling when performing the task. Humans demonstrated a propensity for exploring and modifying plans based on additional information and investigating reasons for failure. These findings offer insights into designing planning, reflection, and information discovery modules for web agents and designing the capturing method for implicit human knowledge in a web task.
△ Less
Submitted 8 May, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Demystifying Tacit Knowledge in Graphic Design: Characteristics, Instances, Approaches, and Guidelines
Authors:
Kihoon Son,
DaEun Choi,
Tae Soo Kim,
Juho Kim
Abstract:
Despite the growing demand for professional graphic design knowledge, the tacit nature of design inhibits knowledge sharing. However, there is a limited understanding on the characteristics and instances of tacit knowledge in graphic design. In this work, we build a comprehensive set of tacit knowledge characteristics through a literature review. Through interviews with 10 professional graphic des…
▽ More
Despite the growing demand for professional graphic design knowledge, the tacit nature of design inhibits knowledge sharing. However, there is a limited understanding on the characteristics and instances of tacit knowledge in graphic design. In this work, we build a comprehensive set of tacit knowledge characteristics through a literature review. Through interviews with 10 professional graphic designers, we collected 123 tacit knowledge instances and labeled their characteristics. By qualitatively coding the instances, we identified the prominent elements, actions, and purposes of tacit knowledge. To identify which instances have been addressed the least, we conducted a systematic literature review of prior system support to graphic design. By understanding the reasons for the lack of support on these instances based on their characteristics, we propose design guidelines for capturing and applying tacit knowledge in design tools. This work takes a step towards understanding tacit knowledge, and how this knowledge can be communicated.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
GenQuery: Supporting Expressive Visual Search with Generative Models
Authors:
Kihoon Son,
DaEun Choi,
Tae Soo Kim,
Young-Ho Kim,
Juho Kim
Abstract:
Designers rely on visual search to explore and develop ideas in early design stages. However, designers can struggle to identify suitable text queries to initiate a search or to discover images for similarity-based search that can adequately express their intent. We propose GenQuery, a novel system that integrates generative models into the visual search process. GenQuery can automatically elabora…
▽ More
Designers rely on visual search to explore and develop ideas in early design stages. However, designers can struggle to identify suitable text queries to initiate a search or to discover images for similarity-based search that can adequately express their intent. We propose GenQuery, a novel system that integrates generative models into the visual search process. GenQuery can automatically elaborate on users' queries and surface concrete search directions when users only have abstract ideas. To support precise expression of search intents, the system enables users to generatively modify images and use these in similarity-based search. In a comparative user study (N=16), designers felt that they could more accurately express their intents and find more satisfactory outcomes with GenQuery compared to a tool without generative features. Furthermore, the unpredictability of generations allowed participants to uncover more diverse outcomes. By supporting both convergence and divergence, GenQuery led to a more creative experience.
△ Less
Submitted 4 March, 2024; v1 submitted 2 October, 2023;
originally announced October 2023.
-
EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria
Authors:
Tae Soo Kim,
Yoonjoo Lee,
Jamin Shin,
Young-Ho Kim,
Juho Kim
Abstract:
By simply composing prompts, developers can prototype novel generative applications with Large Language Models (LLMs). To refine prototypes into products, however, developers must iteratively revise prompts by evaluating outputs to diagnose weaknesses. Formative interviews (N=8) revealed that developers invest significant effort in manually evaluating outputs as they assess context-specific and su…
▽ More
By simply composing prompts, developers can prototype novel generative applications with Large Language Models (LLMs). To refine prototypes into products, however, developers must iteratively revise prompts by evaluating outputs to diagnose weaknesses. Formative interviews (N=8) revealed that developers invest significant effort in manually evaluating outputs as they assess context-specific and subjective criteria. We present EvalLM, an interactive system for iteratively refining prompts by evaluating multiple outputs on user-defined criteria. By describing criteria in natural language, users can employ the system's LLM-based evaluator to get an overview of where prompts excel or fail, and improve these based on the evaluator's feedback. A comparative study (N=12) showed that EvalLM, when compared to manual evaluation, helped participants compose more diverse criteria, examine twice as many outputs, and reach satisfactory prompts with 59% fewer revisions. Beyond prompts, our work can be extended to augment model evaluation and alignment in specific application contexts.
△ Less
Submitted 27 February, 2024; v1 submitted 24 September, 2023;
originally announced September 2023.
-
Papeos: Augmenting Research Papers with Talk Videos
Authors:
Tae Soo Kim,
Matt Latzke,
Jonathan Bragg,
Amy X. Zhang,
Joseph Chee Chang
Abstract:
Research consumption has been traditionally limited to the reading of academic papers-a static, dense, and formally written format. Alternatively, pre-recorded conference presentation videos, which are more dynamic, concise, and colloquial, have recently become more widely available but potentially under-utilized. In this work, we explore the design space and benefits for combining academic papers…
▽ More
Research consumption has been traditionally limited to the reading of academic papers-a static, dense, and formally written format. Alternatively, pre-recorded conference presentation videos, which are more dynamic, concise, and colloquial, have recently become more widely available but potentially under-utilized. In this work, we explore the design space and benefits for combining academic papers and talk videos to leverage their complementary nature to provide a rich and fluid research consumption experience. Based on formative and co-design studies, we present Papeos, a novel reading and authoring interface that allow authors to augment their papers by segmenting and localizing talk videos alongside relevant paper passages with automatically generated suggestions. With Papeos, readers can visually skim a paper through clip thumbnails, and fluidly switch between consuming dense text in the paper or visual summaries in the video. In a comparative lab study (n=16), Papeos reduced mental load, scaffolded navigation, and facilitated more comprehensive reading of papers.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Atomic-Scale Tailoring of Chemisorbed Atomic Oxygen on Epitaxial Graphene for Graphene-Based Electronic Devices
Authors:
Tae Soo Kim,
Taemin Ahn,
Tae-Hwan Kim,
Hee Cheul Choi,
Han Woong Yeom
Abstract:
Graphene, with its unique band structure, mechanical stability, and high charge mobility, holds great promise for next-generation electronics. Nevertheless, its zero band gap challenges the control of current flow through electrical gating, consequently limiting its practical applications. Recent research indicates that atomic oxygen can oxidize epitaxial graphene in a vacuum without causing unwan…
▽ More
Graphene, with its unique band structure, mechanical stability, and high charge mobility, holds great promise for next-generation electronics. Nevertheless, its zero band gap challenges the control of current flow through electrical gating, consequently limiting its practical applications. Recent research indicates that atomic oxygen can oxidize epitaxial graphene in a vacuum without causing unwanted damage. In this study, we have investigated the effects of chemisorbed atomic oxygen on the electronic properties of epitaxial graphene, using scanning tunneling microscopy (STM). Our findings reveal that oxygen atoms effectively modify the electronic states of graphene, resulting in a band gap at its Dirac point. Furthermore, we demonstrate that it is possible to selectively induce desorption or hopping of oxygen atoms with atomic precision by applying appropriate bias sweeps with an STM tip. These results suggest the potential for atomic-scale tailoring of graphene oxide, enabling the development of graphene-based atomic-scale electronic devices.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
ELVIS: Empowering Locality of Vision Language Pre-training with Intra-modal Similarity
Authors:
Sumin Seo,
JaeWoong Shin,
Jaewoo Kang,
Tae Soo Kim,
Thijs Kooi
Abstract:
Deep learning has shown great potential in assisting radiologists in reading chest X-ray (CXR) images, but its need for expensive annotations for improving performance prevents widespread clinical application. Visual language pre-training (VLP) can alleviate the burden and cost of annotation by leveraging routinely generated reports for radiographs, which exist in large quantities as well as in pa…
▽ More
Deep learning has shown great potential in assisting radiologists in reading chest X-ray (CXR) images, but its need for expensive annotations for improving performance prevents widespread clinical application. Visual language pre-training (VLP) can alleviate the burden and cost of annotation by leveraging routinely generated reports for radiographs, which exist in large quantities as well as in paired form (image-text pairs). Additionally, extensions to localization-aware VLPs are being proposed to address the needs for accurate localization of abnormalities for computer-aided diagnosis (CAD) in CXR. However, we find that the formulation proposed by locality-aware VLP literature actually leads to a loss in spatial relationships required for downstream localization tasks. Therefore, we propose Empowering Locality of VLP with Intra-modal Similarity, ELVIS, a VLP aware of intra-modal locality, to better preserve the locality within radiographs or reports, which enhances the ability to comprehend location references in text reports. Our locality-aware VLP method significantly outperforms state-of-the art baselines in multiple segmentation tasks and the MS-CXR phrase grounding task. Qualitatively, we show that ELVIS focuses well on regions of interest described in the report text compared to prior approaches, allowing for enhanced interpretability.
△ Less
Submitted 23 July, 2023; v1 submitted 11 April, 2023;
originally announced April 2023.
-
LMCanvas: Object-Oriented Interaction to Personalize Large Language Model-Powered Writing Environments
Authors:
Tae Soo Kim,
Arghya Sarkar,
Yoonjoo Lee,
Minsuk Chang,
Juho Kim
Abstract:
Large language models (LLMs) can enhance writing by automating or supporting specific tasks in writers' workflows (e.g., paraphrasing, creating analogies). Leveraging this capability, a collection of interfaces have been developed that provide LLM-powered tools for specific writing tasks. However, these interfaces provide limited support for writers to create personal tools for their own unique ta…
▽ More
Large language models (LLMs) can enhance writing by automating or supporting specific tasks in writers' workflows (e.g., paraphrasing, creating analogies). Leveraging this capability, a collection of interfaces have been developed that provide LLM-powered tools for specific writing tasks. However, these interfaces provide limited support for writers to create personal tools for their own unique tasks, and may not comprehensively fulfill a writer's needs -- requiring them to continuously switch between interfaces during writing. In this work, we envision LMCanvas, an interface that enables writers to create their own LLM-powered writing tools and arrange their personal writing environment by interacting with "blocks" in a canvas. In this interface, users can create text blocks to encapsulate writing and LLM prompts, model blocks for model parameter configurations, and connect these to create pipeline blocks that output generations. In this workshop paper, we discuss the design for LMCanvas and our plans to develop this concept.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces
Authors:
Kyle Lo,
Joseph Chee Chang,
Andrew Head,
Jonathan Bragg,
Amy X. Zhang,
Cassidy Trier,
Chloe Anastasiades,
Tal August,
Russell Authur,
Danielle Bragg,
Erin Bransom,
Isabel Cachola,
Stefan Candra,
Yoganand Chandrasekhar,
Yen-Sung Chen,
Evie Yu-Yen Cheng,
Yvonne Chou,
Doug Downey,
Rob Evans,
Raymond Fok,
Fangzhou Hu,
Regan Huff,
Dongyeop Kang,
Tae Soo Kim,
Rodney Kinney
, et al. (30 additional authors not shown)
Abstract:
Scholarly publications are key to the transfer of knowledge from scholars to others. However, research papers are information-dense, and as the volume of the scientific literature grows, the need for new technology to support the reading process grows. In contrast to the process of finding papers, which has been transformed by Internet technology, the experience of reading research papers has chan…
▽ More
Scholarly publications are key to the transfer of knowledge from scholars to others. However, research papers are information-dense, and as the volume of the scientific literature grows, the need for new technology to support the reading process grows. In contrast to the process of finding papers, which has been transformed by Internet technology, the experience of reading research papers has changed little in decades. The PDF format for sharing research papers is widely used due to its portability, but it has significant downsides including: static content, poor accessibility for low-vision readers, and difficulty reading on mobile devices. This paper explores the question "Can recent advances in AI and HCI power intelligent, interactive, and accessible reading interfaces -- even for legacy PDFs?" We describe the Semantic Reader Project, a collaborative effort across multiple institutions to explore automatic creation of dynamic reading interfaces for research papers. Through this project, we've developed ten research prototype interfaces and conducted usability studies with more than 300 participants and real-world users showing improved reading experiences for scholars. We've also released a production reading interface for research papers that will incorporate the best features as they mature. We structure this paper around challenges scholars and the public face when reading research papers -- Discovery, Efficiency, Comprehension, Synthesis, and Accessibility -- and present an overview of our progress and remaining open challenges.
△ Less
Submitted 23 April, 2023; v1 submitted 24 March, 2023;
originally announced March 2023.
-
Did You Get What You Paid For? Rethinking Annotation Cost of Deep Learning Based Computer Aided Detection in Chest Radiographs
Authors:
Tae Soo Kim,
Geonwoon Jang,
Sanghyup Lee,
Thijs Kooi
Abstract:
As deep networks require large amounts of accurately labeled training data, a strategy to collect sufficiently large and accurate annotations is as important as innovations in recognition methods. This is especially true for building Computer Aided Detection (CAD) systems for chest X-rays where domain expertise of radiologists is required to annotate the presence and location of abnormalities on X…
▽ More
As deep networks require large amounts of accurately labeled training data, a strategy to collect sufficiently large and accurate annotations is as important as innovations in recognition methods. This is especially true for building Computer Aided Detection (CAD) systems for chest X-rays where domain expertise of radiologists is required to annotate the presence and location of abnormalities on X-ray images. However, there lacks concrete evidence that provides guidance on how much resource to allocate for data annotation such that the resulting CAD system reaches desired performance. Without this knowledge, practitioners often fall back to the strategy of collecting as much detail as possible on as much data as possible which is cost inefficient. In this work, we investigate how the cost of data annotation ultimately impacts the CAD model performance on classification and segmentation of chest abnormalities in frontal-view X-ray images. We define the cost of annotation with respect to the following three dimensions: quantity, quality and granularity of labels. Throughout this study, we isolate the impact of each dimension on the resulting CAD model performance on detecting 10 chest abnormalities in X-rays. On a large scale training data with over 120K X-ray images with gold-standard annotations, we find that cost-efficient annotations provide great value when collected in large amounts and lead to competitive performance when compared to models trained with only gold-standard annotations. We also find that combining large amounts of cost efficient annotations with only small amounts of expensive labels leads to competitive CAD models at a much lower cost.
△ Less
Submitted 30 September, 2022;
originally announced September 2022.
-
Video-based assessment of intraoperative surgical skill
Authors:
Sanchit Hira,
Digvijay Singh,
Tae Soo Kim,
Shobhit Gupta,
Gregory Hager,
Shameema Sikder,
S. Swaroop Vedula
Abstract:
Purpose: The objective of this investigation is to provide a comprehensive analysis of state-of-the-art methods for video-based assessment of surgical skill in the operating room. Methods: Using a data set of 99 videos of capsulorhexis, a critical step in cataract surgery, we evaluate feature based methods previously developed for surgical skill assessment mostly under benchtop settings. In additi…
▽ More
Purpose: The objective of this investigation is to provide a comprehensive analysis of state-of-the-art methods for video-based assessment of surgical skill in the operating room. Methods: Using a data set of 99 videos of capsulorhexis, a critical step in cataract surgery, we evaluate feature based methods previously developed for surgical skill assessment mostly under benchtop settings. In addition, we present and validate two deep learning methods that directly assess skill using RGB videos. In the first method, we predict instrument tips as keypoints, and learn surgical skill using temporal convolutional neural networks. In the second method, we propose a novel architecture for surgical skill assessment that includes a frame-wise encoder (2D convolutional neural network) followed by a temporal model (recurrent neural network), both of which are augmented by visual attention mechanisms. We report the area under the receiver operating characteristic curve, sensitivity, specificity, and predictive values with each method through 5-fold cross-validation. Results: For the task of binary skill classification (expert vs. novice), deep neural network based methods exhibit higher AUC than the classical spatiotemporal interest point based methods. The neural network approach using attention mechanisms also showed high sensitivity and specificity. Conclusion: Deep learning methods are necessary for video-based assessment of surgical skill in the operating room. Our findings of internal validity of a network using attention mechanisms to assess skill directly using RGB videos should be evaluated for external validity in other data sets.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
Motion Guided Attention Fusion to Recognize Interactions from Videos
Authors:
Tae Soo Kim,
Jonathan Jones,
Gregory D. Hager
Abstract:
We present a dual-pathway approach for recognizing fine-grained interactions from videos. We build on the success of prior dual-stream approaches, but make a distinction between the static and dynamic representations of objects and their interactions explicit by introducing separate motion and object detection pathways. Then, using our new Motion-Guided Attention Fusion module, we fuse the bottom-…
▽ More
We present a dual-pathway approach for recognizing fine-grained interactions from videos. We build on the success of prior dual-stream approaches, but make a distinction between the static and dynamic representations of objects and their interactions explicit by introducing separate motion and object detection pathways. Then, using our new Motion-Guided Attention Fusion module, we fuse the bottom-up features in the motion pathway with features captured from object detections to learn the temporal aspects of an action. We show that our approach can generalize across appearance effectively and recognize actions where an actor interacts with previously unseen objects. We validate our approach using the compositional action recognition task from the Something-Something-v2 dataset where we outperform existing state-of-the-art methods. We also show that our method can generalize well to real world tasks by showing state-of-the-art performance on recognizing humans assembling various IKEA furniture on the IKEA-ASM dataset.
△ Less
Submitted 1 April, 2021;
originally announced April 2021.
-
SAFCAR: Structured Attention Fusion for Compositional Action Recognition
Authors:
Tae Soo Kim,
Gregory D. Hager
Abstract:
We present a general framework for compositional action recognition -- i.e. action recognition where the labels are composed out of simpler components such as subjects, atomic-actions and objects. The main challenge in compositional action recognition is that there is a combinatorially large set of possible actions that can be composed using basic components. However, compositionality also provide…
▽ More
We present a general framework for compositional action recognition -- i.e. action recognition where the labels are composed out of simpler components such as subjects, atomic-actions and objects. The main challenge in compositional action recognition is that there is a combinatorially large set of possible actions that can be composed using basic components. However, compositionality also provides a structure that can be exploited. To do so, we develop and test a novel Structured Attention Fusion (SAF) self-attention mechanism to combine information from object detections, which capture the time-series structure of an action, with visual cues that capture contextual information. We show that our approach recognizes novel verb-noun compositions more effectively than current state of the art systems, and it generalizes to unseen action categories quite efficiently from only a few labeled examples. We validate our approach on the challenging Something-Else tasks from the Something-Something-V2 dataset. We further show that our framework is flexible and can generalize to a new domain by showing competitive results on the Charades-Fewshot dataset.
△ Less
Submitted 17 December, 2020; v1 submitted 3 December, 2020;
originally announced December 2020.
-
Analysis on the Pricing model for a Discrete Coupon Bond with Early redemption provision by the Structural Approach
Authors:
Hyong Chol O,
Tae Song Kim
Abstract:
In this paper, using the structural approach is derived a mathematical model of the discrete coupon bond with the provision that allow the holder to demand early redemption at any coupon dates prior to the maturity and based on this model is provided some analysis including min-max and gradient estimates of the bond price. Using these estimates the existence and uniqueness of the default boundarie…
▽ More
In this paper, using the structural approach is derived a mathematical model of the discrete coupon bond with the provision that allow the holder to demand early redemption at any coupon dates prior to the maturity and based on this model is provided some analysis including min-max and gradient estimates of the bond price. Using these estimates the existence and uniqueness of the default boundaries and some relationships between the design parameters of the discrete coupon bond with early redemption provision are described. Then under some assumptions the existence and uniqueness of the early redemption boundaries is proved and the analytic formula of the bond price is provided using higher binary options. Finally for our bond is provided the analysis on the duration and credit spread, which are used widely in financial reality. Our works provide a design guide of the discrete coupon bond with the early redemption provision
△ Less
Submitted 3 July, 2020;
originally announced July 2020.
-
DASZL: Dynamic Action Signatures for Zero-shot Learning
Authors:
Tae Soo Kim,
Jonathan D. Jones,
Michael Peven,
Zihao Xiao,
Jin Bai,
Yi Zhang,
Weichao Qiu,
Alan Yuille,
Gregory D. Hager
Abstract:
There are many realistic applications of activity recognition where the set of potential activity descriptions is combinatorially large. This makes end-to-end supervised training of a recognition system impractical as no training set is practically able to encompass the entire label set. In this paper, we present an approach to fine-grained recognition that models activities as compositions of dyn…
▽ More
There are many realistic applications of activity recognition where the set of potential activity descriptions is combinatorially large. This makes end-to-end supervised training of a recognition system impractical as no training set is practically able to encompass the entire label set. In this paper, we present an approach to fine-grained recognition that models activities as compositions of dynamic action signatures. This compositional approach allows us to reframe fine-grained recognition as zero-shot activity recognition, where a detector is composed "on the fly" from simple first-principles state machines supported by deep-learned components. We evaluate our method on the Olympic Sports and UCF101 datasets, where our model establishes a new state of the art under multiple experimental paradigms. We also extend this method to form a unique framework for zero-shot joint segmentation and classification of activities in video and demonstrate the first results in zero-shot decoding of complex action sequences on a widely-used surgical dataset. Lastly, we show that we can use off-the-shelf object detectors to recognize activities in completely de-novo settings with no additional training.
△ Less
Submitted 17 November, 2020; v1 submitted 7 December, 2019;
originally announced December 2019.
-
Gate tunable optical absorption and band structure of twisted bilayer graphene
Authors:
Kwangnam Yu,
Van Luan Nguyen,
Tae Soo Kim,
Jiwon Jeon,
Jiho Kim,
Pilkyung Moon,
Young Hee Lee,
E. J. Choi
Abstract:
We report the infrared transmission measurement on electrically gated twisted bilayer graphene. The optical absorption spectrum clearly manifests the dramatic changes such as the splitting of inter-linear-band absorption step, the shift of inter-van Hove singularity transition peak, and the emergence of very strong intra-valence (intra-conduction) band transition. These anomalous optical behaviors…
▽ More
We report the infrared transmission measurement on electrically gated twisted bilayer graphene. The optical absorption spectrum clearly manifests the dramatic changes such as the splitting of inter-linear-band absorption step, the shift of inter-van Hove singularity transition peak, and the emergence of very strong intra-valence (intra-conduction) band transition. These anomalous optical behaviors demonstrate consistently the non-rigid band structure modification created by the ion-gel gating through the layer-dependent Coulomb screening. We propose that this screening-driven band modification is an universal phenomenon that persists to other bilayer crystals in general, establishing the electrical gating as a versatile technique to engineer the band structures and to create new types of optical absorptions that can be exploited in electro-optical device application.
△ Less
Submitted 7 November, 2019;
originally announced November 2019.
-
Train, Diagnose and Fix: Interpretable Approach for Fine-grained Action Recognition
Authors:
Jingxuan Hou,
Tae Soo Kim,
Austin Reiter
Abstract:
Despite the growing discriminative capabilities of modern deep learning methods for recognition tasks, the inner workings of the state-of-art models still remain mostly black-boxes. In this paper, we propose a systematic interpretation of model parameters and hidden representations of Residual Temporal Convolutional Networks (Res-TCN) for action recognition in time-series data. We also propose a F…
▽ More
Despite the growing discriminative capabilities of modern deep learning methods for recognition tasks, the inner workings of the state-of-art models still remain mostly black-boxes. In this paper, we propose a systematic interpretation of model parameters and hidden representations of Residual Temporal Convolutional Networks (Res-TCN) for action recognition in time-series data. We also propose a Feature Map Decoder as part of the interpretation analysis, which outputs a representation of model's hidden variables in the same domain as the input. Such analysis empowers us to expose model's characteristic learning patterns in an interpretable way. For example, through the diagnosis analysis, we discovered that our model has learned to achieve view-point invariance by implicitly learning to perform rotational normalization of the input to a more discriminative view. Based on the findings from the model interpretation analysis, we propose a targeted refinement technique, which can generalize to various other recognition models. The proposed work introduces a three-stage paradigm for model learning: training, interpretable diagnosis and targeted refinement. We validate our approach on skeleton based 3D human action recognition benchmark of NTU RGB+D. We show that the proposed workflow is an effective model learning strategy and the resulting Multi-stream Residual Temporal Convolutional Network (MS-Res-TCN) achieves the state-of-the-art performance on NTU RGB+D.
△ Less
Submitted 22 November, 2017;
originally announced November 2017.
-
Interpretable 3D Human Action Analysis with Temporal Convolutional Networks
Authors:
Tae Soo Kim,
Austin Reiter
Abstract:
The discriminative power of modern deep learning models for 3D human action recognition is growing ever so potent. In conjunction with the recent resurgence of 3D human action representation with 3D skeletons, the quality and the pace of recent progress have been significant. However, the inner workings of state-of-the-art learning based methods in 3D human action recognition still remain mostly b…
▽ More
The discriminative power of modern deep learning models for 3D human action recognition is growing ever so potent. In conjunction with the recent resurgence of 3D human action representation with 3D skeletons, the quality and the pace of recent progress have been significant. However, the inner workings of state-of-the-art learning based methods in 3D human action recognition still remain mostly black-box. In this work, we propose to use a new class of models known as Temporal Convolutional Neural Networks (TCN) for 3D human action recognition. Compared to popular LSTM-based Recurrent Neural Network models, given interpretable input such as 3D skeletons, TCN provides us a way to explicitly learn readily interpretable spatio-temporal representations for 3D human action recognition. We provide our strategy in re-designing the TCN with interpretability in mind and how such characteristics of the model is leveraged to construct a powerful 3D activity recognition method. Through this work, we wish to take a step towards a spatio-temporal model that is easier to understand, explain and interpret. The resulting model, Res-TCN, achieves state-of-the-art results on the largest 3D human action recognition dataset, NTU-RGBD.
△ Less
Submitted 14 April, 2017;
originally announced April 2017.
-
A Rad-hard CMOS Active Pixel Sensor for Electron Microscopy
Authors:
Marco Battaglia,
Devis Contarato,
Peter Denes,
Dionisio Doering,
Piero Giubilato,
Tae Sung Kim,
Serena Mattiazzo,
Velimir Radmilovic,
Sarah Zalusky
Abstract:
Monolithic CMOS pixel sensors offer unprecedented opportunities for fast nano-imaging through direct electron detection in transmission electron microscopy. We present the design and a full characterisation of a CMOS pixel test structure able to withstand doses in excess of 1 MRad. Data collected with electron beams at various energies of interest in electron microscopy are compared to predictio…
▽ More
Monolithic CMOS pixel sensors offer unprecedented opportunities for fast nano-imaging through direct electron detection in transmission electron microscopy. We present the design and a full characterisation of a CMOS pixel test structure able to withstand doses in excess of 1 MRad. Data collected with electron beams at various energies of interest in electron microscopy are compared to predictions of simulation and to 1.5 GeV electron data to disentagle the effect of multiple scattering. The point spread function measured with 300 keV electrons is (8.1 +/- 1.6) micron for 10 micron pixel and (10.9 +/- 2.3) micron for 20 micron pixels, respectively, which agrees well with the values of 8.4 micron and 10.5 micron predicted by our simulation.
△ Less
Submitted 17 November, 2008;
originally announced November 2008.
-
Dynamical Response of Nanomechanical Resonators to Biomolecular Interactions
Authors:
Kilho Eom,
Tae Yun Kwon,
Dae Sung Yoon,
Hong Lim Lee,
Tae Song Kim
Abstract:
We studied the dynamical response of a nanomechanical resonator to biomolecular (e.g. DNA) adsorptions on a resonator's surface by using a theoretical model, which considers the Hamiltonian H such that the potential energy consists of elastic bending energy of a resonator and the potential energy for biomolecular interactions. It was shown that the resonant frequency shift of a resonator due to…
▽ More
We studied the dynamical response of a nanomechanical resonator to biomolecular (e.g. DNA) adsorptions on a resonator's surface by using a theoretical model, which considers the Hamiltonian H such that the potential energy consists of elastic bending energy of a resonator and the potential energy for biomolecular interactions. It was shown that the resonant frequency shift of a resonator due to biomolecular adsorption depends on not only the mass of adsorbed biomolecules but also the biomolecular interactions. Specifically, for dsDNA adsorption on a resonator's surface, the resonant frequency shift is also dependent on the ionic strength of a solvent, implying the role of molecular interactions on the dynamic behavior of a resonator. This indicates that nanomechanical resonators may enable one to quantify the biomolecular mass, implying the enumeration of biomolecules, as well as gain insight into intermolecular interactions between adsorbed biomolecules on the surface.
△ Less
Submitted 17 August, 2007; v1 submitted 25 June, 2007;
originally announced June 2007.
-
A Homogeneous Sample of Sub-DLAs III: Total Gas Mass Omega_(HI+HeII) at z>2
Authors:
Celine Peroux,
Miroslava Dessauges-Zavadsky,
Sandro D'Odorico,
Tae Sun Kim,
Richard G. McMahon
Abstract:
Absorbers seen in the spectrum of background quasars are a unique tool to select HI-rich galaxies at all redshifts. In turns, these allow to determine the cosmological evolution of the HI gas, Omega_HI+HeII, a possible indicator of gas consumption as star formation proceeds. The Damped Lyman-alpha systems (DLAs with N(HI) > 10^20.3 cm^-2), in particular, are believed to contain a large fraction…
▽ More
Absorbers seen in the spectrum of background quasars are a unique tool to select HI-rich galaxies at all redshifts. In turns, these allow to determine the cosmological evolution of the HI gas, Omega_HI+HeII, a possible indicator of gas consumption as star formation proceeds. The Damped Lyman-alpha systems (DLAs with N(HI) > 10^20.3 cm^-2), in particular, are believed to contain a large fraction of the HI gas but there are also indications that lower column density systems, named ``sub-Damped Lyman-alpha'' systems play a role at high-redshift. Here we present the discovery of high-redshift sub-DLAs based on 17 z>4 quasar spectra observed with the Ultraviolet-Visual Echelle Spectrograph (UVES) on VLT. This sample is composed of 21 new sub-DLAs which, together with another 10 systems from previous ESO archive studies, make up a homogeneous sample. The redshift evolution of the number density of several classes of absorbers is derived and shows that all systems seem to be evolving in the redshift range from z=5 to z~3. This is further used to estimate the redshift evolution of the characteristic radius of these classes of absorbers assuming a Holmberg relation and one unique underlying parent population. DLAs are found to have R_* ~ 20 h_100^-1 kpc, while sub-DLAs have R_* ~ 40 h_100^-1 kpc. The redshift evolution of the column density distribution, f(N,z), down to N(HI) = 10^19 cm^-2 is also presented. A departure from a power law due to a flattening of f(N,z) in the sub-DLA regime is present in the data. f(N,z) is further used to determine the HI gas mass contained in sub-DLAs at z>2. The complete sample shows that sub-DLAs are important at all redshifts from z=5 to z=2. Finally, the possibility that sub-DLAs are less affected by the effects of dust obscuration than classical DLAs are discussed.
△ Less
Submitted 14 July, 2005;
originally announced July 2005.
-
A Homogeneous Sample of Sub-DLAs II: Statistical, Kinematic and Chemical Properties
Authors:
Celine Peroux,
Miroslava Dessauges-Zavadsky,
Sandro D'Odorico,
Tae Sun Kim,
Richard G. McMahon
Abstract:
Damped Ly-alpha Systems (DLAs), with N(HI)>2*10^20 cm^{-2}, observed in quasars have allowed us to quantify the chemical content of the Universe over cosmological scales. Such studies can be extended to lower N(HI), in the sub-DLA range (10^19<N(HI)<2*10^20 cm^{-2}), which are systems believed to contain a large fraction of the neutral hydrogen at z>3.5. In this paper, we use a homogeneous sampl…
▽ More
Damped Ly-alpha Systems (DLAs), with N(HI)>2*10^20 cm^{-2}, observed in quasars have allowed us to quantify the chemical content of the Universe over cosmological scales. Such studies can be extended to lower N(HI), in the sub-DLA range (10^19<N(HI)<2*10^20 cm^{-2}), which are systems believed to contain a large fraction of the neutral hydrogen at z>3.5. In this paper, we use a homogeneous sample of sub-DLAs from the ESO UVES archives presented in Paper I (Dessauges-Zavadsky et al. 2003), to observationally determine for the first time the column density distribution, f(N), down to N(HI)=10^19 cm^{-2}. The results are in good agreement with the predictions from Peroux et al. (2003). We present the kinematic and clustering properties of this survey. We compare low- and high-ionization transition widths and find that the sub-DLAs properties roughly span the parameter space of DLAs. We analyse the chemical content of this sample in conjunction with abundances from 72 DLAs taken from the literature. We compute the HI column density-weighted mean abundance which is believed to be an indicator of the Universe's metallicity. Although the number statistics is limited, the results suggest a slightly stronger evolution in the sub-DLA range. The evolution we probe is not due to their lower dust content. Therefore these systems might be associated with a different class of objects which better trace the overall chemical evolution of the Universe. Finally, we present abundance ratios of [Si/Fe], [O/Fe], [C/Fe] and [Al/Fe], but it is difficult to decipher whether the observed values are the effect of nucleosynthesis or are due to differential dust depletion. The metallicities are compared with two different sets of models of galaxy evolution in order to provide constraints on the morphology of the absorbers [abridged].
△ Less
Submitted 2 July, 2003;
originally announced July 2003.
-
Dynamical surface structures in multi-particle-correlated surface growths
Authors:
Yup Kim,
T. S. Kim,
Hyunggyu Park
Abstract:
We investigate the scaling properties of the interface fluctuation width for the $Q$-mer and $Q$-particle-correlated deposition-evaporation models. These models are constrained with a global conservation law that the particle number at each height is conserved modulo $Q$. In equilibrium, the stationary roughness is anomalous but universal with roughness exponent $α=1/3$, while the early time evo…
▽ More
We investigate the scaling properties of the interface fluctuation width for the $Q$-mer and $Q$-particle-correlated deposition-evaporation models. These models are constrained with a global conservation law that the particle number at each height is conserved modulo $Q$. In equilibrium, the stationary roughness is anomalous but universal with roughness exponent $α=1/3$, while the early time evolution shows nonuniversal behavior with growth exponent $β$ varying with models and $Q$. Nonequilibrium surfaces display diverse growing/stationary behavior. The $Q$-mer model shows a faceted structure, while the $Q$-particle-correlated model a macroscopically grooved structure.
△ Less
Submitted 21 May, 2002;
originally announced May 2002.
-
The Dynamical Behaviors in (2+1)-Dimensional Gross-Neveu Model with a Thirring Interaction
Authors:
Tae Seong Kim,
Won-Ho Kye,
Jae Kwan Kim
Abstract:
We analyze (2+1)-dimensional Gross-Neveu model with a Thirring interaction, where a vector-vector type four-fermi interaction is on equal terms with a scalar-scalar type one. The Dyson-Schwinger equation for fermion self-energy function is constructed up to next-to-leading order in 1/N expansion. We determine the critical surface which is the boundary between a broken phase and an unbroken one i…
▽ More
We analyze (2+1)-dimensional Gross-Neveu model with a Thirring interaction, where a vector-vector type four-fermi interaction is on equal terms with a scalar-scalar type one. The Dyson-Schwinger equation for fermion self-energy function is constructed up to next-to-leading order in 1/N expansion. We determine the critical surface which is the boundary between a broken phase and an unbroken one in ($α_c,~ β_c,~ N_c$) space. It is observed that the critical behavior is mainly controlled by Gross-Neveu coupling $α_c$ and the region of the broken phase is separated into two parts by the line $α_c=α_c^*(=\frac{8}{π^2})$. The mass function is strongly dependent upon the flavor number N for $α> α_c^*$, while weakly for $α< α_c^*$. For $α> α_c^*$, the critical flavor number $N_c$ increases as Thirring coupling $β$ decreases. By driving the CJT effective potential, we show that the broken phase is energetically preferred to the symmetric one. We discuss the gauge dependence of the mass function and the ultra-violet property of the composite operators.
△ Less
Submitted 13 September, 2000;
originally announced September 1995.