Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–19 of 19 results for author: Park, D H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16896  [pdf, other

    eess.SP cs.LG

    f-GAN: A frequency-domain-constrained generative adversarial network for PPG to ECG synthesis

    Authors: Nathan C. L. Kong, Dae Lee, Huyen Do, Dae Hoon Park, Cong Xu, Hongda Mao, Jonathan Chung

    Abstract: Electrocardiograms (ECGs) and photoplethysmograms (PPGs) are generally used to monitor an individual's cardiovascular health. In clinical settings, ECGs and fingertip PPGs are the main signals used for assessing cardiovascular health, but the equipment necessary for their collection precludes their use in daily monitoring. Although PPGs obtained from wrist-worn devices are susceptible to noise due… ▽ More

    Submitted 15 May, 2024; originally announced June 2024.

  2. arXiv:2312.08472  [pdf, other

    cs.NE cs.LG math.NA

    AutoNumerics-Zero: Automated Discovery of State-of-the-Art Mathematical Functions

    Authors: Esteban Real, Yao Chen, Mirko Rossini, Connal de Souza, Manav Garg, Akhil Verghese, Moritz Firsching, Quoc V. Le, Ekin Dogus Cubuk, David H. Park

    Abstract: Computers calculate transcendental functions by approximating them through the composition of a few limited-precision instructions. For example, an exponential can be calculated with a Taylor series. These approximation methods were developed over the centuries by mathematicians, who emphasized the attainability of arbitrary precision. Computers, however, operate on few limited precision types, su… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    ACM Class: I.2.2; I.2.6; G.1.2

  3. arXiv:2305.14334  [pdf, other

    cs.CV

    Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence

    Authors: Grace Luo, Lisa Dunlap, Dong Huk Park, Aleksander Holynski, Trevor Darrell

    Abstract: Diffusion models have been shown to be capable of generating high-quality images, suggesting that they could contain meaningful internal representations. Unfortunately, the feature maps that encode a diffusion model's internal information are spread not only over layers of the network, but also over diffusion timesteps, making it challenging to extract useful descriptors. We propose Diffusion Hype… ▽ More

    Submitted 1 April, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  4. arXiv:2212.00210  [pdf, other

    cs.CV cs.AI cs.LG

    Shape-Guided Diffusion with Inside-Outside Attention

    Authors: Dong Huk Park, Grace Luo, Clayton Toste, Samaneh Azadi, Xihui Liu, Maka Karalashvili, Anna Rohrbach, Trevor Darrell

    Abstract: We introduce precise object silhouette as a new form of user control in text-to-image diffusion models, which we dub Shape-Guided Diffusion. Our training-free method uses an Inside-Outside Attention mechanism during the inversion and generation process to apply a shape constraint to the cross- and self-attention maps. Our mechanism designates which spatial region is the object (inside) vs. backgro… ▽ More

    Submitted 1 April, 2024; v1 submitted 30 November, 2022; originally announced December 2022.

    Comments: WACV 2024

  5. arXiv:2112.05744  [pdf, other

    cs.CV cs.GR

    More Control for Free! Image Synthesis with Semantic Diffusion Guidance

    Authors: Xihui Liu, Dong Huk Park, Samaneh Azadi, Gong Zhang, Arman Chopikyan, Yuxiao Hu, Humphrey Shi, Anna Rohrbach, Trevor Darrell

    Abstract: Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from a reference image. Recently, denoising diffusion probabilistic models have been shown to generate more realistic imagery than prior methods, and have been successfully demonstrated in unconditional and class-conditional settings. We investigate fine-grained, continuous control of this m… ▽ More

    Submitted 5 December, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

    Comments: WACV 2023. Project page https://xh-liu.github.io/sdg/

  6. arXiv:2110.15797  [pdf, other

    cs.CL cs.AI cs.LG

    Discovering Non-monotonic Autoregressive Orderings with Variational Inference

    Authors: Xuanlin Li, Brandon Trabucco, Dong Huk Park, Michael Luo, Sheng Shen, Trevor Darrell, Yang Gao

    Abstract: The predominant approach for language modeling is to process sequences from left to right, but this eliminates a source of information: the order by which the sequence was generated. One strategy to recover this information is to decode both the content and ordering of tokens. Existing approaches supervise content and ordering by designing problem-specific loss functions and pre-training with an o… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: updated from ICLR 2021, first two authors contributed equally

  7. arXiv:2108.05887  [pdf, other

    cs.CV cs.AI cs.LG

    Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations

    Authors: Josh Beal, Hao-Yu Wu, Dong Huk Park, Andrew Zhai, Dmitry Kislyuk

    Abstract: Large-scale pretraining of visual representations has led to state-of-the-art performance on a range of benchmark computer vision tasks, yet the benefits of these techniques at extreme scale in complex production systems has been relatively unexplored. We consider the case of a popular visual discovery product, where these representations are trained with multi-task learning, from use-case specifi… ▽ More

    Submitted 12 August, 2021; originally announced August 2021.

    Comments: Accepted by WACV 2022

  8. arXiv:2012.09958  [pdf, other

    cs.CV cs.AI cs.LG

    Toward Transformer-Based Object Detection

    Authors: Josh Beal, Eric Kim, Eric Tzeng, Dong Huk Park, Andrew Zhai, Dmitry Kislyuk

    Abstract: Transformers have become the dominant model in natural language processing, owing to their ability to pretrain on massive amounts of data, then transfer to smaller, more specific tasks via fine-tuning. The Vision Transformer was the first major attempt to apply a pure transformer model directly to images as input, demonstrating that as compared to convolutional networks, transformer-based architec… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

  9. arXiv:1909.10616  [pdf, other

    cs.LG cs.NE stat.ML

    Compiler-Level Matrix Multiplication Optimization for Deep Learning

    Authors: Huaqing Zhang, Xiaolin Cheng, Hui Zang, Dae Hoon Park

    Abstract: An important linear algebra routine, GEneral Matrix Multiplication (GEMM), is a fundamental operator in deep learning. Compilers need to translate these routines into low-level code optimized for specific hardware. Compiler-level optimization of GEMM has significant performance impact on training and executing deep learning models. However, most deep learning frameworks rely on hardware-specific o… ▽ More

    Submitted 23 September, 2019; originally announced September 2019.

  10. arXiv:1908.01707  [pdf, other

    cs.CV

    Learning a Unified Embedding for Visual Search at Pinterest

    Authors: Andrew Zhai, Hao-Yu Wu, Eric Tzeng, Dong Huk Park, Charles Rosenberg

    Abstract: At Pinterest, we utilize image embeddings throughout our search and recommendation systems to help our users navigate through visual content by powering experiences like browsing of related content and searching for exact products for shopping. In this work we describe a multi-task deep metric learning system to learn a single unified image embedding which can be used to power our multiple visual… ▽ More

    Submitted 5 August, 2019; originally announced August 2019.

    Comments: in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge and Discovery and Data Mining, 2019

  11. arXiv:1901.02527  [pdf, other

    cs.CV cs.AI

    Robust Change Captioning

    Authors: Dong Huk Park, Trevor Darrell, Anna Rohrbach

    Abstract: Describing what has changed in a scene can be useful to a user, but only if generated text focuses on what is semantically relevant. It is thus important to distinguish distractors (e.g. a viewpoint change) from relevant changes (e.g. an object has moved). We present a novel Dual Dynamic Attention Model (DUDA) to perform robust Change Captioning. Our model learns to distinguish distractors from se… ▽ More

    Submitted 16 April, 2019; v1 submitted 8 January, 2019; originally announced January 2019.

  12. arXiv:1811.08056  [pdf, ps, other

    cs.LG cs.CV stat.ML

    Gradient-Coherent Strong Regularization for Deep Neural Networks

    Authors: Dae Hoon Park, Chiu Man Ho, Yi Chang, Huaqing Zhang

    Abstract: Regularization plays an important role in generalization of deep neural networks, which are often prone to overfitting with their numerous parameters. L1 and L2 regularizers are common regularization tools in machine learning with their simplicity and effectiveness. However, we observe that imposing strong L1 or L2 regularization with stochastic gradient descent on deep neural networks easily fail… ▽ More

    Submitted 17 October, 2019; v1 submitted 19 November, 2018; originally announced November 2018.

  13. Adversarial Sampling and Training for Semi-Supervised Information Retrieval

    Authors: Dae Hoon Park, Yi Chang

    Abstract: Ad-hoc retrieval models with implicit feedback often have problems, e.g., the imbalanced classes in the data set. Too few clicked documents may hurt generalization ability of the models, whereas too many non-clicked documents may harm effectiveness of the models and efficiency of training. In addition, recent neural network-based models are vulnerable to adversarial examples due to the linear natu… ▽ More

    Submitted 17 October, 2019; v1 submitted 9 November, 2018; originally announced November 2018.

    Comments: Published in WWW 2019

  14. arXiv:1810.08322  [pdf, ps, other

    cs.LG cs.CV stat.ML

    Sequenced-Replacement Sampling for Deep Learning

    Authors: Chiu Man Ho, Dae Hoon Park, Wei Yang, Yi Chang

    Abstract: We propose sequenced-replacement sampling (SRS) for training deep neural networks. The basic idea is to assign a fixed sequence index to each sample in the dataset. Once a mini-batch is randomly drawn in each training iteration, we refill the original dataset by successively adding samples according to their sequence index. Thus we carry out replacement sampling but in a batched and sequenced way.… ▽ More

    Submitted 18 October, 2018; originally announced October 2018.

  15. arXiv:1803.04042  [pdf, other

    cs.LG stat.ML

    Interpreting Deep Classifier by Visual Distillation of Dark Knowledge

    Authors: Kai Xu, Dae Hoon Park, Chang Yi, Charles Sutton

    Abstract: Interpreting black box classifiers, such as deep networks, allows an analyst to validate a classifier before it is deployed in a high-stakes setting. A natural idea is to visualize the deep network's representations, so as to "see what the network sees". In this paper, we demonstrate that standard dimension reduction methods in this setting can yield uninformative or even misleading visualizations… ▽ More

    Submitted 11 March, 2018; originally announced March 2018.

  16. arXiv:1802.08129  [pdf, other

    cs.AI cs.CL cs.CV

    Multimodal Explanations: Justifying Decisions and Pointing to the Evidence

    Authors: Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, Marcus Rohrbach

    Abstract: Deep models that are both effective and explainable are desirable in many settings; prior explainable models have been unimodal, offering either image-based visualization of attention weights or text-based generation of post-hoc justifications. We propose a multimodal approach to explanation, and argue that the two modalities provide complementary explanatory strengths. We collect two new datasets… ▽ More

    Submitted 15 February, 2018; originally announced February 2018.

    Comments: arXiv admin note: text overlap with arXiv:1612.04757

  17. arXiv:1711.07373  [pdf, other

    cs.CV

    Attentive Explanations: Justifying Decisions and Pointing to the Evidence (Extended Abstract)

    Authors: Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, Marcus Rohrbach

    Abstract: Deep models are the defacto standard in visual decision problems due to their impressive performance on a wide array of visual tasks. On the other hand, their opaqueness has led to a surge of interest in explainable systems. In this work, we emphasize the importance of model explanation in various forms such as visual pointing and textual justification. The lack of data with justification annotati… ▽ More

    Submitted 17 November, 2017; originally announced November 2017.

    Comments: arXiv admin note: text overlap with arXiv:1612.04757

  18. arXiv:1612.04757  [pdf, other

    cs.CV cs.AI cs.CL

    Attentive Explanations: Justifying Decisions and Pointing to the Evidence

    Authors: Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Bernt Schiele, Trevor Darrell, Marcus Rohrbach

    Abstract: Deep models are the defacto standard in visual decision models due to their impressive performance on a wide array of visual tasks. However, they are frequently seen as opaque and are unable to explain their decisions. In contrast, humans can justify their decisions with natural language and point to the evidence in the visual world which led to their decisions. We postulate that deep models can d… ▽ More

    Submitted 25 July, 2017; v1 submitted 14 December, 2016; originally announced December 2016.

  19. arXiv:1606.01847  [pdf, other

    cs.CV cs.AI cs.CL

    Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

    Authors: Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, Marcus Rohrbach

    Abstract: Modeling textual or visual information with vector representations trained from large language or visual datasets has been successfully explored in recent years. However, tasks such as visual question answering require combining these vector representations with each other. Approaches to multimodal pooling include element-wise product or sum, as well as concatenation of the visual and textual repr… ▽ More

    Submitted 23 September, 2016; v1 submitted 6 June, 2016; originally announced June 2016.

    Comments: Accepted to EMNLP 2016