Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–22 of 22 results for author: Chae, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.05238  [pdf, other

    cs.CV

    P2P: Part-to-Part Motion Cues Guide a Strong Tracking Framework for LiDAR Point Clouds

    Authors: Jiahao Nie, Fei Xie, Sifan Zhou, Xueyi Zhou, Dong-Kyu Chae, Zhiwei He

    Abstract: 3D single object tracking (SOT) methods based on appearance matching has long suffered from insufficient appearance information incurred by incomplete, textureless and semantically deficient LiDAR point clouds. While motion paradigm exploits motion cues instead of appearance matching for tracking, it incurs complex multi-stage processing and segmentation module. In this paper, we first provide in-… ▽ More

    Submitted 8 July, 2024; v1 submitted 6 July, 2024; originally announced July 2024.

    Comments: The source code and pre-trained models are available at https://github.com/haooozi/P2P

  2. arXiv:2407.04994  [pdf, other

    cs.CV cs.LG

    The Solution for Language-Enhanced Image New Category Discovery

    Authors: Haonan Xu, Dian Chao, Xiangyu Wu, Zhonghua Wan, Yang Yang

    Abstract: Treating texts as images, combining prompts with textual labels for prompt tuning, and leveraging the alignment properties of CLIP have been successfully applied in zero-shot multi-label image recognition. Nonetheless, relying solely on textual labels to store visual information is insufficient for representing the diversity of visual objects. In this paper, we propose reversing the training proce… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  3. arXiv:2407.01907  [pdf, other

    cs.CV cs.LG

    The Solution for the ICCV 2023 Perception Test Challenge 2023 -- Task 6 -- Grounded videoQA

    Authors: Hailiang Zhang, Dian Chao, Zhihao Guan, Yang Yang

    Abstract: In this paper, we introduce a grounded video question-answering solution. Our research reveals that the fixed official baseline method for video question answering involves two main steps: visual grounding and object tracking. However, a significant challenge emerges during the initial step, where selected frames may lack clearly identifiable target objects. Furthermore, single images cannot addre… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  4. arXiv:2404.03528  [pdf, other

    cs.CL cs.IR cs.LG cs.NE cs.SI

    BanglaAutoKG: Automatic Bangla Knowledge Graph Construction with Semantic Neural Graph Filtering

    Authors: Azmine Toushik Wasi, Taki Hasan Rafi, Raima Islam, Dong-Kyu Chae

    Abstract: Knowledge Graphs (KGs) have proven essential in information processing and reasoning applications because they link related entities and give context-rich information, supporting efficient information retrieval and knowledge discovery; presenting information flow in a very effective manner. Despite being widely used globally, Bangla is relatively underrepresented in KGs due to a lack of comprehens… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: 7 pages, 3 figures. Accepted to LREC-COLING 2024. Read in ACL Anthology: https://aclanthology.org/2024.lrec-main.189/

    Journal ref: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

  5. arXiv:2404.01104  [pdf, other

    cs.CL

    SentiCSE: A Sentiment-aware Contrastive Sentence Embedding Framework with Sentiment-guided Textual Similarity

    Authors: Jaemin Kim, Yohan Na, Kangmin Kim, Sang Rak Lee, Dong-Kyu Chae

    Abstract: Recently, sentiment-aware pre-trained language models (PLMs) demonstrate impressive results in downstream sentiment analysis tasks. However, they neglect to evaluate the quality of their constructed sentiment representations; they just focus on improving the fine-tuning performance, which overshadows the representation quality. We argue that without guaranteeing the representation quality, their d… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 14 pages, 8 figures

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: LREC-COLING2024

  6. arXiv:2403.17639  [pdf, other

    eess.IV cs.CV

    High-Resolution Image Translation Model Based on Grayscale Redefinition

    Authors: Xixian Wu, Dian Chao, Yang Yang

    Abstract: Image-to-image translation is a technique that focuses on transferring images from one domain to another while maintaining the essential content representations. In recent years, image-to-image translation has gained significant attention and achieved remarkable advancements due to its diverse applications in computer vision and image processing tasks. In this work, we propose an innovative method… ▽ More

    Submitted 1 April, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  7. arXiv:2403.17342  [pdf, other

    cs.CV cs.AI

    The Solution for the ICCV 2023 1st Scientific Figure Captioning Challenge

    Authors: Dian Chao, Xin Song, Shupeng Zhong, Boyuan Wang, Xiangyu Wu, Chen Zhu, Yang Yang

    Abstract: In this paper, we propose a solution for improving the quality of captions generated for figures in papers. We adopt the approach of summarizing the textual content in the paper to generate image captions. Throughout our study, we encounter discrepancies in the OCR information provided in the official dataset. To rectify this, we employ the PaddleOCR toolkit to extract OCR information from all ima… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  8. arXiv:2403.17210  [pdf, other

    cs.LG cs.AI cs.IR q-bio.BM q-bio.MN

    CADGL: Context-Aware Deep Graph Learning for Predicting Drug-Drug Interactions

    Authors: Azmine Toushik Wasi, Taki Hasan Rafi, Raima Islam, Serbetar Karlo, Dong-Kyu Chae

    Abstract: Examining Drug-Drug Interactions (DDIs) is a pivotal element in the process of drug development. DDIs occur when one drug's properties are affected by the inclusion of other drugs. Detecting favorable DDIs has the potential to pave the way for creating and advancing innovative medications applicable in practical settings. However, existing DDI prediction models continue to face challenges related… ▽ More

    Submitted 27 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: 8 Pages, 4 Figures; In review

  9. arXiv:2403.12984  [pdf, other

    q-bio.BM cs.CL cs.IR cs.LG stat.ML

    When SMILES have Language: Drug Classification using Text Classification Methods on Drug SMILES Strings

    Authors: Azmine Toushik Wasi, Ĺ erbetar Karlo, Raima Islam, Taki Hasan Rafi, Dong-Kyu Chae

    Abstract: Complex chemical structures, like drugs, are usually defined by SMILES strings as a sequence of molecules and bonds. These SMILES strings are used in different complex machine learning-based drug-related research and representation works. Escaping from complex representation, in this work, we pose a single question: What if we treat drug SMILES as conventional sentences and engage in text classifi… ▽ More

    Submitted 27 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: 7 pages, 2 figures, 5 tables, Accepted (invited to present) to the The Second Tiny Papers Track at ICLR 2024 (https://openreview.net/forum?id=VUYCyH8fCw)

    Journal ref: The Second Tiny Papers Track at {ICLR} 2024, Tiny Papers @ {ICLR} 2024, Vienna Austria, May 11, 2024

  10. arXiv:2402.16040  [pdf, other

    cs.CL

    EHRNoteQA: An LLM Benchmark for Real-World Clinical Practice Using Discharge Summaries

    Authors: Sunjun Kweon, Jiyoun Kim, Heeyoung Kwak, Dongchul Cha, Hangyul Yoon, Kwanghyun Kim, Jeewon Yang, Seunghyun Won, Edward Choi

    Abstract: Discharge summaries in Electronic Health Records (EHRs) are crucial for clinical decision-making, but their length and complexity make information extraction challenging, especially when dealing with accumulated summaries across multiple patient admissions. Large Language Models (LLMs) show promise in addressing this challenge by efficiently analyzing vast and complex data. Existing benchmarks, ho… ▽ More

    Submitted 27 June, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

    Comments: Under Review

  11. arXiv:2401.11204  [pdf, other

    cs.CV

    Towards Category Unification of 3D Single Object Tracking on Point Clouds

    Authors: Jiahao Nie, Zhiwei He, Xudong Lv, Xueyi Zhou, Dong-Kyu Chae, Fei Xie

    Abstract: Category-specific models are provenly valuable methods in 3D single object tracking (SOT) regardless of Siamese or motion-centric paradigms. However, such over-specialized model designs incur redundant parameters, thus limiting the broader applicability of 3D SOT task. This paper first introduces unified models that can simultaneously track objects across all categories using a single network with… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

    Comments: Accepted by ICLR2024 (poster)

  12. arXiv:2312.03011  [pdf, other

    cs.CV cs.AI

    InstructBooth: Instruction-following Personalized Text-to-Image Generation

    Authors: Daewon Chae, Nokyung Park, Jinkyu Kim, Kimin Lee

    Abstract: Personalizing text-to-image models using a limited set of images for a specific object has been explored in subject-specific image generation. However, existing methods often face challenges in aligning with text prompts due to overfitting to the limited training images. In this work, we introduce InstructBooth, a novel method designed to enhance image-text alignment in personalized text-to-image… ▽ More

    Submitted 15 February, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

  13. arXiv:2310.02692  [pdf, other

    cs.CV cs.AI

    Clustering-based Image-Text Graph Matching for Domain Generalization

    Authors: Nokyung Park, Daewon Chae, Jeongyong Shim, Sangpil Kim, Eun-Sol Kim, Jinkyu Kim

    Abstract: Learning domain-invariant visual representations is important to train a model that can generalize well to unseen target task domains. Recent works demonstrate that text descriptions contain high-level class-discriminative information and such auxiliary semantic cues can be used as effective pivot embedding for domain generalization problem. However, they use pivot embedding in global manner (i.e.… ▽ More

    Submitted 15 April, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

  14. arXiv:2306.08402  [pdf, other

    cs.CR

    Fairness and Privacy-Preserving in Federated Learning: A Survey

    Authors: Taki Hasan Rafi, Faiza Anan Noor, Tahmid Hussain, Dong-Kyu Chae

    Abstract: Federated learning (FL) as distributed machine learning has gained popularity as privacy-aware Machine Learning (ML) systems have emerged as a technique that prevents privacy leakage by building a global model and by conducting individualized training of decentralized edge clients on their own private data. The existing works, however, employ privacy mechanisms such as Secure Multiparty Computing… ▽ More

    Submitted 14 July, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: 23 pages; 2 figures

  15. arXiv:2305.01486  [pdf, other

    cs.CV cs.AI

    ARBEx: Attentive Feature Extraction with Reliability Balancing for Robust Facial Expression Learning

    Authors: Azmine Toushik Wasi, Karlo Ĺ erbetar, Raima Islam, Taki Hasan Rafi, Dong-Kyu Chae

    Abstract: In this paper, we introduce a framework ARBEx, a novel attentive feature extraction framework driven by Vision Transformer with reliability balancing to cope against poor class distributions, bias, and uncertainty in the facial expression learning (FEL) task. We reinforce several data pre-processing and refinement methods along with a window-based cross-attention ViT to squeeze the best of the dat… ▽ More

    Submitted 14 July, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: 12 pages, 7 figures. Code: https://github.com/takihasan/ARBEx

  16. arXiv:2303.14787  [pdf, other

    cs.DC

    A Generalized Look at Federated Learning: Survey and Perspectives

    Authors: Taki Hasan Rafi, Faiza Anan Noor, Tahmid Hussain, Dong-Kyu Chae, Zhaohui Yang

    Abstract: Federated learning (FL) refers to a distributed machine learning framework involving learning from several decentralized edge clients without sharing local dataset. This distributed strategy prevents data leakage and enables on-device training as it updates the global model based on the local model updates. Despite offering several advantages, including data privacy and scalability, FL poses chall… ▽ More

    Submitted 26 March, 2023; originally announced March 2023.

    Comments: 9 pages, 2 figures

  17. arXiv:2209.09475  [pdf, other

    cs.CV

    Revisiting Image Pyramid Structure for High Resolution Salient Object Detection

    Authors: Taehun Kim, Kunhee Kim, Joonyeong Lee, Dongmin Cha, Jiho Lee, Daijin Kim

    Abstract: Salient object detection (SOD) has been in the spotlight recently, yet has been studied less for high-resolution (HR) images. Unfortunately, HR images and their pixel-level annotations are certainly more labor-intensive and time-consuming compared to low-resolution (LR) images and annotations. Therefore, we propose an image pyramid-based SOD framework, Inverse Saliency Pyramid Reconstruction Netwo… ▽ More

    Submitted 16 November, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

    Comments: 27 pages, 15 figures, 7 tables. To appear in the 16th Asian Conference on Computer Vision (ACCV2022), December 4-8, 2022, Macau SAR, China. DOI will be added soon. Results on DIS5K are added in appendices which will not be in the published version

  18. arXiv:2204.09442  [pdf, ps, other

    cs.CV eess.IV

    DAM-GAN : Image Inpainting using Dynamic Attention Map based on Fake Texture Detection

    Authors: Dongmin Cha, Daijin Kim

    Abstract: Deep neural advancements have recently brought remarkable image synthesis performance to the field of image inpainting. The adaptation of generative adversarial networks (GAN) in particular has accelerated significant progress in high-quality image reconstruction. However, although many notable GAN-based networks have been proposed for image inpainting, still pixel artifacts or color inconsistency… ▽ More

    Submitted 20 April, 2022; originally announced April 2022.

  19. arXiv:2101.06371  [pdf, other

    cs.LG cs.AI cs.SE

    NNStreamer: Efficient and Agile Development of On-Device AI Systems

    Authors: MyungJoo Ham, Jijoong Moon, Geunsik Lim, Jaeyun Jung, Hyoungjoo Ahn, Wook Song, Sangjung Woo, Parichay Kapoor, Dongju Chae, Gichan Jang, Yongjoo Ahn, Jihoon Lee

    Abstract: We propose NNStreamer, a software system that handles neural networks as filters of stream pipelines, applying the stream processing paradigm to deep neural network applications. A new trend with the wide-spread of deep neural network applications is on-device AI. It is to process neural networks on mobile devices or edge/IoT devices instead of cloud servers. Emerging privacy issues, data transmis… ▽ More

    Submitted 15 January, 2021; originally announced January 2021.

    Comments: IEEE/ACM ICSE 2021 SEIP (preprint)

  20. arXiv:2003.09085  [pdf, other

    cs.CV cs.LG

    Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network

    Authors: Jakaria Rabbi, Nilanjan Ray, Matthias Schubert, Subir Chowdhury, Dennis Chao

    Abstract: The detection performance of small objects in remote sensing images is not satisfactory compared to large objects, especially in low-resolution and noisy images. A generative adversarial network (GAN)-based model called enhanced super-resolution GAN (ESRGAN) shows remarkable image enhancement performance, but reconstructed images miss high-frequency edge information. Therefore, object detection pe… ▽ More

    Submitted 28 April, 2020; v1 submitted 19 March, 2020; originally announced March 2020.

    Comments: This paper contains 27 pages and accepted for publication in MDPI remote sensing journal. GitHub Repository: https://github.com/Jakaria08/EESRGAN (Implementation)

  21. arXiv:1908.08151  [pdf, other

    cs.DS cs.GR

    Multi-level Graph Drawing using Infomap Clustering

    Authors: Seok-Hee Hong, Peter Eades, Marnijati Torkel, Ziyang Wang, David Chae, Sungpack Hong, Daniel Langerenken, Hassan Chafi

    Abstract: Infomap clustering finds the community structures that minimize the expected description length of a random walk trajectory; algorithms for infomap clustering run fast in practice for large graphs. In this paper we leverage the effectiveness of Infomap clustering combined with the multi-level graph drawing paradigm. Experiments show that our new Infomap based multi-level algorithm produces good vi… ▽ More

    Submitted 21 August, 2019; originally announced August 2019.

    Comments: Appears in the Proceedings of the 27th International Symposium on Graph Drawing and Network Visualization (GD 2019)

  22. arXiv:1811.00143  [pdf, other

    cs.CV cs.DC cs.LG

    Democratizing Production-Scale Distributed Deep Learning

    Authors: Minghuang Ma, Hadi Pouransari, Daniel Chao, Saurabh Adya, Santiago Akle Serrano, Yi Qin, Dan Gimnicher, Dominic Walsh

    Abstract: The interest and demand for training deep neural networks have been experiencing rapid growth, spanning a wide range of applications in both academia and industry. However, training them distributed and at scale remains difficult due to the complex ecosystem of tools and hardware involved. One consequence is that the responsibility of orchestrating these complex components is often left to one-off… ▽ More

    Submitted 3 November, 2018; v1 submitted 31 October, 2018; originally announced November 2018.