-
DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance
Authors:
Younghyun Kim,
Geunmin Hwang,
Eunbyung Park
Abstract:
Recent surge in large-scale generative models has spurred the development of vast fields in computer vision. In particular, text-to-image diffusion models have garnered widespread adoption across diverse domain due to their potential for high-fidelity image generation. Nonetheless, existing large-scale diffusion models are confined to generate images of up to 1K resolution, which is far from meeti…
▽ More
Recent surge in large-scale generative models has spurred the development of vast fields in computer vision. In particular, text-to-image diffusion models have garnered widespread adoption across diverse domain due to their potential for high-fidelity image generation. Nonetheless, existing large-scale diffusion models are confined to generate images of up to 1K resolution, which is far from meeting the demands of contemporary commercial applications. Directly sampling higher-resolution images often yields results marred by artifacts such as object repetition and distorted shapes. Addressing the aforementioned issues typically necessitates training or fine-tuning models on higher resolution datasets. However, this undertaking poses a formidable challenge due to the difficulty in collecting large-scale high-resolution contents and substantial computational resources. While several preceding works have proposed alternatives, they often fail to produce convincing results. In this work, we probe the generative ability of diffusion models at higher resolution beyond its original capability and propose a novel progressive approach that fully utilizes generated low-resolution image to guide the generation of higher resolution image. Our method obviates the need for additional training or fine-tuning which significantly lowers the burden of computational costs. Extensive experiments and results validate the efficiency and efficacy of our method.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
HLQ: Fast and Efficient Backpropagation via Hadamard Low-rank Quantization
Authors:
Seonggon Kim,
Eunhyeok Park
Abstract:
With the rapid increase in model size and the growing importance of various fine-tuning applications, lightweight training has become crucial. Since the backward pass is twice as expensive as the forward pass, optimizing backpropagation is particularly important. However, modifications to this process can lead to suboptimal convergence, so training optimization should minimize perturbations, which…
▽ More
With the rapid increase in model size and the growing importance of various fine-tuning applications, lightweight training has become crucial. Since the backward pass is twice as expensive as the forward pass, optimizing backpropagation is particularly important. However, modifications to this process can lead to suboptimal convergence, so training optimization should minimize perturbations, which is a highly challenging task. In this study, we introduce a novel optimization strategy called Hadamard Low-rank Quantization (HLQ), focusing on reducing the cost of backpropagation in convolutional and linear layers. We first analyze the sensitivity of gradient computation with respect to activation and weight, and judiciously design the HLQ pipeline to apply 4-bit Hadamard quantization to the activation gradient and Hadamard low-rank approximation to the weight gradient. This combination was found to be the best for maximizing benefits, and our extensive experiments demonstrate the outstanding performance of HLQ in both training from scratch and fine-tuning, achieving significant memory savings and acceleration on real GPUs with negligible quality degradation.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Freq-Mip-AA : Frequency Mip Representation for Anti-Aliasing Neural Radiance Fields
Authors:
Youngin Park,
Seungtae Nam,
Cheul-hee Hahm,
Eunbyung Park
Abstract:
Neural Radiance Fields (NeRF) have shown remarkable success in representing 3D scenes and generating novel views. However, they often struggle with aliasing artifacts, especially when rendering images from different camera distances from the training views. To address the issue, Mip-NeRF proposed using volumetric frustums to render a pixel and suggested integrated positional encoding (IPE). While…
▽ More
Neural Radiance Fields (NeRF) have shown remarkable success in representing 3D scenes and generating novel views. However, they often struggle with aliasing artifacts, especially when rendering images from different camera distances from the training views. To address the issue, Mip-NeRF proposed using volumetric frustums to render a pixel and suggested integrated positional encoding (IPE). While effective, this approach requires long training times due to its reliance on MLP architecture. In this work, we propose a novel anti-aliasing technique that utilizes grid-based representations, usually showing significantly faster training time. In addition, we exploit frequency-domain representation to handle the aliasing problem inspired by the sampling theorem. The proposed method, FreqMipAA, utilizes scale-specific low-pass filtering (LPF) and learnable frequency masks. Scale-specific low-pass filters (LPF) prevent aliasing and prioritize important image details, and learnable masks effectively remove problematic high-frequency elements while retaining essential information. By employing a scale-specific LPF and trainable masks, FreqMipAA can effectively eliminate the aliasing factor while retaining important details. We validated the proposed technique by incorporating it into a widely used grid-based method. The experimental results have shown that the FreqMipAA effectively resolved the aliasing issues and achieved state-of-the-art results in the multi-scale Blender dataset. Our code is available at https://github.com/yi0109/FreqMipAA .
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Global bases for Bosonic extensions of quantum unipotent coordinate rings
Authors:
Masaki Kashiwara,
Myungho Kim,
Se-jin Oh,
Euiyong Park
Abstract:
In the paper, we establish the global basis theory for the bosonic extension $\widehat{\mathcal{A}}$ associated with an arbitrary generalized Cartan matrix. When $\widehat{\mathcal{A}}$ is of simply-laced finite type, it is isomorphic to the quantum Grothendieck ring of the Hernandez-Leclerc category over a quantum affine algebra. In this case, we show that the $(t,q)$-characters of simple modules…
▽ More
In the paper, we establish the global basis theory for the bosonic extension $\widehat{\mathcal{A}}$ associated with an arbitrary generalized Cartan matrix. When $\widehat{\mathcal{A}}$ is of simply-laced finite type, it is isomorphic to the quantum Grothendieck ring of the Hernandez-Leclerc category over a quantum affine algebra. In this case, we show that the $(t,q)$-characters of simple modules in the Hernandez-Leclerc category correspond to the normalized global basis of $\widehat{\mathcal{A}}$.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Frustrated phonon with charge density wave in vanadium Kagome metal
Authors:
Seung-Phil Heo,
Choongjae Won,
Heemin Lee,
Hanbyul Kim,
Eunyoung Park,
Sung Yun Lee,
Junha Hwang,
Hyeongi Choi,
Sang-Youn Park,
Byungjune Lee,
Woo-Suk Noh,
Hoyoung Jang,
Jae-Hoon Park,
Dongbin Shin,
Changyong Song
Abstract:
Crystals with unique ionic arrangements and strong electronic correlations serve as a fertile ground for the emergence of exotic phases, as evidenced by the coexistence of charge density wave (CDW) and superconductivity in vanadium Kagome metals, specifically AV3Sb5 (where A represents K, Rb, or Cs). The formation of a star of David CDW superstructure, resulting from the coordinated displacements…
▽ More
Crystals with unique ionic arrangements and strong electronic correlations serve as a fertile ground for the emergence of exotic phases, as evidenced by the coexistence of charge density wave (CDW) and superconductivity in vanadium Kagome metals, specifically AV3Sb5 (where A represents K, Rb, or Cs). The formation of a star of David CDW superstructure, resulting from the coordinated displacements of vanadium ions on a corner sharing triangular lattice, has garnered significant attention in efforts to comprehend the influence of electron phonon interaction within this geometrically intricate lattice. However, understanding of the underlying mechanism behind CDW formation, coupled with symmetry protected lattice vibrations, remains elusive. In this study, we employed time resolved X ray scattering experiments utilising an X ray free electron laser. Our findings reveal that the phonon mode associated with the out of plane motion of Cs ions becomes frustrated in the CDW phase. Furthermore, we observed the photoinduced emergence of a metastable CDW phase, facilitated by the alleviation of frustration through nonadiabatic changes in free energy. By elucidating the longstanding puzzle surrounding the intervention of phonons in CDW ordering, this research offers fresh insights into the competition between phonons and periodic lattice distortions, a phenomenon widespread in other correlated quantum materials including layered high Tc superconductors.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Unipotent quantum coordinate ring and cominuscule prefundamental representations
Authors:
Il-Seung Jang,
Jae-Hoon Kwon,
Euiyong Park
Abstract:
We continue the study of realization of the prefundamental modules $L_{r,a}^{\pm}$, introduced by Hernandez and Jimbo, in terms of unipotent quantum coordinate rings as in [J-Kwon-Park, Int. Math. Res. Not., 2023]. We show that the ordinary character of $L_{r,a}^{\pm}$ is equal to that of the unipotent quantum coordinate ring $U_q^-(w_r)$ associated to fundamental $r$-th coweight. When $r$ is comi…
▽ More
We continue the study of realization of the prefundamental modules $L_{r,a}^{\pm}$, introduced by Hernandez and Jimbo, in terms of unipotent quantum coordinate rings as in [J-Kwon-Park, Int. Math. Res. Not., 2023]. We show that the ordinary character of $L_{r,a}^{\pm}$ is equal to that of the unipotent quantum coordinate ring $U_q^-(w_r)$ associated to fundamental $r$-th coweight. When $r$ is cominuscule, we prove that there exists a $U_q(\mathfrak{b})$-module structure on $U_q^-(w_r)$, which is isomorphic to $L_{r,aη_r}^\pm$ for some $η_r \in \mathbb{C}^\times$.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Electric-Field Control of Magnetic Skyrmion Chirality in a Centrosymmetric 2D van der Waals Magnet
Authors:
Myung-Geun Han,
Joachim Dahl Thomsen,
John P. Philbin,
Junsik Mun,
Eugene Park,
Fernando Camino,
Lukáš Děkanovský,
Chuhang Liu,
Zdenek Sofer,
Prineha Narang,
Frances M. Ross,
Yimei Zhu
Abstract:
Two-dimensional van der Waals magnets hosting topological magnetic textures, such as skyrmions, show promise for applications in spintronics and quantum computing. Electrical control of these topological spin textures would enable novel devices with enhanced performance and functionality. Here, using electron microscopy combined with in situ electric and magnetic biasing, we show that the skyrmion…
▽ More
Two-dimensional van der Waals magnets hosting topological magnetic textures, such as skyrmions, show promise for applications in spintronics and quantum computing. Electrical control of these topological spin textures would enable novel devices with enhanced performance and functionality. Here, using electron microscopy combined with in situ electric and magnetic biasing, we show that the skyrmion chirality, whether left-handed or right-handed, in insulating Cr2Ge2Te6, is controlled by external electric field direction applied during magnetic field cooling process. The electric-field-tuned chirality remains stable, even amid variations in magnetic and electric fields. Our theoretical investigation reveals that nonzero Dzyaloshinskii-Moriya interactions between the nearest neighbors, induced by the external electric field, change their sign upon reversing the electric field direction, thereby facilitating chirality selection. The electrical control of magnetic chirality demonstrated in this study can be extended to other non-metallic centrosymmetric skyrmion-hosting magnets, opening avenues for future device designs in topological spintronics and quantum computing.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
F-3DGS: Factorized Coordinates and Representations for 3D Gaussian Splatting
Authors:
Xiangyu Sun,
Joo Chan Lee,
Daniel Rho,
Jong Hwan Ko,
Usman Ali,
Eunbyung Park
Abstract:
The neural radiance field (NeRF) has made significant strides in representing 3D scenes and synthesizing novel views. Despite its advancements, the high computational costs of NeRF have posed challenges for its deployment in resource-constrained environments and real-time applications. As an alternative to NeRF-like neural rendering methods, 3D Gaussian Splatting (3DGS) offers rapid rendering spee…
▽ More
The neural radiance field (NeRF) has made significant strides in representing 3D scenes and synthesizing novel views. Despite its advancements, the high computational costs of NeRF have posed challenges for its deployment in resource-constrained environments and real-time applications. As an alternative to NeRF-like neural rendering methods, 3D Gaussian Splatting (3DGS) offers rapid rendering speeds while maintaining excellent image quality. However, as it represents objects and scenes using a myriad of Gaussians, it requires substantial storage to achieve high-quality representation. To mitigate the storage overhead, we propose Factorized 3D Gaussian Splatting (F-3DGS), a novel approach that drastically reduces storage requirements while preserving image quality. Inspired by classical matrix and tensor factorization techniques, our method represents and approximates dense clusters of Gaussians with significantly fewer Gaussians through efficient factorization. We aim to efficiently represent dense 3D Gaussians by approximating them with a limited amount of information for each axis and their combinations. This method allows us to encode a substantially large number of Gaussians along with their essential attributes -- such as color, scale, and rotation -- necessary for rendering using a relatively small number of elements. Extensive experimental results demonstrate that F-3DGS achieves a significant reduction in storage costs while maintaining comparable quality in rendered images.
△ Less
Submitted 28 May, 2024; v1 submitted 27 May, 2024;
originally announced May 2024.
-
Parameter-Efficient Instance-Adaptive Neural Video Compression
Authors:
Hyunmo Yang,
Seungjun Oh,
Eunbyung Park
Abstract:
Learning-based Neural Video Codecs (NVCs) have emerged as a compelling alternative to the standard video codecs, demonstrating promising performance, and simple and easily maintainable pipelines. However, NVCs often fall short of compression performance and occasionally exhibit poor generalization capability due to inference-only compression scheme and their dependence on training data. The instan…
▽ More
Learning-based Neural Video Codecs (NVCs) have emerged as a compelling alternative to the standard video codecs, demonstrating promising performance, and simple and easily maintainable pipelines. However, NVCs often fall short of compression performance and occasionally exhibit poor generalization capability due to inference-only compression scheme and their dependence on training data. The instance-adaptive video compression techniques have recently been suggested as a viable solution, fine-tuning the encoder or decoder networks for a particular test instance video. However, fine-tuning all the model parameters incurs high computational costs, increases the bitrates, and often leads to unstable training. In this work, we propose a parameter-efficient instance-adaptive video compression framework. Inspired by the remarkable success of parameter-efficient fine-tuning on large-scale neural network models, we propose to use a lightweight adapter module that can be easily attached to the pretrained NVCs and fine-tuned for test video sequences. The resulting algorithm significantly improves compression performance and reduces the encoding time compared to the existing instant-adaptive video compression algorithms. Furthermore, the suggested fine-tuning method enhances the robustness of the training process, allowing for the proposed method to be widely used in many practical settings. We conducted extensive experiments on various standard benchmark datasets, including UVG, MCL-JVC, and HEVC sequences, and the experimental results have shown a significant improvement in rate-distortion (RD) curves (up to 5 dB PSNR improvements) and BD rates compared to the baselines NVC. Our code is available on https://github.com/ohsngjun/PEVC}{https://github.com/ohsngjun/PEVC.
△ Less
Submitted 11 June, 2024; v1 submitted 14 May, 2024;
originally announced May 2024.
-
Low-overhead General-purpose Near-Data Processing in CXL Memory Expanders
Authors:
Hyungkyu Ham,
Jeongmin Hong,
Geonwoo Park,
Yunseon Shin,
Okkyun Woo,
Wonhyuk Yang,
Jinhoon Bae,
Eunhyeok Park,
Hyojin Sung,
Euicheol Lim,
Gwangsun Kim
Abstract:
To overcome the memory capacity wall of large-scale AI and big data applications, Compute Express Link (CXL) enables cost-efficient memory expansion beyond the local DRAM of processors. While its CXL.mem protocol stack minimizes interconnect latency, CXL memory accesses can still result in significant slowdowns for memory-bound applications. While near-data processing (NDP) in CXL memory can overc…
▽ More
To overcome the memory capacity wall of large-scale AI and big data applications, Compute Express Link (CXL) enables cost-efficient memory expansion beyond the local DRAM of processors. While its CXL.mem protocol stack minimizes interconnect latency, CXL memory accesses can still result in significant slowdowns for memory-bound applications. While near-data processing (NDP) in CXL memory can overcome such limitations, prior works propose application-specific HW units that are not suitable for practical CXL memory-based systems that should support various applications. On the other hand, existing CPU or GPU cores are not cost-effective for NDP because they are not optimized for memory-bound applications. In addition, the communication between the host processor and CXL controller for NDP offloading should achieve low latency, but the CXL$.$io (or PCIe) protocol incurs $μ$s-scale latency and is not suitable for fine-grain NDP.
To achieve high-performance NDP end-to-end, we propose a low-overhead general-purpose NDP architecture for CXL memory referred to as Memory-Mapped NDP (M$^2$NDP), which comprises memory-mapped functions (M$^2$func) and memory-mapped $μ$threading (M$^2μ$thr). The M$^2$func is a CXL.mem-compatible low-overhead communication mechanism between the host processor and NDP controller in the CXL memory. The M$^2μ$thr enables low-cost, general-purpose NDP unit design by introducing lightweight $μ$threads that support highly concurrent execution of NDP kernels with minimal resource wastage. By combining them, our M$^2$NDP achieves significant speedups for various applications, including in-memory OLAP, key-value store, large language model, recommendation model, and graph analytics by up to 128$\times$ (11.5$\times$ overall) and reduces energy by up to 87.9\% (80.1\% overall) compared to a baseline CPU or GPU host with passive CXL memory.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Pegasus-v1 Technical Report
Authors:
Raehyuk Jung,
Hyojun Go,
Jaehyuk Yi,
Jiho Jang,
Daniel Kim,
Jay Suh,
Aiden Lee,
Cooper Han,
Jae Lee,
Jeff Kim,
Jin-Young Kim,
Junwan Kim,
Kyle Park,
Lucas Lee,
Mars Ha,
Minjoon Seo,
Abraham Jo,
Ed Park,
Hassan Kianinejad,
SJ Kim,
Tony Moon,
Wade Jeong,
Andrei Popescu,
Esther Kim,
EK Yoon
, et al. (19 additional authors not shown)
Abstract:
This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's archi…
▽ More
This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's architecture, training strategies, and its performance in benchmarks on video conversation, zero-shot video question answering, and video summarization. We also explore qualitative characteristics of Pegasus-1 , demonstrating its capabilities as well as its limitations, in order to provide readers a balanced view of its current state and its future direction.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
CodecNeRF: Toward Fast Encoding and Decoding, Compact, and High-quality Novel-view Synthesis
Authors:
Gyeongjin Kang,
Younggeun Lee,
Seungjun Oh,
Eunbyung Park
Abstract:
Neural Radiance Fields (NeRF) have achieved huge success in effectively capturing and representing 3D objects and scenes. However, several factors have impeded its further proliferation as next-generation 3D media. To establish a ubiquitous presence in everyday media formats, such as images and videos, it is imperative to devise a solution that effectively fulfills three key objectives: fast encod…
▽ More
Neural Radiance Fields (NeRF) have achieved huge success in effectively capturing and representing 3D objects and scenes. However, several factors have impeded its further proliferation as next-generation 3D media. To establish a ubiquitous presence in everyday media formats, such as images and videos, it is imperative to devise a solution that effectively fulfills three key objectives: fast encoding and decoding time, compact model sizes, and high-quality renderings. Despite significant advancements, a comprehensive algorithm that adequately addresses all objectives has yet to be fully realized. In this work, we present CodecNeRF, a neural codec for NeRF representations, consisting of a novel encoder and decoder architecture that can generate a NeRF representation in a single forward pass. Furthermore, inspired by the recent parameter-efficient finetuning approaches, we develop a novel finetuning method to efficiently adapt the generated NeRF representations to a new test instance, leading to high-quality image renderings and compact code sizes. The proposed CodecNeRF, a newly suggested encoding-decoding-finetuning pipeline for NeRF, achieved unprecedented compression performance of more than 150x and 20x reduction in encoding time while maintaining (or improving) the image quality on widely used 3D object datasets, such as ShapeNet and Objaverse.
△ Less
Submitted 28 May, 2024; v1 submitted 7 April, 2024;
originally announced April 2024.
-
Some remarks on the $\mathcal{K}_{p,1}$ Theorem
Authors:
Yeongrak Kim,
Hyunsuk Moon,
Euisung Park
Abstract:
Let $X$ be a non-degenerate projective irreducible variety of dimension $n \ge 1$, degree $d$, and codimension $e \ge 2$ over an algebraically closed field $\mathbb{K}$ of characteristic $0$. Let $β_{p,q} (X)$ be the $(p,q)$-th graded Betti number of $X$. M. Green proved the celebrating $\mathcal K_{p,1}$-theorem about the vanishing of $β_{p,1} (X)$ for high values for $p$ and potential examples o…
▽ More
Let $X$ be a non-degenerate projective irreducible variety of dimension $n \ge 1$, degree $d$, and codimension $e \ge 2$ over an algebraically closed field $\mathbb{K}$ of characteristic $0$. Let $β_{p,q} (X)$ be the $(p,q)$-th graded Betti number of $X$. M. Green proved the celebrating $\mathcal K_{p,1}$-theorem about the vanishing of $β_{p,1} (X)$ for high values for $p$ and potential examples of nonvanishing graded Betti numbers. Later, Nagel-Pitteloud and Brodmann-Schenzel classified varieties with nonvanishing $β_{e-1,1}(X)$. It is clear that $β_{e-1,1}(X) \neq 0$ when there is an $(n+1)$-dimensional variety of minimal degree containing $X$, however, this is not always the case as seen in the example of the triple Veronese surface in $\mathbb{P}^9$. In this paper, we completely classify varieties $X$ with nonvanishing $β_{e-1,1}(X) \neq 0$ such that $X$ does not lie on an $(n+1)$-dimensional variety of minimal degree. They are exactly cones over smooth del Pezzo varieties whose Picard number is $\le n-1$.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Unleash the Potential of CLIP for Video Highlight Detection
Authors:
Donghoon Han,
Seunghyeon Seo,
Eunhwan Park,
Seong-Uk Nam,
Nojun Kwak
Abstract:
Multimodal and large language models (LLMs) have revolutionized the utilization of open-world knowledge, unlocking novel potentials across various tasks and applications. Among these domains, the video domain has notably benefited from their capabilities. In this paper, we present Highlight-CLIP (HL-CLIP), a method designed to excel in the video highlight detection task by leveraging the pre-train…
▽ More
Multimodal and large language models (LLMs) have revolutionized the utilization of open-world knowledge, unlocking novel potentials across various tasks and applications. Among these domains, the video domain has notably benefited from their capabilities. In this paper, we present Highlight-CLIP (HL-CLIP), a method designed to excel in the video highlight detection task by leveraging the pre-trained knowledge embedded in multimodal models. By simply fine-tuning the multimodal encoder in combination with our innovative saliency pooling technique, we have achieved the state-of-the-art performance in the highlight detection task, the QVHighlight Benchmark, to the best of our knowledge.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Can AI Outperform Human Experts in Creating Social Media Creatives?
Authors:
Eunkyung Park,
Raymond K. Wong,
Junbum Kwon
Abstract:
Artificial Intelligence has outperformed human experts in functional tasks such as chess and baduk. How about creative tasks? This paper evaluates AI's capability in the creative domain compared to human experts, which little research has been conducted so far. We propose a novel Prompt-for-Prompt to generate social media creatives via prompt augmentation by Large Language Models. We take the most…
▽ More
Artificial Intelligence has outperformed human experts in functional tasks such as chess and baduk. How about creative tasks? This paper evaluates AI's capability in the creative domain compared to human experts, which little research has been conducted so far. We propose a novel Prompt-for-Prompt to generate social media creatives via prompt augmentation by Large Language Models. We take the most popular Instagram posts (with the biggest number of like clicks) in top brands' Instagram accounts to create social media creatives. We give GPT 4 several prompt instructions with text descriptions to generate the most effective prompts for cutting-edge text-to-image generators: Midjourney, DALL E 3, and Stable Diffusion. LLM-augmented prompts can boost AI's abilities by adding objectives, engagement strategy, lighting and brand consistency for social media image creation. We conduct an extensive human evaluation experiment, and find that AI excels human experts, and Midjourney is better than the other text-to-image generators. Surprisingly, unlike conventional wisdom in the social media industry, prompt instruction including eye-catching shows much poorer performance than those including natural. Regarding the type of creatives, AI improves creatives with animals or products but less with real people. Also, AI improves creatives with short text descriptions more than with long text descriptions, because there is more room for AI to augment prompts with shorter descriptions.
△ Less
Submitted 19 March, 2024;
originally announced April 2024.
-
MOGAM: A Multimodal Object-oriented Graph Attention Model for Depression Detection
Authors:
Junyeop Cha,
Seoyun Kim,
Dongjae Kim,
Eunil Park
Abstract:
Early detection plays a crucial role in the treatment of depression. Therefore, numerous studies have focused on social media platforms, where individuals express their emotions, aiming to achieve early detection of depression. However, the majority of existing approaches often rely on specific features, leading to limited scalability across different types of social media datasets, such as text,…
▽ More
Early detection plays a crucial role in the treatment of depression. Therefore, numerous studies have focused on social media platforms, where individuals express their emotions, aiming to achieve early detection of depression. However, the majority of existing approaches often rely on specific features, leading to limited scalability across different types of social media datasets, such as text, images, or videos. To overcome this limitation, we introduce a Multimodal Object-Oriented Graph Attention Model (MOGAM), which can be applied to diverse types of data, offering a more scalable and versatile solution. Furthermore, to ensure that our model can capture authentic symptoms of depression, we only include vlogs from users with a clinical diagnosis. To leverage the diverse features of vlogs, we adopt a multimodal approach and collect additional metadata such as the title, description, and duration of the vlogs. To effectively aggregate these multimodal features, we employed a cross-attention mechanism. MOGAM achieved an accuracy of 0.871 and an F1-score of 0.888. Moreover, to validate the scalability of MOGAM, we evaluated its performance with a benchmark dataset and achieved comparable results with prior studies (0.61 F1-score). In conclusion, we believe that the proposed model, MOGAM, is an effective solution for detecting depression in social media, offering potential benefits in the early detection and treatment of this mental health condition.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Sequential Modeling of Complex Marine Navigation: Case Study on a Passenger Vessel (Student Abstract)
Authors:
Yimeng Fan,
Pedram Agand,
Mo Chen,
Edward J. Park,
Allison Kennedy,
Chanwoo Bae
Abstract:
The maritime industry's continuous commitment to sustainability has led to a dedicated exploration of methods to reduce vessel fuel consumption. This paper undertakes this challenge through a machine learning approach, leveraging a real-world dataset spanning two years of a ferry in west coast Canada. Our focus centers on the creation of a time series forecasting model given the dynamic and static…
▽ More
The maritime industry's continuous commitment to sustainability has led to a dedicated exploration of methods to reduce vessel fuel consumption. This paper undertakes this challenge through a machine learning approach, leveraging a real-world dataset spanning two years of a ferry in west coast Canada. Our focus centers on the creation of a time series forecasting model given the dynamic and static states, actions, and disturbances. This model is designed to predict dynamic states based on the actions provided, subsequently serving as an evaluative tool to assess the proficiency of the ferry's operation under the captain's guidance. Additionally, it lays the foundation for future optimization algorithms, providing valuable feedback on decision-making processes. To facilitate future studies, our code is available at \url{https://github.com/pagand/model_optimze_vessel/tree/AAAI}
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Separable Physics-informed Neural Networks for Solving the BGK Model of the Boltzmann Equation
Authors:
Jaemin Oh,
Seung Yeon Cho,
Seok-Bae Yun,
Eunbyung Park,
Youngjoon Hong
Abstract:
In this study, we introduce a method based on Separable Physics-Informed Neural Networks (SPINNs) for effectively solving the BGK model of the Boltzmann equation. While the mesh-free nature of PINNs offers significant advantages in handling high-dimensional partial differential equations (PDEs), challenges arise when applying quadrature rules for accurate integral evaluation in the BGK operator, w…
▽ More
In this study, we introduce a method based on Separable Physics-Informed Neural Networks (SPINNs) for effectively solving the BGK model of the Boltzmann equation. While the mesh-free nature of PINNs offers significant advantages in handling high-dimensional partial differential equations (PDEs), challenges arise when applying quadrature rules for accurate integral evaluation in the BGK operator, which can compromise the mesh-free benefit and increase computational costs. To address this, we leverage the canonical polyadic decomposition structure of SPINNs and the linear nature of moment calculation, achieving a substantial reduction in computational expense for quadrature rule application. The multi-scale nature of the particle density function poses difficulties in precisely approximating macroscopic moments using neural networks. To improve SPINN training, we introduce the integration of Gaussian functions into SPINNs, coupled with a relative loss approach. This modification enables SPINNs to decay as rapidly as Maxwellian distributions, thereby enhancing the accuracy of macroscopic moment approximations. The relative loss design further ensures that both large and small-scale features are effectively captured by the SPINNs. The efficacy of our approach is demonstrated through a series of five numerical experiments, including the solution to a challenging 3D Riemann problem. These results highlight the potential of our novel method in efficiently and accurately addressing complex challenges in computational physics.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
Continuous Memory Representation for Anomaly Detection
Authors:
Joo Chan Lee,
Taejune Kim,
Eunbyung Park,
Simon S. Woo,
Jong Hwan Ko
Abstract:
There have been significant advancements in anomaly detection in an unsupervised manner, where only normal images are available for training. Several recent methods aim to detect anomalies based on a memory, comparing or reconstructing the input with directly stored normal features (or trained features with normal images). However, such memory-based approaches operate on a discrete feature space i…
▽ More
There have been significant advancements in anomaly detection in an unsupervised manner, where only normal images are available for training. Several recent methods aim to detect anomalies based on a memory, comparing or reconstructing the input with directly stored normal features (or trained features with normal images). However, such memory-based approaches operate on a discrete feature space implemented by the nearest neighbor or attention mechanism, suffering from poor generalization or an identity shortcut issue outputting the same as input, respectively. Furthermore, the majority of existing methods are designed to detect single-class anomalies, resulting in unsatisfactory performance when presented with multiple classes of objects. To tackle all of the above challenges, we propose CRAD, a novel anomaly detection method for representing normal features within a "continuous" memory, enabled by transforming spatial features into coordinates and mapping them to continuous grids. Furthermore, we carefully design the grids tailored for anomaly detection, representing both local and global normal features and fusing them effectively. Our extensive experiments demonstrate that CRAD successfully generalizes the normal features and mitigates the identity shortcut, furthermore, CRAD effectively handles diverse classes in a single model thanks to the high-granularity continuous representation. In an evaluation using the MVTec AD dataset, CRAD significantly outperforms the previous state-of-the-art method by reducing 65.0% of the error for multi-class unified anomaly detection. The project page is available at https://tae-mo.github.io/crad/.
△ Less
Submitted 10 March, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Embeddings and near-neighbor searching with constant additive error for hyperbolic spaces
Authors:
Eunku Park,
Antoine Vigneron
Abstract:
We give an embedding of the Poincaré halfspace $H^D$ into a discrete metric space based on a binary tiling of $H^D$, with additive distortion $O(\log D)$. It yields the following results. We show that any subset $P$ of $n$ points in $H^D$ can be embedded into a graph-metric with $2^{O(D)}n$ vertices and edges, and with additive distortion $O(\log D)$. We also show how to construct, for any $k$, an…
▽ More
We give an embedding of the Poincaré halfspace $H^D$ into a discrete metric space based on a binary tiling of $H^D$, with additive distortion $O(\log D)$. It yields the following results. We show that any subset $P$ of $n$ points in $H^D$ can be embedded into a graph-metric with $2^{O(D)}n$ vertices and edges, and with additive distortion $O(\log D)$. We also show how to construct, for any $k$, an $O(k\log D)$-purely additive spanner of $P$ with $2^{O(D)}n$ Steiner vertices and $2^{O(D)}n \cdot λ_k(n)$ edges, where $λ_k(n)$ is the $k$th-row inverse Ackermann function. Finally, we show how to construct an approximate Voronoi diagram for $P$ of size $2^{O(D)}n$. It allows us to answer approximate near-neighbor queries in $2^{O(D)}+O(\log n)$ time, with additive error $O(\log D)$. These constructions can be done in $2^{O(D)}n \log n$ time.
△ Less
Submitted 1 April, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Mip-Grid: Anti-aliased Grid Representations for Neural Radiance Fields
Authors:
Seungtae Nam,
Daniel Rho,
Jong Hwan Ko,
Eunbyung Park
Abstract:
Despite the remarkable achievements of neural radiance fields (NeRF) in representing 3D scenes and generating novel view images, the aliasing issue, rendering "jaggies" or "blurry" images at varying camera distances, remains unresolved in most existing approaches. The recently proposed mip-NeRF has addressed this challenge by rendering conical frustums instead of rays. However, it relies on MLP ar…
▽ More
Despite the remarkable achievements of neural radiance fields (NeRF) in representing 3D scenes and generating novel view images, the aliasing issue, rendering "jaggies" or "blurry" images at varying camera distances, remains unresolved in most existing approaches. The recently proposed mip-NeRF has addressed this challenge by rendering conical frustums instead of rays. However, it relies on MLP architecture to represent the radiance fields, missing out on the fast training speed offered by the latest grid-based methods. In this work, we present mip-Grid, a novel approach that integrates anti-aliasing techniques into grid-based representations for radiance fields, mitigating the aliasing artifacts while enjoying fast training time. The proposed method generates multi-scale grids by applying simple convolution operations over a shared grid representation and uses the scale-aware coordinate to retrieve features at different scales from the generated multi-scale grids. To test the effectiveness, we integrated the proposed method into the two recent representative grid-based methods, TensoRF and K-Planes. Experimental results demonstrate that mip-Grid greatly improves the rendering performance of both methods and even outperforms mip-NeRF on multi-scale datasets while achieving significantly faster training time. For code and demo videos, please see https://stnamjef.github.io/mipgrid.github.io/.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
No-exclaves percolation on random networks
Authors:
Byungjoon Min,
Eun-Kyu Park,
Sang-Hwan Gwak,
K. -I. Goh
Abstract:
No-exclaves percolation (NExP) is a nonlocal percolation process in which the components are formed not only by the connected occupied nodes but also by the agglomeration of empty nodes completely surrounded by the occupied nodes. It has been studied in low dimensions, displaying such novel phenomena as the discontinuous transition to complete percolation. However, its characteristics in complex n…
▽ More
No-exclaves percolation (NExP) is a nonlocal percolation process in which the components are formed not only by the connected occupied nodes but also by the agglomeration of empty nodes completely surrounded by the occupied nodes. It has been studied in low dimensions, displaying such novel phenomena as the discontinuous transition to complete percolation. However, its characteristics in complex networks are still unexplored. In this paper, we study the NExP on random networks by developing mean-field solutions using the generating function formalism. Our theory allows us to determine the size of the giant no-exclaves component as well as the percolation threshold, which are in excellent agreements with Monte Carlo simulations on random networks and some real-world networks. We show that on random networks NExP exhibits three phases and two transitions between them: the phases are characterized by the presence or absence of not only the giant NExP component but also the giant unoccupied component, which is the giant connected component composed solely of unoccupied nodes. This work offers theoretical understanding on the anatomy of phase transitions in the NExP process.
△ Less
Submitted 6 May, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Task-Oriented Diffusion Model Compression
Authors:
Geonung Kim,
Beomsu Kim,
Eunhyeok Park,
Sunghyun Cho
Abstract:
As recent advancements in large-scale Text-to-Image (T2I) diffusion models have yielded remarkable high-quality image generation, diverse downstream Image-to-Image (I2I) applications have emerged. Despite the impressive results achieved by these I2I models, their practical utility is hampered by their large model size and the computational burden of the iterative denoising process. In this paper,…
▽ More
As recent advancements in large-scale Text-to-Image (T2I) diffusion models have yielded remarkable high-quality image generation, diverse downstream Image-to-Image (I2I) applications have emerged. Despite the impressive results achieved by these I2I models, their practical utility is hampered by their large model size and the computational burden of the iterative denoising process. In this paper, we explore the compression potential of these I2I models in a task-oriented manner and introduce a novel method for reducing both model size and the number of timesteps. Through extensive experiments, we observe key insights and use our empirical knowledge to develop practical solutions that aim for near-optimal results with minimal exploration costs. We validate the effectiveness of our method by applying it to InstructPix2Pix for image editing and StableSR for image restoration. Our approach achieves satisfactory output quality with 39.2% and 56.4% reduction in model footprint and 81.4% and 68.7% decrease in latency to InstructPix2Pix and StableSR, respectively.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
PBW theory for Bosonic extensions of quantum groups
Authors:
Se-jin Oh,
Euiyong Park
Abstract:
In this paper, we develop the PBW theory for the bosonic extension $\qbA{\g}$ of a quantum group $\mathcal{U}_q(\g)$ of \emph{any} finite type. When $\g$ belongs to the class of \emph{simply-laced type}, the algebra $\qbA{\g}$ arises from the quantum Grothendieck ring of the Hernandez-Leclerc category over quantum affine algebras of untwisted affine types. We introduce and investigate a symmetric…
▽ More
In this paper, we develop the PBW theory for the bosonic extension $\qbA{\g}$ of a quantum group $\mathcal{U}_q(\g)$ of \emph{any} finite type. When $\g$ belongs to the class of \emph{simply-laced type}, the algebra $\qbA{\g}$ arises from the quantum Grothendieck ring of the Hernandez-Leclerc category over quantum affine algebras of untwisted affine types. We introduce and investigate a symmetric bilinear form $\pair{\ , \ }$ on $\qbA{\g}$ which is invariant under the braid group actions $\bT_i$ on $\qbA{\g}$, and study the adjoint operators $\Ep_{i,p}$ and $\Es_{i,p}$ with respect to $\pair{\ , \ }$. It turns out that the adjoint operators $\Ep_{i,p}$ and $\Es_{i,p}$ are analogues of the $q$-derivations $e_i'$ and $\es_i$ on the negative half $\calU_q^-(\g)$ of $\calU_q(\g)$. Following this, we introduce a new family of subalgebras denoted as $\qbA{\mathfrak{g}}(\ttb)$ in $\qbA{\mathfrak{g}}$. These subalgebras are defined for any elements $\ttb$ in the positive submonoid $\bg^+$ of the (generalized) braid group $\ttB$ of $\g$. We prove that $\qbA{\mathfrak{g}}(\ttb)$ exhibits PBW root vectors and PBW bases defined by $\bT_\ii$ for any sequence $\ii$ of $\ttb$. The PBW root vectors satisfy a Levendorskii-Soibelman formula and the PBW bases are orthogonal with respect to $\pair{\ , \ }$. The algebras $\qbA{\g} (\ttb)$ can be understood as a natural extension of quantum unipotent coordinate rings.
△ Less
Submitted 7 February, 2024; v1 submitted 9 January, 2024;
originally announced January 2024.
-
Deblurring 3D Gaussian Splatting
Authors:
Byeonghyeon Lee,
Howoong Lee,
Xiangyu Sun,
Usman Ali,
Eunbyung Park
Abstract:
Recent studies in Radiance Fields have paved the robust way for novel view synthesis with their photorealistic rendering quality. Nevertheless, they usually employ neural networks and volumetric rendering, which are costly to train and impede their broad use in various real-time applications due to the lengthy rendering time. Lately 3D Gaussians splatting-based approach has been proposed to model…
▽ More
Recent studies in Radiance Fields have paved the robust way for novel view synthesis with their photorealistic rendering quality. Nevertheless, they usually employ neural networks and volumetric rendering, which are costly to train and impede their broad use in various real-time applications due to the lengthy rendering time. Lately 3D Gaussians splatting-based approach has been proposed to model the 3D scene, and it achieves remarkable visual quality while rendering the images in real-time. However, it suffers from severe degradation in the rendering quality if the training images are blurry. Blurriness commonly occurs due to the lens defocusing, object motion, and camera shake, and it inevitably intervenes in clean image acquisition. Several previous studies have attempted to render clean and sharp images from blurry input images using neural fields. The majority of those works, however, are designed only for volumetric rendering-based neural radiance fields and are not straightforwardly applicable to rasterization-based 3D Gaussian splatting methods. Thus, we propose a novel real-time deblurring framework, Deblurring 3D Gaussian Splatting, using a small Multi-Layer Perceptron (MLP) that manipulates the covariance of each 3D Gaussian to model the scene blurriness. While Deblurring 3D Gaussian Splatting can still enjoy real-time rendering, it can reconstruct fine and sharp details from blurry images. A variety of experiments have been conducted on the benchmark, and the results have revealed the effectiveness of our approach for deblurring. Qualitative results are available at https://benhenryl.github.io/Deblurring-3D-Gaussian-Splatting/
△ Less
Submitted 26 May, 2024; v1 submitted 1 January, 2024;
originally announced January 2024.
-
Sharp-NeRF: Grid-based Fast Deblurring Neural Radiance Fields Using Sharpness Prior
Authors:
Byeonghyeon Lee,
Howoong Lee,
Usman Ali,
Eunbyung Park
Abstract:
Neural Radiance Fields (NeRF) have shown remarkable performance in neural rendering-based novel view synthesis. However, NeRF suffers from severe visual quality degradation when the input images have been captured under imperfect conditions, such as poor illumination, defocus blurring, and lens aberrations. Especially, defocus blur is quite common in the images when they are normally captured usin…
▽ More
Neural Radiance Fields (NeRF) have shown remarkable performance in neural rendering-based novel view synthesis. However, NeRF suffers from severe visual quality degradation when the input images have been captured under imperfect conditions, such as poor illumination, defocus blurring, and lens aberrations. Especially, defocus blur is quite common in the images when they are normally captured using cameras. Although few recent studies have proposed to render sharp images of considerably high-quality, yet they still face many key challenges. In particular, those methods have employed a Multi-Layer Perceptron (MLP) based NeRF, which requires tremendous computational time. To overcome these shortcomings, this paper proposes a novel technique Sharp-NeRF -- a grid-based NeRF that renders clean and sharp images from the input blurry images within half an hour of training. To do so, we used several grid-based kernels to accurately model the sharpness/blurriness of the scene. The sharpness level of the pixels is computed to learn the spatially varying blur kernels. We have conducted experiments on the benchmarks consisting of blurry images and have evaluated full-reference and non-reference metrics. The qualitative and quantitative results have revealed that our approach renders the sharp novel views with vivid colors and fine details, and it has considerably faster training time than the previous works. Our project page is available at https://benhenryl.github.io/SharpNeRF/
△ Less
Submitted 1 January, 2024;
originally announced January 2024.
-
A scalable two-stage Bayesian approach accounting for exposure measurement error in environmental epidemiology
Authors:
Changwoo J. Lee,
Elaine Symanski,
Amal Rammah,
Dong Hun Kang,
Philip K. Hopke,
Eun Sug Park
Abstract:
Accounting for exposure measurement errors has been recognized as a crucial problem in environmental epidemiology for over two decades. Bayesian hierarchical models offer a coherent probabilistic framework for evaluating associations between environmental exposures and health effects, which take into account exposure measurement errors introduced by uncertainty in the estimated exposure as well as…
▽ More
Accounting for exposure measurement errors has been recognized as a crucial problem in environmental epidemiology for over two decades. Bayesian hierarchical models offer a coherent probabilistic framework for evaluating associations between environmental exposures and health effects, which take into account exposure measurement errors introduced by uncertainty in the estimated exposure as well as spatial misalignment between the exposure and health outcome data. While two-stage Bayesian analyses are often regarded as a good alternative to fully Bayesian analyses when joint estimation is not feasible, there has been minimal research on how to properly propagate uncertainty from the first-stage exposure model to the second-stage health model, especially in the case of a large number of participant locations along with spatially correlated exposures. We propose a scalable two-stage Bayesian approach, called a sparse multivariate normal (sparse MVN) prior approach, based on the Vecchia approximation for assessing associations between exposure and health outcomes in environmental epidemiology. We compare its performance with existing approaches through simulation. Our sparse MVN prior approach shows comparable performance with the fully Bayesian approach, which is a gold standard but is impossible to implement in some cases. We investigate the association between source-specific exposures and pollutant (nitrogen dioxide (NO$_2$))-specific exposures and birth outcomes for 2012 in Harris County, Texas, using several approaches, including the newly developed method.
△ Less
Submitted 13 January, 2024; v1 submitted 31 December, 2023;
originally announced January 2024.
-
Cluster algebras and monotone Lagrangian tori
Authors:
Yunhyung Cho,
Myungho Kim,
Yoosik Kim,
Euiyong Park
Abstract:
Motivated by recent developments in the construction of Newton--Okounkov bodies and toric degenerations via cluster algebras in [GHKK18, FO20], we consider a family of Newton--Okounkov polytopes of a complex smooth projective variety $X$ related by a composition of tropicalized cluster mutations. According to the work of [HK15], the toric degeneration associated with each Newton--Okounkov polytope…
▽ More
Motivated by recent developments in the construction of Newton--Okounkov bodies and toric degenerations via cluster algebras in [GHKK18, FO20], we consider a family of Newton--Okounkov polytopes of a complex smooth projective variety $X$ related by a composition of tropicalized cluster mutations. According to the work of [HK15], the toric degeneration associated with each Newton--Okounkov polytope $Δ$ in the family produces a Lagrangian torus fibration of $X$ over $Δ$. We investigate circumstances in which each Lagrangian torus fibration possesses a monotone Lagrangian torus fiber. We provide a sufficient condition, based on the data of tropical integer points and exchange matrices, for the family of constructed monotone Lagrangian tori to contain infinitely many monotone Lagrangian tori, no two of which are related by any symplectomorphisms. By employing this criterion and exploiting the correspondence between the tropical integer points and the dual canonical basis elements, we generate infinitely many distinct monotone Lagrangian tori on flag manifolds of arbitrary type except in a few cases.
△ Less
Submitted 30 December, 2023;
originally announced January 2024.
-
Plant Disease Recognition Datasets in the Age of Deep Learning: Challenges and Opportunities
Authors:
Mingle Xu,
Ji Eun Park,
Jaehwan Lee,
Jucheng Yang,
Sook Yoon
Abstract:
Plant disease recognition has witnessed a significant improvement with deep learning in recent years. Although plant disease datasets are essential and many relevant datasets are public available, two fundamental questions exist. First, how to differentiate datasets and further choose suitable public datasets for specific applications? Second, what kinds of characteristics of datasets are desired…
▽ More
Plant disease recognition has witnessed a significant improvement with deep learning in recent years. Although plant disease datasets are essential and many relevant datasets are public available, two fundamental questions exist. First, how to differentiate datasets and further choose suitable public datasets for specific applications? Second, what kinds of characteristics of datasets are desired to achieve promising performance in real-world applications? To address the questions, this study explicitly propose an informative taxonomy to describe potential plant disease datasets. We further provide several directions for future, such as creating challenge-oriented datasets and the ultimate objective deploying deep learning in real-world applications with satisfactory performance. In addition, existing related public RGB image datasets are summarized. We believe that this study will contributing making better datasets and that this study will contribute beyond plant disease recognition such as plant species recognition. To facilitate the community, our project is public https://github.com/xml94/PPDRD with the information of relevant public datasets.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
FRDiff : Feature Reuse for Universal Training-free Acceleration of Diffusion Models
Authors:
Junhyuk So,
Jungwon Lee,
Eunhyeok Park
Abstract:
The substantial computational costs of diffusion models, especially due to the repeated denoising steps necessary for high-quality image generation, present a major obstacle to their widespread adoption. While several studies have attempted to address this issue by reducing the number of score function evaluations (NFE) using advanced ODE solvers without fine-tuning, the decreased number of denois…
▽ More
The substantial computational costs of diffusion models, especially due to the repeated denoising steps necessary for high-quality image generation, present a major obstacle to their widespread adoption. While several studies have attempted to address this issue by reducing the number of score function evaluations (NFE) using advanced ODE solvers without fine-tuning, the decreased number of denoising iterations misses the opportunity to update fine details, resulting in noticeable quality degradation. In our work, we introduce an advanced acceleration technique that leverages the temporal redundancy inherent in diffusion models. Reusing feature maps with high temporal similarity opens up a new opportunity to save computation resources without compromising output quality. To realize the practical benefits of this intuition, we conduct an extensive analysis and propose a novel method, FRDiff. FRDiff is designed to harness the advantages of both reduced NFE and feature reuse, achieving a Pareto frontier that balances fidelity and latency trade-offs in various generative tasks.
△ Less
Submitted 2 April, 2024; v1 submitted 6 December, 2023;
originally announced December 2023.
-
Coordinate-Aware Modulation for Neural Fields
Authors:
Joo Chan Lee,
Daniel Rho,
Seungtae Nam,
Jong Hwan Ko,
Eunbyung Park
Abstract:
Neural fields, mapping low-dimensional input coordinates to corresponding signals, have shown promising results in representing various signals. Numerous methodologies have been proposed, and techniques employing MLPs and grid representations have achieved substantial success. MLPs allow compact and high expressibility, yet often suffer from spectral bias and slow convergence speed. On the other h…
▽ More
Neural fields, mapping low-dimensional input coordinates to corresponding signals, have shown promising results in representing various signals. Numerous methodologies have been proposed, and techniques employing MLPs and grid representations have achieved substantial success. MLPs allow compact and high expressibility, yet often suffer from spectral bias and slow convergence speed. On the other hand, methods using grids are free from spectral bias and achieve fast training speed, however, at the expense of high spatial complexity. In this work, we propose a novel way for exploiting both MLPs and grid representations in neural fields. Unlike the prevalent methods that combine them sequentially (extract features from the grids first and feed them to the MLP), we inject spectral bias-free grid representations into the intermediate features in the MLP. More specifically, we suggest a Coordinate-Aware Modulation (CAM), which modulates the intermediate features using scale and shift parameters extracted from the grid representations. This can maintain the strengths of MLPs while mitigating any remaining potential biases, facilitating the rapid learning of high-frequency components. In addition, we empirically found that the feature normalizations, which have not been successful in neural filed literature, proved to be effective when applied in conjunction with the proposed CAM. Experimental results demonstrate that CAM enhances the performance of neural representation and improves learning stability across a range of signals. Especially in the novel view synthesis task, we achieved state-of-the-art performance with the least number of parameters and fast training speed for dynamic scenes and the best performance under 1MB memory for static scenes. CAM also outperforms the best-performing video compression methods using neural fields by a large margin.
△ Less
Submitted 25 November, 2023;
originally announced November 2023.
-
Compact 3D Gaussian Representation for Radiance Field
Authors:
Joo Chan Lee,
Daniel Rho,
Xiangyu Sun,
Jong Hwan Ko,
Eunbyung Park
Abstract:
Neural Radiance Fields (NeRFs) have demonstrated remarkable potential in capturing complex 3D scenes with high fidelity. However, one persistent challenge that hinders the widespread adoption of NeRFs is the computational bottleneck due to the volumetric rendering. On the other hand, 3D Gaussian splatting (3DGS) has recently emerged as an alternative representation that leverages a 3D Gaussisan-ba…
▽ More
Neural Radiance Fields (NeRFs) have demonstrated remarkable potential in capturing complex 3D scenes with high fidelity. However, one persistent challenge that hinders the widespread adoption of NeRFs is the computational bottleneck due to the volumetric rendering. On the other hand, 3D Gaussian splatting (3DGS) has recently emerged as an alternative representation that leverages a 3D Gaussisan-based representation and adopts the rasterization pipeline to render the images rather than volumetric rendering, achieving very fast rendering speed and promising image quality. However, a significant drawback arises as 3DGS entails a substantial number of 3D Gaussians to maintain the high fidelity of the rendered images, which requires a large amount of memory and storage. To address this critical issue, we place a specific emphasis on two key objectives: reducing the number of Gaussian points without sacrificing performance and compressing the Gaussian attributes, such as view-dependent color and covariance. To this end, we propose a learnable mask strategy that significantly reduces the number of Gaussians while preserving high performance. In addition, we propose a compact but effective representation of view-dependent color by employing a grid-based neural field rather than relying on spherical harmonics. Finally, we learn codebooks to compactly represent the geometric attributes of Gaussian by vector quantization. With model compression techniques such as quantization and entropy coding, we consistently show over 25$\times$ reduced storage and enhanced rendering speed, while maintaining the quality of the scene representation, compared to 3DGS. Our work provides a comprehensive framework for 3D scene representation, achieving high performance, fast training, compactness, and real-time rendering. Our project page is available at https://maincold2.github.io/c3dgs/.
△ Less
Submitted 15 February, 2024; v1 submitted 22 November, 2023;
originally announced November 2023.
-
Purcell modified Doppler cooling of quantum emitters inside optical cavities
Authors:
Julian Lyne,
Nico S. Bassler,
Seong eun Park,
Guido Pupillo,
Claudiu Genes
Abstract:
Standard cavity cooling of atoms or dielectric particles is based on the action of dispersive optical forces in high-finesse cavities. We investigate here a complementary regime characterized by large cavity losses, resembling the standard Doppler cooling technique. For a single two-level emitter a modification of the cooling rate is obtained from the Purcell enhancement of spontaneous emission in…
▽ More
Standard cavity cooling of atoms or dielectric particles is based on the action of dispersive optical forces in high-finesse cavities. We investigate here a complementary regime characterized by large cavity losses, resembling the standard Doppler cooling technique. For a single two-level emitter a modification of the cooling rate is obtained from the Purcell enhancement of spontaneous emission in the large cooperativity limit. This mechanism is aimed at cooling of quantum emitters without closed transitions, which is the case for molecular systems, where the Purcell effect can mitigate the loss of population from the cooling cycle. We extend our analytical formulation to the many particle case governed by weak individual coupling but exhibiting collective strong Purcell enhancement to a cavity mode.
△ Less
Submitted 7 March, 2024; v1 submitted 7 November, 2023;
originally announced November 2023.
-
Dynamics of the Reynolds Shear Stress in Adverse Pressure-Gradient Flows, from the Lagrangian Transport Formalism
Authors:
T. W. Lee,
J. E. Park
Abstract:
Using the Lagrangian transport analysis for the turbulence momentum, the Reynolds stress gradient can be expressed as a function of the local momentum flux and force terms. From this perspective of an observer moving at the local mean velocity, the Reynolds stress gradient represents the lateral transport of streamwise momentum, balanced by the u'2 transport, pressure and viscous force terms. Data…
▽ More
Using the Lagrangian transport analysis for the turbulence momentum, the Reynolds stress gradient can be expressed as a function of the local momentum flux and force terms. From this perspective of an observer moving at the local mean velocity, the Reynolds stress gradient represents the lateral transport of streamwise momentum, balanced by the u'2 transport, pressure and viscous force terms. Data from direct numerical simulations (DNS) have been used to validate this approach for adverse pressure-gradient boundary layer flows at Clauser parameter of 1.4 and 39 (Kitsios et al., 2017), with a good degree of consistency and agreements. Minute fluctuations and attributes in the Reynolds shear stress profile are replicated through the Lagrangian momentum equation. Gradient analysis also leads to scaling at the first- and second-derivative levels, for u'2, v'2 and u'v'.
△ Less
Submitted 7 November, 2023; v1 submitted 30 October, 2023;
originally announced October 2023.
-
Fuel Consumption Prediction for a Passenger Ferry using Machine Learning and In-service Data: A Comparative Study
Authors:
Pedram Agand,
Allison Kennedy,
Trevor Harris,
Chanwoo Bae,
Mo Chen,
Edward J Park
Abstract:
As the importance of eco-friendly transportation increases, providing an efficient approach for marine vessel operation is essential. Methods for status monitoring with consideration to the weather condition and forecasting with the use of in-service data from ships requires accurate and complete models for predicting the energy efficiency of a ship. The models need to effectively process all the…
▽ More
As the importance of eco-friendly transportation increases, providing an efficient approach for marine vessel operation is essential. Methods for status monitoring with consideration to the weather condition and forecasting with the use of in-service data from ships requires accurate and complete models for predicting the energy efficiency of a ship. The models need to effectively process all the operational data in real-time. This paper presents models that can predict fuel consumption using in-service data collected from a passenger ship. Statistical and domain-knowledge methods were used to select the proper input variables for the models. These methods prevent over-fitting, missing data, and multicollinearity while providing practical applicability. Prediction models that were investigated include multiple linear regression (MLR), decision tree approach (DT), an artificial neural network (ANN), and ensemble methods. The best predictive performance was from a model developed using the XGboost technique which is a boosting ensemble approach. \rvv{Our code is available on GitHub at \url{https://github.com/pagand/model_optimze_vessel/tree/OE} for future research.
△ Less
Submitted 23 October, 2023; v1 submitted 19 October, 2023;
originally announced October 2023.
-
An MCTS-DRL Based Obstacle and Occlusion Avoidance Methodology in Robotic Follow-Ahead Applications
Authors:
Sahar Leisiazar,
Edward J. Park,
Angelica Lim,
Mo Chen
Abstract:
We propose a novel methodology for robotic follow-ahead applications that address the critical challenge of obstacle and occlusion avoidance. Our approach effectively navigates the robot while ensuring avoidance of collisions and occlusions caused by surrounding objects. To achieve this, we developed a high-level decision-making algorithm that generates short-term navigational goals for the mobile…
▽ More
We propose a novel methodology for robotic follow-ahead applications that address the critical challenge of obstacle and occlusion avoidance. Our approach effectively navigates the robot while ensuring avoidance of collisions and occlusions caused by surrounding objects. To achieve this, we developed a high-level decision-making algorithm that generates short-term navigational goals for the mobile robot. Monte Carlo Tree Search is integrated with a Deep Reinforcement Learning method to enhance the performance of the decision-making process and generate more reliable navigational goals. Through extensive experimentation and analysis, we demonstrate the effectiveness and superiority of our proposed approach in comparison to the existing follow-ahead human-following robotic methods. Our code is available at https://github.com/saharLeisiazar/follow-ahead-ros.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
Hydra: Multi-head Low-rank Adaptation for Parameter Efficient Fine-tuning
Authors:
Sanghyeon Kim,
Hyunmo Yang,
Younghyun Kim,
Youngjoon Hong,
Eunbyung Park
Abstract:
The recent surge in large-scale foundation models has spurred the development of efficient methods for adapting these models to various downstream tasks. Low-rank adaptation methods, such as LoRA, have gained significant attention due to their outstanding parameter efficiency and no additional inference latency. This paper investigates a more general form of adapter module based on the analysis th…
▽ More
The recent surge in large-scale foundation models has spurred the development of efficient methods for adapting these models to various downstream tasks. Low-rank adaptation methods, such as LoRA, have gained significant attention due to their outstanding parameter efficiency and no additional inference latency. This paper investigates a more general form of adapter module based on the analysis that parallel and sequential adaptation branches learn novel and general features during fine-tuning, respectively. The proposed method, named Hydra, due to its multi-head computational branches, combines parallel and sequential branch to integrate capabilities, which is more expressive than existing single branch methods and enables the exploration of a broader range of optimal points in the fine-tuning process. In addition, the proposed adaptation method explicitly leverages the pre-trained weights by performing a linear combination of the pre-trained features. It allows the learned features to have better generalization performance across diverse downstream tasks. Furthermore, we perform a comprehensive analysis of the characteristics of each adaptation branch with empirical evidence. Through an extensive range of experiments, encompassing comparisons and ablation studies, we substantiate the efficiency and demonstrate the superior performance of Hydra. This comprehensive evaluation underscores the potential impact and effectiveness of Hydra in a variety of applications. Our code is available on \url{https://github.com/extremebird/Hydra}
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
Localizations for quiver Hecke algebras III
Authors:
Masaki Kashiwara,
Myungho Kim,
Se-jin Oh,
Euiyong Park
Abstract:
Let $R$ be a quiver Hecke algebra, and let $\mathcal{C}_{w,v}$ be the category of finite-dimensional graded $R$-module categorifying a $q$-deformation of the doubly-invariant algebra $^{N'(w)} \mathbb{C}[N] ^{N(v)} $. In this paper, we prove that the localization $\tilde{\mathcal{C}}_{w,v}$ of the category $\mathcal{C}_{w,v}$ can be obtained as the localization by right braiders arising from deter…
▽ More
Let $R$ be a quiver Hecke algebra, and let $\mathcal{C}_{w,v}$ be the category of finite-dimensional graded $R$-module categorifying a $q$-deformation of the doubly-invariant algebra $^{N'(w)} \mathbb{C}[N] ^{N(v)} $. In this paper, we prove that the localization $\tilde{\mathcal{C}}_{w,v}$ of the category $\mathcal{C}_{w,v}$ can be obtained as the localization by right braiders arising from determinantial modules. As its application, we show several interesting properties of the localized category $\tilde{\mathcal{C}}_{w,v} $ including the right rigidity.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Pseudo Outlier Exposure for Out-of-Distribution Detection using Pretrained Transformers
Authors:
Jaeyoung Kim,
Kyuheon Jung,
Dongbin Na,
Sion Jang,
Eunbin Park,
Sungchul Choi
Abstract:
For real-world language applications, detecting an out-of-distribution (OOD) sample is helpful to alert users or reject such unreliable samples. However, modern over-parameterized language models often produce overconfident predictions for both in-distribution (ID) and OOD samples. In particular, language models suffer from OOD samples with a similar semantic representation to ID samples since the…
▽ More
For real-world language applications, detecting an out-of-distribution (OOD) sample is helpful to alert users or reject such unreliable samples. However, modern over-parameterized language models often produce overconfident predictions for both in-distribution (ID) and OOD samples. In particular, language models suffer from OOD samples with a similar semantic representation to ID samples since these OOD samples lie near the ID manifold. A rejection network can be trained with ID and diverse outlier samples to detect test OOD samples, but explicitly collecting auxiliary OOD datasets brings an additional burden for data collection. In this paper, we propose a simple but effective method called Pseudo Outlier Exposure (POE) that constructs a surrogate OOD dataset by sequentially masking tokens related to ID classes. The surrogate OOD sample introduced by POE shows a similar representation to ID data, which is most effective in training a rejection network. Our method does not require any external OOD data and can be easily implemented within off-the-shelf Transformers. A comprehensive comparison with state-of-the-art algorithms demonstrates POE's competitiveness on several text classification benchmarks.
△ Less
Submitted 19 July, 2023; v1 submitted 18 July, 2023;
originally announced July 2023.
-
Principal Component Analysis and Hidden Markov Model for Forecasting Stock Returns
Authors:
Eugene W. Park
Abstract:
This paper presents a method for predicting stock returns using principal component analysis (PCA) and the hidden Markov model (HMM) and tests the results of trading stocks based on this approach. Principal component analysis is applied to the covariance matrix of stock returns for companies listed in the S&P 500 index, and interpreting principal components as factor returns, we apply the HMM mode…
▽ More
This paper presents a method for predicting stock returns using principal component analysis (PCA) and the hidden Markov model (HMM) and tests the results of trading stocks based on this approach. Principal component analysis is applied to the covariance matrix of stock returns for companies listed in the S&P 500 index, and interpreting principal components as factor returns, we apply the HMM model on them. Then we use the transition probability matrix and state conditional means to forecast the factors returns. Reverting the factor returns forecasts to stock returns using eigenvectors, we obtain forecasts for the stock returns. We find that, with the right hyperparameters, our model yields a strategy that outperforms the buy-and-hold strategy in terms of the annualized Sharpe ratio.
△ Less
Submitted 1 July, 2023;
originally announced July 2023.
-
Separable Physics-Informed Neural Networks
Authors:
Junwoo Cho,
Seungtae Nam,
Hyunmo Yang,
Seok-Bae Yun,
Youngjoon Hong,
Eunbyung Park
Abstract:
Physics-informed neural networks (PINNs) have recently emerged as promising data-driven PDE solvers showing encouraging results on various PDEs. However, there is a fundamental limitation of training PINNs to solve multi-dimensional PDEs and approximate highly complex solution functions. The number of training points (collocation points) required on these challenging PDEs grows substantially, but…
▽ More
Physics-informed neural networks (PINNs) have recently emerged as promising data-driven PDE solvers showing encouraging results on various PDEs. However, there is a fundamental limitation of training PINNs to solve multi-dimensional PDEs and approximate highly complex solution functions. The number of training points (collocation points) required on these challenging PDEs grows substantially, but it is severely limited due to the expensive computational costs and heavy memory overhead. To overcome this issue, we propose a network architecture and training algorithm for PINNs. The proposed method, separable PINN (SPINN), operates on a per-axis basis to significantly reduce the number of network propagations in multi-dimensional PDEs unlike point-wise processing in conventional PINNs. We also propose using forward-mode automatic differentiation to reduce the computational cost of computing PDE residuals, enabling a large number of collocation points (>10^7) on a single commodity GPU. The experimental results show drastically reduced computational costs (62x in wall-clock time, 1,394x in FLOPs given the same number of collocation points) in multi-dimensional PDEs while achieving better accuracy. Furthermore, we present that SPINN can solve a chaotic (2+1)-d Navier-Stokes equation significantly faster than the best-performing prior method (9 minutes vs 10 hours in a single GPU), maintaining accuracy. Finally, we showcase that SPINN can accurately obtain the solution of a highly nonlinear and multi-dimensional PDE, a (3+1)-d Navier-Stokes equation. For visualized results and code, please see https://jwcho5576.github.io/spinn.github.io/.
△ Less
Submitted 31 October, 2023; v1 submitted 28 June, 2023;
originally announced June 2023.
-
Laurent family of simple modules over quiver Hecke algebra
Authors:
Masaki Kashiwara,
Myungho Kim,
Se-jin Oh,
Euiyong Park
Abstract:
We introduce the notions of quasi-Laurent and Laurent families of simple modules over quiver Hecke algebras of arbitrary symmetrizable types. We prove that such a family plays a similar role of a cluster in the quantum cluster algebra theory and exhibits a quantum Laurent positivity phenomenon for the basis of the quantum unipotent coordinate ring $\mathcal{A}_q(\mathfrak{n}(w))$, coming from the…
▽ More
We introduce the notions of quasi-Laurent and Laurent families of simple modules over quiver Hecke algebras of arbitrary symmetrizable types. We prove that such a family plays a similar role of a cluster in the quantum cluster algebra theory and exhibits a quantum Laurent positivity phenomenon for the basis of the quantum unipotent coordinate ring $\mathcal{A}_q(\mathfrak{n}(w))$, coming from the categorification. Then we show that the families of simple modules categorifying GLS-clusters are Laurent families by using the PBW-decomposition vector of a simple module $X$ and categorical interpretation of (co-)degree of $[X]$. As applications of such $\mathbb{Z}$-vectors, we define several skew symmetric pairings on arbitrary pairs of simple modules, and investigate the relationships among the pairings and $Λ$-invariants of R-matrices in the quiver Hecke algebra theory.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
Temporal Dynamic Quantization for Diffusion Models
Authors:
Junhyuk So,
Jungwon Lee,
Daehyun Ahn,
Hyungjun Kim,
Eunhyeok Park
Abstract:
The diffusion model has gained popularity in vision applications due to its remarkable generative performance and versatility. However, high storage and computation demands, resulting from the model size and iterative generation, hinder its use on mobile devices. Existing quantization techniques struggle to maintain performance even in 8-bit precision due to the diffusion model's unique property o…
▽ More
The diffusion model has gained popularity in vision applications due to its remarkable generative performance and versatility. However, high storage and computation demands, resulting from the model size and iterative generation, hinder its use on mobile devices. Existing quantization techniques struggle to maintain performance even in 8-bit precision due to the diffusion model's unique property of temporal variation in activation. We introduce a novel quantization method that dynamically adjusts the quantization interval based on time step information, significantly improving output quality. Unlike conventional dynamic quantization techniques, our approach has no computational overhead during inference and is compatible with both post-training quantization (PTQ) and quantization-aware training (QAT). Our extensive experiments demonstrate substantial improvements in output quality with the quantized diffusion model across various datasets.
△ Less
Submitted 11 December, 2023; v1 submitted 4 June, 2023;
originally announced June 2023.
-
OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models
Authors:
Changhun Lee,
Jungyu Jin,
Taesu Kim,
Hyungjun Kim,
Eunhyeok Park
Abstract:
Large language models (LLMs) with hundreds of billions of parameters require powerful server-grade GPUs for inference, limiting their practical deployment. To address this challenge, we introduce the outlier-aware weight quantization (OWQ) method, which aims to minimize LLM's footprint through low-precision representation. OWQ prioritizes a small subset of structured weights sensitive to quantizat…
▽ More
Large language models (LLMs) with hundreds of billions of parameters require powerful server-grade GPUs for inference, limiting their practical deployment. To address this challenge, we introduce the outlier-aware weight quantization (OWQ) method, which aims to minimize LLM's footprint through low-precision representation. OWQ prioritizes a small subset of structured weights sensitive to quantization, storing them in high-precision, while applying highly tuned quantization to the remaining dense weights. This sensitivity-aware mixed-precision scheme reduces the quantization error notably, and extensive experiments demonstrate that 3.1-bit models using OWQ perform comparably to 4-bit models optimized by OPTQ. Furthermore, OWQ incorporates a parameter-efficient fine-tuning for task-specific adaptation, called weak column tuning (WCT), enabling accurate task-specific LLM adaptation with minimal memory overhead in the optimized format. OWQ represents a notable advancement in the flexibility, efficiency, and practicality of LLM optimization literature. The source code is available at https://github.com/xvyaward/owq
△ Less
Submitted 23 January, 2024; v1 submitted 4 June, 2023;
originally announced June 2023.
-
A Framework for Ductility in Metallic Glasses
Authors:
Sungwoo Sohn,
Naijia Liu,
Geun Hee Yoo,
Aya Ochiai,
Jade Chen,
Callie Levitt,
Guannan Liu,
Samuel Charles Schroers,
Ethen Lund,
Eun Soo Park,
Jan Schroers
Abstract:
The understanding and quantification of ductility in crystalline metals, which has led to their widespread and effective usage as a structural material, is lacking in metallic glasses (MGs). Here, we introduce such a framework for ductility. This very practical framework is based on a MGs ability to support stable shear band growth, quantified in a stress gradient, gradSDB, which we measure and ca…
▽ More
The understanding and quantification of ductility in crystalline metals, which has led to their widespread and effective usage as a structural material, is lacking in metallic glasses (MGs). Here, we introduce such a framework for ductility. This very practical framework is based on a MGs ability to support stable shear band growth, quantified in a stress gradient, gradSDB, which we measure and calculate for a range of MGs. Whether a MG behaves ductile or brittle in an application is determined by the comparison between gradsDB the applied stress field gradient, gradsapp. If gradsDB > gradsapp, the MG will behave brittle, if gradsDB < gradsapp, the MG will behave ductile, and gradsapp - gradsDB indicates how ductile. This framework can explain observed plastic properties of MGs and their apparent contradicting brittle and ductile characteristics. Looking forward, proposed framework provides the constitutive relation to quantitatively model their plastic behavior in any application, a requirement to use MGs as structural materials.
△ Less
Submitted 15 April, 2023;
originally announced April 2023.
-
SMPConv: Self-moving Point Representations for Continuous Convolution
Authors:
Sanghyeon Kim,
Eunbyung Park
Abstract:
Continuous convolution has recently gained prominence due to its ability to handle irregularly sampled data and model long-term dependency. Also, the promising experimental results of using large convolutional kernels have catalyzed the development of continuous convolution since they can construct large kernels very efficiently. Leveraging neural networks, more specifically multilayer perceptrons…
▽ More
Continuous convolution has recently gained prominence due to its ability to handle irregularly sampled data and model long-term dependency. Also, the promising experimental results of using large convolutional kernels have catalyzed the development of continuous convolution since they can construct large kernels very efficiently. Leveraging neural networks, more specifically multilayer perceptrons (MLPs), is by far the most prevalent approach to implementing continuous convolution. However, there are a few drawbacks, such as high computational costs, complex hyperparameter tuning, and limited descriptive power of filters. This paper suggests an alternative approach to building a continuous convolution without neural networks, resulting in more computationally efficient and improved performance. We present self-moving point representations where weight parameters freely move, and interpolation schemes are used to implement continuous functions. When applied to construct convolutional kernels, the experimental results have shown improved performance with drop-in replacement in the existing frameworks. Due to its lightweight structure, we are first to demonstrate the effectiveness of continuous convolution in a large-scale setting, e.g., ImageNet, presenting the improvements over the prior arts. Our code is available on https://github.com/sangnekim/SMPConv
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
Affinizations, R-matrices and reflection functors
Authors:
Masaki Kashiwara,
Myungho Kim,
Se-jin Oh,
Euiyong Park
Abstract:
In this paper we establish affinizations and R-matrices in the language of pro-objects, and as an application, we construct reflection functors over the localizations of quiver Hecke algebras of arbitrary finite types. This reflection functor categorifies the braid group action on the half of a quantum group and the Saito reflection.
In this paper we establish affinizations and R-matrices in the language of pro-objects, and as an application, we construct reflection functors over the localizations of quiver Hecke algebras of arbitrary finite types. This reflection functor categorifies the braid group action on the half of a quantum group and the Saito reflection.
△ Less
Submitted 2 February, 2024; v1 submitted 1 April, 2023;
originally announced April 2023.
-
Enhancing Breast Cancer Risk Prediction by Incorporating Prior Images
Authors:
Hyeonsoo Lee,
Junha Kim,
Eunkyung Park,
Minjeong Kim,
Taesoo Kim,
Thijs Kooi
Abstract:
Recently, deep learning models have shown the potential to predict breast cancer risk and enable targeted screening strategies, but current models do not consider the change in the breast over time. In this paper, we present a new method, PRIME+, for breast cancer risk prediction that leverages prior mammograms using a transformer decoder, outperforming a state-of-the-art risk prediction method th…
▽ More
Recently, deep learning models have shown the potential to predict breast cancer risk and enable targeted screening strategies, but current models do not consider the change in the breast over time. In this paper, we present a new method, PRIME+, for breast cancer risk prediction that leverages prior mammograms using a transformer decoder, outperforming a state-of-the-art risk prediction method that only uses mammograms from a single time point. We validate our approach on a dataset with 16,113 exams and further demonstrate that it effectively captures patterns of changes from prior mammograms, such as changes in breast density, resulting in improved short-term and long-term breast cancer risk prediction. Experimental results show that our model achieves a statistically significant improvement in performance over the state-of-the-art based model, with a C-index increase from 0.68 to 0.73 (p < 0.05) on held-out test sets.
△ Less
Submitted 28 August, 2023; v1 submitted 27 March, 2023;
originally announced March 2023.
-
Estimating marginal treatment effects from observational studies and indirect treatment comparisons: When are standardization-based methods preferable to those based on propensity score weighting?
Authors:
Harlan Campbell,
Julie E Park,
Jeroen P Jansen,
Shannon Cope
Abstract:
In light of newly developed standardization methods, we evaluate, via simulation study, how propensity score weighting and standardization -based approaches compare for obtaining estimates of the marginal odds ratio and the marginal hazard ratio. Specifically, we consider how the two approaches compare in two different scenarios: (1) in a single observational study, and (2) in an anchored indirect…
▽ More
In light of newly developed standardization methods, we evaluate, via simulation study, how propensity score weighting and standardization -based approaches compare for obtaining estimates of the marginal odds ratio and the marginal hazard ratio. Specifically, we consider how the two approaches compare in two different scenarios: (1) in a single observational study, and (2) in an anchored indirect treatment comparison (ITC) of randomized controlled trials. We present the material in such a way so that the matching-adjusted indirect comparison (MAIC) and the (novel) simulated treatment comparison (STC) methods in the ITC setting may be viewed as analogous to the propensity score weighting and standardization methods in the single observational study setting. Our results suggest that current recommendations for conducting ITCs can be improved and underscore the importance of adjusting for purely prognostic factors.
△ Less
Submitted 7 October, 2023; v1 submitted 23 January, 2023;
originally announced January 2023.
-
FFNeRV: Flow-Guided Frame-Wise Neural Representations for Videos
Authors:
Joo Chan Lee,
Daniel Rho,
Jong Hwan Ko,
Eunbyung Park
Abstract:
Neural fields, also known as coordinate-based or implicit neural representations, have shown a remarkable capability of representing, generating, and manipulating various forms of signals. For video representations, however, mapping pixel-wise coordinates to RGB colors has shown relatively low compression performance and slow convergence and inference speed. Frame-wise video representation, which…
▽ More
Neural fields, also known as coordinate-based or implicit neural representations, have shown a remarkable capability of representing, generating, and manipulating various forms of signals. For video representations, however, mapping pixel-wise coordinates to RGB colors has shown relatively low compression performance and slow convergence and inference speed. Frame-wise video representation, which maps a temporal coordinate to its entire frame, has recently emerged as an alternative method to represent videos, improving compression rates and encoding speed. While promising, it has still failed to reach the performance of state-of-the-art video compression algorithms. In this work, we propose FFNeRV, a novel method for incorporating flow information into frame-wise representations to exploit the temporal redundancy across the frames in videos inspired by the standard video codecs. Furthermore, we introduce a fully convolutional architecture, enabled by one-dimensional temporal grids, improving the continuity of spatial features. Experimental results show that FFNeRV yields the best performance for video compression and frame interpolation among the methods using frame-wise representations or neural fields. To reduce the model size even further, we devise a more compact convolutional architecture using the group and pointwise convolutions. With model compression techniques, including quantization-aware training and entropy coding, FFNeRV outperforms widely-used standard video codecs (H.264 and HEVC) and performs on par with state-of-the-art video compression algorithms.
△ Less
Submitted 6 August, 2023; v1 submitted 23 December, 2022;
originally announced December 2022.