Search | arXiv e-print repository

Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling

Authors: Min-Seop Kwak, Donghoon Ahn, Ines Hyeonsu Kim, Jin-wha Kim, Seungryong Kim

Abstract: Score distillation sampling (SDS), the methodology in which the score from pretrained 2D diffusion models is distilled into 3D representation, has recently brought significant advancements in text-to-3D generation task. However, this approach is still confronted with critical geometric inconsistency problems such as the Janus problem. Starting from a hypothesis that such inconsistency problems may… ▽ More Score distillation sampling (SDS), the methodology in which the score from pretrained 2D diffusion models is distilled into 3D representation, has recently brought significant advancements in text-to-3D generation task. However, this approach is still confronted with critical geometric inconsistency problems such as the Janus problem. Starting from a hypothesis that such inconsistency problems may be induced by multiview inconsistencies between 2D scores predicted from various viewpoints, we introduce GSD, a simple and general plug-and-play framework for incorporating 3D consistency and therefore geometry awareness into the SDS process. Our methodology is composed of three components: 3D consistent noising, designed to produce 3D consistent noise maps that perfectly follow the standard Gaussian distribution, geometry-based gradient warping for identifying correspondences between predicted gradients of different viewpoints, and novel gradient consistency loss to optimize the scene geometry toward producing more consistent gradients. We demonstrate that our method significantly improves performance, successfully addressing the geometric inconsistency problems in text-to-3D generation task with minimal computation cost and being compatible with existing score distillation-based models. Our project page is available at https://ku-cvlab.github.io/GSD/. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16042 [pdf, other]

Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification

Authors: Inès Hyeonsu Kim, JoungBin Lee, Soowon Son, Woojeong Jin, Kyusun Cho, Junyoung Seo, Min-Seop Kwak, Seokju Cho, JeongYeol Baek, Byeongwon Lee, Seungryong Kim

Abstract: Person re-identification (Re-ID) often faces challenges due to variations in human poses and camera viewpoints, which significantly affect the appearance of individuals across images. Existing datasets frequently lack diversity and scalability in these aspects, hindering the generalization of Re-ID models to new camera systems. Previous methods have attempted to address these issues through data a… ▽ More Person re-identification (Re-ID) often faces challenges due to variations in human poses and camera viewpoints, which significantly affect the appearance of individuals across images. Existing datasets frequently lack diversity and scalability in these aspects, hindering the generalization of Re-ID models to new camera systems. Previous methods have attempted to address these issues through data augmentation; however, they rely on human poses already present in the training dataset, failing to effectively reduce the human pose bias in the dataset. We propose Diff-ID, a novel data augmentation approach that incorporates sparse and underrepresented human pose and camera viewpoint examples into the training data, addressing the limited diversity in the original training data distribution. Our objective is to augment a training dataset that enables existing Re-ID models to learn features unbiased by human pose and camera viewpoint variations. To achieve this, we leverage the knowledge of pre-trained large-scale diffusion models. Using the SMPL model, we simultaneously capture both the desired human poses and camera viewpoints, enabling realistic human rendering. The depth information provided by the SMPL model indirectly conveys the camera viewpoints. By conditioning the diffusion model on both the human pose and camera viewpoint concurrently through the SMPL model, we generate realistic images with diverse human poses and camera viewpoints. Qualitative results demonstrate the effectiveness of our method in addressing human pose bias and enhancing the generalizability of Re-ID models compared to other data augmentation-based Re-ID approaches. The performance gains achieved by training Re-ID models on our offline augmented dataset highlight the potential of our proposed framework in improving the scalability and generalizability of person Re-ID models. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: The project page is available at https://ku-cvlab.github.io/Diff-ID/

arXiv:2402.02972 [pdf, other]

Retrieval-Augmented Score Distillation for Text-to-3D Generation

Authors: Junyoung Seo, Susung Hong, Wooseok Jang, Inès Hyeonsu Kim, Minseop Kwak, Doyup Lee, Seungryong Kim

Abstract: Text-to-3D generation has achieved significant success by incorporating powerful 2D diffusion models, but insufficient 3D prior knowledge also leads to the inconsistency of 3D geometry. Recently, since large-scale multi-view datasets have been released, fine-tuning the diffusion model on the multi-view datasets becomes a mainstream to solve the 3D inconsistency problem. However, it has confronted… ▽ More Text-to-3D generation has achieved significant success by incorporating powerful 2D diffusion models, but insufficient 3D prior knowledge also leads to the inconsistency of 3D geometry. Recently, since large-scale multi-view datasets have been released, fine-tuning the diffusion model on the multi-view datasets becomes a mainstream to solve the 3D inconsistency problem. However, it has confronted with fundamental difficulties regarding the limited quality and diversity of 3D data, compared with 2D data. To sidestep these trade-offs, we explore a retrieval-augmented approach tailored for score distillation, dubbed ReDream. We postulate that both expressiveness of 2D diffusion models and geometric consistency of 3D assets can be fully leveraged by employing the semantically relevant assets directly within the optimization process. To this end, we introduce novel framework for retrieval-based quality enhancement in text-to-3D generation. We leverage the retrieved asset to incorporate its geometric prior in the variational objective and adapt the diffusion model's 2D prior toward view consistency, achieving drastic improvements in both geometry and fidelity of generated scenes. We conduct extensive experiments to demonstrate that ReDream exhibits superior quality with increased geometric consistency. Project page is available at https://ku-cvlab.github.io/ReDream/. △ Less

Submitted 2 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: Accepted to ICML 2024 / Project Page: https://ku-cvlab.github.io/ReDream/

arXiv:1906.00476 [pdf, other]

Noise reduction using past causal cones in variational quantum algorithms

Authors: Omar Shehab, Isaac H. Kim, Nhung H. Nguyen, Kevin Landsman, Cinthia H. Alderete, Daiwei Zhu, C. Monroe, Norbert M. Linke

Abstract: We introduce an approach to improve the accuracy and reduce the sample complexity of near term quantum-classical algorithms. We construct a simpler initial parameterized quantum state, or ansatz, based on the past causal cone of each observable, generally yielding fewer qubits and gates. We implement this protocol on a trapped ion quantum computer and demonstrate improvement in accuracy and time-t… ▽ More We introduce an approach to improve the accuracy and reduce the sample complexity of near term quantum-classical algorithms. We construct a simpler initial parameterized quantum state, or ansatz, based on the past causal cone of each observable, generally yielding fewer qubits and gates. We implement this protocol on a trapped ion quantum computer and demonstrate improvement in accuracy and time-to-solution at an arbitrary point in the variational search space. We report a $\sim 27\%$ improvement in the accuracy of the calculation of the deuteron binding energy and $\sim 40\%$ improvement in the accuracy of the quantum approximate optimization of the MAXCUT problem applied to the dragon graph $T_{3,2}$. When the time-to-solution is prioritized over accuracy, the former requires $\sim 71\%$ fewer measurements and the latter requires $\sim 78\%$ fewer measurements. △ Less

Submitted 12 June, 2019; v1 submitted 2 June, 2019; originally announced June 2019.

Comments: Added data availability statement, additional affiliation and grant acknowledgement

MSC Class: 68Q12; 81P68; 81P45

arXiv:0802.3253 [pdf, ps, other]

On the Capacity and Design of Limited Feedback Multiuser MIMO Uplinks

Authors: Il Han Kim, David J. Love

Abstract: The theory of multiple-input multiple-output (MIMO) technology has been well-developed to increase fading channel capacity over single-input single-output (SISO) systems. This capacity gain can often be leveraged by utilizing channel state information at the transmitter and the receiver. Users make use of this channel state information for transmit signal adaptation. In this correspondence, we d… ▽ More The theory of multiple-input multiple-output (MIMO) technology has been well-developed to increase fading channel capacity over single-input single-output (SISO) systems. This capacity gain can often be leveraged by utilizing channel state information at the transmitter and the receiver. Users make use of this channel state information for transmit signal adaptation. In this correspondence, we derive the capacity region for the MIMO multiple access channel (MIMO MAC) when partial channel state information is available at the transmitters, where we assume a synchronous MIMO multiuser uplink. The partial channel state information feedback has a cardinality constraint and is fed back from the basestation to the users using a limited rate feedback channel. Using this feedback information, we propose a finite codebook design method to maximize sum-rate. In this correspondence, the codebook is a set of transmit signal covariance matrices. We also derive the capacity region and codebook design methods in the case that the covariance matrix is rank-one (i.e., beamforming). This is motivated by the fact that beamforming is optimal in certain conditions. The simulation results show that when the number of feedback bits increases, the capacity also increases. Even with a small number of feedback bits, the performance of the proposed system is close to an optimal solution with the full feedback. △ Less

Submitted 21 February, 2008; originally announced February 2008.

Comments: 25 pages, submitted to the IEEE Transactions on Information Theory

ACM Class: H.1.1

Showing 1–5 of 5 results for author: Kim, I H