Search | arXiv e-print repository

DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer

Authors: Keon Lee, Dong Won Kim, Jaehyeon Kim, Jaewoong Cho

Abstract: Large-scale diffusion models have shown outstanding generative abilities across multiple modalities including images, videos, and audio. However, text-to-speech (TTS) systems typically involve domain-specific modeling factors (e.g., phonemes and phoneme-level durations) to ensure precise temporal alignments between text and speech, which hinders the efficiency and scalability of diffusion models f… ▽ More Large-scale diffusion models have shown outstanding generative abilities across multiple modalities including images, videos, and audio. However, text-to-speech (TTS) systems typically involve domain-specific modeling factors (e.g., phonemes and phoneme-level durations) to ensure precise temporal alignments between text and speech, which hinders the efficiency and scalability of diffusion models for TTS. In this work, we present an efficient and scalable Diffusion Transformer (DiT) that utilizes off-the-shelf pre-trained text and speech encoders. Our approach addresses the challenge of text-speech alignment via cross-attention mechanisms with the prediction of the total length of speech representations. To achieve this, we enhance the DiT architecture to suit TTS and improve the alignment by incorporating semantic guidance into the latent space of speech. We scale the training dataset and the model size to 82K hours and 790M parameters, respectively. Our extensive experiments demonstrate that the large-scale diffusion model for TTS without domain-specific modeling not only simplifies the training pipeline but also yields superior or comparable zero-shot performance to state-of-the-art TTS models in terms of naturalness, intelligibility, and speaker similarity. Our speech samples are available at https://ditto-tts.github.io. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2309.11988 [pdf, ps, other]

Relaxed Conditions for Parameterized Linear Matrix Inequality in the Form of Nested Fuzzy Summations

Authors: Do Wan Kim, Donghwan Lee

Abstract: The aim of this study is to investigate less conservative conditions for parameterized linear matrix inequalities (PLMIs) that are formulated as nested fuzzy summations. Such PLMIs are commonly encountered in stability analysis and control design problems for Takagi-Sugeno (T-S) fuzzy systems. Utilizing the weighted inequality of arithmetic and geometric means (AM-GM inequality), we develop new, l… ▽ More The aim of this study is to investigate less conservative conditions for parameterized linear matrix inequalities (PLMIs) that are formulated as nested fuzzy summations. Such PLMIs are commonly encountered in stability analysis and control design problems for Takagi-Sugeno (T-S) fuzzy systems. Utilizing the weighted inequality of arithmetic and geometric means (AM-GM inequality), we develop new, less conservative linear matrix inequalities for the PLMIs. This methodology enables us to efficiently handle the product of membership functions that have intersecting indices. Through empirical case studies, we demonstrate that our proposed conditions produce less conservative results compared to existing approaches in the literature. △ Less

Submitted 18 December, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: This work has been submitted to IEEE Transactions on Systems, Man and Cybernetics: Systems for possible publications

arXiv:2309.06841 [pdf, ps, other]

On the Local Quadratic Stability of T-S Fuzzy Systems in the Vicinity of the Origin

Authors: Donghwan Lee, Do Wan Kim

Abstract: The main goal of this paper is to introduce new local stability conditions for continuous-time Takagi-Sugeno (T-S) fuzzy systems. These stability conditions are based on linear matrix inequalities (LMIs) in combination with quadratic Lyapunov functions. Moreover, they integrate information on the membership functions at the origin and effectively leverage the linear structure of the underlying non… ▽ More The main goal of this paper is to introduce new local stability conditions for continuous-time Takagi-Sugeno (T-S) fuzzy systems. These stability conditions are based on linear matrix inequalities (LMIs) in combination with quadratic Lyapunov functions. Moreover, they integrate information on the membership functions at the origin and effectively leverage the linear structure of the underlying nonlinear system in the vicinity of the origin. As a result, the proposed conditions are proved to be less conservative compared to existing methods using fuzzy Lyapunov functions in the literature. Moreover, we establish that the proposed methods offer necessary and sufficient conditions for the local exponential stability of T-S fuzzy systems. The paper also includes discussions on the inherent limitations associated with fuzzy Lyapunov approaches. To demonstrate the theoretical results, we provide comprehensive examples that elucidate the core concepts and validate the efficacy of the proposed conditions. △ Less

Submitted 13 September, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

arXiv:2307.16706 [pdf, ps, other]

Continuous-Time Distributed Dynamic Programming for Networked Multi-Agent Markov Decision Processes

Authors: Donghwan Lee, Han-Dong Lim, Do Wan Kim

Abstract: The main goal of this paper is to investigate continuous-time distributed dynamic programming (DP) algorithms for networked multi-agent Markov decision problems (MAMDPs). In our study, we adopt a distributed multi-agent framework where individual agents have access only to their own rewards, lacking insights into the rewards of other agents. Moreover, each agent has the ability to share its parame… ▽ More The main goal of this paper is to investigate continuous-time distributed dynamic programming (DP) algorithms for networked multi-agent Markov decision problems (MAMDPs). In our study, we adopt a distributed multi-agent framework where individual agents have access only to their own rewards, lacking insights into the rewards of other agents. Moreover, each agent has the ability to share its parameters with neighboring agents through a communication network, represented by a graph. We first introduce a novel distributed DP, inspired by the distributed optimization method of Wang and Elia. Next, a new distributed DP is introduced through a decoupling process. The convergence of the DP algorithms is proved through systems and control perspectives. The study in this paper sets the stage for new distributed temporal different learning algorithms. △ Less

Submitted 13 June, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

arXiv:2204.10479 [pdf, ps, other]

Finite-Time Analysis of Temporal Difference Learning: Discrete-Time Linear System Perspective

Authors: Donghwan Lee, Do Wan Kim

Abstract: TD-learning is a fundamental algorithm in the field of reinforcement learning (RL), that is employed to evaluate a given policy by estimating the corresponding value function for a Markov decision process. While significant progress has been made in the theoretical analysis of TD-learning, recent research has uncovered guarantees concerning its statistical efficiency by developing finite-time erro… ▽ More TD-learning is a fundamental algorithm in the field of reinforcement learning (RL), that is employed to evaluate a given policy by estimating the corresponding value function for a Markov decision process. While significant progress has been made in the theoretical analysis of TD-learning, recent research has uncovered guarantees concerning its statistical efficiency by developing finite-time error bounds. This paper aims to contribute to the existing body of knowledge by presenting a novel finite-time analysis of tabular temporal difference (TD) learning, which makes direct and effective use of discrete-time stochastic linear system models and leverages Schur matrix properties. The proposed analysis can cover both on-policy and off-policy settings in a unified manner. By adopting this approach, we hope to offer new and straightforward templates that not only shed further light on the analysis of TD-learning and related RL algorithms but also provide valuable insights for future research in this domain. △ Less

Submitted 2 June, 2023; v1 submitted 21 April, 2022; originally announced April 2022.

Comments: arXiv admin note: text overlap with arXiv:2112.14417

arXiv:2112.14417

Control Theoretic Analysis of Temporal Difference Learning

Authors: Donghwan Lee, Do Wan Kim

Abstract: The goal of this manuscript is to conduct a controltheoretic analysis of Temporal Difference (TD) learning algorithms. TD-learning serves as a cornerstone in the realm of reinforcement learning, offering a methodology for approximating the value function associated with a given policy in a Markov Decision Process. Despite several existing works that have contributed to the theoretical understandin… ▽ More The goal of this manuscript is to conduct a controltheoretic analysis of Temporal Difference (TD) learning algorithms. TD-learning serves as a cornerstone in the realm of reinforcement learning, offering a methodology for approximating the value function associated with a given policy in a Markov Decision Process. Despite several existing works that have contributed to the theoretical understanding of TD-learning, it is only in recent years that researchers have been able to establish concrete guarantees on its statistical efficiency. In this paper, we introduce a finite-time, control-theoretic framework for analyzing TD-learning, leveraging established concepts from the field of linear systems control. Consequently, this paper provides additional insights into the mechanics of TD learning and the broader landscape of reinforcement learning, all while employing straightforward analytical tools derived from control theory. △ Less

Submitted 8 September, 2023; v1 submitted 29 December, 2021; originally announced December 2021.

Comments: The contents of this paper have some overlaps with some other arxiv paper we have submitted. Therefore, this paper is redundant in my opinion

arXiv:2109.09088 [pdf, ps, other]

Relaxed Conditions for Parameterized Linear Matrix Inequality in the Form of Double Sum

Authors: Do Wan Kim, Dong Hwan Lee

Abstract: The aim of this study is to investigate less conservative conditions for a parameterized linear matrix inequality (PLMI) expressed in the form of a double convex sum. This type of PLMI frequently appears in T-S fuzzy control system analysis and design problems. In this letter, we derive new, less conservative linear matrix inequalities (LMIs) for the PLMI by employing the proposed sum relaxation m… ▽ More The aim of this study is to investigate less conservative conditions for a parameterized linear matrix inequality (PLMI) expressed in the form of a double convex sum. This type of PLMI frequently appears in T-S fuzzy control system analysis and design problems. In this letter, we derive new, less conservative linear matrix inequalities (LMIs) for the PLMI by employing the proposed sum relaxation method based on Young's inequality. The derived LMIs are proven to be less conservative than the existing conditions related to this topic in the literature. The proposed technique is applicable to various stability analysis and control design problems for T-S fuzzy systems, which are formulated as solving the PLMIs in the form of a double convex sum. Furthermore, examples is provided to illustrate the reduced conservatism of the derived LMIs. △ Less

Submitted 13 July, 2023; v1 submitted 19 September, 2021; originally announced September 2021.

arXiv:2106.02391 [pdf, ps, other]

Data-Driven Control Design with LMIs and Dynamic Programming

Authors: Donghwan Lee, Do Wan Kim

Abstract: The goal of this paper is to develop data-driven control design and evaluation strategies based on linear matrix inequalities (LMIs) and dynamic programming. We consider deterministic discrete-time LTI systems, where the system model is unknown. We propose efficient data collection schemes from the state-input trajectories together with data-driven LMIs to design state-feedback controllers for sta… ▽ More The goal of this paper is to develop data-driven control design and evaluation strategies based on linear matrix inequalities (LMIs) and dynamic programming. We consider deterministic discrete-time LTI systems, where the system model is unknown. We propose efficient data collection schemes from the state-input trajectories together with data-driven LMIs to design state-feedback controllers for stabilization and linear quadratic regulation (LQR) problem. In addition, we investigate theoretically guaranteed exploration schemes to acquire valid data from the trajectories under different scenarios. In particular, we prove that as more and more data is accumulated, the collected data becomes valid for the proposed algorithms with higher probability. Finally, data-driven dynamic programming algorithms with convergence guarantees are then discussed. △ Less

Submitted 16 June, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

arXiv:2105.14760 [pdf, ps, other]

Multi-Objective LQG Design with Primal-Dual Method

Authors: Donghwan Lee, Do Wan Kim

Abstract: The goal of this paper is to study a multi-objective linear quadratic Gaussian (LQG) control problem. In particular, we consider an optimal control problem minimizing a quadratic cost over a finite time horizon for linear stochastic systems subject to control energy constraints. To solve the problem, we suggest an efficient bisection line search algorithm which is computationally efficient compare… ▽ More The goal of this paper is to study a multi-objective linear quadratic Gaussian (LQG) control problem. In particular, we consider an optimal control problem minimizing a quadratic cost over a finite time horizon for linear stochastic systems subject to control energy constraints. To solve the problem, we suggest an efficient bisection line search algorithm which is computationally efficient compared to other approaches such as the semidefinite programming. The main idea is to use the Lagrangian function and Karush-Kuhn-Tucker (KKT) optimality conditions to solve the constrained optimization problem. The Lagrange multiplier is searched using the bisection line search. Numerical examples are given to demonstrate the effectiveness of the proposed methods. △ Less

Submitted 31 May, 2021; originally announced May 2021.

Showing 1–9 of 9 results for author: Kim, D W