-
Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation
Authors:
Kohei Matsuura,
Takanori Ashihara,
Takafumi Moriya,
Masato Mimura,
Takatomo Kano,
Atsunori Ogawa,
Marc Delcroix
Abstract:
This paper introduces a novel approach called sentence-wise speech summarization (Sen-SSum), which generates text summaries from a spoken document in a sentence-by-sentence manner. Sen-SSum combines the real-time processing of automatic speech recognition (ASR) with the conciseness of speech summarization. To explore this approach, we present two datasets for Sen-SSum: Mega-SSum and CSJ-SSum. Usin…
▽ More
This paper introduces a novel approach called sentence-wise speech summarization (Sen-SSum), which generates text summaries from a spoken document in a sentence-by-sentence manner. Sen-SSum combines the real-time processing of automatic speech recognition (ASR) with the conciseness of speech summarization. To explore this approach, we present two datasets for Sen-SSum: Mega-SSum and CSJ-SSum. Using these datasets, our study evaluates two types of Transformer-based models: 1) cascade models that combine ASR and strong text summarization models, and 2) end-to-end (E2E) models that directly convert speech into a text summary. While E2E models are appealing to develop compute-efficient models, they perform worse than cascade models. Therefore, we propose knowledge distillation for E2E models using pseudo-summaries generated by the cascade models. Our experiments show that this proposed knowledge distillation effectively improves the performance of the E2E model on both datasets.
△ Less
Submitted 31 July, 2024;
originally announced August 2024.
-
SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling
Authors:
Hiroshi Sato,
Takafumi Moriya,
Masato Mimura,
Shota Horiguchi,
Tsubasa Ochiai,
Takanori Ashihara,
Atsushi Ando,
Kentaro Shinayama,
Marc Delcroix
Abstract:
Real-time target speaker extraction (TSE) is intended to extract the desired speaker's voice from the observed mixture of multiple speakers in a streaming manner. Implementing real-time TSE is challenging as the computational complexity must be reduced to provide real-time operation. This work introduces to Conv-TasNet-based TSE a new architecture based on state space modeling (SSM) that has been…
▽ More
Real-time target speaker extraction (TSE) is intended to extract the desired speaker's voice from the observed mixture of multiple speakers in a streaming manner. Implementing real-time TSE is challenging as the computational complexity must be reduced to provide real-time operation. This work introduces to Conv-TasNet-based TSE a new architecture based on state space modeling (SSM) that has been shown to model long-term dependency effectively. Owing to SSM, fewer dilated convolutional layers are required to capture temporal dependency in Conv-TasNet, resulting in the reduction of model complexity. We also enlarge the window length and shift of the convolutional (TasNet) frontend encoder to reduce the computational cost further; the performance decline is compensated by over-parameterization of the frontend encoder. The proposed method reduces the real-time factor by 78% from the conventional causal Conv-TasNet-based TSE while matching its performance.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Invariant quasimorphisms and generalized mixed Bavard duality
Authors:
Morimichi Kawasaki,
Mitsuaki Kimura,
Shuhei Maruyama,
Takahiro Matsushita,
Masato Mimura
Abstract:
This article provides an expository account of the celebrated duality theorem of Bavard and three its strengthenings. The Bavard duality theorem connects scl (stable commutator length) and quasimorphisms on a group. Calegari extended the framework from a group element to a chain on the group, and established the generalized Bavard duality. Kawasaki, Kimura, Matsushita and Mimura studied the settin…
▽ More
This article provides an expository account of the celebrated duality theorem of Bavard and three its strengthenings. The Bavard duality theorem connects scl (stable commutator length) and quasimorphisms on a group. Calegari extended the framework from a group element to a chain on the group, and established the generalized Bavard duality. Kawasaki, Kimura, Matsushita and Mimura studied the setting of a pair of a group and its normal subgroup, and obtained the mixed Bavard duality. The first half of the present article is devoted to an introduction to these three Bavard dualities. In the latter half, we present a new strengthening, the generalized mixed Bavard duality, and provide a self-contained proof of it. This third strengthening recovers all of the Bavard dualities treated in the first half; thus, we supply complete proofs of these four Bavard dualities in a unified manner. In addition, we state several results on the space $\mathrm{W}(G,N)$ of non-extendable quasimorphisms, which is related to the comparison problem between scl and mixed scl via the mixed Bavard duality.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Coarse group theoretic study on stable mixed commutator length
Authors:
Morimichi Kawasaki,
Mitsuaki Kimura,
Shuhei Maruyama,
Takahiro Matsushita,
Masato Mimura
Abstract:
Let $G$ be a group and $N$ a normal subgroup of $G$. We study the large scale behavior, not the exact values themselves, of the stable mixed commutator length $scl_{G,N}$ on the mixed commutator subgroup $[G,N]$; when $N=G$, $scl_{G,N}$ equals the stable commutator length $scl_G$ on the commutator subgroup $[G,G]$. For this purpose, we regard $scl_{G,N}$ not only as a function from $[G,N]$ to…
▽ More
Let $G$ be a group and $N$ a normal subgroup of $G$. We study the large scale behavior, not the exact values themselves, of the stable mixed commutator length $scl_{G,N}$ on the mixed commutator subgroup $[G,N]$; when $N=G$, $scl_{G,N}$ equals the stable commutator length $scl_G$ on the commutator subgroup $[G,G]$. For this purpose, we regard $scl_{G,N}$ not only as a function from $[G,N]$ to $\mathbb{R}_{\geq 0}$, but as a bi-invariant metric function $d^+_{scl_{G,N}}$ from $[G,N]\times [G,N]$ to $\mathbb{R}_{\geq 0}$. Our main focus is coarse group theoretic structures of $([G,N],d^+_{scl_{G,N}})$. Our preliminary result (the absolute version) connects, via the Bavard duality, $([G,N],d^+_{scl_{G,N}})$ and the quotient vector space of the space of $G$-invariant quasimorphisms on $N$ over one of such homomorphisms. In particular, we prove that the dimension of this vector space equals the asymptotic dimension of $([G,N],d^+_{scl_{G,N}})$.
Our main result is the comparative version: we connect the coarse kernel, formulated by Leitner and Vigolo, of the coarse homomorphism $ι_{G,N}\colon ([G,N],d^+_{scl_{G,N}})\to ([G,N],d^+_{scl_{G}})$; $y\mapsto y$, and a certain quotient vector space $W(G,N)$ of the space of invariant quasimorphisms. Assume that $N=[G,G]$ and that $W(G,N)$ is finite dimensional with dimension $\ell$. Then we prove that the coarse kernel of $ι_{G,N}$ is isomorphic to $\mathbb{Z}^{\ell}$ as a coarse group. In contrast to the absolute version, the space $W(G,N)$ is finite dimensional in many cases, including all $(G,N)$ with finitely generated $G$ and nilpotent $G/N$. As an application of our result, given a group homomorphism $\varphi\colon G\to H$ between finitely generated groups, we define an $\mathbb{R}$-linear map `inside' the groups, which is dual to the naturally defined $\mathbb{R}$-linear map from $W(H,[H,H])$ to $W(G,[G,G])$ induced by $\varphi$.
△ Less
Submitted 16 August, 2023; v1 submitted 14 June, 2023;
originally announced June 2023.
-
Time-domain Speech Enhancement Assisted by Multi-resolution Frequency Encoder and Decoder
Authors:
Hao Shi,
Masato Mimura,
Longbiao Wang,
Jianwu Dang,
Tatsuya Kawahara
Abstract:
Time-domain speech enhancement (SE) has recently been intensively investigated. Among recent works, DEMUCS introduces multi-resolution STFT loss to enhance performance. However, some resolutions used for STFT contain non-stationary signals, and it is challenging to learn multi-resolution frequency losses simultaneously with only one output. For better use of multi-resolution frequency information,…
▽ More
Time-domain speech enhancement (SE) has recently been intensively investigated. Among recent works, DEMUCS introduces multi-resolution STFT loss to enhance performance. However, some resolutions used for STFT contain non-stationary signals, and it is challenging to learn multi-resolution frequency losses simultaneously with only one output. For better use of multi-resolution frequency information, we supplement multiple spectrograms in different frame lengths into the time-domain encoders. They extract stationary frequency information in both narrowband and wideband. We also adopt multiple decoder outputs, each of which computes its corresponding resolution frequency loss. Experimental results show that (1) it is more effective to fuse stationary frequency features than non-stationary features in the encoder, and (2) the multiple outputs consistent with the frequency loss improve performance. Experiments on the Voice-Bank dataset show that the proposed method obtained a 0.14 PESQ improvement.
△ Less
Submitted 25 March, 2023;
originally announced March 2023.
-
Survey on invariant quasimorphisms and stable mixed commutator length
Authors:
Morimichi Kawasaki,
Mitsuaki Kimura,
Shuhei Maruyama,
Takahiro Matsushita,
Masato Mimura
Abstract:
A homogeneous quasimorphism $φ$ on a normal subgroup $N$ of $G$ is said to be $G$-invariant if $φ(gxg^{-1}) = φ(x)$ for every $g \in G$ and for every $x \in N$. Invariant quasimorphisms have naturally appeared in symplectic geometry and the extension problem of quasimorphisms. Moreover, it is known that the existence of non-extendable invariant quasimorphisms is closely related to the behavior of…
▽ More
A homogeneous quasimorphism $φ$ on a normal subgroup $N$ of $G$ is said to be $G$-invariant if $φ(gxg^{-1}) = φ(x)$ for every $g \in G$ and for every $x \in N$. Invariant quasimorphisms have naturally appeared in symplectic geometry and the extension problem of quasimorphisms. Moreover, it is known that the existence of non-extendable invariant quasimorphisms is closely related to the behavior of the stable mixed commutator length $\mathrm{scl}_{G,N}$, which is a certain generalization of the stable commutator length $\mathrm{scl}_G$.
In this survey, we review the history and recent developments of invariant quasimorphisms and stable mixed commutator length. The topics we treat include several examples of invariant quasimorphisms, Bavard's duality theorem for invariant quasimorphisms, Aut-invariant quasimorphisms, and the estimation of the dimension of spaces of non-extendable quasimorphisms. We also mention the extension problem of partial quasimorphisms.
△ Less
Submitted 28 January, 2024; v1 submitted 21 December, 2022;
originally announced December 2022.
-
Compiler Provenance Recovery for Multi-CPU Architectures Using a Centrifuge Mechanism
Authors:
Yuhei Otsubo,
Akira Otsuka,
Mamoru Mimura
Abstract:
Bit-stream recognition (BSR) has many applications, such as forensic investigations, detection of copyright infringement, and malware analysis. We propose the first BSR that takes a bare input bit-stream and outputs a class label without any preprocessing. To achieve our goal, we propose a centrifuge mechanism, where the upstream layers (sub-net) capture global features and tell the downstream lay…
▽ More
Bit-stream recognition (BSR) has many applications, such as forensic investigations, detection of copyright infringement, and malware analysis. We propose the first BSR that takes a bare input bit-stream and outputs a class label without any preprocessing. To achieve our goal, we propose a centrifuge mechanism, where the upstream layers (sub-net) capture global features and tell the downstream layers (main-net) to switch the focus, even if a part of the input bit-stream has the same value. We applied the centrifuge mechanism to compiler provenance recovery, a type of BSR, and achieved excellent classification. Additionally, downstream transfer learning (DTL), one of the learning methods we propose for the centrifuge mechanism, pre-trains the main-net using the sub-net's ground truth instead of the sub-net's output. We found that sub-predictions made by DTL tend to be highly accurate when the sub-label classification contributes to the essence of the main prediction.
△ Less
Submitted 23 November, 2022; v1 submitted 21 November, 2022;
originally announced November 2022.
-
Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM
Authors:
Hayato Futami,
Hirofumi Inaguma,
Sei Ueno,
Masato Mimura,
Shinsuke Sakai,
Tatsuya Kawahara
Abstract:
Connectionist temporal classification (CTC) -based models are attractive in automatic speech recognition (ASR) because of their non-autoregressive nature. To take advantage of text-only data, language model (LM) integration approaches such as rescoring and shallow fusion have been widely used for CTC. However, they lose CTC's non-autoregressive nature because of the need for beam search, which slo…
▽ More
Connectionist temporal classification (CTC) -based models are attractive in automatic speech recognition (ASR) because of their non-autoregressive nature. To take advantage of text-only data, language model (LM) integration approaches such as rescoring and shallow fusion have been widely used for CTC. However, they lose CTC's non-autoregressive nature because of the need for beam search, which slows down the inference speed. In this study, we propose an error correction method with phone-conditioned masked LM (PC-MLM). In the proposed method, less confident word tokens in a greedy decoded output from CTC are masked. PC-MLM then predicts these masked word tokens given unmasked words and phones supplementally predicted from CTC. We further extend it to Deletable PC-MLM in order to address insertion errors. Since both CTC and PC-MLM are non-autoregressive models, the method enables fast LM integration. Experimental evaluations on the Corpus of Spontaneous Japanese (CSJ) and TED-LIUM2 in domain adaptation setting shows that our proposed method outperformed rescoring and shallow fusion in terms of inference speed, and also in terms of recognition accuracy on CSJ.
△ Less
Submitted 8 September, 2022;
originally announced September 2022.
-
Distilling the Knowledge of BERT for CTC-based ASR
Authors:
Hayato Futami,
Hirofumi Inaguma,
Masato Mimura,
Shinsuke Sakai,
Tatsuya Kawahara
Abstract:
Connectionist temporal classification (CTC) -based models are attractive because of their fast inference in automatic speech recognition (ASR). Language model (LM) integration approaches such as shallow fusion and rescoring can improve the recognition accuracy of CTC-based ASR by taking advantage of the knowledge in text corpora. However, they significantly slow down the inference of CTC. In this…
▽ More
Connectionist temporal classification (CTC) -based models are attractive because of their fast inference in automatic speech recognition (ASR). Language model (LM) integration approaches such as shallow fusion and rescoring can improve the recognition accuracy of CTC-based ASR by taking advantage of the knowledge in text corpora. However, they significantly slow down the inference of CTC. In this study, we propose to distill the knowledge of BERT for CTC-based ASR, extending our previous study for attention-based ASR. CTC-based ASR learns the knowledge of BERT during training and does not use BERT during testing, which maintains the fast inference of CTC. Different from attention-based models, CTC-based models make frame-level predictions, so they need to be aligned with token-level predictions of BERT for distillation. We propose to obtain alignments by calculating the most plausible CTC paths. Experimental evaluations on the Corpus of Spontaneous Japanese (CSJ) and TED-LIUM2 show that our method improves the performance of CTC-based ASR without the cost of inference speed.
△ Less
Submitted 5 September, 2022;
originally announced September 2022.
-
Invariant quasimorphisms for groups acting on the circle and non-equivalence of SCL
Authors:
Shuhei Maruyama,
Takahiro Matsushita,
Masato Mimura
Abstract:
We construct invariant quasimorphisms for groups acting on the circle. Furthermore, we provide a criterion for the non-extendablity of the resulting quasimorphisms and an explicit formula which relates the values of our quasimorphisms to those of the Poincaré translation number. By using them, we show that the stable commutator length $\mathrm{scl}_G$ and the stable mixed commutator length…
▽ More
We construct invariant quasimorphisms for groups acting on the circle. Furthermore, we provide a criterion for the non-extendablity of the resulting quasimorphisms and an explicit formula which relates the values of our quasimorphisms to those of the Poincaré translation number. By using them, we show that the stable commutator length $\mathrm{scl}_G$ and the stable mixed commutator length $\mathrm{scl}_{G,N}$ are not bi-Lipschitzly equivalent for the surface group $G=π_1(Σ_{\ell})$ of genus at least $2$ and its commutator subgroup $N = [π_1(Σ_{\ell}), π_1(Σ_{\ell})]$. We also show the non-equivalence for a pair $(G,N)$ such that $G$ is the fundamental group of a $3$-dimensional closed hyperbolic mapping torus. These pairs serve as the first family of examples of such $(G,N)$ in which $G$ is finitely generated.
△ Less
Submitted 7 February, 2023; v1 submitted 17 March, 2022;
originally announced March 2022.
-
Mixed commutator lengths, wreath products and general ranks
Authors:
Morimichi Kawasaki,
Mitsuaki Kimura,
Shuhei Maruyama,
Takahiro Matsushita,
Masato Mimura
Abstract:
In the present paper, for a pair $(G,N)$ of a group $G$ and its normal subgroup $N$, we consider the mixed commutator length $\mathrm{cl}_{G,N}$ on the mixed commutator subgroup $[G,N]$. We focus on the setting of wreath products: $ (G,N)=(\mathbb{Z}\wr Γ, \bigoplus_Γ\mathbb{Z})$. Then we determine mixed commutator lengths in terms of the general rank in the sense of Malcev. As a byproduct, when a…
▽ More
In the present paper, for a pair $(G,N)$ of a group $G$ and its normal subgroup $N$, we consider the mixed commutator length $\mathrm{cl}_{G,N}$ on the mixed commutator subgroup $[G,N]$. We focus on the setting of wreath products: $ (G,N)=(\mathbb{Z}\wr Γ, \bigoplus_Γ\mathbb{Z})$. Then we determine mixed commutator lengths in terms of the general rank in the sense of Malcev. As a byproduct, when an abelian group $Γ$ is not locally cyclic, the ordinary commutator length $\mathrm{cl}_G$ does not coincide with $\mathrm{cl}_{G,N}$ on $[G,N]$ for the above pair. On the other hand, we prove that if $Γ$ is locally cyclic, then for every pair $(G,N)$ such that $1\to N\to G\to Γ\to 1$ is exact, $\mathrm{cl}_{G}$ and $\mathrm{cl}_{G,N}$ coincide on $[G,N]$. We also study the case of permutational wreath products when the group $Γ$ belongs to a certain class related to surface groups.
△ Less
Submitted 11 January, 2023; v1 submitted 8 March, 2022;
originally announced March 2022.
-
An aggregation model of cockroaches with fast-or-slow motion dichotomy
Authors:
Jan Elias,
Hirofumi Izuhara,
Masayasu Mimura,
Bao Quoc Tang
Abstract:
We propose a mathematical model, namely a reaction-diffusion system, to describe social behaviour of cockroaches. An essential new aspect in our model is that the dispersion behaviour due to overcrowding effect is taken into account {as a counterpart to commonly studied aggregation}. This consideration leads to an intriguing new phenomenon which has not been observed in the literature. Namely, due…
▽ More
We propose a mathematical model, namely a reaction-diffusion system, to describe social behaviour of cockroaches. An essential new aspect in our model is that the dispersion behaviour due to overcrowding effect is taken into account {as a counterpart to commonly studied aggregation}. This consideration leads to an intriguing new phenomenon which has not been observed in the literature. Namely, due to the competition between aggregation towards areas of higher concentration of pheromone and dispersion avoiding overcrowded areas, the cockroaches aggregate more at the transition area of pheromone. Moreover, we also consider the fast reaction limit where the switching rate between active and inactive subpopulations tends to infinity. By utilising improved duality and energy methods, together with the regularisation of heat operator, we prove that the weak solution of the reaction-diffusion system converges to that of a reaction-cross-diffusion system.
△ Less
Submitted 15 November, 2021;
originally announced November 2021.
-
ASR Rescoring and Confidence Estimation with ELECTRA
Authors:
Hayato Futami,
Hirofumi Inaguma,
Masato Mimura,
Shinsuke Sakai,
Tatsuya Kawahara
Abstract:
In automatic speech recognition (ASR) rescoring, the hypothesis with the fewest errors should be selected from the n-best list using a language model (LM). However, LMs are usually trained to maximize the likelihood of correct word sequences, not to detect ASR errors. We propose an ASR rescoring method for directly detecting errors with ELECTRA, which is originally a pre-training method for NLP ta…
▽ More
In automatic speech recognition (ASR) rescoring, the hypothesis with the fewest errors should be selected from the n-best list using a language model (LM). However, LMs are usually trained to maximize the likelihood of correct word sequences, not to detect ASR errors. We propose an ASR rescoring method for directly detecting errors with ELECTRA, which is originally a pre-training method for NLP tasks. ELECTRA is pre-trained to predict whether each word is replaced by BERT or not, which can simulate ASR error detection on large text corpora. To make this pre-training closer to ASR error detection, we further propose an extended version of ELECTRA called phone-attentive ELECTRA (P-ELECTRA). In the pre-training of P-ELECTRA, each word is replaced by a phone-to-word conversion model, which leverages phone information to generate acoustically similar words. Since our rescoring method is optimized for detecting errors, it can also be used for word-level confidence estimation. Experimental evaluations on the Librispeech and TED-LIUM2 corpora show that our rescoring method with ELECTRA is competitive with conventional rescoring methods with faster inference. ELECTRA also performs better in confidence estimation than BERT because it can learn to detect inappropriate words not only in fine-tuning but also in pre-training.
△ Less
Submitted 5 October, 2021;
originally announced October 2021.
-
The space of non-extendable quasimorphisms
Authors:
Morimichi Kawasaki,
Mitsuaki Kimura,
Shuhei Maruyama,
Takahiro Matsushita,
Masato Mimura
Abstract:
For a pair $(G,N)$ of a group $G$ and its normal subgroup $N$, we consider the space of quasimorphisms and quasi-cocycles on $N$ non-extendable to $G$. To treat this space, we establish the five-term exact sequence of cohomology relative to the bounded subcomplex. As its application, we study the spaces associated with the kernel of the (volume) flux homomorphism, the IA-automorphism group of a fr…
▽ More
For a pair $(G,N)$ of a group $G$ and its normal subgroup $N$, we consider the space of quasimorphisms and quasi-cocycles on $N$ non-extendable to $G$. To treat this space, we establish the five-term exact sequence of cohomology relative to the bounded subcomplex. As its application, we study the spaces associated with the kernel of the (volume) flux homomorphism, the IA-automorphism group of a free group, and certain normal subgroups of Gromov-hyperbolic groups.
Furthermore, we employ this space to prove that the stable commutator length is equivalent to the stable mixed commutator length for certain pairs of a group and its normal subgroup.
△ Less
Submitted 16 August, 2023; v1 submitted 18 July, 2021;
originally announced July 2021.
-
Commuting symplectomorphisms on a surface and the flux homomorphism
Authors:
Morimichi Kawasaki,
Mitsuaki Kimura,
Takahiro Matsushita,
Masato Mimura
Abstract:
Let $(S,ω)$ be a closed connected oriented surface whose genus $l$ is at least two equipped with a symplectic form. Then we show the vanishing of the cup product of the fluxes of commuting symplectomorphisms. This result may be regarded as an obstruction for commuting symplectomorphisms. In particular, the image of an abelian subgroup of $\mathrm{Symp}_0^c(S, ω)$ under the flux homomorphism is iso…
▽ More
Let $(S,ω)$ be a closed connected oriented surface whose genus $l$ is at least two equipped with a symplectic form. Then we show the vanishing of the cup product of the fluxes of commuting symplectomorphisms. This result may be regarded as an obstruction for commuting symplectomorphisms. In particular, the image of an abelian subgroup of $\mathrm{Symp}_0^c(S, ω)$ under the flux homomorphism is isotropic with respect to the natural intersection form on $H^1(S;\mathbb{R})$. The key to the proof is a refinement of the non-extendability result, previously given by the first-named and second-named authors, for Py's Calabi quasimorphism $μ_P$ on $\mathrm{Ham}(S, ω)$.
△ Less
Submitted 18 June, 2023; v1 submitted 24 February, 2021;
originally announced February 2021.
-
Constellations in prime elements of number fields
Authors:
Wataru Kai,
Masato Mimura,
Akihiro Munemasa,
Shin-ichiro Seki,
Kiyoto Yoshino
Abstract:
Given any number field, we prove that there exist arbitrarily shaped constellations consisting of pairwise non-associate prime elements of the ring of integers. This result extends the celebrated Green-Tao theorem on arithmetic progressions of rational primes and Tao's theorem on constellations of Gaussian primes. Furthermore, we prove a constellation theorem on prime representations of binary qua…
▽ More
Given any number field, we prove that there exist arbitrarily shaped constellations consisting of pairwise non-associate prime elements of the ring of integers. This result extends the celebrated Green-Tao theorem on arithmetic progressions of rational primes and Tao's theorem on constellations of Gaussian primes. Furthermore, we prove a constellation theorem on prime representations of binary quadratic forms with integer coefficients. More precisely, for a non-degenerate primitive binary quadratic form $F$ which is not negative definite, there exist arbitrarily shaped constellations consisting of pairs of integers $(x,y)$ for which $F(x,y)$ is a rational prime. The latter theorem is obtained by extending the framework from the ring of integers to the pair of an order and its invertible fractional ideal.
△ Less
Submitted 4 April, 2022; v1 submitted 31 December, 2020;
originally announced December 2020.
-
On the spectrum and linear programming bound for hypergraphs
Authors:
Sebastian M. Cioabă,
Jack H. Koolen,
Masato Mimura,
Hiroshi Nozaki,
Takayuki Okuda
Abstract:
The spectrum of a graph is closely related to many graph parameters. In particular, the spectral gap of a regular graph which is the difference between its valency and second eigenvalue, is widely seen an algebraic measure of connectivity and plays a key role in the theory of expander graphs. In this paper, we extend previous work done for graphs and bipartite graphs and present a linear programmi…
▽ More
The spectrum of a graph is closely related to many graph parameters. In particular, the spectral gap of a regular graph which is the difference between its valency and second eigenvalue, is widely seen an algebraic measure of connectivity and plays a key role in the theory of expander graphs. In this paper, we extend previous work done for graphs and bipartite graphs and present a linear programming method for obtaining an upper bound on the order of a regular uniform hypergraph with prescribed distinct eigenvalues. Furthermore, we obtain a general upper bound on the order of a regular uniform hypergraph whose second eigenvalue is bounded by a given value. Our results improve and extend previous work done by Feng-Li (1996) on Alon-Boppana theorems for regular hypergraphs and by Dinitz-Schapira-Shahaf (2020) on the Moore or degree-diameter problem. We also determine the largest order of an $r$-regular $u$-uniform hypergraph with second eigenvalue at most $θ$ for several parameters $(r,u,θ)$. In particular, orthogonal arrays give the structure of the largest hypergraphs with second eigenvalue at most $1$ for every sufficiently large $r$. Moreover, we show that a generalized Moore geometry has the largest spectral gap among all hypergraphs of that order and degree.
△ Less
Submitted 5 April, 2022; v1 submitted 7 September, 2020;
originally announced September 2020.
-
End-to-end Music-mixed Speech Recognition
Authors:
Jeongwoo Woo,
Masato Mimura,
Kazuyoshi Yoshii,
Tatsuya Kawahara
Abstract:
Automatic speech recognition (ASR) in multimedia content is one of the promising applications, but speech data in this kind of content are frequently mixed with background music, which is harmful for the performance of ASR. In this study, we propose a method for improving ASR with background music based on time-domain source separation. We utilize Conv-TasNet as a separation network, which has ach…
▽ More
Automatic speech recognition (ASR) in multimedia content is one of the promising applications, but speech data in this kind of content are frequently mixed with background music, which is harmful for the performance of ASR. In this study, we propose a method for improving ASR with background music based on time-domain source separation. We utilize Conv-TasNet as a separation network, which has achieved state-of-the-art performance for multi-speaker source separation, to extract the speech signal from a speech-music mixture in the waveform domain. We also propose joint fine-tuning of a pre-trained Conv-TasNet front-end with an attention-based ASR back-end using both separation and ASR objectives. We evaluated our method through ASR experiments using speech data mixed with background music from a wide variety of Japanese animations. We show that time-domain speech-music separation drastically improves ASR performance of the back-end model trained with mixture data, and the joint optimization yielded a further significant WER reduction. The time-domain separation method outperformed a frequency-domain separation method, which reuses the phase information of the input mixture signal, both in simple cascading and joint training settings. We also demonstrate that our method works robustly for music interference from classical, jazz and popular genres.
△ Less
Submitted 27 August, 2020;
originally announced August 2020.
-
Distilling the Knowledge of BERT for Sequence-to-Sequence ASR
Authors:
Hayato Futami,
Hirofumi Inaguma,
Sei Ueno,
Masato Mimura,
Shinsuke Sakai,
Tatsuya Kawahara
Abstract:
Attention-based sequence-to-sequence (seq2seq) models have achieved promising results in automatic speech recognition (ASR). However, as these models decode in a left-to-right way, they do not have access to context on the right. We leverage both left and right context by applying BERT as an external language model to seq2seq ASR through knowledge distillation. In our proposed method, BERT generat…
▽ More
Attention-based sequence-to-sequence (seq2seq) models have achieved promising results in automatic speech recognition (ASR). However, as these models decode in a left-to-right way, they do not have access to context on the right. We leverage both left and right context by applying BERT as an external language model to seq2seq ASR through knowledge distillation. In our proposed method, BERT generates soft labels to guide the training of seq2seq ASR. Furthermore, we leverage context beyond the current utterance as input to BERT. Experimental evaluations show that our method significantly improves the ASR performance from the seq2seq baseline on the Corpus of Spontaneous Japanese (CSJ). Knowledge distillation from BERT outperforms that from a transformer LM that only looks at left context. We also show the effectiveness of leveraging context beyond the current utterance. Our method outperforms other LM application approaches such as n-best rescoring and shallow fusion, while it does not require extra inference cost.
△ Less
Submitted 9 August, 2020;
originally announced August 2020.
-
Bavard's duality theorem for mixed commutator length
Authors:
Morimichi Kawasaki,
Mitsuaki Kimura,
Takahiro Matsushita,
Masato Mimura
Abstract:
Let $N$ be a normal subgroup of a group $G$. A quasimorphism $f$ on $N$ is $G$-invariant if $f(gxg^{-1}) = f(x)$ for every $g \in G$ and every $x \in N$. The goal in this paper is to establish Bavard's duality theorem of $G$-invariant quasimorphisms, which was previously proved by Kawasaki and Kimura in the case $N = [G,N]$.
Our duality theorem provides a connection between $G$-invariant quasimo…
▽ More
Let $N$ be a normal subgroup of a group $G$. A quasimorphism $f$ on $N$ is $G$-invariant if $f(gxg^{-1}) = f(x)$ for every $g \in G$ and every $x \in N$. The goal in this paper is to establish Bavard's duality theorem of $G$-invariant quasimorphisms, which was previously proved by Kawasaki and Kimura in the case $N = [G,N]$.
Our duality theorem provides a connection between $G$-invariant quasimorphisms and $(G,N)$-commutator lengths. Here for $x \in [G,N]$, the $(G,N)$-commutator length $\mathrm{cl}_{G,N}(x)$ of $x$ is the minimum number $n$ such that $x$ is a product of $n$ commutators which are written as $[g,x]$ with $g \in G$ and $h \in N$. In the proof, we give a geometric interpretation of $(G,N)$-commutator lengths.
As an application of our Bavard duality, we obtain a sufficient condition on a pair $(G,N)$ under which $\mathrm{scl}_G$ and $\mathrm{scl}_{G,N}$ are bi-Lipschitzly equivalent on $[G,N]$.
△ Less
Submitted 22 March, 2022; v1 submitted 5 July, 2020;
originally announced July 2020.
-
Enhancing Monotonic Multihead Attention for Streaming ASR
Authors:
Hirofumi Inaguma,
Masato Mimura,
Tatsuya Kawahara
Abstract:
We investigate a monotonic multihead attention (MMA) by extending hard monotonic attention to Transformer-based automatic speech recognition (ASR) for online streaming applications. For streaming inference, all monotonic attention (MA) heads should learn proper alignments because the next token is not generated until all heads detect the corresponding token boundaries. However, we found not all MA…
▽ More
We investigate a monotonic multihead attention (MMA) by extending hard monotonic attention to Transformer-based automatic speech recognition (ASR) for online streaming applications. For streaming inference, all monotonic attention (MA) heads should learn proper alignments because the next token is not generated until all heads detect the corresponding token boundaries. However, we found not all MA heads learn alignments with a naïve implementation. To encourage every head to learn alignments properly, we propose HeadDrop regularization by masking out a part of heads stochastically during training. Furthermore, we propose to prune redundant heads to improve consensus among heads for boundary detection and prevent delayed token generation caused by such heads. Chunkwise attention on each MA head is extended to the multihead counterpart. Finally, we propose head-synchronous beam search decoding to guarantee stable streaming inference.
△ Less
Submitted 30 September, 2020; v1 submitted 19 May, 2020;
originally announced May 2020.
-
Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition
Authors:
Kohei Matsuura,
Masato Mimura,
Shinsuke Sakai,
Tatsuya Kawahara
Abstract:
It is important to transcribe and archive speech data of endangered languages for preserving heritages of verbal culture and automatic speech recognition (ASR) is a powerful tool to facilitate this process. However, since endangered languages do not generally have large corpora with many speakers, the performance of ASR models trained on them are considerably poor in general. Nevertheless, we are…
▽ More
It is important to transcribe and archive speech data of endangered languages for preserving heritages of verbal culture and automatic speech recognition (ASR) is a powerful tool to facilitate this process. However, since endangered languages do not generally have large corpora with many speakers, the performance of ASR models trained on them are considerably poor in general. Nevertheless, we are often left with a lot of recordings of spontaneous speech data that have to be transcribed. In this work, for mitigating this speaker sparsity problem, we propose to convert the whole training speech data and make it sound like the test speaker in order to develop a highly accurate ASR system for this speaker. For this purpose, we utilize a CycleGAN-based non-parallel voice conversion technology to forge a labeled training data that is close to the test speaker's speech. We evaluated this speaker adaptation approach on two low-resource corpora, namely, Ainu and Mboshi. We obtained 35-60% relative improvement in phone error rate on the Ainu corpus, and 40% relative improvement was attained on the Mboshi corpus. This approach outperformed two conventional methods namely unsupervised adaptation and multilingual training with these two corpora.
△ Less
Submitted 31 July, 2020; v1 submitted 19 May, 2020;
originally announced May 2020.
-
CTC-synchronous Training for Monotonic Attention Model
Authors:
Hirofumi Inaguma,
Masato Mimura,
Tatsuya Kawahara
Abstract:
Monotonic chunkwise attention (MoChA) has been studied for the online streaming automatic speech recognition (ASR) based on a sequence-to-sequence framework. In contrast to connectionist temporal classification (CTC), backward probabilities cannot be leveraged in the alignment marginalization process during training due to left-to-right dependency in the decoder. This results in the error propagat…
▽ More
Monotonic chunkwise attention (MoChA) has been studied for the online streaming automatic speech recognition (ASR) based on a sequence-to-sequence framework. In contrast to connectionist temporal classification (CTC), backward probabilities cannot be leveraged in the alignment marginalization process during training due to left-to-right dependency in the decoder. This results in the error propagation of alignments to subsequent token generation. To address this problem, we propose CTC-synchronous training (CTC-ST), in which MoChA uses CTC alignments to learn optimal monotonic alignments. Reference CTC alignments are extracted from a CTC branch sharing the same encoder with the decoder. The entire model is jointly optimized so that the expected boundaries from MoChA are synchronized with the alignments. Experimental evaluations of the TEDLIUM release-2 and Librispeech corpora show that the proposed method significantly improves recognition, especially for long utterances. We also show that CTC-ST can bring out the full potential of SpecAugment for MoChA.
△ Less
Submitted 6 August, 2020; v1 submitted 10 May, 2020;
originally announced May 2020.
-
Mathematical treatment of PDE model describing chemotactic E. coli colonies
Authors:
Rafał Celiński,
Danielle Hilhorst,
Grzegorz Karch,
Masayasu Mimura,
Pierre Roux
Abstract:
We consider an initial-boundary value problem describing the formation of colony patterns of bacteria Escherichia coli. This model consists of reaction-diffusion equations coupled with the Keller-Segel system from the chemotaxis theory in a bounded domain, supplemented with zero-flux boundary conditions and with non-negative initial data. We answer questions on the global in time existence of solu…
▽ More
We consider an initial-boundary value problem describing the formation of colony patterns of bacteria Escherichia coli. This model consists of reaction-diffusion equations coupled with the Keller-Segel system from the chemotaxis theory in a bounded domain, supplemented with zero-flux boundary conditions and with non-negative initial data. We answer questions on the global in time existence of solutions as well as on their large time behaviour. Moreover, we show that solutions of a related model may blow up in a finite time.
△ Less
Submitted 6 January, 2021; v1 submitted 12 March, 2020;
originally announced March 2020.
-
Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language
Authors:
Kohei Matsuura,
Sei Ueno,
Masato Mimura,
Shinsuke Sakai,
Tatsuya Kawahara
Abstract:
Ainu is an unwritten language that has been spoken by Ainu people who are one of the ethnic groups in Japan. It is recognized as critically endangered by UNESCO and archiving and documentation of its language heritage is of paramount importance. Although a considerable amount of voice recordings of Ainu folklore has been produced and accumulated to save their culture, only a quite limited parts of…
▽ More
Ainu is an unwritten language that has been spoken by Ainu people who are one of the ethnic groups in Japan. It is recognized as critically endangered by UNESCO and archiving and documentation of its language heritage is of paramount importance. Although a considerable amount of voice recordings of Ainu folklore has been produced and accumulated to save their culture, only a quite limited parts of them are transcribed so far. Thus, we started a project of automatic speech recognition (ASR) for the Ainu language in order to contribute to the development of annotated language archives. In this paper, we report speech corpus development and the structure and performance of end-to-end ASR for Ainu. We investigated four modeling units (phone, syllable, word piece, and word) and found that the syllable-based model performed best in terms of both word and phone recognition accuracy, which were about 60% and over 85% respectively in speaker-open condition. Furthermore, word and phone accuracy of 80% and 90% has been achieved in a speaker-closed setting. We also found out that a multilingual ASR training with additional speech corpora of English and Japanese further improves the speaker-open test accuracy.
△ Less
Submitted 16 May, 2020; v1 submitted 16 February, 2020;
originally announced February 2020.
-
Avoiding a shape, and the slice rank method for a system of equations
Authors:
Masato Mimura,
Norihide Tokushige
Abstract:
Fix a vector space over a finite field and a system of linear equations. We provide estimates, in terms of the dimension of the vector space, of the maximum of the sizes of subsets of the space that do not admit solutions of the system consisting of more than one point. That from above is derived by slice rank method of Tao; to obtain one from below, we define the notion of 'dominant reductions' o…
▽ More
Fix a vector space over a finite field and a system of linear equations. We provide estimates, in terms of the dimension of the vector space, of the maximum of the sizes of subsets of the space that do not admit solutions of the system consisting of more than one point. That from above is derived by slice rank method of Tao; to obtain one from below, we define the notion of 'dominant reductions' of the system. Furthermore, by adapting a recent argument of Sauermann, we make an estimation of the maximum of the sizes of subsets that are 'W shape'-free, that means, there exist no five distinct points forming two overlapping parallelograms.
△ Less
Submitted 23 September, 2019;
originally announced September 2019.
-
Avoiding a star of three-term arthmetic progressions
Authors:
Masato Mimura,
Norihide Tokushige
Abstract:
We provide an upper bound of the size of a subset A of F_p^n that does not admit a k-star of 3-APs (three-term arithmetic progressions). Namely, the subset A is assumed to contain no configuration of k 3-APs, sharing the middle term, such that all 2k+1 terms are distinct. In the proof, we adapt a new method in the recent work of Sauermann.
We provide an upper bound of the size of a subset A of F_p^n that does not admit a k-star of 3-APs (three-term arithmetic progressions). Namely, the subset A is assumed to contain no configuration of k 3-APs, sharing the middle term, such that all 2k+1 terms are distinct. In the proof, we adapt a new method in the recent work of Sauermann.
△ Less
Submitted 23 September, 2019;
originally announced September 2019.
-
Improving OOV Detection and Resolution with External Language Models in Acoustic-to-Word ASR
Authors:
Hirofumi Inaguma,
Masato Mimura,
Shinsuke Sakai,
Tatsuya Kawahara
Abstract:
Acoustic-to-word (A2W) end-to-end automatic speech recognition (ASR) systems have attracted attention because of an extremely simplified architecture and fast decoding. To alleviate data sparseness issues due to infrequent words, the combination with an acoustic-to-character (A2C) model is investigated. Moreover, the A2C model can be used to recover out-of-vocabulary (OOV) words that are not cover…
▽ More
Acoustic-to-word (A2W) end-to-end automatic speech recognition (ASR) systems have attracted attention because of an extremely simplified architecture and fast decoding. To alleviate data sparseness issues due to infrequent words, the combination with an acoustic-to-character (A2C) model is investigated. Moreover, the A2C model can be used to recover out-of-vocabulary (OOV) words that are not covered by the A2W model, but this requires accurate detection of OOV words. A2W models learn contexts with both acoustic and transcripts; therefore they tend to falsely recognize OOV words as words in the vocabulary. In this paper, we tackle this problem by using external language models (LM), which are trained only with transcriptions and have better linguistic information to detect OOV words. The A2C model is used to resolve these OOV words. Experimental evaluations show that external LMs have the effects of not only reducing errors but also increasing the number of detected OOV words, and the proposed method significantly improves performances in English conversational and Japanese lecture corpora, especially for out-of-domain scenario. We also investigate the impact of the vocabulary size of A2W models and the data size for training LMs. Moreover, our approach can reduce the vocabulary size several times with marginal performance degradation.
△ Less
Submitted 22 September, 2019;
originally announced September 2019.
-
Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition
Authors:
Kazuki Shimada,
Yoshiaki Bando,
Masato Mimura,
Katsutoshi Itoyama,
Kazuyoshi Yoshii,
Tatsuya Kawahara
Abstract:
This paper describes multichannel speech enhancement for improving automatic speech recognition (ASR) in noisy environments. Recently, the minimum variance distortionless response (MVDR) beamforming has widely been used because it works well if the steering vector of speech and the spatial covariance matrix (SCM) of noise are given. To estimating such spatial information, conventional studies take…
▽ More
This paper describes multichannel speech enhancement for improving automatic speech recognition (ASR) in noisy environments. Recently, the minimum variance distortionless response (MVDR) beamforming has widely been used because it works well if the steering vector of speech and the spatial covariance matrix (SCM) of noise are given. To estimating such spatial information, conventional studies take a supervised approach that classifies each time-frequency (TF) bin into noise or speech by training a deep neural network (DNN). The performance of ASR, however, is degraded in an unknown noisy environment. To solve this problem, we take an unsupervised approach that decomposes each TF bin into the sum of speech and noise by using multichannel nonnegative matrix factorization (MNMF). This enables us to accurately estimate the SCMs of speech and noise not from observed noisy mixtures but from separated speech and noise components. In this paper we propose online MVDR beamforming by effectively initializing and incrementally updating the parameters of MNMF. Another main contribution is to comprehensively investigate the performances of ASR obtained by various types of spatial filters, i.e., time-invariant and variant versions of MVDR beamformers and those of rank-1 and full-rank multichannel Wiener filters, in combination with MNMF. The experimental results showed that the proposed method outperformed the state-of-the-art DNN-based beamforming method in unknown environments that did not match training data.
△ Less
Submitted 31 March, 2019; v1 submitted 21 March, 2019;
originally announced March 2019.
-
An extreme counterexample to the Lubotzky--Weiss conjecture
Authors:
Masato Mimura
Abstract:
In 1993, Lubotzky and Weiss conjectured that if a compact group admits two finitely generated dense subgroups, one of which is amenable and the other has Kazhdan's property (T), then it would be finite. This conjecture was resolved in the negative by Ershov and Jaikin-Zapirain, and by Kassabov around 2010. In the present paper, we provide an extreme counterexample to this conjecture. More precisel…
▽ More
In 1993, Lubotzky and Weiss conjectured that if a compact group admits two finitely generated dense subgroups, one of which is amenable and the other has Kazhdan's property (T), then it would be finite. This conjecture was resolved in the negative by Ershov and Jaikin-Zapirain, and by Kassabov around 2010. In the present paper, we provide an extreme counterexample to this conjecture. More precisely, the latter dense group with property (T) may contain a given countable residually finite group; in particular, it can be non-exact by a result of Osajda. We may construct these counterexamples with a compact group common for all countable residually finite groups.
△ Less
Submitted 25 April, 2019; v1 submitted 24 September, 2018;
originally announced September 2018.
-
o-glasses: Visualizing x86 Code from Binary Using a 1d-CNN
Authors:
Yuhei Otsubo,
Akira Otsuka,
Mamoru Mimura,
Takeshi Sakaki,
Atsuhiro Goto
Abstract:
Malicious document files used in targeted attacks often contain a small program called shellcode. It is often hard to prepare a runnable environment for dynamic analysis of these document files because they exploit specific vulnerabilities. In these cases, it is necessary to identify the position of the shellcode in each document file to analyze it. If the exploit code uses executable scripts such…
▽ More
Malicious document files used in targeted attacks often contain a small program called shellcode. It is often hard to prepare a runnable environment for dynamic analysis of these document files because they exploit specific vulnerabilities. In these cases, it is necessary to identify the position of the shellcode in each document file to analyze it. If the exploit code uses executable scripts such as JavaScript and Flash, it is not so hard to locate the shellcode. On the other hand, it is sometimes almost impossible to locate the shellcode when it does not contain any JavaScript or Flash but consists of native x86 code only.
Binary fragment classification is often applied to visualize the location of regions of interest, and shellcode must contain at least a small fragment of x86 native code even if most of it is obfuscated, such as, a decoder for the obfuscated body of the shellcode. In this paper, we propose a novel method, o-glasses, to visualize the shellcode by recognizing the x86 native code using a specially designed one-dimensional convolutional neural network (1d-CNN). The fragment size needs to be as small as the minimum size of the x86 native code in the whole shellcode. Our results show that a 16-instruction-sequence (approximately 48 bytes on average) is sufficient for the code fragment visualization. Our method, o-glasses (1d-CNN), outperforms other methods in that it recognizes x86 native code with a surprisingly high F-measure rate (about 99.95%).
△ Less
Submitted 13 June, 2018;
originally announced June 2018.
-
Amenability versus non-exactness of dense subgroups of a compact group
Authors:
Masato Mimura
Abstract:
Given a countable residually finite group, we construct a compact group K and two elements w and u of K with the following properties: The group generated by w and the cube of u is amenable, the group generated by w and u contains a copy of the given group, and these two groups are dense in K. By combining it with a construction of non-exact groups that are LEF by Osajda and Arzhantseva--Osajda an…
▽ More
Given a countable residually finite group, we construct a compact group K and two elements w and u of K with the following properties: The group generated by w and the cube of u is amenable, the group generated by w and u contains a copy of the given group, and these two groups are dense in K. By combining it with a construction of non-exact groups that are LEF by Osajda and Arzhantseva--Osajda and formation of diagonal products, we construct an example for which the latter dense group is non-exact. Our proof employs approximations in the space of marked groups of LEF ("Locally Embeddable into Finite groups") groups.
△ Less
Submitted 13 March, 2019; v1 submitted 3 May, 2018;
originally announced May 2018.
-
Group approximation in Cayley topology and coarse geometry, Part II: Fibered coarse embeddings
Authors:
Masato Mimura,
Hiroki Sako
Abstract:
The objective of this series is to study metric geometric properties of disjoint unions of amenable Cayley graphs by group properties of the Cayley accumulation points in the space of marked groups. In this Part II, we prove that a disjoint union admits a fibred coarse embedding into a Hilbert space (in a generalized sense) if and only if the Cayley boundary of the sequence in the space of marked…
▽ More
The objective of this series is to study metric geometric properties of disjoint unions of amenable Cayley graphs by group properties of the Cayley accumulation points in the space of marked groups. In this Part II, we prove that a disjoint union admits a fibred coarse embedding into a Hilbert space (in a generalized sense) if and only if the Cayley boundary of the sequence in the space of marked groups is uniformly a-T-menable. We furthermore extend this result to ones with other target spaces. By combining our main results with constructions of Arzhantseva--Osajda and Osajda, we construct two systems of markings of a certain sequence of finite groups with two opposite extreme behaviors of the resulting two disjoint unions: With respect to one marking, the space has property A. On the other hand, with respect to the other, the space does not admit fibred coarse embeddings into Banach spaces with non-trivial type (for instance, uniformly convex Banach spaces) or Hadamard manifolds; the Cayley limit group is, furthermore, non-exact.
△ Less
Submitted 14 March, 2019; v1 submitted 27 April, 2018;
originally announced April 2018.
-
Complex pattern formation driven by the interaction of stable fronts in a competition-diffusion system
Authors:
Lorenzo Contento,
Masayasu Mimura
Abstract:
The ecological invasion problem in which a weaker exotic species invades an ecosystem inhabited by two strongly competing native species is modelled by a three-species competition-diffusion system. It is known that for a certain range of parameter values competitor-mediated coexistence occurs and complex spatio-temporal patterns are observed in two spatial dimensions. In this paper we uncover the…
▽ More
The ecological invasion problem in which a weaker exotic species invades an ecosystem inhabited by two strongly competing native species is modelled by a three-species competition-diffusion system. It is known that for a certain range of parameter values competitor-mediated coexistence occurs and complex spatio-temporal patterns are observed in two spatial dimensions. In this paper we uncover the mechanism which generates such patterns. Under some assumptions on the parameters the three-species competition-diffusion system admits two planarly stable travelling waves. Their interaction in one spatial dimension may result in either reflection or merging into a single homoclinic wave, depending on the strength of the invading species. This transition can be understood by studying the bifurcation structure of the homoclinic wave. In particular, a time-periodic homoclinic wave (breathing wave) is born from a Hopf bifurcation and its unstable branch acts as a separator between the reflection and merging regimes. The same transition occurs in two spatial dimensions: the stable regular spiral associated to the homoclinic wave destabilizes, giving rise first to an oscillating breathing spiral and then breaking up producing a dynamic pattern characterized by many spiral cores. We find that these complex patterns are generated by the interaction of two planarly stable travelling waves, in contrast with many other well known cases of pattern formation where planar instability plays a central role.
△ Less
Submitted 31 October, 2018; v1 submitted 11 April, 2018;
originally announced April 2018.
-
Ecological invasion in competition-diffusion systems when the exotic species is either very strong or very weak
Authors:
Lorenzo Contento,
Danielle Hilhorst,
Masayasu Mimura
Abstract:
Reaction-diffusion systems with a Lotka-Volterra-type reaction term, also known as competition-diffusion systems, have been used to investigate the dynamics of the competition among $m$ ecological species for a limited resource necessary to their survival and growth. Notwithstanding their rather simple mathematical structure, such systems may display quite interesting behaviours. In particular, wh…
▽ More
Reaction-diffusion systems with a Lotka-Volterra-type reaction term, also known as competition-diffusion systems, have been used to investigate the dynamics of the competition among $m$ ecological species for a limited resource necessary to their survival and growth. Notwithstanding their rather simple mathematical structure, such systems may display quite interesting behaviours. In particular, while for $m=2$ no coexistence of the two species is usually possible, if $m \ge 3$ we may observe coexistence of all or a subset of the species, sensitively depending on the parameter values. Such coexistence can take the form of very complex spatio-temporal patterns and oscillations.
Unfortunately, at the moment there are no known tools for a complete analytical study of such systems for $m \ge 3$. This means that establishing general criteria for the occurrence of coexistence appears to be very hard. In this paper we will instead give some criteria for the non-coexistence of species, motivated by the ecological problem of the invasion of an ecosystem by an exotic species. We will show that when the environment is very favourable to the invading species the invasion will always be successful and the native species will be driven to extinction. On the other hand, if the environment is not favourable enough, the invasion will always fail.
△ Less
Submitted 30 January, 2018;
originally announced January 2018.
-
Statistical Speech Enhancement Based on Probabilistic Integration of Variational Autoencoder and Non-Negative Matrix Factorization
Authors:
Yoshiaki Bando,
Masato Mimura,
Katsutoshi Itoyama,
Kazuyoshi Yoshii,
Tatsuya Kawahara
Abstract:
This paper presents a statistical method of single-channel speech enhancement that uses a variational autoencoder (VAE) as a prior distribution on clean speech. A standard approach to speech enhancement is to train a deep neural network (DNN) to take noisy speech as input and output clean speech. Although this supervised approach requires a very large amount of pair data for training, it is not ro…
▽ More
This paper presents a statistical method of single-channel speech enhancement that uses a variational autoencoder (VAE) as a prior distribution on clean speech. A standard approach to speech enhancement is to train a deep neural network (DNN) to take noisy speech as input and output clean speech. Although this supervised approach requires a very large amount of pair data for training, it is not robust against unknown environments. Another approach is to use non-negative matrix factorization (NMF) based on basis spectra trained on clean speech in advance and those adapted to noise on the fly. This semi-supervised approach, however, causes considerable signal distortion in enhanced speech due to the unrealistic assumption that speech spectrograms are linear combinations of the basis spectra. Replacing the poor linear generative model of clean speech in NMF with a VAE---a powerful nonlinear deep generative model---trained on clean speech, we formulate a unified probabilistic generative model of noisy speech. Given noisy speech as observed data, we can sample clean speech from its posterior distribution. The proposed method outperformed the conventional DNN-based method in unseen noisy environments.
△ Less
Submitted 19 March, 2018; v1 submitted 31 October, 2017;
originally announced October 2017.
-
An alternative proof of Kazhdan property for elementary groups
Authors:
Masato Mimura
Abstract:
In 2010, Invent. Math., Ershov and Jaikin-Zapirain proved Kazhdan's property (T) for elementary groups. This expository article focuses on presenting an alternative simpler proof of that. Unlike the original one, our proof supplies no estimate of Kazhdan constants. It may be regarded as a specific example of the results in the paper "Upgrading fixed points without bounded generation" (arXiv:1505.0…
▽ More
In 2010, Invent. Math., Ershov and Jaikin-Zapirain proved Kazhdan's property (T) for elementary groups. This expository article focuses on presenting an alternative simpler proof of that. Unlike the original one, our proof supplies no estimate of Kazhdan constants. It may be regarded as a specific example of the results in the paper "Upgrading fixed points without bounded generation" (arXiv:1505.06728, forthcoming version) by the author.
△ Less
Submitted 26 January, 2018; v1 submitted 1 November, 2016;
originally announced November 2016.
-
On strong property (T) and fixed point properties for Lie groups
Authors:
Tim de Laat,
Masato Mimura,
Mikael de la Salle
Abstract:
We consider certain strengthenings of property (T) relative to Banach spaces that are satisfied by high rank Lie groups. Let X be a Banach space for which, for all k, the Banach--Mazur distance to a Hilbert space of all k-dimensional subspaces is bounded above by a power of k strictly less than one half. We prove that every connected simple Lie group of sufficiently large real rank depending on X…
▽ More
We consider certain strengthenings of property (T) relative to Banach spaces that are satisfied by high rank Lie groups. Let X be a Banach space for which, for all k, the Banach--Mazur distance to a Hilbert space of all k-dimensional subspaces is bounded above by a power of k strictly less than one half. We prove that every connected simple Lie group of sufficiently large real rank depending on X has strong property (T) of Lafforgue with respect to X. As a consequence, we obtain that every continuous affine isometric action of such a high rank group (or a lattice in such a group) on X has a fixed point. This result corroborates a conjecture of Bader, Furman, Gelander and Monod. For the special linear Lie groups, we also present a more direct approach to fixed point properties, or, more precisely, to the boundedness of quasi-cocycles. Without appealing to strong property (T), we prove that given a Banach space X as above, every special linear group of sufficiently large rank satisfies the following property: every quasi-1-cocycle with values in an isometric representation on X is bounded.
△ Less
Submitted 4 February, 2016; v1 submitted 24 August, 2015;
originally announced August 2015.
-
Strong algebraization of fixed point properties
Authors:
Masato Mimura
Abstract:
The following natural question arises from Shalom's innovational work (1999, Publ. IHES): "Can we establish an intrinsic criterion to synthesize relative fixed point properties into the whole fixed point property without assuming Bounded Generation?" This paper resolves this question in the affirmative. Our criterion works for ones with respect to certain classes of Busemann NPC spaces. It, moreov…
▽ More
The following natural question arises from Shalom's innovational work (1999, Publ. IHES): "Can we establish an intrinsic criterion to synthesize relative fixed point properties into the whole fixed point property without assuming Bounded Generation?" This paper resolves this question in the affirmative. Our criterion works for ones with respect to certain classes of Busemann NPC spaces. It, moreover, suggests a further step toward constructing super-expanders from finite simple groups of Lie type.
△ Less
Submitted 15 November, 2016; v1 submitted 25 May, 2015;
originally announced May 2015.
-
Superrigidity from Chevalley groups into acylindrically hyperbolic groups via quasi-cocycles
Authors:
Masato Mimura
Abstract:
We prove that every homomorphism from the elementary Chevalley group over a finitely generated unital commutative ring associated with reduced irreducible classical root system of rank at least 2, and ME analogues of such groups, into acylindrically hyperbolic groups has an absolutely elliptic image. This result provides a non-arithmetic generalization of homomorphism superrigidity of Farb--Kaiman…
▽ More
We prove that every homomorphism from the elementary Chevalley group over a finitely generated unital commutative ring associated with reduced irreducible classical root system of rank at least 2, and ME analogues of such groups, into acylindrically hyperbolic groups has an absolutely elliptic image. This result provides a non-arithmetic generalization of homomorphism superrigidity of Farb--Kaimanovich--Masur and Bridson--Wade.
△ Less
Submitted 26 January, 2018; v1 submitted 12 February, 2015;
originally announced February 2015.
-
Multi-way expanders and imprimitive group actions on graphs
Authors:
Masato Mimura
Abstract:
For n at least 2, the concept of n-way expanders was defined by various researchers. Bigger n gives a weaker notion in general, and 2-way expanders coincide with expanders in usual sense. Koji Fujiwara asked whether these concepts are equivalent to that of ordinary expanders for all n for a sequence of Cayley graphs. In this paper, we answer his question in the affirmative. Furthermore, we obtain…
▽ More
For n at least 2, the concept of n-way expanders was defined by various researchers. Bigger n gives a weaker notion in general, and 2-way expanders coincide with expanders in usual sense. Koji Fujiwara asked whether these concepts are equivalent to that of ordinary expanders for all n for a sequence of Cayley graphs. In this paper, we answer his question in the affirmative. Furthermore, we obtain universal inequalities on multi-way isoperimetric constants on any finite connected vertex-transitive graph, and show that gaps between these constants imply the imprimitivity of the group action on the graph.
△ Less
Submitted 25 June, 2015; v1 submitted 10 March, 2014;
originally announced March 2014.
-
Group approximation in Cayley topology and coarse geometry, Part III: Geometric property (T)
Authors:
Masato Mimura,
Narutaka Ozawa,
Hiroki Sako,
Yuhei Suzuki
Abstract:
In this series of papers, we study correspondence between the following: (1) large scale structure of the metric space bigsqcup_m {Cay(G(m))} consisting of Cayley graphs of finite groups with k generators; (2) structure of groups which appear in the boundary of the set {G(m)}_m in the space of k-marked groups. In this third part of the series, we show the correspondence among the metric properties…
▽ More
In this series of papers, we study correspondence between the following: (1) large scale structure of the metric space bigsqcup_m {Cay(G(m))} consisting of Cayley graphs of finite groups with k generators; (2) structure of groups which appear in the boundary of the set {G(m)}_m in the space of k-marked groups. In this third part of the series, we show the correspondence among the metric properties `geometric property (T),' `cohomological property (T),' and the group property `Kazhdan's property (T).' Geometric property (T) of Willett--Yu is stronger than being expander graphs. Cohomological property (T) is stronger than geometric property (T) for general coarse spaces.
△ Less
Submitted 3 March, 2014; v1 submitted 20 February, 2014;
originally announced February 2014.
-
Sphere equivalence, Banach expanders, and extrapolation
Authors:
Masato Mimura
Abstract:
We study the Banach spectral gap lambda_1(G;X,p) of finite graphs G for pairs (X,p) of Banach spaces and exponents. We define the notion of sphere equivalence between Banach spaces and show a generalization of Matousek's extrapolation for Banach spaces sphere equivalent to uniformly convex ones. As a byproduct, we prove that expanders are automatically expanders with respects to (X,p) for any X sp…
▽ More
We study the Banach spectral gap lambda_1(G;X,p) of finite graphs G for pairs (X,p) of Banach spaces and exponents. We define the notion of sphere equivalence between Banach spaces and show a generalization of Matousek's extrapolation for Banach spaces sphere equivalent to uniformly convex ones. As a byproduct, we prove that expanders are automatically expanders with respects to (X,p) for any X sphere equivalent to a uniformly curved Banach space and for any p strictly bigger than 1.
△ Less
Submitted 25 May, 2014; v1 submitted 17 October, 2013;
originally announced October 2013.
-
Group approximation in Cayley topology and coarse geometry, Part I: Coarse embeddings of amenable groups
Authors:
Masato Mimura,
Hiroki Sako
Abstract:
The objective of this series is to study metric geometric properties of (coarse) disjoint unions of amenable Cayley graphs. We employ the Cayley topology and observe connections between large scale structure of metric spaces and group properties of Cayley accumulation points. In this Part I, we prove that a disjoint union has property A of G. Yu if and only if all groups appearing as Cayley accumu…
▽ More
The objective of this series is to study metric geometric properties of (coarse) disjoint unions of amenable Cayley graphs. We employ the Cayley topology and observe connections between large scale structure of metric spaces and group properties of Cayley accumulation points. In this Part I, we prove that a disjoint union has property A of G. Yu if and only if all groups appearing as Cayley accumulation points in the space of marked groups are amenable. As an application, we construct two disjoint unions of finite special linear groups (and unimodular linear groups) with respect to two systems of generators that look similar such that one has property A and the other does not admit (fibred) coarse embeddings into any Banach space with non-trivial type (for instance, any uniformly convex Banach space).
△ Less
Submitted 12 March, 2019; v1 submitted 17 October, 2013;
originally announced October 2013.
-
Weyl group invariants
Authors:
Masaki Kameko,
Mamoru Mimura
Abstract:
For any odd prime $p$, we prove that the induced homomorphism from the mod $p$ cohomology of the classifying space of a compact simply-connected simple connected Lie group to the Weyl group invariants of the mod $p$ cohomology of the classifying space of its maximal torus is an epimorphism except for the case $p=3$, $G=E_8$.
For any odd prime $p$, we prove that the induced homomorphism from the mod $p$ cohomology of the classifying space of a compact simply-connected simple connected Lie group to the Weyl group invariants of the mod $p$ cohomology of the classifying space of its maximal torus is an epimorphism except for the case $p=3$, $G=E_8$.
△ Less
Submitted 29 February, 2012;
originally announced February 2012.
-
Cohomology mod 3 of the classifying space of the exceptional Lie group $E_6$, II : The Weyl group invariants
Authors:
Mamoru Mimura,
Yuriko Sambe,
Michishige Tezuka
Abstract:
We calculate the Weyl group invariants with respect to a maximal torus of the exceptional Lie group $E_6$.
We calculate the Weyl group invariants with respect to a maximal torus of the exceptional Lie group $E_6$.
△ Less
Submitted 16 January, 2012;
originally announced January 2012.
-
Cohomology mod 3 of the classifying space of the exceptional Lie group $E_6$, I : structure of Cotor
Authors:
Mamoru Mimura,
Yuriko Sambe,
Michishige Tezuka
Abstract:
We study the structure of the $E_2$-term of the Rothenberg-Steenrod spectral sequence converging to the mod 3 cohomology of the classifying space of the compact, connected, simply connected, exceptional Lie group of rank 6.
We study the structure of the $E_2$-term of the Rothenberg-Steenrod spectral sequence converging to the mod 3 cohomology of the classifying space of the compact, connected, simply connected, exceptional Lie group of rank 6.
△ Less
Submitted 25 January, 2012; v1 submitted 26 December, 2011;
originally announced December 2011.
-
Property $(TT)$ modulo $T$ and homomorphism superrigidity into mapping class groups
Authors:
Masato Mimura
Abstract:
Every homomorphism from finite index subgroups of a universal lattices to mapping class groups of orientable surfaces (possibly with punctures), or to outer automorphism groups of finitely generated nonabelian free groups must have finite image. Here the universal lattice denotes the special linear group G=SL_m(Z[x1,...,xk]) with m at least 3 and k finite. Moreover, the same results hold ture if u…
▽ More
Every homomorphism from finite index subgroups of a universal lattices to mapping class groups of orientable surfaces (possibly with punctures), or to outer automorphism groups of finitely generated nonabelian free groups must have finite image. Here the universal lattice denotes the special linear group G=SL_m(Z[x1,...,xk]) with m at least 3 and k finite. Moreover, the same results hold ture if universal lattices are replaced with symplectic universal lattices Sp_{2m}(Z[x1,...,xk]) with m at least 2. These results can be regarded as a non-arithmetization of the theorems of Farb--Kaimanovich--Masur and Bridson--Wade. A certain measure equivalence analogue is also established. To show the statements above, we introduce a notion of property (TT)/T ("/T" stands for "modulo trivial part"), which is a weakening of property (TT) of N. Monod. Furthermore, symplectic universal lattices Sp_{2m}(Z[x1,...,xk]) with m at least 3 has the fixed point property for L^p-spaces for any p in (1,infinity).
△ Less
Submitted 19 June, 2011;
originally announced June 2011.
-
Fixed point property for universal lattice on Schatten classes
Authors:
Masato Mimura
Abstract:
The special linear group G=SL_n(Z[x1,...,xk]) (n at least 3 and k finite) is called the universal lattice. Let n be at least 4, p be any real number in (1,\infty). The main result is the following: any finite index subgroup of G has the fixed point property with respect to every affine isometric action on the space of p-Schatten class operators. It is in addition shown that higher rank lattices ha…
▽ More
The special linear group G=SL_n(Z[x1,...,xk]) (n at least 3 and k finite) is called the universal lattice. Let n be at least 4, p be any real number in (1,\infty). The main result is the following: any finite index subgroup of G has the fixed point property with respect to every affine isometric action on the space of p-Schatten class operators. It is in addition shown that higher rank lattices have the same property. These results are generalization of previous theorems repsectively of the author and of Bader--Furman--Gelander--Monod, which treated commutative Lp-setting.
△ Less
Submitted 7 June, 2011; v1 submitted 21 October, 2010;
originally announced October 2010.
-
On Quasi-homomorphisms and Commutators in the Special Linear Group over a Euclidean Ring
Authors:
Masato Mimura
Abstract:
We prove that for any euclidean ring R and n at least 6, Gamma=SL_n(R) has no unbounded quasi-homomorphisms. From Bavard's duality theorem, this means that the stable commutator length vanishes on Gamma. The result is particularly interesting for R = F[x] for a certain field F (such as the field C of complex numbers, because in this case the commutator length on Gamma is known to be unbounded. T…
▽ More
We prove that for any euclidean ring R and n at least 6, Gamma=SL_n(R) has no unbounded quasi-homomorphisms. From Bavard's duality theorem, this means that the stable commutator length vanishes on Gamma. The result is particularly interesting for R = F[x] for a certain field F (such as the field C of complex numbers, because in this case the commutator length on Gamma is known to be unbounded. This answers a question of M. Abért and N. Monod for n at least 6.
△ Less
Submitted 16 February, 2010; v1 submitted 6 November, 2009;
originally announced November 2009.