Search | arXiv e-print repository

Nearest Neighbor Future Captioning: Generating Descriptions for Possible Collisions in Object Placement Tasks

Authors: Takumi Komatsu, Motonari Kambara, Shumpei Hatanaka, Haruka Matsuo, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi, Komei Sugiura

Abstract: Domestic service robots (DSRs) that support people in everyday environments have been widely investigated. However, their ability to predict and describe future risks resulting from their own actions remains insufficient. In this study, we focus on the linguistic explainability of DSRs. Most existing methods do not explicitly model the region of possible collisions; thus, they do not properly gene… ▽ More Domestic service robots (DSRs) that support people in everyday environments have been widely investigated. However, their ability to predict and describe future risks resulting from their own actions remains insufficient. In this study, we focus on the linguistic explainability of DSRs. Most existing methods do not explicitly model the region of possible collisions; thus, they do not properly generate descriptions of these regions. In this paper, we propose the Nearest Neighbor Future Captioning Model that introduces the Nearest Neighbor Language Model for future captioning of possible collisions, which enhances the model output with a nearest neighbors retrieval mechanism. Furthermore, we introduce the Collision Attention Module that attends regions of possible collisions, which enables our model to generate descriptions that adequately reflect the objects associated with possible collisions. To validate our method, we constructed a new dataset containing samples of collisions that can occur when a DSR places an object in a simulation environment. The experimental results demonstrated that our method outperformed baseline methods, based on the standard metrics. In particular, on CIDEr-D, the baseline method obtained 25.09 points, whereas our method obtained 33.08 points. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: Accepted for presentation at Advanced Robotics 24

arXiv:2406.13139 [pdf, other]

Audio Fingerprinting with Holographic Reduced Representations

Authors: Yusuke Fujita, Tatsuya Komatsu

Abstract: This paper proposes an audio fingerprinting model with holographic reduced representation (HRR). The proposed method reduces the number of stored fingerprints, whereas conventional neural audio fingerprinting requires many fingerprints for each audio track to achieve high accuracy and time resolution. We utilize HRR to aggregate multiple fingerprints into a composite fingerprint via circular convo… ▽ More This paper proposes an audio fingerprinting model with holographic reduced representation (HRR). The proposed method reduces the number of stored fingerprints, whereas conventional neural audio fingerprinting requires many fingerprints for each audio track to achieve high accuracy and time resolution. We utilize HRR to aggregate multiple fingerprints into a composite fingerprint via circular convolution and summation, resulting in fewer fingerprints with the same dimensional space as the original. Our search method efficiently finds a combined fingerprint in which a query fingerprint exists. Using HRR's inverse operation, it can recover the relative position within a combined fingerprint, retaining the original time resolution. Experiments show that our method can reduce the number of fingerprints with modest accuracy degradation while maintaining the time resolution, outperforming simple decimation and summation-based aggregation methods. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: accepted at Interspeech 2024

arXiv:2406.12194 [pdf, other]

Universal Score-based Speech Enhancement with High Content Preservation

Authors: Robin Scheibler, Yusuke Fujita, Yuma Shirahata, Tatsuya Komatsu

Abstract: We propose UNIVERSE++, a universal speech enhancement method based on score-based diffusion and adversarial training. Specifically, we improve the existing UNIVERSE model that decouples clean speech feature extraction and diffusion. Our contributions are three-fold. First, we make several modifications to the network architecture, improving training stability and final performance. Second, we intr… ▽ More We propose UNIVERSE++, a universal speech enhancement method based on score-based diffusion and adversarial training. Specifically, we improve the existing UNIVERSE model that decouples clean speech feature extraction and diffusion. Our contributions are three-fold. First, we make several modifications to the network architecture, improving training stability and final performance. Second, we introduce an adversarial loss to promote learning high quality speech features. Third, we propose a low-rank adaptation scheme with a phoneme fidelity loss to improve content preservation in the enhanced speech. In the experiments, we train a universal enhancement model on a large scale dataset of speech degraded by noise, reverberation, and various distortions. The results on multiple public benchmark datasets demonstrate that UNIVERSE++ compares favorably to both discriminative and generative baselines for a wide range of qualitative and intelligibility metrics. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 5 pages, 5 figures, accepted at Interspeech 2024

arXiv:2403.13816 [pdf, other]

Combined X-ray diffraction, electrical resistivity, and $ab$ $initio$ study of (TMTTF)$_2$PF$_6$ under pressure: implications to the unified phase diagram

Authors: Miho Itoi, Kazuyoshi Yoshimi, Hanming Ma, Takahiro Misawa, Takao Tsumuraya, Dilip Bhoi, Tokutaro Komatsu, Hatsumi Mori, Yoshiya Uwatoko, Hitoshi Seo

Abstract: We present a combined experimental and theoretical study on the quasi-one-dimensional organic conductor (TMTTF)$_2$PF$_6$, and elucidate the variation of its physical properties under pressure. We fully resolve the crystal structure by single crystal x-ray diffraction measurements using a diamond anvil cell up to 8 GPa, and based on the structural data, we perform first-principles density-function… ▽ More We present a combined experimental and theoretical study on the quasi-one-dimensional organic conductor (TMTTF)$_2$PF$_6$, and elucidate the variation of its physical properties under pressure. We fully resolve the crystal structure by single crystal x-ray diffraction measurements using a diamond anvil cell up to 8 GPa, and based on the structural data, we perform first-principles density-functional theory calculations and derive the $ab$ $initio$ extended Hubbard-type Hamiltonians. Furthermore, we compare the behavior of the resistivity measured up to 3 GPa using a BeCu clamp-type cell and the ground state properties of the obtained model numerically calculated by the many-variable variational Monte Carlo method. Our main findings are as follows: i) The crystal was rapidly compressed up to about 3 GPa where the volume drops to 80% and gradually varies down to 70% at 8 GPa. The transfer integrals increase following such behavior whereas the screened Coulomb interactions decrease, resulting in a drastic reduction of correlation effect. ii) The degree of dimerization in the intrachain transfer integrals, as the result of the decrease in structural dimerization together with the change in the intermolecular configuration, almost disappears above 4 GPa; the interchain transfer integrals also show characteristic variations under pressure. iii) The results of identifying the characteristic temperatures in the resistivity and the charge and spin orderings in the calculations show an overall agreement: The charge ordering sensitively becomes unstable above 1 GPa, while the spin ordering survives up to higher pressures. These results shed light on the similarities and differences between applying external pressure and substituting the chemical species (chemical pressure). △ Less

Submitted 18 February, 2024; originally announced March 2024.

Comments: 13 pages, 10 figures

arXiv:2403.07534 [pdf, ps, other]

Frobenius numbers associated with Diophantine triples of $x^2+y^2=z^r$ (extended version)

Authors: Takao Komatsu, Neha Gupta, Manoj Upreti

Abstract: We give an explicit formula for the $p$-Frobenius number of triples associated with Diophantine equations $x^2+y^2=z^r$, that is, the largest positive integer that can only be represented in $p$ ways by combining the three integers of the solutions of Diophantine equations $x^2+y^2=z^r$. When $r=2$, the Frobenius number has already been given. We give an explicit formula for the $p$-Frobenius number of triples associated with Diophantine equations $x^2+y^2=z^r$, that is, the largest positive integer that can only be represented in $p$ ways by combining the three integers of the solutions of Diophantine equations $x^2+y^2=z^r$. When $r=2$, the Frobenius number has already been given. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2401.11700 [pdf, other]

Keep Decoding Parallel with Effective Knowledge Distillation from Language Models to End-to-end Speech Recognisers

Authors: Michael Hentschel, Yuta Nishikawa, Tatsuya Komatsu, Yusuke Fujita

Abstract: This study presents a novel approach for knowledge distillation (KD) from a BERT teacher model to an automatic speech recognition (ASR) model using intermediate layers. To distil the teacher's knowledge, we use an attention decoder that learns from BERT's token probabilities. Our method shows that language model (LM) information can be more effectively distilled into an ASR model using both the in… ▽ More This study presents a novel approach for knowledge distillation (KD) from a BERT teacher model to an automatic speech recognition (ASR) model using intermediate layers. To distil the teacher's knowledge, we use an attention decoder that learns from BERT's token probabilities. Our method shows that language model (LM) information can be more effectively distilled into an ASR model using both the intermediate layers and the final layer. By using the intermediate layers as distillation target, we can more effectively distil LM knowledge into the lower network layers. Using our method, we achieve better recognition accuracy than with shallow fusion of an external LM, allowing us to maintain fast parallel decoding. Experiments on the LibriSpeech dataset demonstrate the effectiveness of our approach in enhancing greedy decoding with connectionist temporal classification (CTC). △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: Accepted at ICASSP 2024

arXiv:2312.02510 [pdf, other]

doi 10.1080/01691864.2021.1974942

Estimation of articulated angle in six-wheeled dump trucks using multiple GNSS receivers for autonomous driving

Authors: Taro Suzuki, Kazunori Ohno, Syotaro Kojima, Naoto Miyamoto, Takahiro Suzuki, Tomohiro Komatsu, Yukinori Shibata, Kimitaka Asano, Keiji Nagatani

Abstract: Due to the declining birthrate and aging population, the shortage of labor in the construction industry has become a serious problem, and increasing attention has been paid to automation of construction equipment. We focus on the automatic operation of articulated six-wheel dump trucks at construction sites. For the automatic operation of the dump trucks, it is important to estimate the position a… ▽ More Due to the declining birthrate and aging population, the shortage of labor in the construction industry has become a serious problem, and increasing attention has been paid to automation of construction equipment. We focus on the automatic operation of articulated six-wheel dump trucks at construction sites. For the automatic operation of the dump trucks, it is important to estimate the position and the articulated angle of the dump trucks with high accuracy. In this study, we propose a method for estimating the state of a dump truck by using four global navigation satellite systems (GNSSs) installed on an articulated dump truck and a graph optimization method that utilizes the redundancy of multiple GNSSs. By adding real-time kinematic (RTK)-GNSS constraints and geometric constraints between the four antennas, the proposed method can robustly estimate the position and articulation angle even in environments where GNSS satellites are partially blocked. As a result of evaluating the accuracy of the proposed method through field tests, it was confirmed that the articulated angle could be estimated with an accuracy of 0.1$^\circ$ in an open-sky environment and 0.7$^\circ$ in a mountainous area simulating an elevation angle of 45$^\circ$ where GNSS satellites are blocked. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: This is an electronic version of an article published in ADVANCED ROBOTICS, 35:23, 1376-1387, 2021. ADVANCED ROBOTICS is available online at: www.tandfonline.com/Article DOI; 10.1080/01691864.2019.1619622

Journal ref: Advanced Robotics, 35:23, 1376-1387, 2021

arXiv:2311.03997 [pdf, ps, other]

On a conjecture of Ramírez Alfonsín and Skałba III

Authors: Yuchen Ding, Takao Komatsu

Abstract: Let $1<c<d$ be two relatively prime integers. For a non-negative integer $\ell$, let $g_\ell(c,d)$ be the largest integer $n$ such that $n=c x+d y$ has at most $\ell$ non-negative solutions $(x,y)$. In this paper we prove that $$ π_{\ell,c,d}\sim\frac{π\bigl(g_\ell(c,d)\bigr)}{2 \ell+2}\quad(\text{as}~ c\to\infty)\,, $$ where $π_{\ell,c,d}$ is the number of primes $n$ having more than $\ell$ disti… ▽ More Let $1<c<d$ be two relatively prime integers. For a non-negative integer $\ell$, let $g_\ell(c,d)$ be the largest integer $n$ such that $n=c x+d y$ has at most $\ell$ non-negative solutions $(x,y)$. In this paper we prove that $$ π_{\ell,c,d}\sim\frac{π\bigl(g_\ell(c,d)\bigr)}{2 \ell+2}\quad(\text{as}~ c\to\infty)\,, $$ where $π_{\ell,c,d}$ is the number of primes $n$ having more than $\ell$ distinct non-negative solutions to $n=c x+d y$ with $n\le g_\ell(c,d)$, and $π(x)$ denotes the number of all primes up to $x$. The case where $\ell=0$ has been proved by Ding, Zhai and Zhao recently, which was conjectured formerly by Ramírez Alfonsín and Skałba. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2310.03273 [pdf, other]

Ablation Study to Clarify the Mechanism of Object Segmentation in Multi-Object Representation Learning

Authors: Takayuki Komatsu, Yoshiyuki Ohmura, Yasuo Kuniyoshi

Abstract: Multi-object representation learning aims to represent complex real-world visual input using the composition of multiple objects. Representation learning methods have often used unsupervised learning to segment an input image into individual objects and encode these objects into each latent vector. However, it is not clear how previous methods have achieved the appropriate segmentation of individu… ▽ More Multi-object representation learning aims to represent complex real-world visual input using the composition of multiple objects. Representation learning methods have often used unsupervised learning to segment an input image into individual objects and encode these objects into each latent vector. However, it is not clear how previous methods have achieved the appropriate segmentation of individual objects. Additionally, most of the previous methods regularize the latent vectors using a Variational Autoencoder (VAE). Therefore, it is not clear whether VAE regularization contributes to appropriate object segmentation. To elucidate the mechanism of object segmentation in multi-object representation learning, we conducted an ablation study on MONet, which is a typical method. MONet represents multiple objects using pairs that consist of an attention mask and the latent vector corresponding to the attention mask. Each latent vector is encoded from the input image and attention mask. Then, the component image and attention mask are decoded from each latent vector. The loss function of MONet consists of 1) the sum of reconstruction losses between the input image and decoded component image, 2) the VAE regularization loss of the latent vector, and 3) the reconstruction loss of the attention mask to explicitly encode shape information. We conducted an ablation study on these three loss functions to investigate the effect on segmentation performance. Our results showed that the VAE regularization loss did not affect segmentation performance and the others losses did affect it. Based on this result, we hypothesize that it is important to maximize the attention mask of the image region best represented by a single latent vector corresponding to the attention mask. We confirmed this hypothesis by evaluating a new loss function with the same mechanism as the hypothesis. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2309.09491 [pdf, ps, other]

Polynomial identities and Fermat quotients

Authors: Takao Komatsu, B. Sury

Abstract: We prove some polynomial identities from which we deduce congruences modulo $p^2$ for the Fermat quotient $\frac{2^p-2}{p}$ for any odd prime $p$ (Proposition 1 and Theorem 1). These congruences are simpler than the one obtained by Jothilingam in 1985 which involves listing quadratic residues in some order. On the way, we also observe some more congruences for the Fermat quotient that generalize E… ▽ More We prove some polynomial identities from which we deduce congruences modulo $p^2$ for the Fermat quotient $\frac{2^p-2}{p}$ for any odd prime $p$ (Proposition 1 and Theorem 1). These congruences are simpler than the one obtained by Jothilingam in 1985 which involves listing quadratic residues in some order. On the way, we also observe some more congruences for the Fermat quotient that generalize Eisenstein's classical congruence. Using such polynomial identities, we obtain some sums involving harmonic numbers. We also prove formulae for binomial sums of harmonic numbers of higher order. △ Less

Submitted 18 September, 2023; originally announced September 2023.

arXiv:2309.08141 [pdf, other]

Audio Difference Learning for Audio Captioning

Authors: Tatsuya Komatsu, Yusuke Fujita, Kazuya Takeda, Tomoki Toda

Abstract: This study introduces a novel training paradigm, audio difference learning, for improving audio captioning. The fundamental concept of the proposed learning method is to create a feature representation space that preserves the relationship between audio, enabling the generation of captions that detail intricate audio information. This method employs a reference audio along with the input audio, bo… ▽ More This study introduces a novel training paradigm, audio difference learning, for improving audio captioning. The fundamental concept of the proposed learning method is to create a feature representation space that preserves the relationship between audio, enabling the generation of captions that detail intricate audio information. This method employs a reference audio along with the input audio, both of which are transformed into feature representations via a shared encoder. Captions are then generated from these differential features to describe their differences. Furthermore, a unique technique is proposed that involves mixing the input audio with additional audio, and using the additional audio as a reference. This results in the difference between the mixed audio and the reference audio reverting back to the original input audio. This allows the original input's caption to be used as the caption for their difference, eliminating the need for additional annotations for the differences. In the experiments using the Clotho and ESC50 datasets, the proposed method demonstrated an improvement in the SPIDEr score by 7% compared to conventional methods. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: submitted to ICASSP2024

arXiv:2309.08140 [pdf, other]

PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions

Authors: Reo Shimizu, Ryuichi Yamamoto, Masaya Kawamura, Yuma Shirahata, Hironori Doi, Tatsuya Komatsu, Kentaro Tachibana

Abstract: We propose PromptTTS++, a prompt-based text-to-speech (TTS) synthesis system that allows control over speaker identity using natural language descriptions. To control speaker identity within the prompt-based TTS framework, we introduce the concept of speaker prompt, which describes voice characteristics (e.g., gender-neutral, young, old, and muffled) designed to be approximately independent of spe… ▽ More We propose PromptTTS++, a prompt-based text-to-speech (TTS) synthesis system that allows control over speaker identity using natural language descriptions. To control speaker identity within the prompt-based TTS framework, we introduce the concept of speaker prompt, which describes voice characteristics (e.g., gender-neutral, young, old, and muffled) designed to be approximately independent of speaking style. Since there is no large-scale dataset containing speaker prompts, we first construct a dataset based on the LibriTTS-R corpus with manually annotated speaker prompts. We then employ a diffusion-based acoustic model with mixture density networks to model diverse speaker factors in the training data. Unlike previous studies that rely on style prompts describing only a limited aspect of speaker individuality, such as pitch, speaking speed, and energy, our method utilizes an additional speaker prompt to effectively learn the mapping from natural language descriptions to the acoustic features of diverse speakers. Our subjective evaluation results show that the proposed method can better control speaker characteristics than the methods without the speaker prompt. Audio samples are available at https://reppy4620.github.io/demo.promptttspp/. △ Less

Submitted 27 December, 2023; v1 submitted 15 September, 2023; originally announced September 2023.

Comments: Accepted to ICASSP 2024

arXiv:2307.08998 [pdf, ps, other]

$p$-numerical semigroups of Pell triples

Authors: Takao Komatsu, Jiaxin Mu

Abstract: For a nonnegative integer $p$, the $p$-numerical semigroup $S_p$ is defined as the set of integers whose nonnegative integral linear combinations of given positive integers $a_1,a_2,\dots,a_κ$ with $\gcd(a_1,a_2,\dots,a_κ)=1$ are expressed in more than $p$ ways. When $p=0$, $S=S_0$ is the original numerical semigroup. The largest element and the cardinality of $\mathbb N_0\backslash S_p$ are calle… ▽ More For a nonnegative integer $p$, the $p$-numerical semigroup $S_p$ is defined as the set of integers whose nonnegative integral linear combinations of given positive integers $a_1,a_2,\dots,a_κ$ with $\gcd(a_1,a_2,\dots,a_κ)=1$ are expressed in more than $p$ ways. When $p=0$, $S=S_0$ is the original numerical semigroup. The largest element and the cardinality of $\mathbb N_0\backslash S_p$ are called the $p$-Frobenius number and the $p$-genus, respectively. Their explicit formulas are known for $κ=2$, but those for $κ\ge 3$ have been found only in some special cases. For some known cases, such as the Fibonacci and the Jacobsthal triplets, similar techniques could be applied and explicit formulas such as the $p$-Frobenius number could be found. In this paper, we give explicit formulas for the $p$-Frobenius number and the $p$-genus of Pell numerical semigroups $\bigl(P_i(u),P_{i+2}(u),P_{i+k}(u)\bigr)$. Here, for a given positive integer $u$, Pell-type numbers $P_n(u)$ satisfy the recurrence relation $P_n(u)=u P_{n-1}(u)+P_{n-2}(u)$ ($n\ge 2$) with $P_0(u)=0$ and $P_1(u)=1$. The $p$-Apéry set is used to find the formulas, but it shows a different pattern from those in the known results, and some case by case discussions are necessary. △ Less

Submitted 3 January, 2024; v1 submitted 18 July, 2023; originally announced July 2023.

Comments: J. Ramanujan Math. Soc. arXiv admin note: text overlap with arXiv:2304.00443

arXiv:2304.00443 [pdf, ps, other]

$p$-numerical semigroup of generalized Fibonacci triples

Authors: Takao Komatsu, Shanta Laishram, Pooja Punyani

Abstract: For a nonnegative integer $p$, we give explicit formulas for the $p$-Frobenius number and the $p$-genus of generalized Fibonacci numerical semigroups. Here, the $p$-numerical semigroup $S_p$ is defined as the set of integers whose nonnegative integral linear combinations of given positive integers $a_1,a_2,\dots,a_k$ are expressed more than $p$ ways. When $p=0$, $S_0$ with the $0$-Frobenius number… ▽ More For a nonnegative integer $p$, we give explicit formulas for the $p$-Frobenius number and the $p$-genus of generalized Fibonacci numerical semigroups. Here, the $p$-numerical semigroup $S_p$ is defined as the set of integers whose nonnegative integral linear combinations of given positive integers $a_1,a_2,\dots,a_k$ are expressed more than $p$ ways. When $p=0$, $S_0$ with the $0$-Frobenius number and the $0$-genus is the original numerical semigroup with the Frobenius number and the genus. In this paper, we consider the $p$-numerical semigroup involving Jacobsthal polynomials, which include Fibonacci numbers as special cases. We can also treat with the Jacobsthal-Lucas polynomials, including Lucas numbers accordingly. One of the applications on the $p$-Hilbert series is mentioned. △ Less

Submitted 2 April, 2023; originally announced April 2023.

MSC Class: 11D07; 20M14; 05A17; 05A19; 11D04; 11B68; 11P81

arXiv:2303.06806 [pdf, other]

Neural Diarization with Non-autoregressive Intermediate Attractors

Authors: Yusuke Fujita, Tatsuya Komatsu, Robin Scheibler, Yusuke Kida, Tetsuji Ogawa

Abstract: End-to-end neural diarization (EEND) with encoder-decoder-based attractors (EDA) is a promising method to handle the whole speaker diarization problem simultaneously with a single neural network. While the EEND model can produce all frame-level speaker labels simultaneously, it disregards output label dependency. In this work, we propose a novel EEND model that introduces the label dependency betw… ▽ More End-to-end neural diarization (EEND) with encoder-decoder-based attractors (EDA) is a promising method to handle the whole speaker diarization problem simultaneously with a single neural network. While the EEND model can produce all frame-level speaker labels simultaneously, it disregards output label dependency. In this work, we propose a novel EEND model that introduces the label dependency between frames. The proposed method generates non-autoregressive intermediate attractors to produce speaker labels at the lower layers and conditions the subsequent layers with these labels. While the proposed model works in a non-autoregressive manner, the speaker labels are refined by referring to the whole sequence of intermediate labels. The experiments with the two-speaker CALLHOME dataset show that the intermediate labels with the proposed non-autoregressive intermediate attractors boost the diarization performance. The proposed method with the deeper network benefits more from the intermediate labels, resulting in better performance and training throughput than EEND-EDA. △ Less

Submitted 12 March, 2023; originally announced March 2023.

Comments: ICASSP 2023

arXiv:2302.09583 [pdf, ps, other]

Alternating Walk/Zeta Correspondence

Authors: Takashi Komatsu, Norio Konno, Iwao Sato

Abstract: We consider the alternating zeta function and the alternating $L$-function of a graph $G$, and express them by using the Ihara zeta function of $G$. Next, we define a generalized alternating zeta function of a graph, and express the generalized alternating zeta function of a vertex-transitive regular graph by spectra of the transition probability matrix of the symmetric simple random walk on it an… ▽ More We consider the alternating zeta function and the alternating $L$-function of a graph $G$, and express them by using the Ihara zeta function of $G$. Next, we define a generalized alternating zeta function of a graph, and express the generalized alternating zeta function of a vertex-transitive regular graph by spectra of the transition probability matrix of the symmetric simple random walk on it and its Laplacian. Furthermore, we present an integral expression for the limit of the generalized alternating zeta functions of a series of vertex-transitive regular graphs. As an example, we treat the generalized alternating zeta functions of a finite torus. Finally, we treat the relation between the Mahler measure and the alternating zeta function of a graph. △ Less

Submitted 19 February, 2023; originally announced February 2023.

Comments: 24 pages. arXiv admin note: text overlap with arXiv:2205.00457; text overlap with arXiv:1905.13182 by other authors

MSC Class: 05C50; 15A15

arXiv:2212.13704 [pdf, other]

Ronkin/Zeta Correspondence

Authors: Takashi Komatsu, Norio Konno, Iwao Sato, Kohei Sato

Abstract: The Ronkin function was defined by Ronkin in the consideration of the zeros of almost periodic function. Recently, this function has been used in various research fields in mathematics, physics and so on. Especially in mathematics, it has a closed connections with tropical geometry, amoebas, Newton polytopes and dimer models. On the other hand, we have been investigated a new class of zeta funct… ▽ More The Ronkin function was defined by Ronkin in the consideration of the zeros of almost periodic function. Recently, this function has been used in various research fields in mathematics, physics and so on. Especially in mathematics, it has a closed connections with tropical geometry, amoebas, Newton polytopes and dimer models. On the other hand, we have been investigated a new class of zeta functions for various kinds of walks including quantum walks by a series of our previous work on Zeta Correspondence. The quantum walk is a quantum counterpart of the random walk. In this paper, we present a new relation between the Ronkin function and our zeta function for random walks and quantum walks. Firstly we consider this relation in the case of one-dimensional random walks. Afterwards we deal with higher-dimensional random walks. For comparison with the case of the quantum walk, we also treat the case of one-dimensional quantum walks. Our results bridge between the Ronkin function and the zeta function via quantum walks for the first time. △ Less

Submitted 12 February, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

Comments: 19 pages. arXiv admin note: substantial text overlap with arXiv:2202.05966

arXiv:2210.17019 [pdf, ps, other]

doi 10.1142/9789811272608_0005

Sylvester sums on the Frobenius set in arithmetic progression with initial gaps

Authors: Takao Komatsu

Abstract: Let $a_1,a_2,\dots,a_k$ be positive integers with $\gcd(a_1,a_2,\dots,a_k)=1$. Frobenius number is the largest positive integer that is NOT representable in terms of $a_1,a_2,\dots,a_k$. When $k\ge 3$, there is no explicit formula in general, but some formulae may exist for special sequences $a_1,a_2,\dots,a_k$, including, those forming arithmetic progressions and their modifications. In this pape… ▽ More Let $a_1,a_2,\dots,a_k$ be positive integers with $\gcd(a_1,a_2,\dots,a_k)=1$. Frobenius number is the largest positive integer that is NOT representable in terms of $a_1,a_2,\dots,a_k$. When $k\ge 3$, there is no explicit formula in general, but some formulae may exist for special sequences $a_1,a_2,\dots,a_k$, including, those forming arithmetic progressions and their modifications. In this paper we give explicit formulae for the sum of nonrepresentable positive integers (Sylvester sum) as well as Frobenius numbers and the number of nonrepresentable positive integers (Sylverster number) for $a_1,a_2,\dots,a_k$ forming arithmetic progressions with initial gaps. △ Less

Submitted 30 October, 2022; originally announced October 2022.

arXiv:2209.13118 [pdf, ps, other]

The Frobenius number for shifted geometric sequences associated with the number of solutions

Authors: Takao Komatsu

Abstract: For a non-negative integer $p$, one of the generalized Frobenius numbers, that is called the $p$-Frobenius number, is the largest integer that is represented at most in $p$ ways as a linear combination with nonnegative integer coefficients of a given set of positive integers whose greatest common divisor is one. The famous so-called Frobenius number proposed by Frobenius is reduced to the $0$-Frob… ▽ More For a non-negative integer $p$, one of the generalized Frobenius numbers, that is called the $p$-Frobenius number, is the largest integer that is represented at most in $p$ ways as a linear combination with nonnegative integer coefficients of a given set of positive integers whose greatest common divisor is one. The famous so-called Frobenius number proposed by Frobenius is reduced to the $0$-Frobenius number when $p=0$. The explicit formula for the Frobenius number with two variables was found in the 19th century, but a formula with more than two variables is very difficult to find, and closed formulas of Frobenius numbers have been found only in special cases such as geometric, Thabit, Mersenne, and so on. The case of $p>0$ was even more difficult, and not a single formula was known. However, most recently, we have finally succeeded in giving the $p$-Frobenius numbers as closed-form expressions of the triangular number triplet , repunits, Fibonacci triplet and Jacobsthal triplet. In this paper, we give closed-form expressions of the $p$-Frobenius number for the finite sequence $\{a b^n-c\}_n$, where $a$, $b$ and $c$ are integers with $a\ge 1$, $b\ge 2$ and $c\ne 0$. This sequence includes the cases for geometric, Thabit and Mersenne as well as their variations. △ Less

Submitted 9 February, 2024; v1 submitted 26 September, 2022; originally announced September 2022.

Comments: Publicationes Mathematicae Debrecen (in the second half of 2024 in volume 105)

arXiv:2209.06674 [pdf, ps, other]

Analytic aspects of $q,r$-analogue of poly-Stirling numbers of both kinds

Authors: Takao Komatsu, Eli Bagno, David Garber

Abstract: The Stirling numbers of type $B$ of the second kind count signed set partitions. In this paper we provide new combinatorial and analytical identities regarding these numbers as well as Broder's $r$-version of these numbers. Among these identities one can find recursions, explicit formulas based on the inclusion-exclusion principle, and also exponential generating functions. These Stirling number… ▽ More The Stirling numbers of type $B$ of the second kind count signed set partitions. In this paper we provide new combinatorial and analytical identities regarding these numbers as well as Broder's $r$-version of these numbers. Among these identities one can find recursions, explicit formulas based on the inclusion-exclusion principle, and also exponential generating functions. These Stirling numbers can be considered as members of a wider family of triangles of numbers that are characterized using results of Comtet and Lancaster. We generalize these theorems, which present equivalent conditions for a triangle of numbers to be a triangle of generalized Stirling numbers, to the case of the $q,r$-poly Stirling numbers, which are $q$-analogues of the restricted Stirling numbers defined by Broder and having a polynomial value appearing in their defining recursion. There are two ways to do this and these ways are related by a nice identity. △ Less

Submitted 5 April, 2024; v1 submitted 14 September, 2022; originally announced September 2022.

Comments: Revised version. 37 pages, no figures; submitted

MSC Class: Primary: 05A15; Secondary: 05A18; 05A19; 05A30; 11B73

arXiv:2207.08962 [pdf, ps, other]

$p$-numerical semigroups with $p$-symmetric properties

Authors: Takao Komatsu, Haotian Ying

Abstract: The so-called Frobenius number in the famous linear Diophantine problem of Frobenius is the largest integer such that the linear equation $a_1 x_1+\cdots+a_k x_k=n$ ($a_1,\dots,a_k$ are given positive integers with $\gcd(a_1,\dots,a_k)=1$) does not have a non-negative integer solution $(x_1,\dots,x_k)$. The generalized Frobenius number (called the $p$-Frobenius number) is the largest integer such… ▽ More The so-called Frobenius number in the famous linear Diophantine problem of Frobenius is the largest integer such that the linear equation $a_1 x_1+\cdots+a_k x_k=n$ ($a_1,\dots,a_k$ are given positive integers with $\gcd(a_1,\dots,a_k)=1$) does not have a non-negative integer solution $(x_1,\dots,x_k)$. The generalized Frobenius number (called the $p$-Frobenius number) is the largest integer such that this linear equation has at most $p$ solutions. That is, when $p=0$, the $0$-Frobenius number is the original Frobenius number. In this paper, we introduce and discuss $p$-numerical semigroups by developing a generalization of the theory of numerical semigroups based on this flow of the number of representations. That is, for a certain non-negative integer $p$, $p$-gaps, $p$-symmetric semigroups, $p$-pseudo-symmetric semigroups, and the like are defined, and their properties are obtained. When $p=0$, they correspond to the original gaps, symmetric semigroups, and pseudo-symmetric semigroups, respectively. △ Less

Submitted 18 June, 2023; v1 submitted 18 July, 2022; originally announced July 2022.

Comments: Journal of Algebra and its Applications (2024)

MSC Class: 20M14; 11D07; 20M05; 05A15; 11B25

arXiv:2206.13054 [pdf, ps, other]

doi 10.1007/s00026-022-00594-3

The Frobenius number for sequences of triangular numbers associated with number of solutions

Authors: Takao Komatsu

Abstract: The famous linear diophantine problem of Frobenius is the problem to determine the largest integer (Frobenius number) whose number of representations in terms of $a_1,\dots,a_k$ is at most zero, that is not representable. In other words, all the integers greater than this number can be represented for at least one way. One of the natural generalizations of this problem is to find the largest integ… ▽ More The famous linear diophantine problem of Frobenius is the problem to determine the largest integer (Frobenius number) whose number of representations in terms of $a_1,\dots,a_k$ is at most zero, that is not representable. In other words, all the integers greater than this number can be represented for at least one way. One of the natural generalizations of this problem is to find the largest integer (generalized Frobenius number) whose number of representations is at most a given nonnegative integer $p$. It is easy to find the explicit form of this number in the case of two variables. However, no explicit form has been known even in any special case of three variables. In this paper we are successful to show explicit forms of the generalized Frobenius numbers of the triples of triangular numbers. When $p=0$, their Frobenius number is given by Robles-Pérez and Rosales in 2018. △ Less

Submitted 1 July, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

Comments: Annals of Combinatorics

arXiv:2206.13052 [pdf, ps, other]

The $p$-numerical semigroup of the triple of arithmetic progressions

Authors: Takao Komatsu, Haotian Ying

Abstract: For given positive integers $a_1,a_2,\dots,a_k$ with $\gcd(a_1,a_2,\dots,a_k)=1$, the denumerant $d(n)=d(n;a_1,a_2,\dots,a_k)$ is the number of nonnegative solutions $(x_1,x_2,\dots,x_k)$ of the linear equation $a_1 x_1+a_2 x_2+\dots+a_k x_k=n$ for a positive integer $n$. For a given nonnegative integer $p$, let $S_p=S_p(a_1,a_2,\dots,a_k)$ be the set of all nonnegative integers $n$'s such that… ▽ More For given positive integers $a_1,a_2,\dots,a_k$ with $\gcd(a_1,a_2,\dots,a_k)=1$, the denumerant $d(n)=d(n;a_1,a_2,\dots,a_k)$ is the number of nonnegative solutions $(x_1,x_2,\dots,x_k)$ of the linear equation $a_1 x_1+a_2 x_2+\dots+a_k x_k=n$ for a positive integer $n$. For a given nonnegative integer $p$, let $S_p=S_p(a_1,a_2,\dots,a_k)$ be the set of all nonnegative integers $n$'s such that $d(n)>p$. In this paper, we are interested in the $p$-Frobenius number, which is the maximum of the set of gaps $\mathbb N_0\backslash S_p$. Here $\mathbb N_0$ denotes the set of nonnegative integers. When $p=0$, $S=S_0$ is the original numerical semigroup, and the $0$-Frobenius number is the original Frobenius number. The explicit formula for two variables is known not only for $p=0$ but also for $p>0$, but when there are three or more variables, it is difficult even in the special case of $p=0$. For $p>0$, it is not only more difficult, but no explicit formula had been found. In this paper, explicit formulas of the $p$-Frobenius number and related values are given for the triple of arithmetic progressions. The main tool is to determine the elements of the $p$-Apéry set. △ Less

Submitted 27 June, 2023; v1 submitted 27 June, 2022; originally announced June 2022.

Comments: Symmetry Vol.15 (2023)

arXiv:2206.05660 [pdf, ps, other]

The $p$-Frobenius and $p$-Sylvester numbers for Fibonacci and Lucas triplets

Authors: Takao Komatsu, Haotian Ying

Abstract: In this paper we study a certain kind of generalized linear Diophantine problem of Frobenius. Let $a_1,a_2,\dots,a_l$ be positive integers such that their greatest common divisor is one. For a nonnegative integer $p$, denote the $p$-Frobenius number by $g_p(a_1,a_2,\dots,a_l)$, which is the largest integer that can be represented at most $p$ ways by a linear combination with nonnegative integer co… ▽ More In this paper we study a certain kind of generalized linear Diophantine problem of Frobenius. Let $a_1,a_2,\dots,a_l$ be positive integers such that their greatest common divisor is one. For a nonnegative integer $p$, denote the $p$-Frobenius number by $g_p(a_1,a_2,\dots,a_l)$, which is the largest integer that can be represented at most $p$ ways by a linear combination with nonnegative integer coefficients of $a_1,a_2,\dots,a_l$. When $p=0$, $0$-Frobenius number is the classical Frobenius number. When $l=2$, $p$-Frobenius number is explicitly given. However, when $l=3$ and even larger, even in special cases, it is not easy to give the Frobenius number explicitly, and it is even more difficult when $p>0$, and no specific example has been known. However, very recently, we have succeeded in giving explicit formulas for the case where the sequence is of triangular numbers or of repunits for the case where $l=3$. In this paper, we show the explicit formula for the Fibonacci triple when $p>0$. In addition, we give an explicit formula for the $p$-Sylvester number, that is, the total number of nonnegative integers that can be represented in at most $p$ ways. Furthermore, explicit formulas are shown concerning the Lucas triple. △ Less

Submitted 3 December, 2022; v1 submitted 12 June, 2022; originally announced June 2022.

Comments: Mathematical Biosciences and Engineering

MSC Class: 11D07; 05A15; 05A17; 05A19; 11B68; 11D04; 11P81; 20M14

arXiv:2205.00457 [pdf, ps, other]

Metzler/Zeta Correspondence

Authors: Yusuke Ide, Takashi Komatsu, Norio Konno, Iwao Sato

Abstract: We present an explicit formula for the determinant on the Metzler matrix of a digraph $D$. Furthermore, we introduce a walk-type zeta function with respect to this Metzler matrix of the symmetric digraph of a finite torus, and express its limit formula by using the integral expression. We present an explicit formula for the determinant on the Metzler matrix of a digraph $D$. Furthermore, we introduce a walk-type zeta function with respect to this Metzler matrix of the symmetric digraph of a finite torus, and express its limit formula by using the integral expression. △ Less

Submitted 1 July, 2022; v1 submitted 1 May, 2022; originally announced May 2022.

Comments: 16 pages

MSC Class: 05C50; 15A15

arXiv:2204.07325 [pdf, ps, other]

Sylvester power and weighted sums on the Frobenius set in arithmetic progression

Authors: Takao Komatsu

Abstract: Let $a_1,a_2,\dots,a_k$ be positive integers with $\gcd(a_1,a_2,\dots,a_k)=1$. Frobenius number is the largest positive integer that is NOT representable in terms of $a_1,a_2,\dots,a_k$. When $k\ge 3$, there is no explicit formula in general, but some formulae may exist for special sequences $a_1,a_2,\dots,a_k$, including, those forming arithmetic progressions and their modifications. In this pape… ▽ More Let $a_1,a_2,\dots,a_k$ be positive integers with $\gcd(a_1,a_2,\dots,a_k)=1$. Frobenius number is the largest positive integer that is NOT representable in terms of $a_1,a_2,\dots,a_k$. When $k\ge 3$, there is no explicit formula in general, but some formulae may exist for special sequences $a_1,a_2,\dots,a_k$, including, those forming arithmetic progressions and their modifications. In this paper, we give formulae for the power and weighted sum of nonrepresentable positive integers. As applications, we show explicit expressions of these sums for $a_1,a_2,\dots,a_k$ forming arithmetic progressions. △ Less

Submitted 15 April, 2022; originally announced April 2022.

Comments: arXiv admin note: text overlap with arXiv:2111.11021, arXiv:2203.12238, arXiv:2101.04298

MSC Class: 11D07; 05A15; 05A17; 05A19; 11B68; 11D04; 11P81

Journal ref: Discrete Applied Mathematics 315 (2022), 110-126

arXiv:2204.02279 [pdf, ps, other]

How Information on Acoustic Scenes and Sound Events Mutually Benefits Event Detection and Scene Classification Tasks

Authors: Keisuke Imoto, Yuka Komatsu, Shunsuke Tsubaki, Tatsuya Komatsu

Abstract: Acoustic scene classification (ASC) and sound event detection (SED) are fundamental tasks in environmental sound analysis, and many methods based on deep learning have been proposed. Considering that information on acoustic scenes and sound events helps SED and ASC mutually, some researchers have proposed a joint analysis of acoustic scenes and sound events by multitask learning (MTL). However, co… ▽ More Acoustic scene classification (ASC) and sound event detection (SED) are fundamental tasks in environmental sound analysis, and many methods based on deep learning have been proposed. Considering that information on acoustic scenes and sound events helps SED and ASC mutually, some researchers have proposed a joint analysis of acoustic scenes and sound events by multitask learning (MTL). However, conventional works have not investigated in detail how acoustic scenes and sound events mutually benefit SED and ASC. We, therefore, investigate the impact of information on acoustic scenes and sound events on the performance of SED and ASC by using domain adversarial training based on a gradient reversal layer (GRL) or model training with fake labels. Experimental results obtained using the TUT Acoustic Scenes 2016/2017 and TUT Sound Events 2016/2017 show that pieces of information on acoustic scenes and sound events are effectively used to detect sound events and classify acoustic scenes, respectively. Moreover, upon comparing GRL- and fake-label-based methods with single-task-based ASC and SED methods, single-task-based methods are found to achieve better performance. This result implies that even when using single-task-based ASC and SED methods, information on acoustic scenes may be implicitly utilized for SED and vice versa. △ Less

Submitted 5 April, 2022; originally announced April 2022.

Comments: Submitted to INTERSPEECH 2022

arXiv:2204.00176 [pdf, other]

Better Intermediates Improve CTC Inference

Authors: Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee, Lukas Lee, Shinji Watanabe, Yusuke Kida

Abstract: This paper proposes a method for improved CTC inference with searched intermediates and multi-pass conditioning. The paper first formulates self-conditioned CTC as a probabilistic model with an intermediate prediction as a latent representation and provides a tractable conditioning framework. We then propose two new conditioning methods based on the new formulation: (1) Searched intermediate condi… ▽ More This paper proposes a method for improved CTC inference with searched intermediates and multi-pass conditioning. The paper first formulates self-conditioned CTC as a probabilistic model with an intermediate prediction as a latent representation and provides a tractable conditioning framework. We then propose two new conditioning methods based on the new formulation: (1) Searched intermediate conditioning that refines intermediate predictions with beam-search, (2) Multi-pass conditioning that uses predictions of previous inference for conditioning the next inference. These new approaches enable better conditioning than the original self-conditioned CTC during inference and improve the final performance. Experiments with the LibriSpeech dataset show relative 3%/12% performance improvement at the maximum in test clean/other sets compared to the original self-conditioned CTC. △ Less

Submitted 31 March, 2022; originally announced April 2022.

Comments: 5 pages, submitted INTERSPEECH2022

arXiv:2204.00175 [pdf, other]

doi 10.1109/SLT54892.2023.10022466

Alternate Intermediate Conditioning with Syllable-level and Character-level Targets for Japanese ASR

Authors: Yusuke Fujita, Tatsuya Komatsu, Yusuke Kida

Abstract: End-to-end automatic speech recognition directly maps input speech to characters. However, the mapping can be problematic when several different pronunciations should be mapped into one character or when one pronunciation is shared among many different characters. Japanese ASR suffers the most from such many-to-one and one-to-many mapping problems due to Japanese kanji characters. To alleviate the… ▽ More End-to-end automatic speech recognition directly maps input speech to characters. However, the mapping can be problematic when several different pronunciations should be mapped into one character or when one pronunciation is shared among many different characters. Japanese ASR suffers the most from such many-to-one and one-to-many mapping problems due to Japanese kanji characters. To alleviate the problems, we introduce explicit interaction between characters and syllables using Self-conditioned connectionist temporal classification (CTC), in which the upper layers are ``self-conditioned'' on the intermediate predictions from the lower layers. The proposed method utilizes character-level and syllable-level intermediate predictions as conditioning features to deal with mutual dependency between characters and syllables. Experimental results on Corpus of Spontaneous Japanese show that the proposed method outperformed the conventional multi-task and Self-conditioned CTC methods. △ Less

Submitted 12 March, 2023; v1 submitted 31 March, 2022; originally announced April 2022.

Comments: SLT 2022

arXiv:2204.00174 [pdf, other]

InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR

Authors: Yu Nakagome, Tatsuya Komatsu, Yusuke Fujita, Shuta Ichimura, Yusuke Kida

Abstract: This paper proposes InterAug: a novel training method for CTC-based ASR using augmented intermediate representations for conditioning. The proposed method exploits the conditioning framework of self-conditioned CTC to train robust models by conditioning with "noisy" intermediate predictions. During the training, intermediate predictions are changed to incorrect intermediate predictions, and fed in… ▽ More This paper proposes InterAug: a novel training method for CTC-based ASR using augmented intermediate representations for conditioning. The proposed method exploits the conditioning framework of self-conditioned CTC to train robust models by conditioning with "noisy" intermediate predictions. During the training, intermediate predictions are changed to incorrect intermediate predictions, and fed into the next layer for conditioning. The subsequent layers are trained to correct the incorrect intermediate predictions with the intermediate losses. By repeating the augmentation and the correction, iterative refinements, which generally require a special decoder, can be realized only with the audio encoder. To produce noisy intermediate predictions, we also introduce new augmentation: intermediate feature space augmentation and intermediate token space augmentation that are designed to simulate typical errors. The combination of the proposed InterAug framework with new augmentation allows explicit training of the robust audio encoders. In experiments using augmentations simulating deletion, insertion, and substitution error, we confirmed that the trained model acquires robustness to each error, boosting the speech recognition performance of the strong self-conditioned CTC baseline. △ Less

Submitted 31 March, 2022; originally announced April 2022.

Comments: This paper was submitted to INTERSPEECH2022

arXiv:2203.12238 [pdf, ps, other]

Sylvester sums on the Frobenius set in arithmetic progression

Authors: Takao Komatsu

Abstract: Let $a_1,a_2,\dots,a_k$ be positive integers with $\gcd(a_1,a_2,\dots,a_k)=1$. The concept of the weighted sum $\sum_{n\in{\rm NR}}λ^{n}$ is introduced in \cite{KZ0,KZ}, where ${\rm NR}={\rm NR}(a_1,a_2,\dots,a_k)$ denotes the set of positive integers nonrepresentable in terms of $a_1,a_2,\dots,a_k$. When $λ=1$, such a sum is often called Sylvester sum. The main purpose of this paper is to give ex… ▽ More Let $a_1,a_2,\dots,a_k$ be positive integers with $\gcd(a_1,a_2,\dots,a_k)=1$. The concept of the weighted sum $\sum_{n\in{\rm NR}}λ^{n}$ is introduced in \cite{KZ0,KZ}, where ${\rm NR}={\rm NR}(a_1,a_2,\dots,a_k)$ denotes the set of positive integers nonrepresentable in terms of $a_1,a_2,\dots,a_k$. When $λ=1$, such a sum is often called Sylvester sum. The main purpose of this paper is to give explicit expressions of the Sylvester sum ($λ=1$) and the weighed sum ($λ\ne 1$), where $a_1,a_2,\dots,a_k$ forms arithmetic progressions. As applications, various other cases are also considered, including weighted sums, almost arithmetic sequences, arithmetic sequences with an additional term, and geometric-like sequences. Several examples illustrate and confirm our results. △ Less

Submitted 23 March, 2022; originally announced March 2022.

Comments: In: F. Yilmaz et al. (eds.), Mathematical Methods for Engineering Applications, Springer Proceedings in Mathematics & Statistics, vol. 384. Springer, Cham., 2022 May. (to appear)

MSC Class: 11D07; 05A15; 05A17; 05A19; 11B68; 11D04; 11P81

arXiv:2202.08474 [pdf, other]

Non-Autoregressive ASR with Self-Conditioned Folded Encoders

Authors: Tatsuya Komatsu

Abstract: This paper proposes CTC-based non-autoregressive ASR with self-conditioned folded encoders. The proposed method realizes non-autoregressive ASR with fewer parameters by folding the conventional stack of encoders into only two blocks; base encoders and folded encoders. The base encoders convert the input audio features into a neural representation suitable for recognition. This is followed by the f… ▽ More This paper proposes CTC-based non-autoregressive ASR with self-conditioned folded encoders. The proposed method realizes non-autoregressive ASR with fewer parameters by folding the conventional stack of encoders into only two blocks; base encoders and folded encoders. The base encoders convert the input audio features into a neural representation suitable for recognition. This is followed by the folded encoders applied repeatedly for further refinement. Applying the CTC loss to the outputs of all encoders enforces the consistency of the input-output relationship. Thus, folded encoders learn to perform the same operations as an encoder with deeper distinct layers. In experiments, we investigate how to set the number of layers and the number of iterations for the base and folded encoders. The results show that the proposed method achieves a performance comparable to that of the conventional method using only 38% as many parameters. Furthermore, it outperforms the conventional method when increasing the number of iterations. △ Less

Submitted 17 February, 2022; originally announced February 2022.

Comments: 5 pages, accepted at ICASSP2022

arXiv:2202.08470 [pdf, other]

doi 10.21437/Interspeech.2021-2218

Acoustic Event Detection with Classifier Chains

Authors: Tatsuya Komatsu, Shinji Watanabe, Koichi Miyazaki, Tomoki Hayashi

Abstract: This paper proposes acoustic event detection (AED) with classifier chains, a new classifier based on the probabilistic chain rule. The proposed AED with classifier chains consists of a gated recurrent unit and performs iterative binary detection of each event one by one. In each iteration, the event's activity is estimated and used to condition the next output based on the probabilistic chain rule… ▽ More This paper proposes acoustic event detection (AED) with classifier chains, a new classifier based on the probabilistic chain rule. The proposed AED with classifier chains consists of a gated recurrent unit and performs iterative binary detection of each event one by one. In each iteration, the event's activity is estimated and used to condition the next output based on the probabilistic chain rule to form classifier chains. Therefore, the proposed method can handle the interdependence among events upon classification, while the conventional AED methods with multiple binary classifiers with a linear layer and sigmoid function have placed an assumption of conditional independence. In the experiments with a real-recording dataset, the proposed method demonstrates its superior AED performance to a relative 14.80% improvement compared to a convolutional recurrent neural network baseline system with the multiple binary classifiers. △ Less

Submitted 17 February, 2022; originally announced February 2022.

Comments: 5pages, presented at Interspeech2021

arXiv:2202.08456 [pdf, other]

MLP-ASR: Sequence-length agnostic all-MLP architectures for speech recognition

Authors: Jin Sakuma, Tatsuya Komatsu, Robin Scheibler

Abstract: We propose multi-layer perceptron (MLP)-based architectures suitable for variable length input. MLP-based architectures, recently proposed for image classification, can only be used for inputs of a fixed, pre-defined size. However, many types of data are naturally variable in length, for example, acoustic signals. We propose three approaches to extend MLP-based architectures for use with sequences… ▽ More We propose multi-layer perceptron (MLP)-based architectures suitable for variable length input. MLP-based architectures, recently proposed for image classification, can only be used for inputs of a fixed, pre-defined size. However, many types of data are naturally variable in length, for example, acoustic signals. We propose three approaches to extend MLP-based architectures for use with sequences of arbitrary length. The first one uses a circular convolution applied in the Fourier domain, the second applies a depthwise convolution, and the final relies on a shift operation. We evaluate the proposed architectures on an automatic speech recognition task with the Librispeech and Tedlium2 corpora. The best proposed MLP-based architectures improves WER by 1.0 / 0.9%, 0.9 / 0.5% on Librispeech dev-clean/dev-other, test-clean/test-other set, and 0.8 / 1.1% on Tedlium2 dev/test set using 86.4% the size of self-attention-based architecture. △ Less

Submitted 17 February, 2022; originally announced February 2022.

Comments: 8 pages, 4 figures

arXiv:2202.05966 [pdf, ps, other]

Mahler/Zeta Correspondence

Authors: Takashi Komatsu, Norio Konno, Iwao Sato, Shunya Tamura

Abstract: The Mahler measure was introduced by Mahler in the study of number theory. It is known that the Mahler measure appears in different areas of mathematics and physics. On the other hand, we have been investigated a new class of zeta functions for various kinds of walks including quantum walks by a series of our previous work on "Zeta Correspondence". The quantum walk is a quantum counterpart of the… ▽ More The Mahler measure was introduced by Mahler in the study of number theory. It is known that the Mahler measure appears in different areas of mathematics and physics. On the other hand, we have been investigated a new class of zeta functions for various kinds of walks including quantum walks by a series of our previous work on "Zeta Correspondence". The quantum walk is a quantum counterpart of the random walk. In this paper, we present a new relation between the Mahler measure and our zeta function for quantum walks. Firstly we consider this relation in the case of one-dimensional quantum walks. Afterwards we deal with higher-dimensional quantum walks. For comparison with the case of the quantum walk, we also treat the case of higher-dimensional random walks. Our results bridge between the Mahler measure and the zeta function via quantum walks for the first time. △ Less

Submitted 11 February, 2022; originally announced February 2022.

Comments: 27 pages. arXiv admin note: text overlap with arXiv:2109.07664, arXiv:2104.10287

arXiv:2201.03973 [pdf, ps, other]

A Generalized Grover/Zeta Correspondence

Authors: Takashi Komatsu, Norio Konno, Iwao Sato, Shunya Tamura

Abstract: We introduce a generalized Grover matrix of a graph and present an explicit formula for its characteristic polynomial. As a corollary, we give the spectra for the generalized Grover matrix of a regular graph. Next, we define a zeta function and a generalized zeta function of a graph $G$ with respect to its generalized Grover matrix as an analog of the Ihara zeta function and present explicit formu… ▽ More We introduce a generalized Grover matrix of a graph and present an explicit formula for its characteristic polynomial. As a corollary, we give the spectra for the generalized Grover matrix of a regular graph. Next, we define a zeta function and a generalized zeta function of a graph $G$ with respect to its generalized Grover matrix as an analog of the Ihara zeta function and present explicit formulas for their zeta functions for a vertex-transitive graph. As applications, we express the limit on the generalized zeta functions of a family of finite vertex-transitive regular graphs by an integral. Furthermore, we give the limit on the generalized zeta functions of a family of finite tori as an integral expression. △ Less

Submitted 9 January, 2022; originally announced January 2022.

Comments: 10 pages. arXiv admin note: text overlap with arXiv:2011.14162

MSC Class: 60F05; 05C50; 15A15; 05C25

arXiv:2112.15310 [pdf, ps, other]

Cameron's operator in terms of determinants, and hypergeometric numbers

Authors: Narakorn Rompurk Kanasri, Takao Komatsu, Vichian Laohakosol

Abstract: By studying Cameron's operator in terms of determinants, two kinds of "integer" sequences of incomplete numbers were introduced. One was the sequence of restricted numbers, including $s$-step Fibonacci sequences. Another was the sequence of associated numbers, including Lamé sequences of higher order. By the classical Trudi's formula and the inverse relation, more expressions were able to be obtai… ▽ More By studying Cameron's operator in terms of determinants, two kinds of "integer" sequences of incomplete numbers were introduced. One was the sequence of restricted numbers, including $s$-step Fibonacci sequences. Another was the sequence of associated numbers, including Lamé sequences of higher order. By the classical Trudi's formula and the inverse relation, more expressions were able to be obtained. These relations and identities can be extended to those of sequence of negative integers or rational numbers. As applications, we consider hypergeometric Bernoulli, Cauchy and Euler numbers with some modifications. △ Less

Submitted 31 December, 2021; originally announced December 2021.

Journal ref: Boletín de la Sociedad Matemática Mexicana, Third Series 28 (2022), issue 1, Article 9, 23 pp

arXiv:2111.11021 [pdf, ps, other]

On the determination of $p$-Frobenius and related numbers using the $p$-Apéry set

Authors: Takao Komatsu

Abstract: In this paper, we give convenient formulas in order to obtain explicit expressions of a generalized Frobenius number called the $p$-Frobenius number as well as its related values. Here, for a non-negative integer $p$, the $p$-Frobenius number is the largest integer whose number of solutions of the linear diophantine equation in terms of positive integers $a_1,a_2,\dots,a_k$ with… ▽ More In this paper, we give convenient formulas in order to obtain explicit expressions of a generalized Frobenius number called the $p$-Frobenius number as well as its related values. Here, for a non-negative integer $p$, the $p$-Frobenius number is the largest integer whose number of solutions of the linear diophantine equation in terms of positive integers $a_1,a_2,\dots,a_k$ with $\gcd(a_1,a_2,\dots,a_k)=1$ is at most $p$. When $p=0$, the problem is reduced to the famous and classical linear Diophantine problem of Frobenius. $0$-Frobenius number is the classical Frobenius number. Our formula is not only a natural extension of the existing classical formulas, but also has the great advantage that the explicit expressions of values such as the $p$-Frobenius and related numbers can be obtained systematically. The concept and formula of the weighted sum has been given recently. We also give a $p$-generalized formula for such weighted sums. The central role is the $p$-Apéry set, which is a generalization of the classical Apéry set. △ Less

Submitted 12 January, 2024; v1 submitted 22 November, 2021; originally announced November 2021.

Comments: Rev. R. Acad. Cienc. Exactas Fis. Nat., Ser. A Mat., RACSAM (The title is changed from the original manuscript)

MSC Class: 11D07; 05A15; 05A17; 05A19; 11B68; 11D04; 11P81; 20M14

arXiv:2110.05249 [pdf, other]

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

Authors: Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, Shinji Watanabe

Abstract: Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines. Showing great potential for real-time applications, an increasing number of NAR models have been explored in different fields to mitigate the performance gap against AR models. In this work, we con… ▽ More Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines. Showing great potential for real-time applications, an increasing number of NAR models have been explored in different fields to mitigate the performance gap against AR models. In this work, we conduct a comparative study of various NAR modeling methods for end-to-end automatic speech recognition (ASR). Experiments are performed in the state-of-the-art setting using ESPnet. The results on various tasks provide interesting findings for developing an understanding of NAR ASR, such as the accuracy-speed trade-off and robustness against long-form utterances. We also show that the techniques can be combined for further improvement and applied to NAR end-to-end speech translation. All the implementations are publicly available to encourage further research in NAR speech processing. △ Less

Submitted 11 October, 2021; originally announced October 2021.

Comments: Accepted to ASRU2021

arXiv:2107.03590 [pdf, ps, other]

CTM/Zeta Correspondence

Authors: Takashi Komatsu, Norio Konno, Iwao Sato

Abstract: In our previous work, we investigated the relation between zeta functions and discrete-time models including random and quantum walks. In this paper, we introduce a zeta function for the continuous-time model (CTM) and consider CTMs including the corresponding random and quantum walks on the d-dimensional torus. In our previous work, we investigated the relation between zeta functions and discrete-time models including random and quantum walks. In this paper, we introduce a zeta function for the continuous-time model (CTM) and consider CTMs including the corresponding random and quantum walks on the d-dimensional torus. △ Less

Submitted 15 March, 2022; v1 submitted 8 July, 2021; originally announced July 2021.

Comments: 9 pages, minor corrections, Quantum Studies: Mathematics and Foundations, Volume 9, pp.165-173 (2022)

arXiv:2107.03300 [pdf, ps, other]

Vertex-Face/Zeta correspondence

Authors: Takashi Komatsu, Norio Konno, Iwao Sato

Abstract: We present the characteristic polynomial for the transition matrix of a vertex-face walk on a graph, and obtain its spectra. Furthermore, we express the characteristic polynomial for the transition matrix of a vertex-face walk on the 2-dimensional torus by using its adjacency matrix, and obtain its spectra. As an application, we define a new walk-type zeta function with respect to the transition m… ▽ More We present the characteristic polynomial for the transition matrix of a vertex-face walk on a graph, and obtain its spectra. Furthermore, we express the characteristic polynomial for the transition matrix of a vertex-face walk on the 2-dimensional torus by using its adjacency matrix, and obtain its spectra. As an application, we define a new walk-type zeta function with respect to the transition matrix of a vertex-face walk on the 2-dimensional torus, and present its explicit formula. △ Less

Submitted 4 July, 2021; originally announced July 2021.

Comments: 14 pages. arXiv admin note: text overlap with arXiv:2103.12971, arXiv:2011.14162

MSC Class: 60F05; 05C10; 05C50; 15A15

arXiv:2106.03439 [pdf, other]

doi 10.1093/ptep/ptac068

Possibility of multi-step electroweak phase transition in the two Higgs doublet models

Authors: Mayumi Aoki, Takatoshi Komatsu, Hiroto Shibuya

Abstract: We discuss whether a multi-step electroweak phase transition (EWPT) occurs in two Higgs doublet models (2HDMs). The EWPT is related to interesting phenomena such as baryogenesis and a gravitational wave from it. We examine parameter regions in CP-conserving 2HDMs and find certain areas where the multi-step EWPTs occur. The parameter search shows the multi-step EWPT prefers the scalar potential wit… ▽ More We discuss whether a multi-step electroweak phase transition (EWPT) occurs in two Higgs doublet models (2HDMs). The EWPT is related to interesting phenomena such as baryogenesis and a gravitational wave from it. We examine parameter regions in CP-conserving 2HDMs and find certain areas where the multi-step EWPTs occur. The parameter search shows the multi-step EWPT prefers the scalar potential with the approximate $Z_2$ symmetry and a mass hierarchy between the neutral CP-odd and CP-even extra scalar bosons $m_A<m_H$. By contrast, the multi-step EWPT whose first step is strongly first order favors a mass hierarchy $m_A>m_H$. In addition, we compute the Higgs trilinear coupling in the parameter region where the multi-step EWPTs occur, which can be observed at future colliders. We also discuss a multi-peaked gravitational wave from a multi-step EWPT. △ Less

Submitted 15 June, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

Comments: 34 pages, 20 figures, 5 tables; Published version

Report number: KANAZAWA-21-08

Journal ref: Prog Theor Exp Phys (2022)

arXiv:2105.14270 [pdf, ps, other]

A discontinuity of the energy of quantum walk in impurities

Authors: Kenta Higuchi, Takashi Komatsu, Norio Konno, Hisashi Morioka, Etsuo Segawa

Abstract: We consider the discrete-time quantum walk whose local dynamics is denoted by $C$ at the perturbed region $\{0,1,\dots,M-1\}$ and free at the other positions. We obtain the stationary state with a bounded initial state. The initial state is set so that the perturbed region receives the inflow $ω^n$ at time $n$ $(|ω|=1)$. From this expression, we compute the scattering on the surface of $-1$ and… ▽ More We consider the discrete-time quantum walk whose local dynamics is denoted by $C$ at the perturbed region $\{0,1,\dots,M-1\}$ and free at the other positions. We obtain the stationary state with a bounded initial state. The initial state is set so that the perturbed region receives the inflow $ω^n$ at time $n$ $(|ω|=1)$. From this expression, we compute the scattering on the surface of $-1$ and $M$ and also compute the quantity how quantum walker accumulates in the perturbed region; namely the energy of the quantum walk, in the long time limit. We find a discontinuity of the energy with respect to the frequency of the inflow. △ Less

Submitted 29 May, 2021; originally announced May 2021.

Comments: 16 pages. arXiv admin note: text overlap with arXiv:1912.11555

arXiv:2105.08277 [pdf, ps, other]

Asymmetric Circular Graph with Hosoya Index and Negative Continued Fractions

Authors: Takao Komatsu

Abstract: It has been known that the Hosoya index of caterpillar graph can be calculated as the numerator of the simple continued fraction. Recently, the author \cite{Komatsu2020} introduces a more general graph called caterpillar-bond graph and shows that its Hosoya index can be calculated as the numerator of the general continued fraction. In this paper, we show how the Hosoya index of the graph with no… ▽ More It has been known that the Hosoya index of caterpillar graph can be calculated as the numerator of the simple continued fraction. Recently, the author \cite{Komatsu2020} introduces a more general graph called caterpillar-bond graph and shows that its Hosoya index can be calculated as the numerator of the general continued fraction. In this paper, we show how the Hosoya index of the graph with non-uniform ring structure can be calculated from the negative continued fraction. We also give the relation between some radial graphs and multidimensional continued fractions in the sense of the Hosoya index. △ Less

Submitted 21 November, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

Journal ref: Carpathian Math. Publ. 10 (2021), No.3, 608--618

arXiv:2105.08274 [pdf, ps, other]

Weighted Sylvester sums on the Frobenius set

Authors: Takao Komatsu, Yuan Zhang

Abstract: Let $a$ and $b$ be relatively prime positive integers. In this paper the weighted sum $\sum_{n\in{\rm NR}(a,b)}λ^{n-1}n^m$ is given explicitly or in terms of the Apostol-Bernoulli numbers, where $m$ is a nonnegative integer, and ${\rm NR}(a,b)$ denotes the set of positive integers nonrepresentable in terms of $a$ and $b$. Let $a$ and $b$ be relatively prime positive integers. In this paper the weighted sum $\sum_{n\in{\rm NR}(a,b)}λ^{n-1}n^m$ is given explicitly or in terms of the Apostol-Bernoulli numbers, where $m$ is a nonnegative integer, and ${\rm NR}(a,b)$ denotes the set of positive integers nonrepresentable in terms of $a$ and $b$. △ Less

Submitted 18 May, 2021; originally announced May 2021.

Comments: Irish Math. Soc. Bull. (to appear)

arXiv:2105.04056 [pdf, ps, other]

IPS/Zeta Correspondence

Authors: Takashi Komatsu, Norio Konno, Iwao Sato

Abstract: Our previous works presented zeta functions by the Konno-Sato theorem or the Fourier analysis for one-particle models including random walks, correlated random walks, quantum walks, and open quantum random walks. This paper introduces a new zeta function for multi-particle models with probabilistic or quantum interactions, called the interacting particle system (IPS). We compute the zeta function… ▽ More Our previous works presented zeta functions by the Konno-Sato theorem or the Fourier analysis for one-particle models including random walks, correlated random walks, quantum walks, and open quantum random walks. This paper introduces a new zeta function for multi-particle models with probabilistic or quantum interactions, called the interacting particle system (IPS). We compute the zeta function for some tensor-type IPSs. △ Less

Submitted 6 February, 2022; v1 submitted 9 May, 2021; originally announced May 2021.

Comments: 19 pages

Journal ref: Quantum Information and Computation, Vol.22, No.3 & 4, pp.251-269 (2022)

arXiv:2105.02678 [pdf, ps, other]

The trace formula with respect to the twisted Grover matrix of a mixed digraph

Authors: Takashi Komatsu, Sho Kubota, Norio Konno, Iwao Sato

Abstract: We define a zeta function woth respect to the twisted Grover matrix of a mixed digraph, and present an exponential expression and a determinant expression of this zeta function. As an application, we give a trace formula with respect to the twisted Grover matrix of a mixed digraph. We define a zeta function woth respect to the twisted Grover matrix of a mixed digraph, and present an exponential expression and a determinant expression of this zeta function. As an application, we give a trace formula with respect to the twisted Grover matrix of a mixed digraph. △ Less

Submitted 5 May, 2021; originally announced May 2021.

Comments: 17 pages

MSC Class: 05C50; 11M06

arXiv:2105.02677 [pdf, ps, other]

The scattering matrix with respect to an Hermitian matrix of a graph

Authors: Takashi Komatsu, Norio Konno, Iwao Sato

Abstract: Recently, Gnutzmann and Smilansky presented a formula for the bond scattering matrix of a graph with respect to a Hermitian matrix. We present another proof for this Gnutzmann and Smilansky's formula by a technique used in the zeta function of a graph. Furthermore, we generalize Gnutzmann and Smilansky's formula to a regular covering of a graph. Finally, we define an $L$-fuction of a graph, and pr… ▽ More Recently, Gnutzmann and Smilansky presented a formula for the bond scattering matrix of a graph with respect to a Hermitian matrix. We present another proof for this Gnutzmann and Smilansky's formula by a technique used in the zeta function of a graph. Furthermore, we generalize Gnutzmann and Smilansky's formula to a regular covering of a graph. Finally, we define an $L$-fuction of a graph, and present a determinant expression. As a corollary, we express the generalization of Gnutzmann and Smilansky's formula to a regular covering of a graph by using its $L$-functions. △ Less

Submitted 5 May, 2021; originally announced May 2021.

Comments: 21 pages. arXiv admin note: substantial text overlap with arXiv:1211.4719

MSC Class: 05C50; 15A15

arXiv:2104.10328 [pdf, ps, other]

Label-Synchronous Speech-to-Text Alignment for ASR Using Forward and Backward Transformers

Authors: Yusuke Kida, Tatsuya Komatsu, Masahito Togami

Abstract: This paper proposes a novel label-synchronous speech-to-text alignment technique for automatic speech recognition (ASR). The speech-to-text alignment is a problem of splitting long audio recordings with un-aligned transcripts into utterance-wise pairs of speech and text. Unlike conventional methods based on frame-synchronous prediction, the proposed method re-defines the speech-to-text alignment a… ▽ More This paper proposes a novel label-synchronous speech-to-text alignment technique for automatic speech recognition (ASR). The speech-to-text alignment is a problem of splitting long audio recordings with un-aligned transcripts into utterance-wise pairs of speech and text. Unlike conventional methods based on frame-synchronous prediction, the proposed method re-defines the speech-to-text alignment as a label-synchronous text mapping problem. This enables an accurate alignment benefiting from the strong inference ability of the state-of-the-art attention-based encoder-decoder models, which cannot be applied to the conventional methods. Two different Transformer models named forward Transformer and backward Transformer are respectively used for estimating an initial and final tokens of a given speech segment based on end-of-sentence prediction with teacher-forcing. Experiments using the corpus of spontaneous Japanese (CSJ) demonstrate that the proposed method provides an accurate utterance-wise alignment, that matches the manually annotated alignment with as few as 0.2% errors. It is also confirmed that a Transformer-based hybrid CTC/Attention ASR model using the aligned speech and text pairs as an additional training data reduces character error rates relatively up to 59.0%, which is significantly better than 39.0% reduction by a conventional alignment method based on connectionist temporal classification model. △ Less

Submitted 20 April, 2021; originally announced April 2021.

Comments: Submitted to INTERSPEECH 2021

arXiv:2104.10287 [pdf, ps, other]

Walk/Zeta Correspondence

Authors: Takashi Komatsu, Norio Konno, Iwao Sato

Abstract: Our previous work presented explicit formulas for the generalized zeta function and the generalized Ihara zeta function corresponding to the Grover walk and the positive-support version of the Grover walk on the regular graph via the Konno-Sato theorem, respectively. This paper extends these walks to a class of walks including random walks, correlated random walks, quantum walks, and open quantum… ▽ More Our previous work presented explicit formulas for the generalized zeta function and the generalized Ihara zeta function corresponding to the Grover walk and the positive-support version of the Grover walk on the regular graph via the Konno-Sato theorem, respectively. This paper extends these walks to a class of walks including random walks, correlated random walks, quantum walks, and open quantum random walks on the torus by the Fourier analysis. △ Less

Submitted 20 December, 2022; v1 submitted 20 April, 2021; originally announced April 2021.

Comments: 31 pages

Journal ref: Journal of Statistical Physics, volume 190, Article number: 36 (2023)

Showing 1–50 of 146 results for author: Komatsu, T