Search | arXiv e-print repository

SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling

Authors: Hiroshi Sato, Takafumi Moriya, Masato Mimura, Shota Horiguchi, Tsubasa Ochiai, Takanori Ashihara, Atsushi Ando, Kentaro Shinayama, Marc Delcroix

Abstract: Real-time target speaker extraction (TSE) is intended to extract the desired speaker's voice from the observed mixture of multiple speakers in a streaming manner. Implementing real-time TSE is challenging as the computational complexity must be reduced to provide real-time operation. This work introduces to Conv-TasNet-based TSE a new architecture based on state space modeling (SSM) that has been… ▽ More Real-time target speaker extraction (TSE) is intended to extract the desired speaker's voice from the observed mixture of multiple speakers in a streaming manner. Implementing real-time TSE is challenging as the computational complexity must be reduced to provide real-time operation. This work introduces to Conv-TasNet-based TSE a new architecture based on state space modeling (SSM) that has been shown to model long-term dependency effectively. Owing to SSM, fewer dilated convolutional layers are required to capture temporal dependency in Conv-TasNet, resulting in the reduction of model complexity. We also enlarge the window length and shift of the convolutional (TasNet) frontend encoder to reduce the computational cost further; the performance decline is compensated by over-parameterization of the frontend encoder. The proposed method reduces the real-time factor by 78% from the conventional causal Conv-TasNet-based TSE while matching its performance. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Accepted to Interspeech 2024

arXiv:2406.06597 [pdf, other]

doi 10.1109/TrustCom60117.2023.00376

1-D CNN-Based Online Signature Verification with Federated Learning

Authors: Lingfeng Zhang, Yuheng Guo, Yepeng Ding, Hiroyuki Sato

Abstract: Online signature verification plays a pivotal role in security infrastructures. However, conventional online signature verification models pose significant risks to data privacy, especially during training processes. To mitigate these concerns, we propose a novel federated learning framework that leverages 1-D Convolutional Neural Networks (CNN) for online signature verification. Furthermore, our… ▽ More Online signature verification plays a pivotal role in security infrastructures. However, conventional online signature verification models pose significant risks to data privacy, especially during training processes. To mitigate these concerns, we propose a novel federated learning framework that leverages 1-D Convolutional Neural Networks (CNN) for online signature verification. Furthermore, our experiments demonstrate the effectiveness of our framework regarding 1-D CNN and federated learning. Particularly, the experiment results highlight that our framework 1) minimizes local computational resources; 2) enhances transfer effects with substantial initialization data; 3) presents remarkable scalability. The centralized 1-D CNN model achieves an Equal Error Rate (EER) of 3.33% and an accuracy of 96.25%. Meanwhile, configurations with 2, 5, and 10 agents yield EERs of 5.42%, 5.83%, and 5.63%, along with accuracies of 95.21%, 94.17%, and 94.06%, respectively. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 8 pages, 11 figures, 1 table

arXiv:2406.03713 [pdf]

Gait-Adaptive Navigation and Human Searching in field with Cyborg Insect

Authors: Phuoc Thanh Tran-Ngoc, Huu Duoc Nguyen, Duc Long Le, Rui Li, Bing Sheng Chong, Hirotaka Sato

Abstract: This study focuses on improving the ability of cyborg insects to navigate autonomously during search and rescue missions in outdoor environments. We propose an algorithm that leverages data from an IMU to calculate orientation and position based on the insect's walking gait. These computed factors serve as essential feedback channels across 3 phases of our exploration. Our method functions without… ▽ More This study focuses on improving the ability of cyborg insects to navigate autonomously during search and rescue missions in outdoor environments. We propose an algorithm that leverages data from an IMU to calculate orientation and position based on the insect's walking gait. These computed factors serve as essential feedback channels across 3 phases of our exploration. Our method functions without relying on external systems. The results of our trials, carried out in both indoor (4.8 x 6.6 m^2) and outdoor (3.5 x 6.0 m^2) settings, show that the cyborg insect is capable of seeking a human without knowing the human's position. This exploration strategy would help to bring terrestrial cyborg insects closer to practical application in real-life search and rescue (SAR) missions. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 35 pages, 9 figures

arXiv:2406.00620 [pdf, other]

doi 10.1109/TrustCom60117.2023.00230

Model-Driven Security Analysis of Self-Sovereign Identity Systems

Authors: Yepeng Ding, Hiroyuki Sato

Abstract: Best practices of self-sovereign identity (SSI) are being intensively explored in academia and industry. Reusable solutions obtained from best practices are generalized as architectural patterns for systematic analysis and design reference, which significantly boosts productivity and increases the dependability of future implementations. For security-sensitive projects, architects make architectur… ▽ More Best practices of self-sovereign identity (SSI) are being intensively explored in academia and industry. Reusable solutions obtained from best practices are generalized as architectural patterns for systematic analysis and design reference, which significantly boosts productivity and increases the dependability of future implementations. For security-sensitive projects, architects make architectural decisions with careful consideration of security issues and solutions based on formal analysis and experiment results. In this paper, we propose a model-driven security analysis framework for analyzing architectural patterns of SSI systems with respect to a threat model built on our investigation of real-world security concerns. Our framework mechanizes a modeling language to formalize patterns and threats with security properties in temporal logic and automatically generates programs for verification via model checking. Besides, we present typical vulnerable patterns verified by SecureSSI, a standalone integrated development environment, integrating commonly used pattern and attacker models to practicalize our framework. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.09033 [pdf, other]

Accelerating Decision Diagram-based Multi-node Quantum Simulation with Ring Communication and Automatic SWAP Insertion

Authors: Yusuke Kimura, Shaowen Li, Hiroyuki Sato, Masahiro Fujita

Abstract: An N-bit quantum state requires a vector of length $2^N$, leading to an exponential increase in the required memory with N in conventional statevector-based quantum simulators. A proposed solution to this issue is the decision diagram-based quantum simulator, which can significantly decrease the necessary memory and is expected to operate faster for specific quantum circuits. However, decision dia… ▽ More An N-bit quantum state requires a vector of length $2^N$, leading to an exponential increase in the required memory with N in conventional statevector-based quantum simulators. A proposed solution to this issue is the decision diagram-based quantum simulator, which can significantly decrease the necessary memory and is expected to operate faster for specific quantum circuits. However, decision diagram-based quantum simulators are not easily parallelizable because data must be manipulated dynamically, and most implementations run on one thread. This paper introduces ring communication-based optimal parallelization and automatic swap insertion techniques for multi-node implementation of decision diagram-based quantum simulators. The ring communication approach is designed so that each node communicates with its neighboring nodes, which can facilitate faster and more parallel communication than broadcasting where one node needs to communicate with all nodes simultaneously. The automatic swap insertion method, an approach to minimize inter-node communication, has been employed in existing multi-node state vector-based simulators, but this paper proposes two methods specifically designed for decision diagram-based quantum simulators. These techniques were implemented and evaluated using the Shor algorithm and random circuits with up to 38 qubits using a maximum of 256 nodes. The experimental results have revealed that multi-node implementation can reduce run-time by up to 26 times. For example, Shor circuits that need 38 qubits can finish simulation in 147 seconds. Additionally, it was shown that ring communication has a higher speed-up effect than broadcast communication, and the importance of selecting the appropriate automatic swap insertion method was revealed. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: Accepted at IEEE QSW 2024

arXiv:2404.14860 [pdf, other]

Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance

Authors: Tsubasa Ochiai, Kazuma Iwamoto, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri

Abstract: It is challenging to improve automatic speech recognition (ASR) performance in noisy conditions with a single-channel speech enhancement (SE) front-end. This is generally attributed to the processing distortions caused by the nonlinear processing of single-channel SE front-ends. However, the causes of such degraded ASR performance have not been fully investigated. How to design single-channel SE f… ▽ More It is challenging to improve automatic speech recognition (ASR) performance in noisy conditions with a single-channel speech enhancement (SE) front-end. This is generally attributed to the processing distortions caused by the nonlinear processing of single-channel SE front-ends. However, the causes of such degraded ASR performance have not been fully investigated. How to design single-channel SE front-ends in a way that significantly improves ASR performance remains an open research question. In this study, we investigate a signal-level numerical metric that can explain the cause of degradation in ASR performance. To this end, we propose a novel analysis scheme based on the orthogonal projection-based decomposition of SE errors. This scheme manually modifies the ratio of the decomposed interference, noise, and artifact errors, and it enables us to directly evaluate the impact of each error type on ASR performance. Our analysis reveals the particularly detrimental effect of artifact errors on ASR performance compared to the other types of errors. This provides us with a more principled definition of processing distortions that cause the ASR performance degradation. Then, we study two practical approaches for reducing the impact of artifact errors. First, we prove that the simple observation adding (OA) post-processing (i.e., interpolating the enhanced and observed signals) can monotonically improve the signal-to-artifact ratio. Second, we propose a novel training objective, called artifact-boosted signal-to-distortion ratio (AB-SDR), which forces the model to estimate the enhanced signals with fewer artifact errors. Through experiments, we confirm that both the OA and AB-SDR approaches are effective in decreasing artifact errors caused by single-channel SE front-ends, allowing them to significantly improve ASR performance. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 13 pages, 6 figures, Submitted to IEEE/ACM Trans. Audio, Speech, and Language Processing

arXiv:2404.10376 [pdf, other]

Hunting DeFi Vulnerabilities via Context-Sensitive Concolic Verification

Authors: Yepeng Ding, Arthur Gervais, Roger Wattenhofer, Hiroyuki Sato

Abstract: Decentralized finance (DeFi) is revolutionizing the traditional centralized finance paradigm with its attractive features such as high availability, transparency, and tamper-proofing. However, attacks targeting DeFi services have severely damaged the DeFi market, as evidenced by our investigation of 80 real-world DeFi incidents from 2017 to 2022. Existing methods, based on symbolic execution, mode… ▽ More Decentralized finance (DeFi) is revolutionizing the traditional centralized finance paradigm with its attractive features such as high availability, transparency, and tamper-proofing. However, attacks targeting DeFi services have severely damaged the DeFi market, as evidenced by our investigation of 80 real-world DeFi incidents from 2017 to 2022. Existing methods, based on symbolic execution, model checking, semantic analysis, and fuzzing, fall short in identifying the most DeFi vulnerability types. To address the deficiency, we propose Context-Sensitive Concolic Verification (CSCV), a method of automating the DeFi vulnerability finding based on user-defined properties formulated in temporal logic. CSCV builds and optimizes contexts to guide verification processes that dynamically construct context-carrying transition systems in tandem with concolic executions. Furthermore, we demonstrate the effectiveness of CSCV through experiments on real-world DeFi services and qualitative comparison. The experiment results show that our CSCV prototype successfully detects 76.25% of the vulnerabilities from the investigated incidents with an average time of 253.06 seconds. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2403.17496 [pdf, other]

Dr.Hair: Reconstructing Scalp-Connected Hair Strands without Pre-training via Differentiable Rendering of Line Segments

Authors: Yusuke Takimoto, Hikari Takehara, Hiroyuki Sato, Zihao Zhu, Bo Zheng

Abstract: In the film and gaming industries, achieving a realistic hair appearance typically involves the use of strands originating from the scalp. However, reconstructing these strands from observed surface images of hair presents significant challenges. The difficulty in acquiring Ground Truth (GT) data has led state-of-the-art learning-based methods to rely on pre-training with manually prepared synthet… ▽ More In the film and gaming industries, achieving a realistic hair appearance typically involves the use of strands originating from the scalp. However, reconstructing these strands from observed surface images of hair presents significant challenges. The difficulty in acquiring Ground Truth (GT) data has led state-of-the-art learning-based methods to rely on pre-training with manually prepared synthetic CG data. This process is not only labor-intensive and costly but also introduces complications due to the domain gap when compared to real-world data. In this study, we propose an optimization-based approach that eliminates the need for pre-training. Our method represents hair strands as line segments growing from the scalp and optimizes them using a novel differentiable rendering algorithm. To robustly optimize a substantial number of slender explicit geometries, we introduce 3D orientation estimation utilizing global optimization, strand initialization based on Laplace's equation, and reparameterization that leverages geometric connectivity and spatial proximity. Unlike existing optimization-based methods, our method is capable of reconstructing internal hair flow in an absolute direction. Our method exhibits robust and accurate inverse rendering, surpassing the quality of existing methods and significantly improving processing speed. △ Less

Submitted 29 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

Comments: CVPR 2024

arXiv:2403.17392 [pdf, other]

Natural-artificial hybrid swarm: Cyborg-insect group navigation in unknown obstructed soft terrain

Authors: Yang Bai, Phuoc Thanh Tran Ngoc, Huu Duoc Nguyen, Duc Long Le, Quang Huy Ha, Kazuki Kai, Yu Xiang See To, Yaosheng Deng, Jie Song, Naoki Wakamiya, Hirotaka Sato, Masaki Ogura

Abstract: Navigating multi-robot systems in complex terrains has always been a challenging task. This is due to the inherent limitations of traditional robots in collision avoidance, adaptation to unknown environments, and sustained energy efficiency. In order to overcome these limitations, this research proposes a solution by integrating living insects with miniature electronic controllers to enable roboti… ▽ More Navigating multi-robot systems in complex terrains has always been a challenging task. This is due to the inherent limitations of traditional robots in collision avoidance, adaptation to unknown environments, and sustained energy efficiency. In order to overcome these limitations, this research proposes a solution by integrating living insects with miniature electronic controllers to enable robotic-like programmable control, and proposing a novel control algorithm for swarming. Although these creatures, called cyborg insects, have the ability to instinctively avoid collisions with neighbors and obstacles while adapting to complex terrains, there is a lack of literature on the control of multi-cyborg systems. This research gap is due to the difficulty in coordinating the movements of a cyborg system under the presence of insects' inherent individual variability in their reactions to control input. In response to this issue, we propose a novel swarm navigation algorithm addressing these challenges. The effectiveness of the algorithm is demonstrated through an experimental validation in which a cyborg swarm was successfully navigated through an unknown sandy field with obstacles and hills. This research contributes to the domain of swarm robotics and showcases the potential of integrating biological organisms with robotics and control theory to create more intelligent autonomous systems with real-world applications. △ Less

Submitted 27 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

arXiv:2401.17053 [pdf, other]

BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation

Authors: Zhennan Wu, Yang Li, Han Yan, Taizhang Shang, Weixuan Sun, Senbo Wang, Ruikai Cui, Weizhe Liu, Hiroyuki Sato, Hongdong Li, Pan Ji

Abstract: We present BlockFusion, a diffusion-based model that generates 3D scenes as unit blocks and seamlessly incorporates new blocks to extend the scene. BlockFusion is trained using datasets of 3D blocks that are randomly cropped from complete 3D scene meshes. Through per-block fitting, all training blocks are converted into the hybrid neural fields: with a tri-plane containing the geometry features, f… ▽ More We present BlockFusion, a diffusion-based model that generates 3D scenes as unit blocks and seamlessly incorporates new blocks to extend the scene. BlockFusion is trained using datasets of 3D blocks that are randomly cropped from complete 3D scene meshes. Through per-block fitting, all training blocks are converted into the hybrid neural fields: with a tri-plane containing the geometry features, followed by a Multi-layer Perceptron (MLP) for decoding the signed distance values. A variational auto-encoder is employed to compress the tri-planes into the latent tri-plane space, on which the denoising diffusion process is performed. Diffusion applied to the latent representations allows for high-quality and diverse 3D scene generation. To expand a scene during generation, one needs only to append empty blocks to overlap with the current scene and extrapolate existing latent tri-planes to populate new blocks. The extrapolation is done by conditioning the generation process with the feature samples from the overlapping tri-planes during the denoising iterations. Latent tri-plane extrapolation produces semantically and geometrically meaningful transitions that harmoniously blend with the existing scene. A 2D layout conditioning mechanism is used to control the placement and arrangement of scene elements. Experimental results indicate that BlockFusion is capable of generating diverse, geometrically consistent and unbounded large 3D scenes with unprecedented high-quality shapes in both indoor and outdoor scenarios. △ Less

Submitted 23 May, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: ACM Transactions on Graphics (SIGGRAPH'24). Code: https://yang-l1.github.io/blockfusion

arXiv:2401.05111 [pdf, other]

Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters

Authors: Kenichi Fujita, Hiroshi Sato, Takanori Ashihara, Hiroki Kanagawa, Marc Delcroix, Takafumi Moriya, Yusuke Ijima

Abstract: The zero-shot text-to-speech (TTS) method, based on speaker embeddings extracted from reference speech using self-supervised learning (SSL) speech representations, can reproduce speaker characteristics very accurately. However, this approach suffers from degradation in speech synthesis quality when the reference speech contains noise. In this paper, we propose a noise-robust zero-shot TTS method.… ▽ More The zero-shot text-to-speech (TTS) method, based on speaker embeddings extracted from reference speech using self-supervised learning (SSL) speech representations, can reproduce speaker characteristics very accurately. However, this approach suffers from degradation in speech synthesis quality when the reference speech contains noise. In this paper, we propose a noise-robust zero-shot TTS method. We incorporated adapters into the SSL model, which we fine-tuned with the TTS model using noisy reference speech. In addition, to further improve performance, we adopted a speech enhancement (SE) front-end. With these improvements, our proposed SSL-based zero-shot TTS achieved high-quality speech synthesis with noisy reference speech. Through the objective and subjective evaluations, we confirmed that the proposed method is highly robust to noise in reference speech, and effectively works in combination with SE. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: 5 pages,3 figures, Accepted to IEEE ICASSP 2024

arXiv:2312.14511 [pdf]

3D Programming of Patterned Heterogeneous Interface for 4D Smart Robotics

Authors: Kewei Song, Chunfeng Xiong, Ze Zhang, Kunlin Wu, Weiyang Wan, Yifan Wang, Shinjiro Umezu, Hirotaka Sato

Abstract: Shape memory structures are playing an important role in many cutting-edge intelligent fields. However, the existing technologies can only realize 4D printing of a single polymer or metal, which limits practical applications. Here, we report a construction strategy for TSMP/M heterointerface, which uses Pd2+-containing shape memory polymer (AP-SMR) to induce electroless plating reaction and relies… ▽ More Shape memory structures are playing an important role in many cutting-edge intelligent fields. However, the existing technologies can only realize 4D printing of a single polymer or metal, which limits practical applications. Here, we report a construction strategy for TSMP/M heterointerface, which uses Pd2+-containing shape memory polymer (AP-SMR) to induce electroless plating reaction and relies on molecular dynamics, which has both shape memory properties and metal activity and information processing power. Through multi-material DLP 3D printing technology, the interface can be 3D selectively programmed on functional substrate parts of arbitrary shapes to become 4D electronic smart devices (Robotics). Microscopically, this type of interface appears as a composite structure with a nanometer-micrometer interface height, which is composed of a pure substrate layer (smart materials), an intermediate layer (a composite structure in which metal particles are embedded in a polymer cross-linked network) and a pure metal layer. The structure programmed by TSMP/M heterointerface exhibits both SMA characteristics and metal properties, thus having more intelligent functions (electroactive, electrothermal deformation, electronically controlled denaturation) and higher performance (selectivity of shape memory structures can be realized control, remote control, inline control and low voltage control). This is expected to provide a more flexible manufacturing process as platform technology for designing, manufacturing and applying smart devices with new concepts, and promote the development of cutting-edge industries such as smart robots and smart electronics. △ Less

Submitted 22 December, 2023; originally announced December 2023.

Comments: 37 Pages, 11 Figures

arXiv:2312.01570 [pdf, other]

doi 10.1109/QSW59989.2023.00026

Parallelizing quantum simulation with decision diagrams

Authors: Shaowen Li, Yusuke Kimura, Hiroyuki Sato, Junwei Yu, Masahiro Fujita

Abstract: Recent technological advancements show promise in leveraging quantum mechanical phenomena for computation. This brings substantial speed-ups to problems that are once considered to be intractable in the classical world. However, the physical realization of quantum computers is still far away from us, and a majority of research work is done using quantum simulators running on classical computers. C… ▽ More Recent technological advancements show promise in leveraging quantum mechanical phenomena for computation. This brings substantial speed-ups to problems that are once considered to be intractable in the classical world. However, the physical realization of quantum computers is still far away from us, and a majority of research work is done using quantum simulators running on classical computers. Classical computers face a critical obstacle in simulating quantum algorithms. Quantum states reside in a Hilbert space whose size grows exponentially to the number of subsystems, i.e., qubits. As a result, the straightforward statevector approach does not scale due to the exponential growth of the memory requirement. Decision diagrams have gained attention in recent years for representing quantum states and operations in quantum simulations. The main advantage of this approach is its ability to exploit redundancy. However, mainstream quantum simulators still rely on statevectors or tensor networks. We consider the absence of decision diagrams due to the lack of parallelization strategies. This work explores several strategies for parallelizing decision diagram operations, specifically for quantum simulations. We propose optimal parallelization strategies. Based on the experiment results, our parallelization strategy achieves a 2-3 times faster simulation of Grover's algorithm and random circuits than the state-of-the-art single-thread DD-based simulator DDSIM. △ Less

Submitted 3 December, 2023; originally announced December 2023.

arXiv:2309.14364 [pdf, other]

Automata Quest: NCAs as a Video Game Life Mechanic

Authors: Hiroki Sato, Tanner Lund, Takahide Yoshida, Atsushi Masumori

Abstract: We study life over the course of video game history as represented by their mechanics. While there have been some variations depending on genre or "character type", we find that most games converge to a similar representation. We also examine the development of Conway's Game of Life (one of the first zero player games) and related automata that have developed over the years. With this history in m… ▽ More We study life over the course of video game history as represented by their mechanics. While there have been some variations depending on genre or "character type", we find that most games converge to a similar representation. We also examine the development of Conway's Game of Life (one of the first zero player games) and related automata that have developed over the years. With this history in mind, we investigate the viability of one popular form of automata, namely Neural Cellular Automata, as a way to more fully express life within video game settings and innovate new game mechanics or gameplay loops. △ Less

Submitted 23 September, 2023; originally announced September 2023.

Comments: This article was submitted to and presented at Alife for and from Video Games Workshop at ALIFE2023, Sappro (Japan)

Journal ref: Alife for and from Video Games Workshop at ALIFE2023

arXiv:2306.03616 [pdf, other]

doi 10.1109/SII55687.2023.10039450

Online Estimation of Self-Body Deflection With Various Sensor Data Based on Directional Statistics

Authors: Hiroya Sato, Kento Kawaharazuka, Tasuku Makabe, Kei Okada, Masayuki Inaba

Abstract: In this paper, we propose a method for online estimation of the robot's posture. Our method uses von Mises and Bingham distributions as probability distributions of joint angles and 3D orientation, which are used in directional statistics. We constructed a particle filter using these distributions and configured a system to estimate the robot's posture from various sensor information (e.g., joint… ▽ More In this paper, we propose a method for online estimation of the robot's posture. Our method uses von Mises and Bingham distributions as probability distributions of joint angles and 3D orientation, which are used in directional statistics. We constructed a particle filter using these distributions and configured a system to estimate the robot's posture from various sensor information (e.g., joint encoders, IMU sensors, and cameras). Furthermore, unlike tangent space approximations, these distributions can handle global features and represent sensor characteristics as observation noises. As an application, we show that the yaw drift of a 6-axis IMU sensor can be represented probabilistically to prevent adverse effects on attitude estimation. For the estimation, we used an approximate model that assumes the actual robot posture can be reproduced by correcting the joint angles of a rigid body model. In the experiment part, we tested the estimator's effectiveness by examining that the joint angles generated with the approximate model can be estimated using the link pose of the same model. We then applied the estimator to the actual robot and confirmed that the gripper position could be estimated, thereby verifying the validity of the approximate model in our situation. △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2306.02273 [pdf, ps, other]

End-to-End Joint Target and Non-Target Speakers ASR

Authors: Ryo Masumura, Naoki Makishima, Taiga Yamane, Yoshihiko Yamazaki, Saki Mizuno, Mana Ihori, Mihiro Uchida, Keita Suzuki, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando

Abstract: This paper proposes a novel automatic speech recognition (ASR) system that can transcribe individual speaker's speech while identifying whether they are target or non-target speakers from multi-talker overlapped speech. Target-speaker ASR systems are a promising way to only transcribe a target speaker's speech by enrolling the target speaker's information. However, in conversational ASR applicatio… ▽ More This paper proposes a novel automatic speech recognition (ASR) system that can transcribe individual speaker's speech while identifying whether they are target or non-target speakers from multi-talker overlapped speech. Target-speaker ASR systems are a promising way to only transcribe a target speaker's speech by enrolling the target speaker's information. However, in conversational ASR applications, transcribing both the target speaker's speech and non-target speakers' ones is often required to understand interactive information. To naturally consider both target and non-target speakers in a single ASR model, our idea is to extend autoregressive modeling-based multi-talker ASR systems to utilize the enrollment speech of the target speaker. Our proposed ASR is performed by recursively generating both textual tokens and tokens that represent target or non-target speakers. Our experiments demonstrate the effectiveness of our proposed method. △ Less

Submitted 4 June, 2023; originally announced June 2023.

Comments: Accepted at Interspeech 2023

arXiv:2305.18947 [pdf, other]

A Probabilistic Rotation Representation for Symmetric Shapes With an Efficiently Computable Bingham Loss Function

Authors: Hiroya Sato, Takuya Ikeda, Koichi Nishiwaki

Abstract: In recent years, a deep learning framework has been widely used for object pose estimation. While quaternion is a common choice for rotation representation, it cannot represent the ambiguity of the observation. In order to handle the ambiguity, the Bingham distribution is one promising solution. However, it requires complicated calculation when yielding the negative log-likelihood (NLL) loss. An a… ▽ More In recent years, a deep learning framework has been widely used for object pose estimation. While quaternion is a common choice for rotation representation, it cannot represent the ambiguity of the observation. In order to handle the ambiguity, the Bingham distribution is one promising solution. However, it requires complicated calculation when yielding the negative log-likelihood (NLL) loss. An alternative easy-to-implement loss function has been proposed to avoid complex computations but has difficulty expressing symmetric distribution. In this paper, we introduce a fast-computable and easy-to-implement NLL loss function for Bingham distribution. We also create the inference network and show that our loss function can capture the symmetric property of target objects from their point clouds. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. arXiv admin note: substantial text overlap with arXiv:2203.04456

arXiv:2305.14723 [pdf, other]

Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss

Authors: Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka, Nobukatsu Hojo

Abstract: Self-supervised learning (SSL) is the latest breakthrough in speech processing, especially for label-scarce downstream tasks by leveraging massive unlabeled audio data. The noise robustness of the SSL is one of the important challenges to expanding its application. We can use speech enhancement (SE) to tackle this issue. However, the mismatch between the SE model and SSL models potentially limits… ▽ More Self-supervised learning (SSL) is the latest breakthrough in speech processing, especially for label-scarce downstream tasks by leveraging massive unlabeled audio data. The noise robustness of the SSL is one of the important challenges to expanding its application. We can use speech enhancement (SE) to tackle this issue. However, the mismatch between the SE model and SSL models potentially limits its effect. In this work, we propose a new SE training criterion that minimizes the distance between clean and enhanced signals in the feature representation of the SSL model to alleviate the mismatch. We expect that the loss in the SSL domain could guide SE training to preserve or enhance various levels of characteristics of the speech signals that may be required for high-level downstream tasks. Experiments show that our proposal improves the performance of an SE and SSL pipeline on five downstream tasks with noisy input while maintaining the SE performance. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: 4 pages , 2 figures, Accepted to Interspeech 2023

arXiv:2303.11641 [pdf, other]

doi 10.1109/SDS57574.2022.10062918

Leveraging Self-Sovereign Identity in Decentralized Data Aggregation

Authors: Yepeng Ding, Hiroyuki Sato, Maro G. Machizawa

Abstract: Data aggregation has been widely implemented as an infrastructure of data-driven systems. However, a centralized data aggregation model requires a set of strong trust assumptions to ensure security and privacy. In recent years, decentralized data aggregation has become realizable based on distributed ledger technology. Nevertheless, the lack of appropriate centralized mechanisms like identity mana… ▽ More Data aggregation has been widely implemented as an infrastructure of data-driven systems. However, a centralized data aggregation model requires a set of strong trust assumptions to ensure security and privacy. In recent years, decentralized data aggregation has become realizable based on distributed ledger technology. Nevertheless, the lack of appropriate centralized mechanisms like identity management mechanisms carries risks such as impersonation and unauthorized access. In this paper, we propose a novel decentralized data aggregation framework by leveraging self-sovereign identity, an emerging identity model, to lift the trust assumptions in centralized models and eliminate identity-related risks. Our framework formulates the aggregation protocol regarding data persistence and acquisition aspects, considering security, efficiency, flexibility, and compatibility. Furthermore, we demonstrate the applicability of our framework via a use case study where we concretize and apply our framework in a decentralized neuroscience data aggregation scenario. △ Less

Submitted 21 March, 2023; originally announced March 2023.

arXiv:2303.10990 [pdf]

Resilient conductive membrane synthesized by in-situ polymerisation for wearable non-invasive electronics on moving appendages of cyborg insect

Authors: Qifeng Lin, Rui Li, Feilong Zhang, Kai Kazuki, Ong Zong Chen, Xiaodong Chen, Hirotaka Sato

Abstract: By leveraging their high mobility and small size, insects have been combined with microcontrollers to build up cyborg insects for various practical applications. Unfortunately, all current cyborg insects rely on implanted electrodes to control their movement, which causes irreversible damage to their organs and muscles. Here, we develop a non-invasive method for cyborg insects to address above iss… ▽ More By leveraging their high mobility and small size, insects have been combined with microcontrollers to build up cyborg insects for various practical applications. Unfortunately, all current cyborg insects rely on implanted electrodes to control their movement, which causes irreversible damage to their organs and muscles. Here, we develop a non-invasive method for cyborg insects to address above issues, using a conformal electrode with an in-situ polymerized ion-conducting layer and an electron-conducting layer. The neural and locomotion responses to the electrical inductions verify the efficient communication between insects and controllers by the non-invasive method. The precise "S" line following of the cyborg insect further demonstrates its potential in practical navigation. The conformal non-invasive electrodes keep the intactness of the insects used while controlling their motion. With the antennae, important olfactory organs of insects preserved, the cyborg insect, in the future, may be endowed with abilities to detect the surrounding environment. △ Less

Submitted 20 March, 2023; originally announced March 2023.

Comments: 27 pages

arXiv:2210.15937 [pdf, other]

On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis

Authors: Atsushi Ando, Ryo Masumura, Akihiko Takashima, Satoshi Suzuki, Naoki Makishima, Keita Suzuki, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato

Abstract: This paper investigates the effectiveness and implementation of modality-specific large-scale pre-trained encoders for multimodal sentiment analysis~(MSA). Although the effectiveness of pre-trained encoders in various fields has been reported, conventional MSA methods employ them for only linguistic modality, and their application has not been investigated. This paper compares the features yielded… ▽ More This paper investigates the effectiveness and implementation of modality-specific large-scale pre-trained encoders for multimodal sentiment analysis~(MSA). Although the effectiveness of pre-trained encoders in various fields has been reported, conventional MSA methods employ them for only linguistic modality, and their application has not been investigated. This paper compares the features yielded by large-scale pre-trained encoders with conventional heuristic features. One each of the largest pre-trained encoders publicly available for each modality are used; CLIP-ViT, WavLM, and BERT for visual, acoustic, and linguistic modalities, respectively. Experiments on two datasets reveal that methods with domain-specific pre-trained encoders attain better performance than those with conventional features in both unimodal and multimodal scenarios. We also find it better to use the outputs of the intermediate layers of the encoders than those of the output layer. The codes are available at https://github.com/ando-hub/MSA_Pretrain. △ Less

Submitted 28 October, 2022; originally announced October 2022.

Comments: Accepted to SLT 2022

arXiv:2209.04175 [pdf, other]

Streaming Target-Speaker ASR with Neural Transducer

Authors: Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takahiro Shinozaki

Abstract: Although recent advances in deep learning technology have boosted automatic speech recognition (ASR) performance in the single-talker case, it remains difficult to recognize multi-talker speech in which many voices overlap. One conventional approach to tackle this problem is to use a cascade of a speech separation or target speech extraction front-end with an ASR back-end. However, the extra compu… ▽ More Although recent advances in deep learning technology have boosted automatic speech recognition (ASR) performance in the single-talker case, it remains difficult to recognize multi-talker speech in which many voices overlap. One conventional approach to tackle this problem is to use a cascade of a speech separation or target speech extraction front-end with an ASR back-end. However, the extra computation costs of the front-end module are a critical barrier to quick response, especially for streaming ASR. In this paper, we propose a target-speaker ASR (TS-ASR) system that implicitly integrates the target speech extraction functionality within a streaming end-to-end (E2E) ASR system, i.e. recurrent neural network-transducer (RNNT). Our system uses a similar idea as adopted for target speech extraction, but implements it directly at the level of the encoder of RNNT. This allows TS-ASR to be realized without placing extra computation costs on the front-end. Note that this study presents two major differences between prior studies on E2E TS-ASR; we investigate streaming models and base our study on Conformer models, whereas prior studies used RNN-based systems and considered only offline processing. We confirm in experiments that our TS-ASR achieves comparable recognition performance with conventional cascade systems in the offline setting, while reducing computation costs and realizing streaming TS-ASR. △ Less

Submitted 19 September, 2022; v1 submitted 9 September, 2022; originally announced September 2022.

Comments: Accepted to Interspeech 2022

arXiv:2206.09628 [pdf, other]

Diversified Adversarial Attacks based on Conjugate Gradient Method

Authors: Keiichiro Yamamura, Haruki Sato, Nariaki Tateiwa, Nozomi Hata, Toru Mitsutake, Issa Oe, Hiroki Ishikura, Katsuki Fujisawa

Abstract: Deep learning models are vulnerable to adversarial examples, and adversarial attacks used to generate such examples have attracted considerable research interest. Although existing methods based on the steepest descent have achieved high attack success rates, ill-conditioned problems occasionally reduce their performance. To address this limitation, we utilize the conjugate gradient (CG) method, w… ▽ More Deep learning models are vulnerable to adversarial examples, and adversarial attacks used to generate such examples have attracted considerable research interest. Although existing methods based on the steepest descent have achieved high attack success rates, ill-conditioned problems occasionally reduce their performance. To address this limitation, we utilize the conjugate gradient (CG) method, which is effective for this type of problem, and propose a novel attack algorithm inspired by the CG method, named the Auto Conjugate Gradient (ACG) attack. The results of large-scale evaluation experiments conducted on the latest robust models show that, for most models, ACG was able to find more adversarial examples with fewer iterations than the existing SOTA algorithm Auto-PGD (APGD). We investigated the difference in search performance between ACG and APGD in terms of diversification and intensification, and define a measure called Diversity Index (DI) to quantify the degree of diversity. From the analysis of the diversity using this index, we show that the more diverse search of the proposed method remarkably improves its attack success rate. △ Less

Submitted 19 July, 2022; v1 submitted 20 June, 2022; originally announced June 2022.

Comments: Proceedings of the 39th International Conference on Machine Learning (ICML 2022)

arXiv:2206.08174 [pdf, other]

Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations

Authors: Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura

Abstract: Target speech extraction is a technique to extract the target speaker's voice from mixture signals using a pre-recorded enrollment utterance that characterize the voice characteristics of the target speaker. One major difficulty of target speech extraction lies in handling variability in ``intra-speaker'' characteristics, i.e., characteristics mismatch between target speech and an enrollment utter… ▽ More Target speech extraction is a technique to extract the target speaker's voice from mixture signals using a pre-recorded enrollment utterance that characterize the voice characteristics of the target speaker. One major difficulty of target speech extraction lies in handling variability in ``intra-speaker'' characteristics, i.e., characteristics mismatch between target speech and an enrollment utterance. While most conventional approaches focus on improving {\it average performance} given a set of enrollment utterances, here we propose to guarantee the {\it worst performance}, which we believe is of great practical importance. In this work, we propose an evaluation metric called worst-enrollment source-to-distortion ratio (SDR) to quantitatively measure the robustness towards enrollment variations. We also introduce a novel training scheme that aims at directly optimizing the worst-case performance by focusing on training with difficult enrollment cases where extraction does not perform well. In addition, we investigate the effectiveness of auxiliary speaker identification loss (SI-loss) as another way to improve robustness over enrollments. Experimental validation reveals the effectiveness of both worst-enrollment target training and SI-loss training to improve robustness against enrollment variations, by increasing speaker discriminability. △ Less

Submitted 16 June, 2022; originally announced June 2022.

Comments: 5 pages, 2 figures, 3 tables Submitted to Interspeech 2022

arXiv:2206.07319 [pdf]

Toward the smooth mesh climbing of a miniature robot using bioinspired soft and expandable claws

Authors: Hong Wang, Peng Liu, Phuoc Thanh Tran Ngoc, Bing Li, Yao Li, Hirotaka Sato

Abstract: While most micro-robots face difficulty traveling on rugged and uneven terrain, beetles can walk smoothly on the complex substrate without slipping or getting stuck on the surface due to their stiffness-variable tarsi and expandable hooks on the tip of tarsi. In this study, we found that beetles actively bent and expanded their claws regularly to crawl freely on mesh surfaces. Inspired by the craw… ▽ More While most micro-robots face difficulty traveling on rugged and uneven terrain, beetles can walk smoothly on the complex substrate without slipping or getting stuck on the surface due to their stiffness-variable tarsi and expandable hooks on the tip of tarsi. In this study, we found that beetles actively bent and expanded their claws regularly to crawl freely on mesh surfaces. Inspired by the crawling mechanism of the beetles, we designed an 8-cm miniature climbing robot equipping artificial claws to open and bend in the same cyclic manner as natural beetles. The robot can climb freely with a controllable gait on the mesh surface, steep incline of the angle of 60°, and even transition surface. To our best knowledge, this is the first micro-scale robot that can climb both the mesh surface and cliffy incline. △ Less

Submitted 15 June, 2022; originally announced June 2022.

arXiv:2205.08314 [pdf, other]

doi 10.1109/COMPSAC54236.2022.00244

Self-Sovereign Identity as a Service: Architecture in Practice

Authors: Yepeng Ding, Hiroyuki Sato

Abstract: Self-sovereign identity (SSI) has gained a large amount of interest. It enables physical entities to retain ownership and control of their digital identities, which naturally forms a conceptual decentralized architecture. With the support of the distributed ledger technology (DLT), it is possible to implement this conceptual decentralized architecture in practice and further bring technical advant… ▽ More Self-sovereign identity (SSI) has gained a large amount of interest. It enables physical entities to retain ownership and control of their digital identities, which naturally forms a conceptual decentralized architecture. With the support of the distributed ledger technology (DLT), it is possible to implement this conceptual decentralized architecture in practice and further bring technical advantages such as privacy protection, security enhancement, high availability. However, developing such a relatively new identity model has high costs and risks with uncertainty. To facilitate the use of the DLT-based SSI in practice, we formulate Self-Sovereign Identity as a Service (SSIaaS), a concept that enables a system, especially a system cluster, to readily adopt SSI as its identity model for identification, authentication, and authorization. We propose a practical architecture by elaborating the service concept, SSI, and DLT to implement SSIaaS platforms and SSI services. Besides, we present an architecture for constructing and customizing SSI services with a set of architectural patterns and provide corresponding evaluations. Furthermore, we demonstrate the feasibility of our proposed architecture in practice with Selfid, an SSIaaS platform based on our proposed architecture. △ Less

Submitted 2 June, 2022; v1 submitted 17 May, 2022; originally announced May 2022.

arXiv:2204.13281 [pdf]

doi 10.1016/j.snb.2022.132988

Efficient Autonomous Navigation for Terrestrial Insect-Machine Hybrid Systems

Authors: Huu Duoc Nguyen, Van Than Dung, Hirotaka Sato, T. Thang Vo-Doan

Abstract: While bio-inspired and biomimetic systems draw inspiration from living materials, biohybrid systems incorporate them with synthetic devices, allowing the exploitation of both organic and artificial advantages inside a single entity. In the challenging development of centimeter-scaled mobile robots serving unstructured territory navigations, biohybrid systems appear as a potential solution in the f… ▽ More While bio-inspired and biomimetic systems draw inspiration from living materials, biohybrid systems incorporate them with synthetic devices, allowing the exploitation of both organic and artificial advantages inside a single entity. In the challenging development of centimeter-scaled mobile robots serving unstructured territory navigations, biohybrid systems appear as a potential solution in the forms of terrestrial insect-machine hybrid systems, which are the fusion of living ambulatory insects and miniature electronic devices. Although their maneuver can be deliberately controlled via artificial electrical stimulation, these hybrid systems still inherit the insects' outstanding locomotory skills, orchestrated by a sophisticated central nervous system and various sensory organs, favoring their maneuvers in complex terrains. However, efficient autonomous navigation of these hybrid systems is challenging. The struggle to optimize the stimulation parameters for individual insects limits the reliability and accuracy of navigation control. This study overcomes this problem by implementing a feedback control system with an insight view of tunable navigation control for an insect-machine hybrid system based on a living darkling beetle. Via a thrust controller for acceleration and a proportional controller for turning, the system regulates the stimulation parameters based on the instantaneous status of the hybrid robot. While the system can provide an overall success rate of ~71% for path-following navigations, fine-tuning its control parameters could further improve the outcome's reliability and precision to up to ~94% success rate and ~1/2 body length accuracy, respectively. Such tunable performance of the feedback control system provides flexibility to navigation applications of insect-machine hybrid systems. △ Less

Submitted 19 November, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

Comments: Demonstration video can be found at http://youtu.be/p00mfxFo7VY

Journal ref: Sensors and Actuators B: Chemical 376(A) (2023) 132988

arXiv:2204.04811 [pdf, other]

Listen only to me! How well can target speech extraction handle false alarms?

Authors: Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolikova, Hiroshi Sato, Tomohiro Nakatani

Abstract: Target speech extraction (TSE) extracts the speech of a target speaker in a mixture given auxiliary clues characterizing the speaker, such as an enrollment utterance. TSE addresses thus the challenging problem of simultaneously performing separation and speaker identification. There has been much progress in extraction performance following the recent development of neural networks for speech enha… ▽ More Target speech extraction (TSE) extracts the speech of a target speaker in a mixture given auxiliary clues characterizing the speaker, such as an enrollment utterance. TSE addresses thus the challenging problem of simultaneously performing separation and speaker identification. There has been much progress in extraction performance following the recent development of neural networks for speech enhancement and separation. Most studies have focused on processing mixtures where the target speaker is actively speaking. However, the target speaker is sometimes silent in practice, i.e., inactive speaker (IS). A typical TSE system will tend to output a signal in IS cases, causing false alarms. It is a severe problem for the practical deployment of TSE systems. This paper aims at understanding better how well TSE systems can handle IS cases. We consider two approaches to deal with IS, (1) training a system to directly output zero signals or (2) detecting IS with an extra speaker verification module. We perform an extensive experimental comparison of these schemes in terms of extraction performance and IS detection using the LibriMix dataset and reveal their pros and cons. △ Less

Submitted 14 July, 2022; v1 submitted 10 April, 2022; originally announced April 2022.

Comments: Accepted to Interspeech 2022

arXiv:2204.01386 [pdf, other]

Dressi: A Hardware-Agnostic Differentiable Renderer with Reactive Shader Packing and Soft Rasterization

Authors: Yusuke Takimoto, Hiroyuki Sato, Hikari Takehara, Keishiro Uragaki, Takehiro Tawara, Xiao Liang, Kentaro Oku, Wataru Kishimoto, Bo Zheng

Abstract: Differentiable rendering (DR) enables various computer graphics and computer vision applications through gradient-based optimization with derivatives of the rendering equation. Most rasterization-based approaches are built on general-purpose automatic differentiation (AD) libraries and DR-specific modules handcrafted using CUDA. Such a system design mixes DR algorithm implementation and algorithm… ▽ More Differentiable rendering (DR) enables various computer graphics and computer vision applications through gradient-based optimization with derivatives of the rendering equation. Most rasterization-based approaches are built on general-purpose automatic differentiation (AD) libraries and DR-specific modules handcrafted using CUDA. Such a system design mixes DR algorithm implementation and algorithm building blocks, resulting in hardware dependency and limited performance. In this paper, we present a practical hardware-agnostic differentiable renderer called Dressi, which is based on a new full AD design. The DR algorithms of Dressi are fully written in our Vulkan-based AD for DR, Dressi-AD, which supports all primitive operations for DR. Dressi-AD and our inverse UV technique inside it bring hardware independence and acceleration by graphics hardware. Stage packing, our runtime optimization technique, can adapt hardware constraints and efficiently execute complex computational graphs of DR with reactive cache considering the render pass hierarchy of Vulkan. HardSoftRas, our novel rendering process, is designed for inverse rendering with a graphics pipeline. Under the limited functionalities of the graphics pipeline, HardSoftRas can propagate the gradients of pixels from the screen space to far-range triangle attributes. Our experiments and applications demonstrate that Dressi establishes hardware independence, high-quality and robust optimization with fast speed, and photorealistic rendering. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Comments: 13 pages, 17 figures, EUROGRAPHICS 2022

arXiv:2203.10918 [pdf]

doi 10.1088/1748-3190/ac78b5

A robotic leg inspired from an insect leg

Authors: P. Thanh Tran-Ngoc, Leslie Ziqi Lim, Jia Hui Gan, Hong Wang, T. Thang Vo-Doan, Hirotaka Sato

Abstract: While most insect-inspired robots come with a simple tarsus such as a hemispherical foot tip, insect legs have complex tarsal structures and claws, which enable them to walk on complex terrain. Their sharp claws can smoothly attach and detach on plant surfaces by actuating a single muscle. Thus, installing insect-inspired tarsus on legged robots would improve their locomotion on complex terrain. T… ▽ More While most insect-inspired robots come with a simple tarsus such as a hemispherical foot tip, insect legs have complex tarsal structures and claws, which enable them to walk on complex terrain. Their sharp claws can smoothly attach and detach on plant surfaces by actuating a single muscle. Thus, installing insect-inspired tarsus on legged robots would improve their locomotion on complex terrain. This paper shows that the tendon-driven ball-socket structure provides the tarsus both flexibility and rigidity, which is necessary for the beetle to walk on a complex substrate such as a mesh surface. Disabling the tarsus' rigidity by removing the socket and elastic membrane of a tarsal joint, the claws could not attach to the mesh securely. Meanwhile, the beetle struggled to draw the claws out of the substrate when we turned the tarsus rigid by tubing. We then developed a cable-driven bio-inspired tarsus structure to validate the function of the tarsus as well as to show its potential application in the legged robot. With the tarsus, the robotic leg was able to attach and retract smoothly from the mesh substrate when performing a walking cycle. △ Less

Submitted 11 May, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

Comments: 17 pages, 10 figures

Journal ref: Bioinspir. Biomim. 17 (2022) 056008

arXiv:2203.04456 [pdf, other]

Probabilistic Rotation Representation With an Efficiently Computable Bingham Loss Function and Its Application to Pose Estimation

Authors: Hiroya Sato, Takuya Ikeda, Koichi Nishiwaki

Abstract: In recent years, a deep learning framework has been widely used for object pose estimation. While quaternion is a common choice for rotation representation of 6D pose, it cannot represent an uncertainty of the observation. In order to handle the uncertainty, Bingham distribution is one promising solution because this has suitable features, such as a smooth representation over SO(3), in addition to… ▽ More In recent years, a deep learning framework has been widely used for object pose estimation. While quaternion is a common choice for rotation representation of 6D pose, it cannot represent an uncertainty of the observation. In order to handle the uncertainty, Bingham distribution is one promising solution because this has suitable features, such as a smooth representation over SO(3), in addition to the ambiguity representation. However, it requires the complex computation of the normalizing constants. This is the bottleneck of loss computation in training neural networks based on Bingham representation. As such, we propose a fast-computable and easy-to-implement loss function for Bingham distribution. We also show not only to examine the parametrization of Bingham distribution but also an application based on our loss function. △ Less

Submitted 8 March, 2022; originally announced March 2022.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2201.06685 [pdf, other]

How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR

Authors: Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri

Abstract: It is challenging to improve automatic speech recognition (ASR) performance in noisy conditions with single-channel speech enhancement (SE). In this paper, we investigate the causes of ASR performance degradation by decomposing the SE errors using orthogonal projection-based decomposition (OPD). OPD decomposes the SE errors into noise and artifact components. The artifact component is defined as t… ▽ More It is challenging to improve automatic speech recognition (ASR) performance in noisy conditions with single-channel speech enhancement (SE). In this paper, we investigate the causes of ASR performance degradation by decomposing the SE errors using orthogonal projection-based decomposition (OPD). OPD decomposes the SE errors into noise and artifact components. The artifact component is defined as the SE error signal that cannot be represented as a linear combination of speech and noise sources. We propose manually scaling the error components to analyze their impact on ASR. We experimentally identify the artifact component as the main cause of performance degradation, and we find that mitigating the artifact can greatly improve ASR performance. Furthermore, we demonstrate that the simple observation adding (OA) technique (i.e., adding a scaled version of the observed signal to the enhanced speech) can monotonically increase the signal-to-artifact ratio under a mild condition. Accordingly, we experimentally confirm that OA improves ASR performance for both simulated and real recordings. The findings of this paper provide a better understanding of the influence of SE errors on ASR and open the door to future research on novel approaches for designing effective single-channel SE front-ends for ASR. △ Less

Submitted 30 March, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

Comments: 5 pages, 5 figures, submitted to Interspeech 2022

arXiv:2201.03881 [pdf, other]

doi 10.1109/ICASSP43922.2022.9746347

Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition

Authors: Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Naoyuki Kamo, Takafumi Moriya

Abstract: The combination of a deep neural network (DNN) -based speech enhancement (SE) front-end and an automatic speech recognition (ASR) back-end is a widely used approach to implement overlapping speech recognition. However, the SE front-end generates processing artifacts that can degrade the ASR performance. We previously found that such performance degradation can occur even under fully overlapping co… ▽ More The combination of a deep neural network (DNN) -based speech enhancement (SE) front-end and an automatic speech recognition (ASR) back-end is a widely used approach to implement overlapping speech recognition. However, the SE front-end generates processing artifacts that can degrade the ASR performance. We previously found that such performance degradation can occur even under fully overlapping conditions, depending on the signal-to-interference ratio (SIR) and signal-to-noise ratio (SNR). To mitigate the degradation, we introduced a rule-based method to switch the ASR input between the enhanced and observed signals, which showed promising results. However, the rule's optimality was unclear because it was heuristically designed and based only on SIR and SNR values. In this work, we propose a DNN-based switching method that directly estimates whether ASR will perform better on the enhanced or observed signals. We also introduce soft-switching that computes a weighted sum of the enhanced and observed signals for ASR input, with weights given by the switching model's output posteriors. The proposed learning-based switching showed performance comparable to that of rule-based oracle switching. The soft-switching further improved the ASR performance and achieved a relative character error rate reduction of up to 23 % as compared with the conventional method. △ Less

Submitted 11 January, 2022; originally announced January 2022.

Comments: 5 pages, 2 figures

Journal ref: In 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6287-6291

arXiv:2112.12530 [pdf, other]

Long-Term Optimal Delivery Planning for Replacing the Liquefied Petroleum Gas Cylinder

Authors: Akihiro Yoshida, Haruki Sato, Shiori Uchiumi, Nariaki Tateiwa, Daisuke Kataoka, Akira Tanaka, Nozomi Hata, Yousuke Yatsushiro, Ayano Ide, Hiroki Ishikura, Shingo Egi, Miyu Fujii, Hiroki Kai, Katsuki Fujisawa

Abstract: In the daily operation of liquefied petroleum gas service, gas providers visit customers and replace cylinders if the gas is about to run out. For a long time, frequent visits to customers were required because they could not determine the amount of remaining gas without a staff visit and observation. To solve this problem, smart meters are started to be employed to acquire gas consumption more fr… ▽ More In the daily operation of liquefied petroleum gas service, gas providers visit customers and replace cylinders if the gas is about to run out. For a long time, frequent visits to customers were required because they could not determine the amount of remaining gas without a staff visit and observation. To solve this problem, smart meters are started to be employed to acquire gas consumption more frequently without visiting customers. In this study, we construct a system to optimize plans for cylinder replacement, and evaluate it with a large-scale field test. We propose an algorithm to create a replacement plan with three steps: estimating the replacement date, acquiring the customer list for replacement, and determining the delivery route. A more accurate estimation of the replacement date can be acquired with a smart meter, which is used for making a customer list for replacement. The formulation for making a customer list enables the gas provider to replace cylinders some days before the date when the gas would run out. It can suppress the concentration of replacements on certain days. Large-scale verification experiments were performed with more than 1,000 customers in Chiba prefecture in Japan. In the field test, the gas provider incorporated the system into its replacement operations. Moreover, the replacement plans developed by the proposed system were compared with that by the gas provider. Our system reduced the number of gas cylinders with gas shortage, the number of visits without replacement due to plenty of gas remaining, and the working duration per customer, which shows that our system benefits both gas providers and customers. △ Less

Submitted 20 June, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

Comments: 25 pages

MSC Class: 90C90 (Primary) 90C27; 90C15 (Secondary) ACM Class: G.1.6; G.2.3

arXiv:2112.11661 [pdf]

New metal-plastic hybrid additive manufacturing strategy: Fabrication of arbitrary metal-patterns on external and even internal surfaces of 3D plastic structures

Authors: Kewei Song, Yue Cui, Tiannan Tao, Xiangyi Meng, Michinari Sone, Masahiro Yoshino, Shinjiro Umezu, Hirotaka Sato

Abstract: Constructing precise micro-nano metal patterns on complex three-dimensional (3D) plastic parts allows the fabrication of functional devices for advanced applications. However, this patterning is currently expensive and requires complex processes with long manufacturing lead time. The present work demonstrates a process for the fabrication of micro-nano 3D metal-plastic composite structures with ar… ▽ More Constructing precise micro-nano metal patterns on complex three-dimensional (3D) plastic parts allows the fabrication of functional devices for advanced applications. However, this patterning is currently expensive and requires complex processes with long manufacturing lead time. The present work demonstrates a process for the fabrication of micro-nano 3D metal-plastic composite structures with arbitrarily complex shapes. In this approach, a light-cured resin is modified to prepare an active precursor capable of allowing subsequent electroless plating (ELP). A multi-material digital light processing 3D printer was newly developed to enable the fabrication of parts containing regions made of either standard resin or active precursor resin nested within each other. Selective 3D ELP processing of such parts provided various metal-plastic composite parts having complicated hollow micro-nano structures with specific topological relationships on a size scale as small as 40 um. Using this technique, 3D metal topologies that cannot be manufactured by traditional methods are possible, and metal patterns can be produced inside plastic parts as a means of further miniaturizing electronic devices. The proposed method can also generate metal coatings exhibiting improved adhesion of metal to plastic substrate. Based on this technique, several sensors composed of different functional nonmetallic materials and specific metal patterns were designed and fabricated. The present results demonstrate the viability of the proposed method and suggest potential applications in the fields of smart 3D micro-nano electronics, 3D wearable devices, micro/nano-sensors, and health care. △ Less

Submitted 21 December, 2021; originally announced December 2021.

arXiv:2111.14314 [pdf]

doi 10.34133/2022/9780504

Braking and Body Angles Control of an Insect-Computer Hybrid Robot by Electrical Stimulation of Beetle Flight Muscle in Free Flight

Authors: T. Thang Vo-Doan, V. Than Dung, Hirotaka Sato

Abstract: While engineers put lots of effort, resources, and time in building insect scale micro aerial vehicles (MAVs) that fly like insects, insects themselves are the real masters of flight. What if we would use living insect as platform for MAV instead? Here, we reported a flight control via electrical stimulation of a flight muscle of an insect-computer hybrid robot, which is the interface of a mountab… ▽ More While engineers put lots of effort, resources, and time in building insect scale micro aerial vehicles (MAVs) that fly like insects, insects themselves are the real masters of flight. What if we would use living insect as platform for MAV instead? Here, we reported a flight control via electrical stimulation of a flight muscle of an insect-computer hybrid robot, which is the interface of a mountable wireless backpack controller and a living beetle. The beetle uses indirect flight muscles to drive wing flapping and three major direct flight muscles (basalar, subalar and third axilliary (3Ax) muscles) to control the kinematics of the wings for flight maneuver. While turning control was already achieved by stimulating basalar and 3Ax muscles, electrical stimulation of subalar muscles resulted in braking and elevation control in flight. We also demonstrated around 20 degrees of contralateral yaw and roll by stimulating individual subalar muscle. Stimulating both subalar muscles lead to an increase of 20 degrees in pitch and decelerate the flight by 1.5 m/s2 as well as an induce an elevation of 2 m/s2. △ Less

Submitted 28 November, 2021; originally announced November 2021.

Comments: 9 pages, 7 figures, supplemental video: https://youtu.be/P9dxsSf14LY . Cyborg and Bionic Systems 2022

Journal ref: Cyborg and Bionic Systems, vol. 2022, Article ID 9780504, 11 pages

arXiv:2111.03865 [pdf, other]

doi 10.1007/978-3-030-95384-3_43

Sunspot: A Decentralized Framework Enabling Privacy for Authorizable Data Sharing on Transparent Public Blockchains

Authors: Yepeng Ding, Hiroyuki Sato

Abstract: Blockchain technologies have been boosting the development of data-driven decentralized services in a wide range of fields. However, with the spirit of full transparency, many public blockchains expose all types of data to the public such as Ethereum. Besides, the on-chain persistence of large data is significantly expensive technically and economically. These issues lead to the difficulty of shar… ▽ More Blockchain technologies have been boosting the development of data-driven decentralized services in a wide range of fields. However, with the spirit of full transparency, many public blockchains expose all types of data to the public such as Ethereum. Besides, the on-chain persistence of large data is significantly expensive technically and economically. These issues lead to the difficulty of sharing fairly large private data while preserving attractive properties of public blockchains. Although direct encryption for on-chain data persistence can introduce confidentiality, new challenges such as key sharing, access control, and legal rights proving are still open. Meanwhile, cross-chain collaboration still requires secure and effective protocols, though decentralized storage systems such as IPFS bring the possibility for fairly large data persistence. In this paper, we propose Sunspot, a decentralized framework for privacy-preserving data sharing with access control on transparent public blockchains, to solve these issues. We also show the practicality and applicability of Sunspot by MyPub, a decentralized privacy-preserving publishing platform based on Sunspot. Furthermore, we evaluate the security, privacy, and performance of Sunspot through theoretical analysis and experiments. △ Less

Submitted 12 May, 2022; v1 submitted 6 November, 2021; originally announced November 2021.

arXiv:2106.00949 [pdf, other]

doi 10.21437/Interspeech.2021-2253

Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition

Authors: Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoyuki Kamo

Abstract: Although recent advances in deep learning technology improved automatic speech recognition (ASR), it remains difficult to recognize speech when it overlaps other people's voices. Speech separation or extraction is often used as a front-end to ASR to handle such overlapping speech. However, deep neural network-based speech enhancement can generate `processing artifacts' as a side effect of the enha… ▽ More Although recent advances in deep learning technology improved automatic speech recognition (ASR), it remains difficult to recognize speech when it overlaps other people's voices. Speech separation or extraction is often used as a front-end to ASR to handle such overlapping speech. However, deep neural network-based speech enhancement can generate `processing artifacts' as a side effect of the enhancement, which degrades ASR performance. For example, it is well known that single-channel noise reduction for non-speech noise (non-overlapping speech) often does not improve ASR. Likewise, the processing artifacts may also be detrimental to ASR in some conditions when processing overlapping speech with a separation/extraction method, although it is usually believed that separation/extraction improves ASR. In order to answer the question `Do we always have to separate/extract speech from mixtures?', we analyze ASR performance on observed and enhanced speech at various noise and interference conditions, and show that speech enhancement degrades ASR under some conditions even for overlapping speech. Based on these findings, we propose a simple switching algorithm between observed and enhanced speech based on the estimated signal-to-interference ratio and signal-to-noise ratio. We demonstrated experimentally that such a simple switching mechanism can improve recognition performance when processing artifacts are detrimental to ASR. △ Less

Submitted 2 June, 2021; originally announced June 2021.

Comments: 5 pages, 1 figure

Journal ref: in Proc. Interspeech 2021, 1149-1153

arXiv:2105.10869 [pdf]

Insect-Computer Hybrid System for Autonomous Search and Rescue Mission

Authors: P. Thanh Tran-Ngoc, D. Long Le, Bing Sheng Chong, H. Duoc Nguyen, V. Than Dung, Feng Cao, Yao Li, Kazuki Kai, Jia Hui Gan, T. Thang Vo-Doan, T. Luan Nguyen, Hirotaka Sato

Abstract: There is still a long way to go before artificial mini robots are really used for search and rescue missions in disaster-hit areas due to hindrance in power consumption, computation load of the locomotion, and obstacle-avoidance system. Insect-computer hybrid system, which is the fusion of living insect platform and microcontroller, emerges as an alternative solution. This study demonstrates the f… ▽ More There is still a long way to go before artificial mini robots are really used for search and rescue missions in disaster-hit areas due to hindrance in power consumption, computation load of the locomotion, and obstacle-avoidance system. Insect-computer hybrid system, which is the fusion of living insect platform and microcontroller, emerges as an alternative solution. This study demonstrates the first-ever insect-computer hybrid system conceived for search and rescue missions, which is capable of autonomous navigation and human presence detection in an unstructured environment. Customized navigation control algorithm utilizing the insect's intrinsic navigation capability achieved exploration and negotiation of complex terrains. On-board high-accuracy human presence detection using infrared camera was achieved with a custom machine learning model. Low power consumption suggests system suitability for hour-long operations and its potential for realization in real-life missions. △ Less

Submitted 21 June, 2021; v1 submitted 23 May, 2021; originally announced May 2021.

Comments: Videos are available at https://hirosatontu.wordpress.com/research/

arXiv:2103.02587 [pdf]

Reconstructed spatial receptive field structures by reverse correlation technique explains the visual feature selectivity of units in deep convolutional neural networks

Authors: Yoshiyuki R Shiraishi, Hiromichi Sato, Takahisa M Sanada, Tomoyuki Naito

Abstract: An important issue in dealing with Deep Convolutional Neural Networks (DCNN) is the 'black box problem', which represents the unknowns about internal information representation and processing, especially in the middle and higher layers. In this study, we adopted a systems neuroscience methodology to measure the visual feature selectivity and visualize the spatial receptive field of the units in VG… ▽ More An important issue in dealing with Deep Convolutional Neural Networks (DCNN) is the 'black box problem', which represents the unknowns about internal information representation and processing, especially in the middle and higher layers. In this study, we adopted a systems neuroscience methodology to measure the visual feature selectivity and visualize the spatial receptive field of the units in VGG16. Orientation and spatial frequency tunings of each unit were measured using sinusoidal grating stimuli. The image category selectivity of each unit was also measured using natural image stimuli. The spatial structures of the receptive fields of all convolutional units were estimated by activation-weighted average (AWA) and activation-weighted covariance (AWC) analyses. In the middle layers (convolutional layers in block3 and block4), AWC analysis successfully reconstructed the receptive field that predicted the visual feature selectivity of the unit. Those results suggested the possibility that analyzing the reconstructed receptive field structure can be used to interpret the functional significance of the units and layers of a DCNN. △ Less

Submitted 3 March, 2021; originally announced March 2021.

Comments: 28 pages, 7 figures, 1 table

arXiv:2102.01326 [pdf, other]

Multimodal Attention Fusion for Target Speaker Extraction

Authors: Hiroshi Sato, Tsubasa Ochiai, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Shoko Araki

Abstract: Target speaker extraction, which aims at extracting a target speaker's voice from a mixture of voices using audio, visual or locational clues, has received much interest. Recently an audio-visual target speaker extraction has been proposed that extracts target speech by using complementary audio and visual clues. Although audio-visual target speaker extraction offers a more stable performance than… ▽ More Target speaker extraction, which aims at extracting a target speaker's voice from a mixture of voices using audio, visual or locational clues, has received much interest. Recently an audio-visual target speaker extraction has been proposed that extracts target speech by using complementary audio and visual clues. Although audio-visual target speaker extraction offers a more stable performance than single modality methods for simulated data, its adaptation towards realistic situations has not been fully explored as well as evaluations on real recorded mixtures. One of the major issues to handle realistic situations is how to make the system robust to clue corruption because in real recordings both clues may not be equally reliable, e.g. visual clues may be affected by occlusions. In this work, we propose a novel attention mechanism for multi-modal fusion and its training methods that enable to effectively capture the reliability of the clues and weight the more reliable ones. Our proposals improve signal to distortion ratio (SDR) by 1.0 dB over conventional fusion mechanisms on simulated data. Moreover, we also record an audio-visual dataset of simultaneous speech with realistic visual clue corruption and show that audio-visual target speaker extraction with our proposals successfully work on real data. △ Less

Submitted 2 February, 2021; originally announced February 2021.

Comments: 7 pages, 5 figures

Journal ref: in IEEE Spoken Language Technology Workshop (SLT), 2021, pp. 778-784

arXiv:2012.04185 [pdf, other]

Formalism-Driven Development of Decentralized Systems

Authors: Yepeng Ding, Hiroyuki Sato

Abstract: Decentralized systems have been widely developed and applied to address security and privacy issues in centralized systems, especially since the advancement of distributed ledger technology. However, it is challenging to ensure their correct functioning with respect to their designs and minimize the technical risk before the delivery. Although formal methods have made significant progress over the… ▽ More Decentralized systems have been widely developed and applied to address security and privacy issues in centralized systems, especially since the advancement of distributed ledger technology. However, it is challenging to ensure their correct functioning with respect to their designs and minimize the technical risk before the delivery. Although formal methods have made significant progress over the past decades, a feasible solution based on formal methods from a development process perspective has not been well developed. In this paper, we formulate an iterative and incremental development process, named formalism-driven development (FDD), for developing provably correct decentralized systems under the guidance of formal methods. We also present a framework named Seniz, to practicalize FDD with a new modeling language and scaffolds. Furthermore, we conduct case studies to demonstrate the effectiveness of FDD in practice with the support of Seniz. △ Less

Submitted 30 January, 2022; v1 submitted 7 December, 2020; originally announced December 2020.

Comments: To appear in ICECCS 2022

arXiv:2008.08245 [pdf, other]

Formalizing and Verifying Decentralized Systems with Extended Concurrent Separation Logic

Authors: Yepeng Ding, Hiroyuki Sato

Abstract: Decentralized techniques are becoming crucial and ubiquitous with the rapid advancement of distributed ledger technologies such as the blockchain. Numerous decentralized systems have been developed to address security and privacy issues with great dependability and reliability via these techniques. Meanwhile, formalization and verification of the decentralized systems is the key to ensuring correc… ▽ More Decentralized techniques are becoming crucial and ubiquitous with the rapid advancement of distributed ledger technologies such as the blockchain. Numerous decentralized systems have been developed to address security and privacy issues with great dependability and reliability via these techniques. Meanwhile, formalization and verification of the decentralized systems is the key to ensuring correctness of the design and security properties of the implementation. In this paper, we propose a novel method of formalizing and verifying decentralized systems with a kind of extended concurrent separation logic. Our logic extends the standard concurrent separation logic with new features including communication encapsulation, environment perception, and node-level reasoning, which enhances modularity and expressiveness. Besides, we develop our logic with unitarity and compatibility to facilitate implementation. Furthermore, we demonstrate the effectiveness and versatility of our method by applying our logic to formalize and verify critical techniques in decentralized systems including the consensus mechanism and the smart contract. △ Less

Submitted 18 August, 2020; originally announced August 2020.

arXiv:2007.13685 [pdf, ps, other]

Extending Concurrent Separation Logic to Enhance Modular Formalization

Authors: Yepeng Ding, Hiroyuki Sato

Abstract: Nowadays, numerous services based on large-scale distributed systems have been developed to boost the convenience of human life. On the other side, it becomes a significant challenge to ensure the correctness and properties of these systems due to the complex and nested architecture. Although concurrent separation logic (CSL) has partially tackled the problem by specifying systems and verifying th… ▽ More Nowadays, numerous services based on large-scale distributed systems have been developed to boost the convenience of human life. On the other side, it becomes a significant challenge to ensure the correctness and properties of these systems due to the complex and nested architecture. Although concurrent separation logic (CSL) has partially tackled the problem by specifying systems and verifying the correctness of them, it faces modularity issues. In this paper, we propose an extended concurrent separation logic (ECSL) to address the modularity issues of CSL with the support of the temporal extension, communication extension, environment extension, and nest extension. ECSL is capable of formalizing systems at different abstraction levels from memory management to architecture and protocol design with great modularity. Furthermore, we stick to unitarity and compatibility principles while developing ECSL. △ Less

Submitted 27 July, 2020; originally announced July 2020.

arXiv:1703.04890 [pdf, other]

Riemannian stochastic quasi-Newton algorithm with variance reduction and its convergence analysis

Authors: Hiroyuki Kasai, Hiroyuki Sato, Bamdev Mishra

Abstract: Stochastic variance reduction algorithms have recently become popular for minimizing the average of a large, but finite number of loss functions. The present paper proposes a Riemannian stochastic quasi-Newton algorithm with variance reduction (R-SQN-VR). The key challenges of averaging, adding, and subtracting multiple gradients are addressed with notions of retraction and vector transport. We pr… ▽ More Stochastic variance reduction algorithms have recently become popular for minimizing the average of a large, but finite number of loss functions. The present paper proposes a Riemannian stochastic quasi-Newton algorithm with variance reduction (R-SQN-VR). The key challenges of averaging, adding, and subtracting multiple gradients are addressed with notions of retraction and vector transport. We present convergence analyses of R-SQN-VR on both non-convex and retraction-convex functions under retraction and vector transport operators. The proposed algorithm is evaluated on the Karcher mean computation on the symmetric positive-definite manifold and the low-rank matrix completion on the Grassmann manifold. In all cases, the proposed algorithm outperforms the state-of-the-art Riemannian batch and stochastic gradient algorithms. △ Less

Submitted 16 September, 2017; v1 submitted 14 March, 2017; originally announced March 2017.

arXiv:1702.05594 [pdf, ps, other]

doi 10.1137/17M1116787

Riemannian stochastic variance reduced gradient algorithm with retraction and vector transport

Authors: Hiroyuki Sato, Hiroyuki Kasai, Bamdev Mishra

Abstract: In recent years, stochastic variance reduction algorithms have attracted considerable attention for minimizing the average of a large but finite number of loss functions. This paper proposes a novel Riemannian extension of the Euclidean stochastic variance reduced gradient (R-SVRG) algorithm to a manifold search space. The key challenges of averaging, adding, and subtracting multiple gradients are… ▽ More In recent years, stochastic variance reduction algorithms have attracted considerable attention for minimizing the average of a large but finite number of loss functions. This paper proposes a novel Riemannian extension of the Euclidean stochastic variance reduced gradient (R-SVRG) algorithm to a manifold search space. The key challenges of averaging, adding, and subtracting multiple gradients are addressed with retraction and vector transport. For the proposed algorithm, we present a global convergence analysis with a decaying step size as well as a local convergence rate analysis with a fixed step size under some natural assumptions. In addition, the proposed algorithm is applied to the computation problem of the Riemannian centroid on the symmetric positive definite (SPD) manifold as well as the principal component analysis and low-rank matrix completion problems on the Grassmann manifold. The results show that the proposed algorithm outperforms the standard Riemannian stochastic gradient descent algorithm in each case. △ Less

Submitted 31 May, 2019; v1 submitted 18 February, 2017; originally announced February 2017.

Comments: Published in SIAM Journal on Optimization. Extended and revised version of arXiv:1605.07367

Journal ref: SIAM Journal on Optimization 29 (2019) 1444-1472

arXiv:1605.07367 [pdf, other]

Riemannian stochastic variance reduced gradient on Grassmann manifold

Authors: Hiroyuki Kasai, Hiroyuki Sato, Bamdev Mishra

Abstract: Stochastic variance reduction algorithms have recently become popular for minimizing the average of a large, but finite, number of loss functions. In this paper, we propose a novel Riemannian extension of the Euclidean stochastic variance reduced gradient algorithm (R-SVRG) to a compact manifold search space. To this end, we show the developments on the Grassmann manifold. The key challenges of av… ▽ More Stochastic variance reduction algorithms have recently become popular for minimizing the average of a large, but finite, number of loss functions. In this paper, we propose a novel Riemannian extension of the Euclidean stochastic variance reduced gradient algorithm (R-SVRG) to a compact manifold search space. To this end, we show the developments on the Grassmann manifold. The key challenges of averaging, addition, and subtraction of multiple gradients are addressed with notions like logarithm mapping and parallel translation of vectors on the Grassmann manifold. We present a global convergence analysis of the proposed algorithm with decay step-sizes and a local convergence rate analysis under fixed step-size with some natural assumptions. The proposed algorithm is applied on a number of problems on the Grassmann manifold like principal components analysis, low-rank matrix completion, and the Karcher mean computation. In all these cases, the proposed algorithm outperforms the standard Riemannian stochastic gradient descent algorithm. △ Less

Submitted 9 April, 2017; v1 submitted 24 May, 2016; originally announced May 2016.

arXiv:1402.1865 [pdf, ps, other]

Some properties of $τ$-adic expansions on hyperelliptic Koblitz curves

Authors: Keisuke Hakuta, Hisayoshi Sato, Tsuyoshi Takagi

Abstract: This paper explores two techniques on a family of hyperelliptic curves that have been proposed to accelerate computation of scalar multiplication for hyperelliptic curve cryptosystems. In elliptic curve cryptosystems, it is known that Koblitz curves admit fast scalar multiplication, namely, the $τ$-adic non-adjacent form ($τ$-NAF). It is shown that the $τ$-NAF has the three properties: (1) existen… ▽ More This paper explores two techniques on a family of hyperelliptic curves that have been proposed to accelerate computation of scalar multiplication for hyperelliptic curve cryptosystems. In elliptic curve cryptosystems, it is known that Koblitz curves admit fast scalar multiplication, namely, the $τ$-adic non-adjacent form ($τ$-NAF). It is shown that the $τ$-NAF has the three properties: (1) existence, (2) uniqueness, and (3) minimality of the Hamming weight. These properties are not only of intrinsic mathematical interest, but also desirable in some cryptographic applications. On the other hand, G{ü}nther, Lange, and Stein have proposed two generalizations of $τ$-NAF for a family of hyperelliptic curves, called \emph{hyperelliptic Koblitz curves}. However, to our knowledge, it is not known whether the three properties are true or not. We provide an answer to the question. Our investigation shows that the first one has only the existence and the second one has the existence and uniqueness. Furthermore, we shall prove that there exist 16 digit sets so that one can achieve the second one. △ Less

Submitted 8 February, 2014; originally announced February 2014.

Comments: 100 pages

MSC Class: 11A63 (Primary); 94A60 (Secondary)

arXiv:cs/0306092 [pdf]

Building A High Performance Parallel File System Using Grid Datafarm and ROOT I/O

Authors: Y. Morita, H. Sato, Y. Watase, O. Tatebe, S. Sekiguchi, S. Matsuoka, N. Soda, A. Dell'Acqua

Abstract: Sheer amount of petabyte scale data foreseen in the LHC experiments require a careful consideration of the persistency design and the system design in the world-wide distributed computing. Event parallelism of the HENP data analysis enables us to take maximum advantage of the high performance cluster computing and networking when we keep the parallelism both in the data processing phase, in the… ▽ More Sheer amount of petabyte scale data foreseen in the LHC experiments require a careful consideration of the persistency design and the system design in the world-wide distributed computing. Event parallelism of the HENP data analysis enables us to take maximum advantage of the high performance cluster computing and networking when we keep the parallelism both in the data processing phase, in the data management phase, and in the data transfer phase. A modular architecture of FADS/ Goofy, a versatile detector simulation framework for Geant4, enables an easy choice of plug-in facilities for persistency technologies such as Objectivity/DB and ROOT I/O. The framework is designed to work naturally with the parallel file system of Grid Datafarm (Gfarm). FADS/Goofy is proven to generate 10^6 Geant4-simulated Atlas Mockup events using a 512 CPU PC cluster. The data in ROOT I/O files is replicated using Gfarm file system. The histogram information is collected from the distributed ROOT files. During the data replication it has been demonstrated to achieve more than 2.3 Gbps data transfer rate between the PC clusters over seven participating PC clusters in the United States and in Japan. △ Less

Submitted 14 June, 2003; originally announced June 2003.

Comments: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 4 pages, PDF. PSN TUDT010

ACM Class: J.2

arXiv:cs/0306051 [pdf, ps, other]

A data Grid testbed environment in Gigabit WAN with HPSS

Authors: Atsushi Manabe, Kohki Ishikawa, Yoshihiko Itoh, Setsuya Kawabata, Tetsuro Mashimo, Youhei Morita, Hiroshi Sakamoto, Takashi Sasaki, Hiroyuki Sato, Junichi Tanaka, Ikuo Ueda, Yoshiyuki Watase, Satomi Yamamoto, Shigeo Yashiro

Abstract: For data analysis of large-scale experiments such as LHC Atlas and other Japanese high energy and nuclear physics projects, we have constructed a Grid test bed at ICEPP and KEK. These institutes are connected to national scientific gigabit network backbone called SuperSINET. In our test bed, we have installed NorduGrid middleware based on Globus, and connected 120TB HPSS at KEK as a large scale… ▽ More For data analysis of large-scale experiments such as LHC Atlas and other Japanese high energy and nuclear physics projects, we have constructed a Grid test bed at ICEPP and KEK. These institutes are connected to national scientific gigabit network backbone called SuperSINET. In our test bed, we have installed NorduGrid middleware based on Globus, and connected 120TB HPSS at KEK as a large scale data store. Atlas simulation data at ICEPP has been transferred and accessed using SuperSINET. We have tested various performances and characteristics of HPSS through this high speed WAN. The measurement includes comparison between computing and storage resources are tightly coupled with low latency LAN and long distant WAN. △ Less

Submitted 3 September, 2003; v1 submitted 12 June, 2003; originally announced June 2003.

Comments: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 5 pages, LaTeX, 9 figures, PSN THCT002

ACM Class: C.2.4; J.2; H.3.4

Showing 1–50 of 50 results for author: Sato, H