Search | arXiv e-print repository

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

Authors: LLM-jp, :, Akiko Aizawa, Eiji Aramaki, Bowen Chen, Fei Cheng, Hiroyuki Deguchi, Rintaro Enomoto, Kazuki Fujii, Kensuke Fukumoto, Takuya Fukushima, Namgi Han, Yuto Harada, Chikara Hashimoto, Tatsuya Hiraoka, Shohei Hisada, Sosuke Hosokawa, Lu Jie, Keisuke Kamata, Teruhito Kanazawa, Hiroki Kanezashi, Hiroshi Kataoka, Satoru Katsumata, Daisuke Kawahara, Seiya Kawano , et al. (57 additional authors not shown)

Abstract: This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its… ▽ More This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its activities, and technical reports on the LLMs developed by LLM-jp. For the latest activities, visit https://llm-jp.nii.ac.jp/en/. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2406.15359 [pdf, other]

Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models

Authors: Jesse Atuhurra, Iqra Ali, Tatsuya Hiraoka, Hidetaka Kamigaito, Tomoya Iwakura, Taro Watanabe

Abstract: Large language models (LLMs) have increased interest in vision language models (VLMs), which process image-text pairs as input. Studies investigating the visual understanding ability of VLMs have been proposed, but such studies are still preliminary because existing datasets do not permit a comprehensive evaluation of the fine-grained visual linguistic abilities of VLMs across multiple languages.… ▽ More Large language models (LLMs) have increased interest in vision language models (VLMs), which process image-text pairs as input. Studies investigating the visual understanding ability of VLMs have been proposed, but such studies are still preliminary because existing datasets do not permit a comprehensive evaluation of the fine-grained visual linguistic abilities of VLMs across multiple languages. To further explore the strengths of VLMs, such as GPT-4V \cite{openai2023GPT4}, we developed new datasets for the systematic and qualitative analysis of VLMs. Our contribution is four-fold: 1) we introduced nine vision-and-language (VL) tasks (including object recognition, image-text matching, and more) and constructed multilingual visual-text datasets in four languages: English, Japanese, Swahili, and Urdu through utilizing templates containing \textit{questions} and prompting GPT4-V to generate the \textit{answers} and the \textit{rationales}, 2) introduced a new VL task named \textit{unrelatedness}, 3) introduced rationales to enable human understanding of the VLM reasoning process, and 4) employed human evaluation to measure the suitability of proposed datasets for VL tasks. We show that VLMs can be fine-tuned on our datasets. Our work is the first to conduct such analyses in Swahili and Urdu. Also, it introduces \textit{rationales} in VL analysis, which played a vital role in the evaluation. △ Less

Submitted 29 March, 2024; originally announced June 2024.

arXiv:2405.14629 [pdf, other]

Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences

Authors: Takuya Hiraoka, Guanquan Wang, Takashi Onishi, Yoshimasa Tsuruoka

Abstract: In reinforcement learning (RL) with experience replay, experiences stored in a replay buffer influence the RL agent's performance. Information about the influence of these experiences is valuable for various purposes, such as identifying experiences that negatively influence poorly performing RL agents. One method for estimating the influence of experiences is the leave-one-out (LOO) method. Howev… ▽ More In reinforcement learning (RL) with experience replay, experiences stored in a replay buffer influence the RL agent's performance. Information about the influence of these experiences is valuable for various purposes, such as identifying experiences that negatively influence poorly performing RL agents. One method for estimating the influence of experiences is the leave-one-out (LOO) method. However, this method is usually computationally prohibitive. In this paper, we present Policy Iteration with Turn-over Dropout (PIToD), which efficiently estimates the influence of experiences. We evaluate how accurately PIToD estimates the influence of experiences and its efficiency compared to LOO. We then apply PIToD to amend poorly performing RL agents, i.e., we use PIToD to estimate negatively influential experiences for the RL agents and to delete the influence of these experiences. We show that RL agents' performance is significantly improved via amendments with PIToD. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Source code: https://github.com/TakuyaHiraoka/Which-Experiences-Are-Influential-for-RL-Agents

arXiv:2404.00397 [pdf, other]

An Analysis of BPE Vocabulary Trimming in Neural Machine Translation

Authors: Marco Cognetta, Tatsuya Hiraoka, Naoaki Okazaki, Rico Sennrich, Yuval Pinter

Abstract: We explore threshold vocabulary trimming in Byte-Pair Encoding subword tokenization, a postprocessing step that replaces rare subwords with their component subwords. The technique is available in popular tokenization libraries but has not been subjected to rigorous scientific scrutiny. While the removal of rare subwords is suggested as best practice in machine translation implementations, both as… ▽ More We explore threshold vocabulary trimming in Byte-Pair Encoding subword tokenization, a postprocessing step that replaces rare subwords with their component subwords. The technique is available in popular tokenization libraries but has not been subjected to rigorous scientific scrutiny. While the removal of rare subwords is suggested as best practice in machine translation implementations, both as a means to reduce model size and for improving model performance through robustness, our experiments indicate that, across a large space of hyperparameter settings, vocabulary trimming fails to improve performance, and is even prone to incurring heavy degradation. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: 15 pages

arXiv:2402.09808 [pdf, other]

Knowledge of Pretrained Language Models on Surface Information of Tokens

Authors: Tatsuya Hiraoka, Naoaki Okazaki

Abstract: Do pretrained language models have knowledge regarding the surface information of tokens? We examined the surface information stored in word or subword embeddings acquired by pretrained language models from the perspectives of token length, substrings, and token constitution. Additionally, we evaluated the ability of models to generate knowledge regarding token surfaces. We focused on 12 pretraine… ▽ More Do pretrained language models have knowledge regarding the surface information of tokens? We examined the surface information stored in word or subword embeddings acquired by pretrained language models from the perspectives of token length, substrings, and token constitution. Additionally, we evaluated the ability of models to generate knowledge regarding token surfaces. We focused on 12 pretrained language models that were mainly trained on English and Japanese corpora. Experimental results demonstrate that pretrained language models have knowledge regarding token length and substrings but not token constitution. Additionally, the results imply that there is a bottleneck on the decoder side in terms of effectively utilizing acquired knowledge. △ Less

Submitted 22 February, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

arXiv:2312.16553 [pdf, other]

doi 10.7566/JPSJ.93.023702

Sublattice-selective inverse Faraday effect in ferrimagnetic rare-earth iron garnet

Authors: Toshiki Hiraoka, Ryo Kainuma, Keita Matsumoto, Kihiro T. Yamada, Takuya Satoh

Abstract: We performed time-resolved pump--probe measurements using rare-earth iron garnet \ce{Gd3/2Yb1/2BiFe5O12} as a two-sublattice ferrimagnet. We measured the initial phases of the magnetic resonance modes below and above the magnetization compensation temperature to clarify the sublattice selectivity of the inverse Faraday effect in ferrimagnets. A comparison of the time evolution of magnetization est… ▽ More We performed time-resolved pump--probe measurements using rare-earth iron garnet \ce{Gd3/2Yb1/2BiFe5O12} as a two-sublattice ferrimagnet. We measured the initial phases of the magnetic resonance modes below and above the magnetization compensation temperature to clarify the sublattice selectivity of the inverse Faraday effect in ferrimagnets. A comparison of the time evolution of magnetization estimated using the equations of motion revealed that the inverse Faraday effect occurring in ferrimagnetic materials has sublattice selectivity. This is in striking contrast to antiferromagnets, in which the inverse Faraday effect acts on each sublattice identically. The initial phase analysis can be applied to other ferrimagnets with compensation temperatures. △ Less

Submitted 27 December, 2023; originally announced December 2023.

Comments: 4 pages, 5 figures

Journal ref: J. Phys. Soc. Jpn. 93, 023702 (2024)

arXiv:2312.05787 [pdf, other]

Efficient Sparse-Reward Goal-Conditioned Reinforcement Learning with a High Replay Ratio and Regularization

Authors: Takuya Hiraoka

Abstract: Reinforcement learning (RL) methods with a high replay ratio (RR) and regularization have gained interest due to their superior sample efficiency. However, these methods have mainly been developed for dense-reward tasks. In this paper, we aim to extend these RL methods to sparse-reward goal-conditioned tasks. We use Randomized Ensemble Double Q-learning (REDQ) (Chen et al., 2021), an RL method wit… ▽ More Reinforcement learning (RL) methods with a high replay ratio (RR) and regularization have gained interest due to their superior sample efficiency. However, these methods have mainly been developed for dense-reward tasks. In this paper, we aim to extend these RL methods to sparse-reward goal-conditioned tasks. We use Randomized Ensemble Double Q-learning (REDQ) (Chen et al., 2021), an RL method with a high RR and regularization. To apply REDQ to sparse-reward goal-conditioned tasks, we make the following modifications to it: (i) using hindsight experience replay and (ii) bounding target Q-values. We evaluate REDQ with these modifications on 12 sparse-reward goal-conditioned tasks of Robotics (Plappert et al., 2018), and show that it achieves about $2 \times$ better sample efficiency than previous state-of-the-art (SoTA) RL methods. Furthermore, we reconsider the necessity of specific components of REDQ and simplify it by removing unnecessary ones. The simplified REDQ with our modifications achieves $\sim 8 \times$ better sample efficiency than the SoTA methods in 4 Fetch tasks of Robotics. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: Source code: https://github.com/TakuyaHiraoka/Efficient-SRGC-RL-with-a-High-RR-and-Regularization Demo video: https://drive.google.com/file/d/1UHd7JVPCwFLNFhy1QcycQfwU_nll_yII/view?usp=drive_link

arXiv:2309.14225 [pdf, other]

HumanMimic: Learning Natural Locomotion and Transitions for Humanoid Robot via Wasserstein Adversarial Imitation

Authors: Annan Tang, Takuma Hiraoka, Naoki Hiraoka, Fan Shi, Kento Kawaharazuka, Kunio Kojima, Kei Okada, Masayuki Inaba

Abstract: Transferring human motion skills to humanoid robots remains a significant challenge. In this study, we introduce a Wasserstein adversarial imitation learning system, allowing humanoid robots to replicate natural whole-body locomotion patterns and execute seamless transitions by mimicking human motions. First, we present a unified primitive-skeleton motion retargeting to mitigate morphological diff… ▽ More Transferring human motion skills to humanoid robots remains a significant challenge. In this study, we introduce a Wasserstein adversarial imitation learning system, allowing humanoid robots to replicate natural whole-body locomotion patterns and execute seamless transitions by mimicking human motions. First, we present a unified primitive-skeleton motion retargeting to mitigate morphological differences between arbitrary human demonstrators and humanoid robots. An adversarial critic component is integrated with Reinforcement Learning (RL) to guide the control policy to produce behaviors aligned with the data distribution of mixed reference motions. Additionally, we employ a specific Integral Probabilistic Metric (IPM), namely the Wasserstein-1 distance with a novel soft boundary constraint to stabilize the training process and prevent mode collapse. Our system is evaluated on a full-sized humanoid JAXON in the simulator. The resulting control policy demonstrates a wide range of locomotion patterns, including standing, push-recovery, squat walking, human-like straight-leg walking, and dynamic running. Notably, even in the absence of transition motions in the demonstration dataset, robots showcase an emerging ability to transit naturally between distinct locomotion patterns as desired speed changes. △ Less

Submitted 23 April, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

arXiv:2307.04700 [pdf, other]

Strength and weakness of disease-induced herd immunity in networks

Authors: Takayuki Hiraoka, Zahra Ghadiri, Abbas K. Rizi, Mikko Kivelä, Jari Saramäki

Abstract: When a fraction of a population becomes immune to an infectious disease, the population-wide infection risk decreases nonlinearly due to collective protection known as herd immunity. Studies based on mean-field models suggest that natural infection in a heterogeneous population may induce herd immunity more efficiently than homogeneous immunization. Here, we use network epidemic models to show tha… ▽ More When a fraction of a population becomes immune to an infectious disease, the population-wide infection risk decreases nonlinearly due to collective protection known as herd immunity. Studies based on mean-field models suggest that natural infection in a heterogeneous population may induce herd immunity more efficiently than homogeneous immunization. Here, we use network epidemic models to show that the opposite can also be the case. We identify two competing mechanisms driving disease-induced herd immunity in networks: the high density of immunity among socially active individuals enhances the herd immunity effect, while the topological localization of immune individuals weakens it. The effect of localization is stronger in networks embedded in low-dimensional space, which can make disease-induced immunity less effective than random immunization. Our results highlight the role of networks in shaping herd immunity and call for careful examination of model predictions that inform public health policies. △ Less

Submitted 3 July, 2024; v1 submitted 10 July, 2023; originally announced July 2023.

Comments: Main text: 11 pages, 4 figures. Supplementary Materials: 8 pages, 2 figures

arXiv:2306.01328 [pdf, other]

Enhancing the Driver's Comprehension of ADS's System Limitations: An HMI for Providing Request-to-Intervene Trigger Information

Authors: Ryuji Matsuo, Hailong Liu, Toshihiro Hiraoka, Takahiro Wada

Abstract: Level 3 automated driving systems (ADS) have attracted significant attention and are being commercialized. A Level 3 ADS prompts the driver to take control by requesting to intervene (RtI) when its operational design domain (ODD) or system limitations are exceeded. However, complex traffic situations may lead drivers to perceive multiple potential triggers of RtI simultaneously, causing hesitation… ▽ More Level 3 automated driving systems (ADS) have attracted significant attention and are being commercialized. A Level 3 ADS prompts the driver to take control by requesting to intervene (RtI) when its operational design domain (ODD) or system limitations are exceeded. However, complex traffic situations may lead drivers to perceive multiple potential triggers of RtI simultaneously, causing hesitation or confusion during take-over. Therefore, drivers must clearly understand the ADS's system limitations to understand the triggers of RtI and ensure safe take-over. In this study, we propose a voice-based HMI for providing RtI trigger cues to help drivers understand ADS's system limitations. The results of a between-group experiment using a driving simulator showed that incorporating effective trigger cues into the RtI enabled drivers to comprehend the ADS's system limitations better and reduce collisions. It also improved the subjective evaluations of drivers, such as the comprehensibility of system limitations, hesitation in response to RtI, and acceptance of ADS behaviors when encountering RtI while using the ADS. Therefore, enhanced comprehension resulting from trigger cues is essential for promoting a safer and better user experience using ADS during RtI. △ Less

Submitted 2 June, 2023; originally announced June 2023.

arXiv:2305.14377 [pdf, other]

Unsupervised Discovery of Continuous Skills on a Sphere

Authors: Takahisa Imagawa, Takuya Hiraoka, Yoshimasa Tsuruoka

Abstract: Recently, methods for learning diverse skills to generate various behaviors without external rewards have been actively studied as a form of unsupervised reinforcement learning. However, most of the existing methods learn a finite number of discrete skills, and thus the variety of behaviors that can be exhibited with the learned skills is limited. In this paper, we propose a novel method for learn… ▽ More Recently, methods for learning diverse skills to generate various behaviors without external rewards have been actively studied as a form of unsupervised reinforcement learning. However, most of the existing methods learn a finite number of discrete skills, and thus the variety of behaviors that can be exhibited with the learned skills is limited. In this paper, we propose a novel method for learning potentially an infinite number of different skills, which is named discovery of continuous skills on a sphere (DISCS). In DISCS, skills are learned by maximizing mutual information between skills and states, and each skill corresponds to a continuous value on a sphere. Because the representations of skills in DISCS are continuous, infinitely diverse skills could be learned. We examine existing methods and DISCS in the MuJoCo Ant robot control environments and show that DISCS can learn much more diverse skills than the other methods. △ Less

Submitted 25 May, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

Comments: 14 pages, 12 figures

arXiv:2304.10813 [pdf, other]

Tokenization Preference for Human and Machine Learning Model: An Annotation Study

Authors: Tatsuya Hiraoka, Tomoya Iwakura

Abstract: Is preferred tokenization for humans also preferred for machine-learning (ML) models? This study examines the relations between preferred tokenization for humans (appropriateness and readability) and one for ML models (performance on an NLP task). The question texts of the Japanese commonsense question-answering dataset are tokenized with six different tokenizers, and the performances of human ann… ▽ More Is preferred tokenization for humans also preferred for machine-learning (ML) models? This study examines the relations between preferred tokenization for humans (appropriateness and readability) and one for ML models (performance on an NLP task). The question texts of the Japanese commonsense question-answering dataset are tokenized with six different tokenizers, and the performances of human annotators and ML models were compared. Furthermore, we analyze relations among performance of answers by human and ML model, the appropriateness of tokenization for human, and response time to questions by human. This study provides a quantitative investigation result that shows that preferred tokenizations for humans and ML models are not necessarily always the same. The result also implies that existing methods using language models for tokenization could be a good compromise both for human and ML models. △ Less

Submitted 16 February, 2024; v1 submitted 21 April, 2023; originally announced April 2023.

arXiv:2304.10808 [pdf, other]

Downstream Task-Oriented Neural Tokenizer Optimization with Vocabulary Restriction as Post Processing

Authors: Tatsuya Hiraoka, Tomoya Iwakura

Abstract: This paper proposes a method to optimize tokenization for the performance improvement of already trained downstream models. Our method generates tokenization results attaining lower loss values of a given downstream model on the training data for restricting vocabularies and trains a tokenizer reproducing the tokenization results. Therefore, our method can be applied to variety of tokenization met… ▽ More This paper proposes a method to optimize tokenization for the performance improvement of already trained downstream models. Our method generates tokenization results attaining lower loss values of a given downstream model on the training data for restricting vocabularies and trains a tokenizer reproducing the tokenization results. Therefore, our method can be applied to variety of tokenization methods, while existing work cannot due to the simultaneous learning of the tokenizer and the downstream model. This paper proposes an example of the BiLSTM-based tokenizer with vocabulary restriction, which can capture wider contextual information for the tokenization process than non-neural-based tokenization methods used in existing work. Experimental results on text classification in Japanese, Chinese, and English text classification tasks show that the proposed method improves performance compared to the existing methods for tokenization optimization. △ Less

Submitted 21 April, 2023; originally announced April 2023.

arXiv:2301.11168

Which Experiences Are Influential for Your Agent? Policy Iteration with Turn-over Dropout

Authors: Takuya Hiraoka, Takashi Onishi, Yoshimasa Tsuruoka

Abstract: In reinforcement learning (RL) with experience replay, experiences stored in a replay buffer influence the RL agent's performance. Information about the influence is valuable for various purposes, including experience cleansing and analysis. One method for estimating the influence of individual experiences is agent comparison, but it is prohibitively expensive when there is a large number of exper… ▽ More In reinforcement learning (RL) with experience replay, experiences stored in a replay buffer influence the RL agent's performance. Information about the influence is valuable for various purposes, including experience cleansing and analysis. One method for estimating the influence of individual experiences is agent comparison, but it is prohibitively expensive when there is a large number of experiences. In this paper, we present PI+ToD as a method for efficiently estimating the influence of experiences. PI+ToD is a policy iteration that efficiently estimates the influence of experiences by utilizing turn-over dropout. We demonstrate the efficiency of PI+ToD with experiments in MuJoCo environments. △ Less

Submitted 22 May, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

Comments: The paper is withdrawn because an error that affects the main results of the experiments has been found

arXiv:2209.04126 [pdf, other]

MaxMatch-Dropout: Subword Regularization for WordPiece

Authors: Tatsuya Hiraoka

Abstract: We present a subword regularization method for WordPiece, which uses a maximum matching algorithm for tokenization. The proposed method, MaxMatch-Dropout, randomly drops words in a search using the maximum matching algorithm. It realizes finetuning with subword regularization for popular pretrained language models such as BERT-base. The experimental results demonstrate that MaxMatch-Dropout improv… ▽ More We present a subword regularization method for WordPiece, which uses a maximum matching algorithm for tokenization. The proposed method, MaxMatch-Dropout, randomly drops words in a search using the maximum matching algorithm. It realizes finetuning with subword regularization for popular pretrained language models such as BERT-base. The experimental results demonstrate that MaxMatch-Dropout improves the performance of text classification and machine translation tasks as well as other subword regularization methods. Moreover, we provide a comparative analysis of subword regularization methods: subword regularization with SentencePiece (Unigram), BPE-Dropout, and MaxMatch-Dropout. △ Less

Submitted 9 September, 2022; originally announced September 2022.

Comments: Accepted to appear at COLING2022

arXiv:2203.13528 [pdf, other]

Single Model Ensemble for Subword Regularized Models in Low-Resource Machine Translation

Authors: Sho Takase, Tatsuya Hiraoka, Naoaki Okazaki

Abstract: Subword regularizations use multiple subword segmentations during training to improve the robustness of neural machine translation models. In previous subword regularizations, we use multiple segmentations in the training process but use only one segmentation in the inference. In this study, we propose an inference strategy to address this discrepancy. The proposed strategy approximates the margin… ▽ More Subword regularizations use multiple subword segmentations during training to improve the robustness of neural machine translation models. In previous subword regularizations, we use multiple segmentations in the training process but use only one segmentation in the inference. In this study, we propose an inference strategy to address this discrepancy. The proposed strategy approximates the marginalized likelihood by using multiple segmentations including the most plausible segmentation and several sampled segmentations. Because the proposed strategy aggregates predictions from several segmentations, we can regard it as a single model ensemble that does not require any additional cost for training. Experimental results show that the proposed strategy improves the performance of models trained with subword regularization in low-resource machine translation tasks. △ Less

Submitted 25 March, 2022; originally announced March 2022.

Comments: Findings of ACL 2022

arXiv:2112.07538 [pdf, other]

doi 10.1103/PhysRevE.105.L052301

Herd Immunity and Epidemic Size in Networks with Vaccination Homophily

Authors: Takayuki Hiraoka, Abbas K. Rizi, Mikko Kivelä, Jari Saramäki

Abstract: We study how the herd immunity threshold and the expected epidemic size depend on homophily with respect to vaccine adoption. We find that the presence of homophily considerably increases the critical vaccine coverage needed for herd immunity and that strong homophily can push the threshold entirely out of reach. The epidemic size monotonically increases as a function of homophily strength for a p… ▽ More We study how the herd immunity threshold and the expected epidemic size depend on homophily with respect to vaccine adoption. We find that the presence of homophily considerably increases the critical vaccine coverage needed for herd immunity and that strong homophily can push the threshold entirely out of reach. The epidemic size monotonically increases as a function of homophily strength for a perfect vaccine, while it is maximized at a nontrivial level of homophily when the vaccine efficacy is limited. Our results highlight the importance of vaccination homophily in epidemic modeling. △ Less

Submitted 28 March, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

Comments: 12 pages, 9 figures

arXiv:2110.02034 [pdf, other]

Dropout Q-Functions for Doubly Efficient Reinforcement Learning

Authors: Takuya Hiraoka, Takahisa Imagawa, Taisei Hashimoto, Takashi Onishi, Yoshimasa Tsuruoka

Abstract: Randomized ensembled double Q-learning (REDQ) (Chen et al., 2021b) has recently achieved state-of-the-art sample efficiency on continuous-action reinforcement learning benchmarks. This superior sample efficiency is made possible by using a large Q-function ensemble. However, REDQ is much less computationally efficient than non-ensemble counterparts such as Soft Actor-Critic (SAC) (Haarnoja et al.,… ▽ More Randomized ensembled double Q-learning (REDQ) (Chen et al., 2021b) has recently achieved state-of-the-art sample efficiency on continuous-action reinforcement learning benchmarks. This superior sample efficiency is made possible by using a large Q-function ensemble. However, REDQ is much less computationally efficient than non-ensemble counterparts such as Soft Actor-Critic (SAC) (Haarnoja et al., 2018a). To make REDQ more computationally efficient, we propose a method of improving computational efficiency called DroQ, which is a variant of REDQ that uses a small ensemble of dropout Q-functions. Our dropout Q-functions are simple Q-functions equipped with dropout connection and layer normalization. Despite its simplicity of implementation, our experimental results indicate that DroQ is doubly (sample and computationally) efficient. It achieved comparable sample efficiency with REDQ, much better computational efficiency than REDQ, and comparable computational efficiency with that of SAC. △ Less

Submitted 16 March, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

Comments: ICLR 2022. Source code: https://github.com/TakuyaHiraoka/Dropout-Q-Functions-for-Doubly-Efficient-Reinforcement-Learning Poster: https://drive.google.com/file/d/1_JSuwlUsMjzo6zRaAIcXXj3__AmOvu2t/view?usp=sharing Slides: https://drive.google.com/file/d/1ecq9SQ2KSNpfeblCkr6TYPz5gRk_Y4S8/view?usp=sharing

arXiv:2110.00821 [pdf, other]

doi 10.12792/jjiiae.6.2.95

Relation Analysis between Hotel Review Rating Scores and Sentiment Analysis of Reviews by Chinese Tourists Visiting Japan

Authors: Elisa Claire Alemán Carreón, Hirofumi Nonaka, Toru Hiraoka

Abstract: In current times, the importance of online hotel review sites has become more and more apparent. Users of these sites reference of reviews strongly influences their purchase behavior and as such, reviews are important to companies and researchers alike. The majority of review sites offer both text reviews and numerical hotel ratings, and both information sources are widely used by researchers as a… ▽ More In current times, the importance of online hotel review sites has become more and more apparent. Users of these sites reference of reviews strongly influences their purchase behavior and as such, reviews are important to companies and researchers alike. The majority of review sites offer both text reviews and numerical hotel ratings, and both information sources are widely used by researchers as a representation of a customer's sentiment and opinion. However, an opinion is a difficult concept to measure, and as such, depending on the relation these two sources have, it would be apparent whether or not it is safe to consider them equally in research. In this study we utilize an entropy-based Support Vector Machine to classify positive and negative sentiments in hotel reviews from the site Ctrip, then calculating the ratio of positive and negative sentiment in each review and examine their correlation with said review's rating score using Spearman and Kendall Correlation coefficients and Maximal Information Coefficient (MIC). △ Less

Submitted 2 October, 2021; originally announced October 2021.

Comments: Translation of the original in Japanese

Journal ref: The Japanese Journal of the Institute of Industrial Applications Engineers (JJIIAE), 2018, Vol. 6, No. 2. pp. 95-99

arXiv:2109.12864 [pdf, other]

doi 10.7566/JPSJ.90.123703

Strongly electron-correlated semimetal RuI$_3$ with a layered honeycomb structure

Authors: Kazuhiro Nawa, Yoshinori Imai, Youhei Yamaji, Hideyuki Fujihara, Wakana Yamada, Ryotaro Takahashi, Takumi Hiraoka, Masato Hagihala, Shuki Torii, Takuya Aoyama, Takamasa Ohashi, Yasuhiro Shimizu, Hirotada Gotou, Masayuki Itoh, Kenya Ohgushi, Taku J Sato

Abstract: A polymorph of RuI$_3$ synthesized under high pressure was found to have a two-layered honeycomb structure. The resistivity of RuI$_3$ exhibits a semimetallic behavior, in contrast to insulating properties in $α$-RuCl$_3$. In addition, Pauli paramagnetic behavior was observed in the temperature dependence of a magnetic susceptibility and a nuclear spin-lattice relaxation rate 1/$T_1$. The band str… ▽ More A polymorph of RuI$_3$ synthesized under high pressure was found to have a two-layered honeycomb structure. The resistivity of RuI$_3$ exhibits a semimetallic behavior, in contrast to insulating properties in $α$-RuCl$_3$. In addition, Pauli paramagnetic behavior was observed in the temperature dependence of a magnetic susceptibility and a nuclear spin-lattice relaxation rate 1/$T_1$. The band structure calculations indicate that contribution of the I 5$p$ components to the low-energy $t_\mathrm{2g}$ bands effectively decreases Coulomb repulsion, leading to semimetallic properties. The physical properties also suggest strong electron correlations in RuI$_3$. △ Less

Submitted 29 September, 2021; v1 submitted 27 September, 2021; originally announced September 2021.

Comments: 4 Figures

Journal ref: J. Phys. Soc. Jpn. 90, 123703 (2021)

arXiv:2107.14681 [pdf, other]

doi 10.1007/s40558-021-00203-8

Differences in Chinese and Western tourists faced with Japanese hospitality: A natural language processing approach

Authors: Elisa Claire Alemán Carreón, Hugo Alberto Mendoza España, Hirofumi Nonaka, Toru Hiraoka

Abstract: Since culture influences expectations, perceptions, and satisfaction, a cross-culture study is necessary to understand the differences between Japan's biggest tourist populations, Chinese and Western tourists. However, with ever-increasing customer populations, this is hard to accomplish without extensive customer base studies. There is a need for an automated method for identifying these expectat… ▽ More Since culture influences expectations, perceptions, and satisfaction, a cross-culture study is necessary to understand the differences between Japan's biggest tourist populations, Chinese and Western tourists. However, with ever-increasing customer populations, this is hard to accomplish without extensive customer base studies. There is a need for an automated method for identifying these expectations at a large scale. For this, we used a data-driven approach to our analysis. Our study analyzed their satisfaction factors comparing soft attributes, such as service, with hard attributes, such as location and facilities, and studied different price ranges. We collected hotel reviews and extracted keywords to classify the sentiment of sentences with an SVC. We then used dependency parsing and part-of-speech tagging to extract nouns tied to positive adjectives. We found that Chinese tourists consider room quality more than hospitality, whereas Westerners are delighted more by staff behavior. Furthermore, the lack of a Chinese-friendly environment for Chinese customers and cigarette smell for Western ones can be disappointing factors of their stay. As one of the first studies in the tourism field to use the high-standard Japanese hospitality environment for this analysis, our cross-cultural study contributes to both the theoretical understanding of satisfaction and suggests practical applications and strategies for hotel managers. △ Less

Submitted 30 July, 2021; originally announced July 2021.

Comments: Published Online at: https://link.springer.com/article/10.1007%2Fs40558-021-00203-8

Journal ref: 2021. Information Technology & Tourism, Vol. 23, No. 3, pp. 281-438

arXiv:2107.01763 [pdf]

Exploration of increasing drivers trust in a semi-autonomous vehicle through real time visualizations of collaborative driving dynamic

Authors: A. Koegel, C. Furet, T. Suzuki, Y. Klebanov, J. Hu, T. Kappeler, D. Okazaki, K. Matsui, T. Hiraoka, K. Shimono, K. Nakano, K. Honma, M. Pennington

Abstract: The Thinking Wave is an ongoing development of visualization concepts showing the real-time effort and confidence of semi-autonomous vehicle (AV) systems. Offering drivers access to this information can inform their decision making, and enable them to handle the situation accordingly and takeover when necessary. Two different visualizations have been designed, Concept one, Tidal, demonstrates the… ▽ More The Thinking Wave is an ongoing development of visualization concepts showing the real-time effort and confidence of semi-autonomous vehicle (AV) systems. Offering drivers access to this information can inform their decision making, and enable them to handle the situation accordingly and takeover when necessary. Two different visualizations have been designed, Concept one, Tidal, demonstrates the AV systems effort through intensified activity of a simple graphic which fluctuates in speed and frequency. Concept two, Tandem, displays the effort of the AV system as well as the handling dynamic and shared responsibility between the driver and the vehicle system. Working collaboratively with mobility research teams at the University of Tokyo, we are prototyping and refining the Thinking Wave and its embodiments as we work towards building a testable version integrated into a driving simulator. The development of the thinking wave aims to calibrate trust by increasing the drivers knowledge and understanding of vehicle handling capacity. By enabling transparent communication of the AV systems capacity, we hope to empower AV-skeptic drivers and keep over-trusting drivers on alert in the case of an emergency takeover situation, in order to create a safer autonomous driving experience. △ Less

Submitted 4 July, 2021; originally announced July 2021.

Comments: 8 pages, 11 figures, 2021 IEEE Intelligent Vehicles Symposium (IV21)

arXiv:2106.16106 [pdf]

How can design help enhance trust calibration in public autonomous vehicles?

Authors: Yuri Klebanov, Romi Mikulinsky, Tom Reznikov, Miles Pennington, Yoshihiro Suda, Toshihiro Hiraoka, Shoichi Kanzaki

Abstract: Trust is a multilayered concept with critical relevance when it comes to introducing new technologies. Understanding how humans will interact with complex vehicle systems and preparing for the functional, societal and psychological aspects of autonomous vehicles' entry into our cities is a pressing concern. Design tools can help calibrate the adequate and affordable level of trust needed for a saf… ▽ More Trust is a multilayered concept with critical relevance when it comes to introducing new technologies. Understanding how humans will interact with complex vehicle systems and preparing for the functional, societal and psychological aspects of autonomous vehicles' entry into our cities is a pressing concern. Design tools can help calibrate the adequate and affordable level of trust needed for a safe and positive experience. This study focuses on passenger interactions capable of enhancing the system trustworthiness and data accuracy in future shared public transportation. △ Less

Submitted 30 June, 2021; originally announced June 2021.

Comments: 4 pages, 5 figures, IV 2021 Nagoya, Trust Calibration Workshop

arXiv:2105.12410 [pdf, other]

Joint Optimization of Tokenization and Downstream Model

Authors: Tatsuya Hiraoka, Sho Takase, Kei Uchiumi, Atsushi Keyaki, Naoaki Okazaki

Abstract: Since traditional tokenizers are isolated from a downstream task and model, they cannot output an appropriate tokenization depending on the task and model, although recent studies imply that the appropriate tokenization improves the performance. In this paper, we propose a novel method to find an appropriate tokenization to a given downstream model by jointly optimizing a tokenizer and the model.… ▽ More Since traditional tokenizers are isolated from a downstream task and model, they cannot output an appropriate tokenization depending on the task and model, although recent studies imply that the appropriate tokenization improves the performance. In this paper, we propose a novel method to find an appropriate tokenization to a given downstream model by jointly optimizing a tokenizer and the model. The proposed method has no restriction except for using loss values computed by the downstream model to train the tokenizer, and thus, we can apply the proposed method to any NLP task. Moreover, the proposed method can be used to explore the appropriate tokenization for an already trained model as post-processing. Therefore, the proposed method is applicable to various situations. We evaluated whether our method contributes to improving performance on text classification in three languages and machine translation in eight language pairs. Experimental results show that our proposed method improves the performance by determining appropriate tokenizations. △ Less

Submitted 26 May, 2021; originally announced May 2021.

Comments: Accepted at ACL-IJCNLP 2021 Findings

arXiv:2105.11562 [pdf, other]

doi 10.1371/journal.pcbi.1009974

Adaptive and optimized COVID-19 vaccination strategies across geographical regions and age groups

Authors: Jeta Molla, Alejandro Ponce de León Chávez, Takayuki Hiraoka, Tapio Ala-Nissila, Mikko Kivelä, Lasse Leskelä

Abstract: We evaluate the efficiency of various heuristic strategies for allocating vaccines against COVID-19 and compare them to strategies found using optimal control theory. Our approach is based on a mathematical model which tracks the spread of disease among different age groups and across different geographical regions, and we introduce a method to combine age-specific contact data to geographical mov… ▽ More We evaluate the efficiency of various heuristic strategies for allocating vaccines against COVID-19 and compare them to strategies found using optimal control theory. Our approach is based on a mathematical model which tracks the spread of disease among different age groups and across different geographical regions, and we introduce a method to combine age-specific contact data to geographical movement data. As a case study, we model the epidemic in the population of mainland Finland utilizing mobility data from a major telecom operator. Our approach allows to determine which geographical regions and age groups should be targeted first in order to minimize the number of deaths. In the scenarios that we test, we find that distributing vaccines demographically and in an age-descending order is not optimal for minimizing deaths and the burden of disease. Instead, more lives could potentially be saved by using strategies which emphasize high-incidence regions and distribute vaccines in parallel to multiple age groups. The level of emphasis that high-incidence regions should be given depends on the overall transmission rate in the population. This observation highlights the importance of updating the vaccination strategy when the effective reproduction number changes due to the general contact patterns changing and new virus variants entering. △ Less

Submitted 3 December, 2021; v1 submitted 24 May, 2021; originally announced May 2021.

Comments: Revision

arXiv:2101.04828 [pdf]

doi 10.1063/5.0033459

Injection Locking and Noise Reduction of Resonant Tunneling Diode Terahertz Oscillator

Authors: Tomoki Hiraoka, Takashi Arikawa, Hiroaki Yasuda, Yuta Inose, Norihiko Sekine, Iwao Hosako, Hiroshi Ito, Koichiro Tanaka

Abstract: We studied the injection-locking properties of a resonant-tunneling-diode terahertz oscillator in the small-signal injection regime with a frequency-stabilized continuous THz wave. The linewidth of the emission spectrum dramatically decreased to less than 120 mHz (HWHM) from 4.4 MHz in the free running state as a result of the injection locking. We experimentally determined the amplitude of inject… ▽ More We studied the injection-locking properties of a resonant-tunneling-diode terahertz oscillator in the small-signal injection regime with a frequency-stabilized continuous THz wave. The linewidth of the emission spectrum dramatically decreased to less than 120 mHz (HWHM) from 4.4 MHz in the free running state as a result of the injection locking. We experimentally determined the amplitude of injection voltage at the antenna caused by the injected THz wave. The locking range was proportional to the injection amplitude and consistent with Adler's model. As increasing the injection amplitude, we observed decrease of the noise component in the power spectrum, which manifests the free-running state, and alternative increase of the injection-locked component. The noise component and the injection-locked component had the same power at the threshold injection amplitude as small as $5\times10^{-4}$ of the oscillation amplitude. This threshold behavior can be qualitatively explained by Maffezzoni's model of noise reduction in general limit-cycle oscillators. △ Less

Submitted 12 January, 2021; originally announced January 2021.

Comments: The following article has been submitted to APL Photonics

Journal ref: APL Photonics 6, 021301 (2021)

arXiv:2101.01883 [pdf, other]

Off-Policy Meta-Reinforcement Learning Based on Feature Embedding Spaces

Authors: Takahisa Imagawa, Takuya Hiraoka, Yoshimasa Tsuruoka

Abstract: Meta-reinforcement learning (RL) addresses the problem of sample inefficiency in deep RL by using experience obtained in past tasks for a new task to be solved. However, most meta-RL methods require partially or fully on-policy data, i.e., they cannot reuse the data collected by past policies, which hinders the improvement of sample efficiency. To alleviate this problem, we propose a novel off… ▽ More Meta-reinforcement learning (RL) addresses the problem of sample inefficiency in deep RL by using experience obtained in past tasks for a new task to be solved. However, most meta-RL methods require partially or fully on-policy data, i.e., they cannot reuse the data collected by past policies, which hinders the improvement of sample efficiency. To alleviate this problem, we propose a novel off-policy meta-RL method, embedding learning and evaluation of uncertainty (ELUE). An ELUE agent is characterized by the learning of a feature embedding space shared among tasks. It learns a belief model over the embedding space and a belief-conditional policy and Q-function. Then, for a new task, it collects data by the pretrained policy, and updates its belief based on the belief model. Thanks to the belief update, the performance can be improved with a small amount of data. In addition, it updates the parameters of the neural networks to adjust the pretrained relationships when there are enough data. We demonstrate that ELUE outperforms state-of-the-art meta RL methods through experiments on meta-RL benchmarks. △ Less

Submitted 6 January, 2021; originally announced January 2021.

Comments: 14pages

arXiv:2011.01562 [pdf, other]

doi 10.1103/PhysRevE.104.014312

Individual-driven versus interaction-driven burstiness in human dynamics: The case of Wikipedia edit history

Authors: Jeehye Choi, Takayuki Hiraoka, Hang-Hyun Jo

Abstract: The origin of non-Poissonian or bursty temporal patterns observed in various datasets for human social dynamics has been extensively studied, yet its understanding still remains incomplete. Considering the fact that humans are social beings, a fundamental question arises: Is the bursty human dynamics dominated by individual characteristics or by interaction between individuals? In this paper we ad… ▽ More The origin of non-Poissonian or bursty temporal patterns observed in various datasets for human social dynamics has been extensively studied, yet its understanding still remains incomplete. Considering the fact that humans are social beings, a fundamental question arises: Is the bursty human dynamics dominated by individual characteristics or by interaction between individuals? In this paper we address this question by analyzing the Wikipedia edit history to see how spontaneous individual editors are in initiating bursty periods of editing, i.e., individual-driven burstiness, and to what extent such editors' behaviors are driven by interaction with other editors in those periods, i.e., interaction-driven burstiness. We quantify the degree of initiative (DoI) of an editor of interest in each Wikipedia article by using the statistics of bursty periods containing the editor's edits. The integrated value of the DoI over all relevant timescales reveals which is dominant between individual-driven and interaction-driven burstiness. We empirically find that this value tends to be larger for weaker temporal correlations in the editor's editing behavior and/or stronger editorial correlations. These empirical findings are successfully confirmed by deriving an analytic form of the DoI from a model capturing the essential features of the edit sequence. Thus our approach provides a deeper insight into the origin and underlying mechanisms of bursts in human social dynamics. △ Less

Submitted 27 July, 2021; v1 submitted 3 November, 2020; originally announced November 2020.

Comments: 12 pages, 5 figures

Journal ref: Phys. Rev. E 104, 014312 (2021)

arXiv:2010.07522 [pdf, other]

Named Entity Recognition and Relation Extraction using Enhanced Table Filling by Contextualized Representations

Authors: Youmi Ma, Tatsuya Hiraoka, Naoaki Okazaki

Abstract: In this study, a novel method for extracting named entities and relations from unstructured text based on the table representation is presented. By using contextualized word embeddings, the proposed method computes representations for entity mentions and long-range dependencies without complicated hand-crafted features or neural-network architectures. We also adapt a tensor dot-product to predict… ▽ More In this study, a novel method for extracting named entities and relations from unstructured text based on the table representation is presented. By using contextualized word embeddings, the proposed method computes representations for entity mentions and long-range dependencies without complicated hand-crafted features or neural-network architectures. We also adapt a tensor dot-product to predict relation labels all at once without resorting to history-based predictions or search strategies. These advances significantly simplify the model and algorithm for the extraction of named entities and relations. Despite its simplicity, the experimental results demonstrate that the proposed method outperforms the state-of-the-art methods on the CoNLL04 and ACE05 English datasets. We also confirm that the proposed method achieves a comparable performance with the state-of-the-art NER models on the ACE05 datasets when multiple sentences are provided for context aggregation. △ Less

Submitted 26 January, 2022; v1 submitted 15 October, 2020; originally announced October 2020.

Comments: An extended version of this paper has been accepted at Journal of Natural Language Processing

arXiv:2006.02608 [pdf, ps, other]

Meta-Model-Based Meta-Policy Optimization

Authors: Takuya Hiraoka, Takahisa Imagawa, Voot Tangkaratt, Takayuki Osa, Takashi Onishi, Yoshimasa Tsuruoka

Abstract: Model-based meta-reinforcement learning (RL) methods have recently been shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarante… ▽ More Model-based meta-reinforcement learning (RL) methods have recently been shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarantee of model-based meta-RL methods by extending the theorems proposed by Janner et al. (2019). On the basis of our theoretical results, we propose Meta-Model-Based Meta-Policy Optimization (M3PO), a model-based meta-RL method with a performance guarantee. We demonstrate that M3PO outperforms existing meta-RL methods in continuous-control benchmarks. △ Less

Submitted 11 October, 2021; v1 submitted 3 June, 2020; originally announced June 2020.

Comments: ACML 2021. Video demo: https://drive.google.com/file/d/1DRA-pmIWnHGNv5G_gFrml8YzKCtMcGnu/view?usp=sharing URL Source code: https://github.com/TakuyaHiraoka/Meta-Model-Based-Meta-Policy-Optimization

arXiv:2004.14081 [pdf, other]

doi 10.22191/nejcs/vol2/iss1/1

Waiting-time paradox in 1922

Authors: Naoki Masuda, Takayuki Hiraoka

Abstract: We present an English translation and discussion of an essay that a Japanese physicist, Torahiko Terada, wrote in 1922. In the essay, he described the waiting-time paradox, also called the bus paradox, which is a known mathematical phenomenon in queuing theory, stochastic processes, and modern temporal network analysis. He also observed and analyzed data on Tokyo City trams to verify the relevance… ▽ More We present an English translation and discussion of an essay that a Japanese physicist, Torahiko Terada, wrote in 1922. In the essay, he described the waiting-time paradox, also called the bus paradox, which is a known mathematical phenomenon in queuing theory, stochastic processes, and modern temporal network analysis. He also observed and analyzed data on Tokyo City trams to verify the relevance of the waiting-time paradox to busy passengers in Tokyo at the time. This essay seems to be one of the earliest documentations of the waiting-time paradox in a sufficiently scientific manner. △ Less

Submitted 29 April, 2020; originally announced April 2020.

Comments: 5 figures, 2 tables

Journal ref: Northeast Journal of Complex Systems, 2(1), 1 (2020)

arXiv:1912.11212 [pdf, other]

doi 10.1103/PhysRevResearch.2.023073

Modeling temporal networks with bursty activity patterns of nodes and links

Authors: Takayuki Hiraoka, Naoki Masuda, Aming Li, Hang-Hyun Jo

Abstract: The concept of temporal networks provides a framework to understand how the interaction between system components changes over time. In empirical communication data, we often detect non-Poissonian, so-called bursty behavior in the activity of nodes as well as in the interaction between nodes. However, such reconciliation between node burstiness and link burstiness cannot be explained if the intera… ▽ More The concept of temporal networks provides a framework to understand how the interaction between system components changes over time. In empirical communication data, we often detect non-Poissonian, so-called bursty behavior in the activity of nodes as well as in the interaction between nodes. However, such reconciliation between node burstiness and link burstiness cannot be explained if the interaction processes on different links are independent of each other. This is because the activity of a node is the superposition of the interaction processes on the links incident to the node and the superposition of independent bursty point processes is not bursty in general. Here we introduce a temporal network model based on bursty node activation and show that it leads to heavy-tailed inter-event time distributions for both node dynamics and link dynamics. Our analysis indicates that activation processes intrinsic to nodes give rise to dynamical correlations across links. Our framework offers a way to model competition and correlation between links, which is key to understanding dynamical processes in various systems. △ Less

Submitted 24 December, 2019; originally announced December 2019.

Comments: 9 pages, 5 figures

Journal ref: Phys. Rev. Research 2, 023073 (2020)

arXiv:1907.13556 [pdf, other]

doi 10.1038/s41598-020-68157-1

Burst-tree decomposition of time series reveals the structure of temporal correlations

Authors: Hang-Hyun Jo, Takayuki Hiraoka, Mikko Kivelä

Abstract: Comprehensive characterization of non-Poissonian, bursty temporal patterns observed in various natural and social processes is crucial to understand the underlying mechanisms behind such temporal patterns. Among them bursty event sequences have been studied mostly in terms of interevent times (IETs), while the higher-order correlation structure between IETs has gained very little attention due to… ▽ More Comprehensive characterization of non-Poissonian, bursty temporal patterns observed in various natural and social processes is crucial to understand the underlying mechanisms behind such temporal patterns. Among them bursty event sequences have been studied mostly in terms of interevent times (IETs), while the higher-order correlation structure between IETs has gained very little attention due to the lack of a proper characterization method. In this paper we propose a method of decomposing an event sequence into a set of IETs and a burst tree, which exactly captures the structure of temporal correlations that is entirely missing in the analysis of IET distributions. We apply the burst-tree decomposition method to various datasets and analyze the structure of the revealed burst trees. In particular, we observe that event sequences show similar burst-tree structure, such as heavy-tailed burst size distributions, despite of very different IET distributions. The burst trees allow us to directly characterize the preferential and assortative mixing structure of bursts responsible for the higher-order temporal correlations. We also show how to use the decomposition method for the systematic investigation of such higher-order correlations captured by the burst trees in the framework of randomized reference models. Finally, we devise a simple kernel-based model for generating event sequences showing appropriate higher-order temporal correlations. Our method is a tool to make the otherwise overwhelming analysis of higher-order correlations in bursty time series tractable by turning it into the analysis of a tree structure. △ Less

Submitted 31 July, 2019; originally announced July 2019.

Comments: 10 pages, 4 figures

Journal ref: Scientific Reports 10, 12202 (2020)

arXiv:1907.12558 [pdf, other]

Bursty time series analysis for temporal networks

Authors: Hang-Hyun Jo, Takayuki Hiraoka

Abstract: Characterizing bursty temporal interaction patterns of temporal networks is crucial to investigate the evolution of temporal networks as well as various collective dynamics taking place in them. The temporal interaction patterns have been described by a series of interaction events or event sequences, often showing non-Poissonian or bursty nature. Such bursty event sequences can be understood not… ▽ More Characterizing bursty temporal interaction patterns of temporal networks is crucial to investigate the evolution of temporal networks as well as various collective dynamics taking place in them. The temporal interaction patterns have been described by a series of interaction events or event sequences, often showing non-Poissonian or bursty nature. Such bursty event sequences can be understood not only by heterogeneous interevent times (IETs) but also by correlations between IETs. The heterogeneities of IETs have been extensively studied in recent years, while the correlations between IETs are far from being fully explored. In this Chapter, we introduce various measures for bursty time series analysis, such as the IET distribution, the burstiness parameter, the memory coefficient, the bursty train sizes, and the autocorrelation function, to discuss the relation between those measures. Then we show that the correlations between IETs can affect the speed of spreading taking place in temporal networks. Finally, we discuss possible research topics regarding bursty time series analysis for temporal networks. △ Less

Submitted 28 July, 2019; originally announced July 2019.

Comments: 9 pages, 6 figures, a chapter to appear in Temporal Network Theory edited by P. Holme and J. Saramaki (https://www.springer.com/gp/book/9783030234942). arXiv admin note: text overlap with arXiv:1807.03169

arXiv:1906.11075 [pdf, other]

Optimistic Proximal Policy Optimization

Authors: Takahisa Imagawa, Takuya Hiraoka, Yoshimasa Tsuruoka

Abstract: Reinforcement Learning, a machine learning framework for training an autonomous agent based on rewards, has shown outstanding results in various domains. However, it is known that learning a good policy is difficult in a domain where rewards are rare. We propose a method, optimistic proximal policy optimization (OPPO) to alleviate this difficulty. OPPO considers the uncertainty of the estimated to… ▽ More Reinforcement Learning, a machine learning framework for training an autonomous agent based on rewards, has shown outstanding results in various domains. However, it is known that learning a good policy is difficult in a domain where rewards are rare. We propose a method, optimistic proximal policy optimization (OPPO) to alleviate this difficulty. OPPO considers the uncertainty of the estimated total return and optimistically evaluates the policy based on that amount. We show that OPPO outperforms the existing methods in a tabular task. △ Less

Submitted 25 June, 2019; originally announced June 2019.

Comments: Exploration in RL (workshop @ ICML2019)

arXiv:1906.03831 [pdf, other]

Explicit behaviors affected by driver's trust in a driving automation system

Authors: Hailong Liu, Toshihiro Hiraoka, Seiya Tanaka

Abstract: As various driving automation system (DAS) are commonly used in the vehicle, the over-trust in the DAS may put the driver in the risk. In order to prevent the over-trust while driving, the trust state of the driver should be recognized. However, description variables of the trust state are not distinct. This paper assumed that the outward expressions of a driver can represent the trust state of hi… ▽ More As various driving automation system (DAS) are commonly used in the vehicle, the over-trust in the DAS may put the driver in the risk. In order to prevent the over-trust while driving, the trust state of the driver should be recognized. However, description variables of the trust state are not distinct. This paper assumed that the outward expressions of a driver can represent the trust state of him/her-self. The explicit behaviors when driving with DAS is seen as those outward expressions. In the experiment, a driving simulator with a driver monitoring system was used for simulating a vehicle with the adaptive cruise control (ACC) and observing the motion information of the driver. Results show that if the driver completely trusted in the ACC, then 1) the participants were likely to put their feet far away from the pedals; 2) the operational intervention of the driver will delay in dangerous situations. In the future, a machine learning model will be tried to predict the trust state by using the motion information of the driver. △ Less

Submitted 10 June, 2019; originally announced June 2019.

Comments: 6 pages, 9 figures, accepted by the 5th International Symposium on Future Active Safety Technology toward Zero Accidents (FAST-zero-19)

arXiv:1905.09191 [pdf, ps, other]

Learning Robust Options by Conditional Value at Risk Optimization

Authors: Takuya Hiraoka, Takahisa Imagawa, Tatsuya Mori, Takashi Onishi, Yoshimasa Tsuruoka

Abstract: Options are generally learned by using an inaccurate environment model (or simulator), which contains uncertain model parameters. While there are several methods to learn options that are robust against the uncertainty of model parameters, these methods only consider either the worst case or the average (ordinary) case for learning options. This limited consideration of the cases often produces op… ▽ More Options are generally learned by using an inaccurate environment model (or simulator), which contains uncertain model parameters. While there are several methods to learn options that are robust against the uncertainty of model parameters, these methods only consider either the worst case or the average (ordinary) case for learning options. This limited consideration of the cases often produces options that do not work well in the unconsidered case. In this paper, we propose a conditional value at risk (CVaR)-based method to learn options that work well in both the average and worst cases. We extend the CVaR-based policy gradient method proposed by Chow and Ghavamzadeh (2014) to deal with robust Markov decision processes and then apply the extended method to learning robust options. We conduct experiments to evaluate our method in multi-joint robot control tasks (HopperIceBlock, Half-Cheetah, and Walker2D). Experimental results show that our method produces options that 1) give better worst-case performance than the options learned only to minimize the average-case loss, and 2) give better average-case performance than the options learned only to minimize the worst-case loss. △ Less

Submitted 31 October, 2019; v1 submitted 22 May, 2019; originally announced May 2019.

Comments: NeurIPS 2019. Video demo: https://drive.google.com/open?id=1xXgSeEa_nNG397ZkIayk3CwYPy_BPy8X Source codes: https://github.com/TakuyaHiraoka/Learning-Robust-Options-by-Conditional-Value-at-Risk-Optimization

arXiv:1905.05601 [pdf, other]

doi 10.1016/j.ifacol.2019.12.073

Saliency difference based objective evaluation method for a superimposed screen of the HUD with various background

Authors: Hailong Liu, Toshihiro Hiraoka, Takatsugu Hirayama, Dongmin Kim

Abstract: The head-up display (HUD) is an emerging device which can project information on a transparent screen. The HUD has been used in airplanes and vehicles, and it is usually placed in front of the operator's view. In the case of the vehicle, the driver can see not only various information on the HUD but also the backgrounds (driving environment) through the HUD. However, the projected information on t… ▽ More The head-up display (HUD) is an emerging device which can project information on a transparent screen. The HUD has been used in airplanes and vehicles, and it is usually placed in front of the operator's view. In the case of the vehicle, the driver can see not only various information on the HUD but also the backgrounds (driving environment) through the HUD. However, the projected information on the HUD may interfere with the colors in the background because the HUD is transparent. For example, a red message on the HUD will be less noticeable when there is an overlap between it and the red brake light from the front vehicle. As the first step to solve this issue, how to evaluate the mutual interference between the information on the HUD and backgrounds is important. Therefore, this paper proposes a method to evaluate the mutual interference based on saliency. It can be evaluated by comparing the HUD part cut from a saliency map of a measured image with the HUD image. △ Less

Submitted 13 May, 2019; originally announced May 2019.

Comments: 10 pages, 5 fighres, 1 table, accepted by IFAC-HMS 2019

arXiv:1905.01314 [pdf]

doi 10.1126/sciadv.aay1977

Selective excitation of multipolar spoof plasmons using orbital angular momentum of light

Authors: Takashi Arikawa, Tomoki Hiraoka, Shohei Morimoto, Francois Blanchard, Shuntaro Tani, Tomoko Tanaka, Kyosuke Sakai, Hiroki Kitajima, Keiji Sasaki, Koichiro Tanaka

Abstract: The nature of light-matter interaction is governed by the spatial-temporal structures of a light field and material wavefunctions. The emergence of the light beam with transverse phase vortex, or equivalently orbital angular momentum (OAM) has been providing intriguing possibilities to induce unconventional optical transitions beyond the framework of the electric dipole interaction. The uniqueness… ▽ More The nature of light-matter interaction is governed by the spatial-temporal structures of a light field and material wavefunctions. The emergence of the light beam with transverse phase vortex, or equivalently orbital angular momentum (OAM) has been providing intriguing possibilities to induce unconventional optical transitions beyond the framework of the electric dipole interaction. The uniqueness stems from the OAM transfer from light to material, as demonstrated using the bound electron of a single trapped ion. However, many aspects of the vortex light-matter interaction are still unexplored especially in solids with extended electronic states. Here, we unambiguously visualized dipole-forbidden multipolar excitations in a solid-state electron system; spoof localized surface plasmon, selectively induced by the terahertz vortex beam. The results obey the selection rules governed by the conservation of the total angular momentum, which is numerically confirmed by the electromagnetic field analysis. Our results show light's OAM can be efficiently transferred to an elementary excitation in solids. △ Less

Submitted 3 May, 2019; originally announced May 2019.

Journal ref: Science Advances 6, eaay1977 (2020)

arXiv:1905.00185 [pdf, other]

doi 10.5954/ICAROB.2018.OS5-3

Emotional Contribution Analysis of Online Reviews

Authors: Elisa Claire Alemán Carreón, Hirofumi Nonaka, Toru Hiraoka, Minoru Kumano, Takao Ito, Masaharu Hirota

Abstract: In response to the constant increase in population and tourism worldwide, there is a need for the development of cross-language market research tools that are more cost and time effective than surveys or interviews. Focusing on the Chinese tourism boom and the hotel industry in Japan, we extracted the most influential keywords in emotional judgement from Chinese online reviews of Japanese hotels i… ▽ More In response to the constant increase in population and tourism worldwide, there is a need for the development of cross-language market research tools that are more cost and time effective than surveys or interviews. Focusing on the Chinese tourism boom and the hotel industry in Japan, we extracted the most influential keywords in emotional judgement from Chinese online reviews of Japanese hotels in the portal site Ctrip. Using an entropy based mathematical model and a machine learning algorithm, we determined the words that most closely represent the demands and emotions of this customer base. △ Less

Submitted 1 May, 2019; originally announced May 2019.

Journal ref: In proceedings of the 2018 International Conference on Artificial Life and Robotics (ICAROB2018). pp. 259 - 362. Beppu, Japan (2018, February 1-4)

arXiv:1904.13214 [pdf, other]

doi 10.6084/m9.figshare.7831853

Analysis of Chinese Tourists in Japan by Text Mining of a Hotel Portal Site

Authors: Elisa Claire Alemán Carreón, Hirofumi Nonaka, Toru Hiraoka

Abstract: With an increasingly large number of Chinese tourists in Japan, the hotel industry is in need of an affordable market research tool that does not rely on expensive and time-consuming surveys or interviews. Because this problem is real and relevant to the hotel industry in Japan, and otherwise completely unexplored in other studies, we have extracted a list of potential keywords from Chinese review… ▽ More With an increasingly large number of Chinese tourists in Japan, the hotel industry is in need of an affordable market research tool that does not rely on expensive and time-consuming surveys or interviews. Because this problem is real and relevant to the hotel industry in Japan, and otherwise completely unexplored in other studies, we have extracted a list of potential keywords from Chinese reviews of Japanese hotels in the hotel portal site Ctrip1 using a mathematical model to then use them in a sentiment analysis with a machine learning classifier. While most studies that use information collected from the internet use pre-existing data analysis tools, in our study, we designed the mathematical model to have the highest possible performing results in classification, while also exploring on the potential business implications these may have. △ Less

Submitted 1 May, 2019; v1 submitted 24 April, 2019; originally announced April 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1904.11797, arXiv:1904.13213, arXiv:1904.12039

Journal ref: In proceedings of the 18th International Symposium on Advanced Intelligent Systems (ISIS2017), pp. 191 - 198. Daegu, South Korea (2017, October 12)

arXiv:1904.13213 [pdf]

doi 10.6084/m9.figshare.8026778

Topic Classification Method for Analyzing Effect of eWOM on Consumer Game Sales

Authors: Yoshiki Horii, Hirofumi Nonaka, Elisa Claire Alemán Carreón, Hiroki Horino, Toru Hiraoka

Abstract: Electronic word-of-mouth (eWOM) has become an important resource for the analysis of marketing research. In this study, in order to analyze user needs for consumer game software, we focus on tweet data. And we proposed topic extraction method using entropy-based feature selection based feature expansion. We also applied it to the classification of the data extracted from tweet data by using SVM. A… ▽ More Electronic word-of-mouth (eWOM) has become an important resource for the analysis of marketing research. In this study, in order to analyze user needs for consumer game software, we focus on tweet data. And we proposed topic extraction method using entropy-based feature selection based feature expansion. We also applied it to the classification of the data extracted from tweet data by using SVM. As a result, we achieved a 0.63 F-measure. △ Less

Submitted 23 April, 2019; originally announced April 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1904.11797, arXiv:1904.12039, 1904.13214

Journal ref: In proceedings of the Joint 10th International Conference on Soft Computing and Intelligent Systems and 19th International Symposium on Advanced Intelligent Systems in conjunction with Intelligent Systems Workshop 2018 (2018 SCIS-ISIS2018)

arXiv:1904.12986 [pdf]

doi 10.1109/IEEM.2018.8607487

Community Detection and Growth Potential Prediction Using the Stochastic Block Model and the Long Short-term Memory from Patent Citation Networks

Authors: Kensei Nakai, Hirofumi Nonaka, Asahi Hentona, Yuki Kanai, Takeshi Sakumoto, Shotaro Kataoka, Elisa Claire Alemán Carreón, Toru Hiraoka

Abstract: Scoring patent documents is very useful for technology management. However, conventional methods are based on static models and, thus, do not reflect the growth potential of the technology cluster of the patent. Because even if the cluster of a patent has no hope of growing, we recognize the patent is important if PageRank or other ranking score is high. Therefore, there arises a necessity of deve… ▽ More Scoring patent documents is very useful for technology management. However, conventional methods are based on static models and, thus, do not reflect the growth potential of the technology cluster of the patent. Because even if the cluster of a patent has no hope of growing, we recognize the patent is important if PageRank or other ranking score is high. Therefore, there arises a necessity of developing citation network clustering and prediction of future citations. In our research, clustering of patent citation networks by Stochastic Block Model was done with the aim of enabling corporate managers and investors to evaluate the scale and life cycle of technology. As a result, we confirmed nested SBM is appropriate for graph clustering of patent citation networks. Also, a high MAPE value was obtained and the direction accuracy achieved a value greater than 50% when predicting growth potential for each cluster by using LSTM. △ Less

Submitted 23 April, 2019; originally announced April 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1904.12040

Journal ref: In Proceedings of the 2018 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM2018). pp. 1884 - 1888. Bangkok, Thailand. December 16-19, 2018

arXiv:1904.12040 [pdf]

doi 10.1145/3281375.3281396

Community Detection and Growth Potential Prediction from Patent Citation Networks

Authors: Asahi Hentona, Takeshi Sakumoto, Hugo Alberto Mendoza España, Hirofumi Nonaka, Shotaro Kataoka, Toru Hiraoka, Kensei Nakai, Elisa Claire Alemán Carreón, Masaharu Hirota

Abstract: The scoring of patents is useful for technology management analysis. Therefore, a necessity of developing citation network clustering and prediction of future citations for practical patent scoring arises. In this paper, we propose a community detection method using the Node2vec. And in order to analyze growth potential we compare three ''time series analysis methods'', the Long Short-Term Memory… ▽ More The scoring of patents is useful for technology management analysis. Therefore, a necessity of developing citation network clustering and prediction of future citations for practical patent scoring arises. In this paper, we propose a community detection method using the Node2vec. And in order to analyze growth potential we compare three ''time series analysis methods'', the Long Short-Term Memory (LSTM), ARIMA model, and Hawkes Process. The results of our experiments, we could find common technical points from those clusters by Node2vec. Furthermore, we found that the prediction accuracy of the ARIMA model was higher than that of other models. △ Less

Submitted 23 April, 2019; originally announced April 2019.

Comments: arXiv admin note: text overlap with arXiv:1607.00653 by other authors

Journal ref: In Proceedings of the 10th International Conference on Management of Emergent Digital EcoSystems (MEDES'18). pp. 204 - 211. Tokyo, Japan. September 25-28, 2018

arXiv:1904.12039 [pdf, other]

doi 10.1145/3281375.3281395

Causal relationship between eWOM topics and profit of rural tourism at Japanese Roadside Stations "MICHINOEKI"

Authors: Elisa Claire Alemán Carreón, Tetsuro Ito, Hirofumi Nonaka, Minoru Kumano, Toru Hiraoka, Masaharu Hirota

Abstract: Affected by urbanization, centralization and the decrease of overall population, Japan has been making efforts to revitalize the rural areas across the country. One particular effort is to increase tourism to these rural areas via regional branding, using local farm products as tourist attractions across Japan. Particularly, a program subsidized by the government called Michinoeki, which stands fo… ▽ More Affected by urbanization, centralization and the decrease of overall population, Japan has been making efforts to revitalize the rural areas across the country. One particular effort is to increase tourism to these rural areas via regional branding, using local farm products as tourist attractions across Japan. Particularly, a program subsidized by the government called Michinoeki, which stands for 'roadside station', was created 20 years ago and it strives to provide a safe and comfortable space for cultural interaction between road travelers and the local community, as well as offering refreshment, and relevant information to travelers. However, despite its importance in the revitalization of the Japanese economy, studies with newer technologies and methodologies are lacking. Using sales data from establishments in the Kyushu area of Japan, we used Support Vector to classify content from Twitter into relevant topics and studied their causal relationship to the sales for each establishment using LiNGAM, a linear non-gaussian acyclic model built for causal structure analysis, to perform an improved market analysis considering more than just correlation. Under the hypotheses stated by the LiNGAM model, we discovered a positive causal relationship between the number of tweets mentioning those establishments, specially mentioning deserts, a need for better access and traf^ic options, and a potentially untapped customer base in motorcycle biker groups. △ Less

Submitted 1 May, 2019; v1 submitted 25 April, 2019; originally announced April 2019.

Journal ref: In Proceedings of the 10th International Conference on Management of Emergent Digital EcoSystems (MEDES'18). pp. 212 - 218. Tokyo, Japan. September 25-28, 2018

arXiv:1904.11797 [pdf]

doi 10.1109/IEEM.2017.8290312

Development of an Entropy-Based Feature Selection Method and Analysis of Online Reviews on Real Estate

Authors: Hiroki Horino, Hirofumi Nonaka, Elisa Claire Alemán Carreón, Toru Hiraoka

Abstract: In recent years, data posted about real estate on the Internet is currently increasing. In this study, in order to analyze user needs for real estate, we focus on "Mansion Community" which is a Japanese bulletin board system (hereinafter referred to as BBS) about Japanese real estate. In our study, extraction of keywords is performed based on the calculation of the entropy value of each word, and… ▽ More In recent years, data posted about real estate on the Internet is currently increasing. In this study, in order to analyze user needs for real estate, we focus on "Mansion Community" which is a Japanese bulletin board system (hereinafter referred to as BBS) about Japanese real estate. In our study, extraction of keywords is performed based on the calculation of the entropy value of each word, and we used them as features in a machine learning classifier to analyze 6 million posts at "Mansion Community". As a result, we achieved a 0.69 F-measure and found that the customers are particularly concerned about the facility of apartment, access, and price of an apartment. △ Less

Submitted 23 April, 2019; originally announced April 2019.

Journal ref: In proceedings of the 2017 IEEE International Conference on Industrial Engineering & Engineering Management (2017 IEEE IEEM). pp. 2351 - 2355. Singapore, (2017, December, 12)

arXiv:1904.08795 [pdf, other]

doi 10.1103/PhysRevE.100.022307

Copula-based algorithm for generating bursty time series

Authors: Hang-Hyun Jo, Byoung-Hwa Lee, Takayuki Hiraoka, Woo-Sung Jung

Abstract: Dynamical processes in various natural and social phenomena have been described by a series of events or event sequences showing non-Poissonian, bursty temporal patterns. Temporal correlations in such bursty time series can be understood not only by heterogeneous interevent times (IETs) but also by correlations between IETs. Modeling and simulating various dynamical processes requires us to genera… ▽ More Dynamical processes in various natural and social phenomena have been described by a series of events or event sequences showing non-Poissonian, bursty temporal patterns. Temporal correlations in such bursty time series can be understood not only by heterogeneous interevent times (IETs) but also by correlations between IETs. Modeling and simulating various dynamical processes requires us to generate event sequences with a heavy-tailed IET distribution and memory effects between IETs. For this, we propose a Farlie-Gumbel-Morgenstern copula-based algorithm for generating event sequences with correlated IETs when the IET distribution and the memory coefficient between two consecutive IETs are given. We successfully apply our algorithm to the cases with heavy-tailed IET distributions. We also compare our algorithm to the existing shuffling method to find that our algorithm outperforms the shuffling method for some cases. Our copula-based algorithm is expected to be used for more realistic modeling of various dynamical processes. △ Less

Submitted 14 August, 2019; v1 submitted 18 April, 2019; originally announced April 2019.

Comments: 7 pages, 4 figures

Journal ref: Phys. Rev. E 100, 022307 (2019)

arXiv:1812.08996 [pdf, other]

Driving behavior model considering driver's over-trust in driving automation system

Authors: Hailong Liu, Toshihiro Hiraoka

Abstract: Levels one to three of driving automation systems~(DAS) are spreading fast. However, as the DAS functions become more and more sophisticated, not only the driver's driving skills will reduce, but also the problem of over-trust will become serious. If a driver has over-trust in the DAS, he/she will become not aware of hazards in time. To prevent the driver's over-trust in the DAS, this paper discus… ▽ More Levels one to three of driving automation systems~(DAS) are spreading fast. However, as the DAS functions become more and more sophisticated, not only the driver's driving skills will reduce, but also the problem of over-trust will become serious. If a driver has over-trust in the DAS, he/she will become not aware of hazards in time. To prevent the driver's over-trust in the DAS, this paper discusses the followings: 1) the definition of over-trust in the DAS, 2) a hypothesis of occurrence condition and occurrence process of over-trust in the DAS, and 3) a driving behavior model based on the trust in the DAS, the risk homeostasis theory, and the over-trust prevention human-machine interface. △ Less

Submitted 19 June, 2019; v1 submitted 21 December, 2018; originally announced December 2018.

Comments: 10 pages, 1 table, 4 figures

arXiv:1811.10728 [pdf, other]

Optimization of Information-Seeking Dialogue Strategy for Argumentation-Based Dialogue System

Authors: Hisao Katsumi, Takuya Hiraoka, Koichiro Yoshino, Kazeto Yamamoto, Shota Motoura, Kunihiko Sadamasa, Satoshi Nakamura

Abstract: Argumentation-based dialogue systems, which can handle and exchange arguments through dialogue, have been widely researched. It is required that these systems have sufficient supporting information to argue their claims rationally; however, the systems often do not have enough of such information in realistic situations. One way to fill in the gap is acquiring such missing information from dialogu… ▽ More Argumentation-based dialogue systems, which can handle and exchange arguments through dialogue, have been widely researched. It is required that these systems have sufficient supporting information to argue their claims rationally; however, the systems often do not have enough of such information in realistic situations. One way to fill in the gap is acquiring such missing information from dialogue partners (information-seeking dialogue). Existing information-seeking dialogue systems are based on handcrafted dialogue strategies that exhaustively examine missing information. However, the proposed strategies are not specialized in collecting information for constructing rational arguments. Moreover, the number of system's inquiry candidates grows in accordance with the size of the argument set that the system deal with. In this paper, we formalize the process of information-seeking dialogue as Markov decision processes (MDPs) and apply deep reinforcement learning (DRL) for automatically optimizing a dialogue strategy. By utilizing DRL, our dialogue strategy can successfully minimize objective functions, the number of turns it takes for our system to collect necessary information in a dialogue. We conducted dialogue experiments using two datasets from different domains of argumentative dialogue. Experimental results show that the proposed formalization based on MDP works well, and the policy optimized by DRL outperformed existing heuristic dialogue strategies. △ Less

Submitted 26 November, 2018; originally announced November 2018.

Comments: Accepted by AAAI2019 DEEP-DIAL 2019 workshop

arXiv:1810.00177 [pdf, ps, other]

Refining Manually-Designed Symbol Grounding and High-Level Planning by Policy Gradients

Authors: Takuya Hiraoka, Takashi Onishi, Takahisa Imagawa, Yoshimasa Tsuruoka

Abstract: Hierarchical planners that produce interpretable and appropriate plans are desired, especially in its application to supporting human decision making. In the typical development of the hierarchical planners, higher-level planners and symbol grounding functions are manually created, and this manual creation requires much human effort. In this paper, we propose a framework that can automatically ref… ▽ More Hierarchical planners that produce interpretable and appropriate plans are desired, especially in its application to supporting human decision making. In the typical development of the hierarchical planners, higher-level planners and symbol grounding functions are manually created, and this manual creation requires much human effort. In this paper, we propose a framework that can automatically refine symbol grounding functions and a high-level planner to reduce human effort for designing these modules. In our framework, symbol grounding and high-level planning, which are based on manually-designed knowledge bases, are modeled with semi-Markov decision processes. A policy gradient method is then applied to refine the modules, in which two terms for updating the modules are considered. The first term, called a reinforcement term, contributes to updating the modules to improve the overall performance of a hierarchical planner to produce appropriate plans. The second term, called a penalty term, contributes to keeping refined modules consistent with the manually-designed original modules. Namely, it keeps the planner, which uses the refined modules, producing interpretable plans. We perform preliminary experiments to solve the Mountain car problem, and its results show that a manually-designed high-level planner and symbol grounding function were successfully refined by our framework. △ Less

Submitted 29 September, 2018; originally announced October 2018.

Comments: presented at the IJCAI-ICAI 2018 workshop on Learning & Reasoning (L&R 2018)

Showing 1–50 of 58 results for author: Hiraoka, T