-
Towards Temporal Change Explanations from Bi-Temporal Satellite Images
Authors:
Ryo Tsujimoto,
Hiroki Ouchi,
Hidetaka Kamigaito,
Taro Watanabe
Abstract:
Explaining temporal changes between satellite images taken at different times is important for urban planning and environmental monitoring. However, manual dataset construction for the task is costly, so human-AI collaboration is promissing. Toward the direction, in this paper, we investigate the ability of Large-scale Vision-Language Models (LVLMs) to explain temporal changes between satellite im…
▽ More
Explaining temporal changes between satellite images taken at different times is important for urban planning and environmental monitoring. However, manual dataset construction for the task is costly, so human-AI collaboration is promissing. Toward the direction, in this paper, we investigate the ability of Large-scale Vision-Language Models (LVLMs) to explain temporal changes between satellite images. While LVLMs are known to generate good image captions, they receive only a single image as input. To deal with a par of satellite images as input, we propose three prompting methods. Through human evaluation, we found the effectiveness of our step-by-step reasoning based prompting.
△ Less
Submitted 27 June, 2024;
originally announced July 2024.
-
Commissioning of a compact multibend achromat lattice: A new 3 GeV synchrotron radiation facility
Authors:
Shuhei Obara,
Kota Ueshima,
Takao Asaka,
Yuji Hosaka,
Koichi Kan,
Nobuyuki Nishimori,
Toshitaka Aoki,
Hiroyuki Asano,
Koichi Haga,
Yuto Iba,
Akira Ihara,
Katsumasa Ito,
Taiki Iwashita,
Masaya Kadowaki,
Rento Kanahama,
Hajime Kobayashi,
Hideki Kobayashi,
Hideo Nishihara,
Masaaki Nishikawa,
Haruhiko Oikawa,
Ryota Saida,
Keisuke Sakuraba,
Kento Sugimoto,
Masahiro Suzuki,
Kouki Takahashi
, et al. (57 additional authors not shown)
Abstract:
NanoTerasu, a new 3 GeV synchrotron light source in Japan, began user operation in April 2024. It provides high-brilliance soft to tender X-rays and covers a wide spectral range from ultraviolet to tender X-rays. Its compact storage ring with a circumference of 349 m is based on a four-bend achromat lattice to provide two straight sections in each cell for insertion devices with a natural horizont…
▽ More
NanoTerasu, a new 3 GeV synchrotron light source in Japan, began user operation in April 2024. It provides high-brilliance soft to tender X-rays and covers a wide spectral range from ultraviolet to tender X-rays. Its compact storage ring with a circumference of 349 m is based on a four-bend achromat lattice to provide two straight sections in each cell for insertion devices with a natural horizontal emittance of 1.14 nm rad, which is small enough for soft X-rays users. The NanoTerasu accelerator incorporates several innovative technologies, including a full-energy injector C-band linear accelerator with a length of 110 m, an in-vacuum off-axis injection system, a four-bend achromat with B-Q combined bending magnets, and a TM020 mode accelerating cavity with built-in higher-order-mode dampers in the storage ring. This paper presents the accelerator machine commissioning over a half-year period and our model-consistent ring optics correction. The first user operation with a stored beam current of 160 mA is also reported. We summarize the storage ring parameters obtained from the commissioning. This is helpful for estimating the effective optical properties of synchrotron radiation at NanoTerasu.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Comprehensive Gyrokinetic Study of Eigenstate Transitions in Fast Ion-Driven Electrostatic Drift Instabilities
Authors:
ByungJun Kang,
Hideo Sugama,
Tomo-Hiko Watanabe,
Masanori Nunami
Abstract:
This study comprehensively investigates fast ion-driven drift instability, extending the theory in [B. J. Kang and T. S. Hahm, Phys. Plasmas 26, 042501 (2019)]. The eigenmode equation, including the resonant contribution of passing fast ions, is derived and solved using the shooting method. Passing fast ions significantly affect the instability in weak negative shear or moderate positive shear pla…
▽ More
This study comprehensively investigates fast ion-driven drift instability, extending the theory in [B. J. Kang and T. S. Hahm, Phys. Plasmas 26, 042501 (2019)]. The eigenmode equation, including the resonant contribution of passing fast ions, is derived and solved using the shooting method. Passing fast ions significantly affect the instability in weak negative shear or moderate positive shear plasmas. Eigenstate transitions to non-ground states occur more readily in weak magnetic shear, high safety factor, and long wavelength perturbations. Linear gyrokinetic simulations using the GKV code verify the theory, showing good agreement with shooting method results. The estimated quasilinear transport indicates that the net energy flux can be inward, without contradicting the second law of thermodynamics. These findings have important implications for heating efficiency and plasma confinement in the heating process, such as Ion Cyclotron Resonance Heating (ICRH) in future fusion devices.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Time evolutions of information entropies in a one-dimensional Vlasov-Poisson system
Authors:
K. Maekaku,
H. Sugama,
T. -H. Watanabe
Abstract:
A one-dimensional Vlasov-Poisson system is considered to elucidate how the information entropies of the probability distribution functions of the electron position and velocity variables evolve in the Landau damping process. Considering the initial condition given by the Maxwellian velocity distribution with the spatial density perturbation in the form of the cosine function of the position, we de…
▽ More
A one-dimensional Vlasov-Poisson system is considered to elucidate how the information entropies of the probability distribution functions of the electron position and velocity variables evolve in the Landau damping process. Considering the initial condition given by the Maxwellian velocity distribution with the spatial density perturbation in the form of the cosine function of the position, we derive linear and quasilinear analytical solutions that accurately describe both early and late time behaviors of the distribution function and the electric field. The validity of these solutions is confirmed by comparison with numerical simulations based on contour dynamics. Using the quasilinear analytical solution, the time evolutions of the velocity distribution function and its kurtosis indicating deviation from the Gaussian distribution are evaluated with the accuracy of the squared perturbation amplitude. We also determine the time evolutions of the information entropies of the electron position and velocity variables and their mutual information. We further consider Coulomb collisions which relax the state in the late-time limit in the collision less process to the thermal equilibrium state. In this collisional relaxation process, the mutual information of the position and velocity variables decreases to zero while the total information entropy of the phase-space distribution function increases by the decrease in the mutual information and demonstrates the validity of Boltzmann's H-theorem.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Imaging reconstruction method on X-ray data of CMOS polarimeter combined with coded aperture
Authors:
Tsubasa Tamba,
Hirokazu Odaka,
Taihei Watanabe,
Toshiya Iwata,
Tomoaki Kasuga,
Atsushi Tanimoto,
Satoshi Takashima,
Masahiro Ichihashi,
Hiromasa Suzuki,
Aya Bamba
Abstract:
X-ray polarization is a powerful tool for unveiling the anisotropic characteristics of high-energy celestial objects. We present a novel imaging reconstruction method designed for hard X-ray polarimeters employing a Si CMOS sensor and coded apertures, which function as a photoelectron tracker and imaging optics, respectively. Faced with challenges posed by substantial artifacts and background nois…
▽ More
X-ray polarization is a powerful tool for unveiling the anisotropic characteristics of high-energy celestial objects. We present a novel imaging reconstruction method designed for hard X-ray polarimeters employing a Si CMOS sensor and coded apertures, which function as a photoelectron tracker and imaging optics, respectively. Faced with challenges posed by substantial artifacts and background noise in the coded aperture imaging associated with the conventional balanced correlation method, we adopt the Expectation-Maximization (EM) algorithm as the foundation of our imaging reconstruction method. The newly developed imaging reconstruction method is validated with imaging polarimetry and a series of X-ray beam experiments. The method demonstrates the capability to accurately reproduce an extended source comprising multiple segments with distinct polarization degrees. Comparative analysis exhibits a significant enhancement in imaging reconstruction accuracy compared to the balanced correlation method, with the background noise levels reduced to 17%. The outcomes of this study enhance the feasibility of Cube-Sat imaging polarimetry missions in the hard X-ray band, as the combination of Si CMOS sensors and coded apertures is a promising approach for realizing it.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Unified Interpretation of Smoothing Methods for Negative Sampling Loss Functions in Knowledge Graph Embedding
Authors:
Xincan Feng,
Hidetaka Kamigaito,
Katsuhiko Hayashi,
Taro Watanabe
Abstract:
Knowledge Graphs (KGs) are fundamental resources in knowledge-intensive tasks in NLP. Due to the limitation of manually creating KGs, KG Completion (KGC) has an important role in automatically completing KGs by scoring their links with KG Embedding (KGE). To handle many entities in training, KGE relies on Negative Sampling (NS) loss that can reduce the computational cost by sampling. Since the app…
▽ More
Knowledge Graphs (KGs) are fundamental resources in knowledge-intensive tasks in NLP. Due to the limitation of manually creating KGs, KG Completion (KGC) has an important role in automatically completing KGs by scoring their links with KG Embedding (KGE). To handle many entities in training, KGE relies on Negative Sampling (NS) loss that can reduce the computational cost by sampling. Since the appearance frequencies for each link are at most one in KGs, sparsity is an essential and inevitable problem. The NS loss is no exception. As a solution, the NS loss in KGE relies on smoothing methods like Self-Adversarial Negative Sampling (SANS) and subsampling. However, it is uncertain what kind of smoothing method is suitable for this purpose due to the lack of theoretical understanding. This paper provides theoretical interpretations of the smoothing methods for the NS loss in KGE and induces a new NS loss, Triplet Adaptive Negative Sampling (TANS), that can cover the characteristics of the conventional smoothing methods. Experimental results of TransE, DistMult, ComplEx, RotatE, HAKE, and HousE on FB15k-237, WN18RR, and YAGO3-10 datasets and their sparser subsets show the soundness of our interpretation and performance improvement by our TANS.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Change My Frame: Reframing in the Wild in r/ChangeMyView
Authors:
Arturo Martínez Peguero,
Taro Watanabe
Abstract:
Recent work in reframing, within the scope of text style transfer, has so far made use of out-of-context, task-prompted utterances in order to produce neutralizing or optimistic reframes. Our work aims to generalize reframing based on the subreddit r/ChangeMyView (CMV). We build a dataset that leverages CMV's community's interactions and conventions to identify high-value, community-recognized utt…
▽ More
Recent work in reframing, within the scope of text style transfer, has so far made use of out-of-context, task-prompted utterances in order to produce neutralizing or optimistic reframes. Our work aims to generalize reframing based on the subreddit r/ChangeMyView (CMV). We build a dataset that leverages CMV's community's interactions and conventions to identify high-value, community-recognized utterances that produce changes of perspective. With this data, we widen the scope of the direction of reframing since the changes in perspective do not only occur in neutral or positive directions. We fine tune transformer-based models, make use of a modern LLM to refine our dataset, and explore challenges in the dataset creation and evaluation around this type of reframing.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Efficient Nearest Neighbor based Uncertainty Estimation for Natural Language Processing Tasks
Authors:
Wataru Hashimoto,
Hidetaka Kamigaito,
Taro Watanabe
Abstract:
Trustworthy prediction in Deep Neural Networks (DNNs), including Pre-trained Language Models (PLMs) is important for safety-critical applications in the real world. However, DNNs often suffer from uncertainty estimation, such as miscalibration. In particular, approaches that require multiple stochastic inference can mitigate this problem, but the expensive cost of inference makes them impractical.…
▽ More
Trustworthy prediction in Deep Neural Networks (DNNs), including Pre-trained Language Models (PLMs) is important for safety-critical applications in the real world. However, DNNs often suffer from uncertainty estimation, such as miscalibration. In particular, approaches that require multiple stochastic inference can mitigate this problem, but the expensive cost of inference makes them impractical. In this study, we propose $k$-Nearest Neighbor Uncertainty Estimation ($k$NN-UE), which is an uncertainty estimation method that uses the distances from the neighbors and label-existence ratio of neighbors. Experiments on sentiment analysis, natural language inference, and named entity recognition show that our proposed method outperforms the baselines or recent density-based methods in confidence calibration, selective prediction, and out-of-distribution detection. Moreover, our analyses indicate that introducing dimension reduction or approximate nearest neighbor search inspired by recent $k$NN-LM studies reduces the inference overhead without significantly degrading estimation performance when combined them appropriately.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Are Data Augmentation Methods in Named Entity Recognition Applicable for Uncertainty Estimation?
Authors:
Wataru Hashimoto,
Hidetaka Kamigaito,
Taro Watanabe
Abstract:
This work investigates the impact of data augmentation on confidence calibration and uncertainty estimation in Named Entity Recognition (NER) tasks. For the future advance of NER in safety-critical fields like healthcare and finance, it is essential to achieve accurate predictions with calibrated confidence when applying Deep Neural Networks (DNNs), including Pre-trained Language Models (PLMs), as…
▽ More
This work investigates the impact of data augmentation on confidence calibration and uncertainty estimation in Named Entity Recognition (NER) tasks. For the future advance of NER in safety-critical fields like healthcare and finance, it is essential to achieve accurate predictions with calibrated confidence when applying Deep Neural Networks (DNNs), including Pre-trained Language Models (PLMs), as a real-world application. However, DNNs are prone to miscalibration, which limits their applicability. Moreover, existing methods for calibration and uncertainty estimation are computational expensive. Our investigation in NER found that data augmentation improves calibration and uncertainty in cross-genre and cross-lingual setting, especially in-domain setting. Furthermore, we showed that the calibration for NER tends to be more effective when the perplexity of the sentences generated by data augmentation is lower, and that increasing the size of the augmentation further improves calibration and uncertainty.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt
Authors:
Deng Cai,
Huayang Li,
Tingchen Fu,
Siheng Li,
Weiwen Xu,
Shuaiyi Li,
Bowen Cao,
Zhisong Zhang,
Xinting Huang,
Leyang Cui,
Yan Wang,
Lemao Liu,
Taro Watanabe,
Shuming Shi
Abstract:
Despite the general capabilities of pre-trained large language models (LLMs), they still need further adaptation to better serve practical applications. In this paper, we demonstrate the interchangeability of three popular and distinct adaptation tools: parameter updating, reward modeling, and in-context prompting. This interchangeability establishes a triangular framework with six transformation…
▽ More
Despite the general capabilities of pre-trained large language models (LLMs), they still need further adaptation to better serve practical applications. In this paper, we demonstrate the interchangeability of three popular and distinct adaptation tools: parameter updating, reward modeling, and in-context prompting. This interchangeability establishes a triangular framework with six transformation directions, each of which facilitates a variety of applications. Our work offers a holistic view that unifies numerous existing studies and suggests potential research directions. We envision our work as a useful roadmap for future research on LLMs.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models
Authors:
Jesse Atuhurra,
Iqra Ali,
Tatsuya Hiraoka,
Hidetaka Kamigaito,
Tomoya Iwakura,
Taro Watanabe
Abstract:
Large language models (LLMs) have increased interest in vision language models (VLMs), which process image-text pairs as input. Studies investigating the visual understanding ability of VLMs have been proposed, but such studies are still preliminary because existing datasets do not permit a comprehensive evaluation of the fine-grained visual linguistic abilities of VLMs across multiple languages.…
▽ More
Large language models (LLMs) have increased interest in vision language models (VLMs), which process image-text pairs as input. Studies investigating the visual understanding ability of VLMs have been proposed, but such studies are still preliminary because existing datasets do not permit a comprehensive evaluation of the fine-grained visual linguistic abilities of VLMs across multiple languages. To further explore the strengths of VLMs, such as GPT-4V \cite{openai2023GPT4}, we developed new datasets for the systematic and qualitative analysis of VLMs. Our contribution is four-fold: 1) we introduced nine vision-and-language (VL) tasks (including object recognition, image-text matching, and more) and constructed multilingual visual-text datasets in four languages: English, Japanese, Swahili, and Urdu through utilizing templates containing \textit{questions} and prompting GPT4-V to generate the \textit{answers} and the \textit{rationales}, 2) introduced a new VL task named \textit{unrelatedness}, 3) introduced rationales to enable human understanding of the VLM reasoning process, and 4) employed human evaluation to measure the suitability of proposed datasets for VL tasks. We show that VLMs can be fine-tuned on our datasets. Our work is the first to conduct such analyses in Swahili and Urdu. Also, it introduces \textit{rationales} in VL analysis, which played a vital role in the evaluation.
△ Less
Submitted 29 March, 2024;
originally announced June 2024.
-
Introducing Syllable Tokenization for Low-resource Languages: A Case Study with Swahili
Authors:
Jesse Atuhurra,
Hiroyuki Shindo,
Hidetaka Kamigaito,
Taro Watanabe
Abstract:
Many attempts have been made in multilingual NLP to ensure that pre-trained language models, such as mBERT or GPT2 get better and become applicable to low-resource languages. To achieve multilingualism for pre-trained language models (PLMs), we need techniques to create word embeddings that capture the linguistic characteristics of any language. Tokenization is one such technique because it allows…
▽ More
Many attempts have been made in multilingual NLP to ensure that pre-trained language models, such as mBERT or GPT2 get better and become applicable to low-resource languages. To achieve multilingualism for pre-trained language models (PLMs), we need techniques to create word embeddings that capture the linguistic characteristics of any language. Tokenization is one such technique because it allows for the words to be split based on characters or subwords, creating word embeddings that best represent the structure of the language. Creating such word embeddings is essential to applying PLMs to other languages where the model was not trained, enabling multilingual NLP. However, most PLMs use generic tokenization methods like BPE, wordpiece, or unigram which may not suit specific languages. We hypothesize that tokenization based on syllables within the input text, which we call syllable tokenization, should facilitate the development of syllable-aware language models. The syllable-aware language models make it possible to apply PLMs to languages that are rich in syllables, for instance, Swahili. Previous works introduced subword tokenization. Our work extends such efforts. Notably, we propose a syllable tokenizer and adopt an experiment-centric approach to validate the proposed tokenizer based on the Swahili language. We conducted text-generation experiments with GPT2 to evaluate the effectiveness of the syllable tokenizer. Our results show that the proposed syllable tokenizer generates syllable embeddings that effectively represent the Swahili language.
△ Less
Submitted 26 March, 2024;
originally announced June 2024.
-
Attention Score is not All You Need for Token Importance Indicator in KV Cache Reduction: Value Also Matters
Authors:
Zhiyu Guo,
Hidetaka Kamigaito,
Taro Watanabe
Abstract:
Scaling the context size of large language models (LLMs) enables them to perform various new tasks, e.g., book summarization. However, the memory cost of the Key and Value (KV) cache in attention significantly limits the practical applications of LLMs. Recent works have explored token pruning for KV cache reduction in LLMs, relying solely on attention scores as a token importance indicator. Howeve…
▽ More
Scaling the context size of large language models (LLMs) enables them to perform various new tasks, e.g., book summarization. However, the memory cost of the Key and Value (KV) cache in attention significantly limits the practical applications of LLMs. Recent works have explored token pruning for KV cache reduction in LLMs, relying solely on attention scores as a token importance indicator. However, our investigation into value vector norms revealed a notably non-uniform pattern questioning their reliance only on attention scores. Inspired by this, we propose a new method: Value-Aware Token Pruning (VATP) which uses both attention scores and the $ \ell_{1} $ norm of value vectors to evaluate token importance. Extensive experiments on LLaMA2-7B-chat and Vicuna-v1.5-7B across 16 LongBench tasks demonstrate VATP's superior performance.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation
Authors:
Zhi Qu,
Chenchen Ding,
Taro Watanabe
Abstract:
Understanding representation transfer in multilingual neural machine translation can reveal the representational issue causing the zero-shot translation deficiency. In this work, we introduce the identity pair, a sentence translated into itself, to address the lack of the base measure in multilingual investigations, as the identity pair represents the optimal state of representation among any lang…
▽ More
Understanding representation transfer in multilingual neural machine translation can reveal the representational issue causing the zero-shot translation deficiency. In this work, we introduce the identity pair, a sentence translated into itself, to address the lack of the base measure in multilingual investigations, as the identity pair represents the optimal state of representation among any language transfers. In our analysis, we demonstrate that the encoder transfers the source language to the representational subspace of the target language instead of the language-agnostic state. Thus, the zero-shot translation deficiency arises because representations are entangled with other languages and are not transferred effectively to the target language. Based on our findings, we propose two methods: 1) low-rank language-specific embedding at the encoder, and 2) language-specific contrastive learning of the representation at the decoder. The experimental results on Europarl-15, TED-19, and OPUS-100 datasets show that our methods substantially enhance the performance of zero-shot translations by improving language transfer capacity, thereby providing practical evidence to support our conclusions.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Cross-sectional shape analysis for risk assessment and prognosis of patients with true lumen narrowing after type-A aortic dissection surgery
Authors:
J V Ramana Reddy,
Toshitaka Watanabe,
Taro Hayashi,
Hiroshi Suito
Abstract:
Background: For acute type-A aortic dissection (ATAAD) surgery, early post-surgery assessment is crucially important for effective treatment plans, underscoring the need for a framework to identify the risk level of aortic dissection cases. We examined true-lumen narrowing during follow-up examinations, collected morphological data 14 days (early stages) after surgery, and assessed patient risk le…
▽ More
Background: For acute type-A aortic dissection (ATAAD) surgery, early post-surgery assessment is crucially important for effective treatment plans, underscoring the need for a framework to identify the risk level of aortic dissection cases. We examined true-lumen narrowing during follow-up examinations, collected morphological data 14 days (early stages) after surgery, and assessed patient risk levels over 2.8 years.
Purpose: To establish an implementable framework supported by mathematical techniques to predict the risk of aortic dissection patients experiencing true-lumen narrowing after ATAAD surgery.
Materials and Methods: This retrospective study analyzed CT data from 21 ATAAD patients. Forty uniformly distributed cross-sectional shapes (CSSs) are derived from each lumen to account for gradual changes in shape. We introduced the form factor (FF) to assess CSS morphology. Linear discriminant analysis (LDA) is used for the risk classification of aortic dissection patients. Leave-one-patient-out cross-validation (LOPO-CV) is used for risk prediction.
Results: For this investigation, we examined data of 21 ATAAD patients categorized into high-risk, medium-risk, and low-risk cases based on clinical observations of the range of true-lumen narrowing. Our risk classification machine-learning (ML) model preserving the model's generalizability. The model's predictions reliably identified low-risk patients, thereby potentially reducing hospital visits. It also demonstrated proficiency in accurately predicting the risk for all high-risk patients.
Conclusion: The suggested method anticipates the risk linked to aortic enlargement in patients with a narrowing true lumen in the early stage following ATAAD surgery, thereby aiding follow-up doctors in enhancing patient care.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
mCSQA: Multilingual Commonsense Reasoning Dataset with Unified Creation Strategy by Language Models and Humans
Authors:
Yusuke Sakai,
Hidetaka Kamigaito,
Taro Watanabe
Abstract:
It is very challenging to curate a dataset for language-specific knowledge and common sense in order to evaluate natural language understanding capabilities of language models. Due to the limitation in the availability of annotators, most current multilingual datasets are created through translation, which cannot evaluate such language-specific aspects. Therefore, we propose Multilingual Commonsen…
▽ More
It is very challenging to curate a dataset for language-specific knowledge and common sense in order to evaluate natural language understanding capabilities of language models. Due to the limitation in the availability of annotators, most current multilingual datasets are created through translation, which cannot evaluate such language-specific aspects. Therefore, we propose Multilingual CommonsenseQA (mCSQA) based on the construction process of CSQA but leveraging language models for a more efficient construction, e.g., by asking LM to generate questions/answers, refine answers and verify QAs followed by reduced human efforts for verification. Constructed dataset is a benchmark for cross-lingual language-transfer capabilities of multilingual LMs, and experimental results showed high language-transfer capabilities for questions that LMs could easily solve, but lower transfer capabilities for questions requiring deep knowledge or commonsense. This highlights the necessity of language-specific datasets for evaluation and training. Finally, our method demonstrated that multilingual LMs could create QA including language-specific knowledge, significantly reducing the dataset creation cost compared to manual creation. The datasets are available at https://huggingface.co/datasets/yusuke1997/mCSQA.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
On the $(\varphi,Γ)$-modules corresponding to crystalline representations
Authors:
Takumi Watanabe
Abstract:
Let $K$ be a mixed characteristic complete discrete valuation field with perfect residue field of characteristic $p$. We construct a new linear category called the category of crystalline $(\varphi,Γ)$-modules over $\widetilde{\mathbb{A}}_K^{+}$ and show that it is equivalent to the category of crystalline $\mathbb{Z}_p$-representations of the absolute Galois group of $K$. In other words, we deter…
▽ More
Let $K$ be a mixed characteristic complete discrete valuation field with perfect residue field of characteristic $p$. We construct a new linear category called the category of crystalline $(\varphi,Γ)$-modules over $\widetilde{\mathbb{A}}_K^{+}$ and show that it is equivalent to the category of crystalline $\mathbb{Z}_p$-representations of the absolute Galois group of $K$. In other words, we determine the $(\varphi,Γ)$-modules over $\widetilde{\mathbb{A}}_K$ which correspond to crystalline representations. In a sense, this can be seen as a generalization of Wach modules in the ramified case.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Refining Coded Image in Human Vision Layer Using CNN-Based Post-Processing
Authors:
Takahiro Shindo,
Yui Tatsumi,
Taiju Watanabe,
Hiroshi Watanabe
Abstract:
Scalable image coding for both humans and machines is a technique that has gained a lot of attention recently. This technology enables the hierarchical decoding of images for human vision and image recognition models. It is a highly effective method when images need to serve both purposes. However, no research has yet incorporated the post-processing commonly used in popular image compression sche…
▽ More
Scalable image coding for both humans and machines is a technique that has gained a lot of attention recently. This technology enables the hierarchical decoding of images for human vision and image recognition models. It is a highly effective method when images need to serve both purposes. However, no research has yet incorporated the post-processing commonly used in popular image compression schemes into scalable image coding method for humans and machines. In this paper, we propose a method to enhance the quality of decoded images for humans by integrating post-processing into scalable coding scheme. Experimental results show that the post-processing improves compression performance. Furthermore, the effectiveness of the proposed method is validated through comparisons with traditional methods.
△ Less
Submitted 16 June, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
Controlling $^{229}$Th isomeric state population in a VUV transparent crystal
Authors:
Takahiro Hiraki,
Koichi Okai,
Michael Bartokos,
Kjeld Beeks,
Hiroyuki Fujimoto,
Yuta Fukunaga,
Hiromitsu Haba,
Yoshitaka Kasamatsu,
Shinji Kitao,
Adrian Leitner,
Takahiko Masuda,
Guan Ming,
Nobumoto Nagasawa,
Ryoichiro Ogake,
Martin Pimon,
Martin Pressler,
Noboru Sasao,
Fabian Schaden,
Thorsten Schumm,
Makoto Seto,
Yudai Shigekawa,
Koutaro Shimizu,
Tomas Sikorsky,
Kenji Tamasaku,
Sayuri Takatori
, et al. (5 additional authors not shown)
Abstract:
The radioisotope Th-229 is renowned for its extraordinarily low-energy, long-lived nuclear first-excited state. This isomeric state can be excited by VUV lasers and the transition from the ground state has been proposed as a reference transition for ultra-precise nuclear clocks. Such nuclear clocks will find multiple applications, ranging from fundamental physics studies to practical implementatio…
▽ More
The radioisotope Th-229 is renowned for its extraordinarily low-energy, long-lived nuclear first-excited state. This isomeric state can be excited by VUV lasers and the transition from the ground state has been proposed as a reference transition for ultra-precise nuclear clocks. Such nuclear clocks will find multiple applications, ranging from fundamental physics studies to practical implementations. Recent investigations extracted valuable constraints on the nuclear transition energy and lifetime, populating the isomer in stochastic nuclear decay of U-233 or Ac-229.
However, to assess the feasibility and performance of the (solid-state) nuclear clock concept, time-controlled excitation and depopulation of the $^{229}$Th isomer together with time-resolved monitoring of the radiative decay are imperative.
Here we report the population of the $^{229}$Th isomeric state through resonant X-ray pumping and detection of the radiative decay in a VUV transparent $^{229}$Th-doped CaF$_2$ crystal. The decay half-life is measured to $447\pm 25$ s, with a transition wavelength of $148.18 \pm 0.42$ nm and a radiative decay fraction consistent with unity. Furthermore, we report a new ``X-ray quenching'' effect which allows to de-populate the isomer on demand and effectively reduce the half-life by at least a factor 50. Such controlled quenching can be used to significantly speed up the interrogation cycle in future nuclear clock schemes.
Our results show that full control over the $^{229}$Th nuclear isomer population can be achieved in a crystal environment. In particular, non-radiative decay processes that might lead to a broadening of the isomer transition linewidth are negligible, paving the way for the development of a compact and robust solid-state nuclear clock. Further studies are needed to reveal the underlying physical mechanism of the X-ray quenching effect.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Scalable Image Coding for Humans and Machines Using Feature Fusion Network
Authors:
Takahiro Shindo,
Taiju Watanabe,
Yui Tatsumi,
Hiroshi Watanabe
Abstract:
As image recognition models become more prevalent, scalable coding methods for machines and humans gain more importance. Applications of image recognition models include traffic monitoring and farm management. In these use cases, the scalable coding method proves effective because the tasks require occasional image checking by humans. Existing image compression methods for humans and machines meet…
▽ More
As image recognition models become more prevalent, scalable coding methods for machines and humans gain more importance. Applications of image recognition models include traffic monitoring and farm management. In these use cases, the scalable coding method proves effective because the tasks require occasional image checking by humans. Existing image compression methods for humans and machines meet these requirements to some extent. However, these compression methods are effective solely for specific image recognition models. We propose a learning-based scalable image coding method for humans and machines that is compatible with numerous image recognition models. We combine an image compression model for machines with a compression model, providing additional information to facilitate image decoding for humans. The features in these compression models are fused using a feature fusion network to achieve efficient image compression. Our method's additional information compression model is adjusted to reduce the number of parameters by enabling combinations of features of different sizes in the feature fusion network. Our approach confirms that the feature fusion network efficiently combines image compression models while reducing the number of parameters. Furthermore, we demonstrate the effectiveness of the proposed scalable coding method by evaluating the image compression performance in terms of decoded image quality and bitrate.
△ Less
Submitted 16 June, 2024; v1 submitted 15 May, 2024;
originally announced May 2024.
-
Effects of vortex and anti-vortex excitations in underdoped Bi-2223 bulk single crystals
Authors:
Takao Watanabe,
Kenta Kosugi,
Nae Sasaki,
Shunpei Yamaguchi,
Takenori Fujii,
Itsuhiro Kakeya,
Toshimitsu Ito
Abstract:
To gain insights into mechanisms underlying superconducting transition in copper oxide high-transition temperature ($T_c$) superconductors, we studied transport properties of underdoped Bi$_2$Sr$_2$Ca$_2$Cu$_3$O$_{10+δ}$ (Bi-2223) bulk single crystals. The power exponent $α$ ($V \propto I^α$) reached 3 just below $T_c$, and the temperature dependence of in-plane resistivity ($ρ_{ab}$) exhibited ty…
▽ More
To gain insights into mechanisms underlying superconducting transition in copper oxide high-transition temperature ($T_c$) superconductors, we studied transport properties of underdoped Bi$_2$Sr$_2$Ca$_2$Cu$_3$O$_{10+δ}$ (Bi-2223) bulk single crystals. The power exponent $α$ ($V \propto I^α$) reached 3 just below $T_c$, and the temperature dependence of in-plane resistivity ($ρ_{ab}$) exhibited typical tailing behavior, consistent with Kosterlitz--Thouless transition characteristics. Thus, with increasing temperature, copper oxide high-$T_c$ superconductors undergo transition to the normal state because of destruction of its phase correlations, although a finite Cooper pair density exists at $T_c$.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Context-Aware Machine Translation with Source Coreference Explanation
Authors:
Huy Hien Vu,
Hidetaka Kamigaito,
Taro Watanabe
Abstract:
Despite significant improvements in enhancing the quality of translation, context-aware machine translation (MT) models underperform in many cases. One of the main reasons is that they fail to utilize the correct features from context when the context is too long or their models are overly complex. This can lead to the explain-away effect, wherein the models only consider features easier to explai…
▽ More
Despite significant improvements in enhancing the quality of translation, context-aware machine translation (MT) models underperform in many cases. One of the main reasons is that they fail to utilize the correct features from context when the context is too long or their models are overly complex. This can lead to the explain-away effect, wherein the models only consider features easier to explain predictions, resulting in inaccurate translations. To address this issue, we propose a model that explains the decisions made for translation by predicting coreference features in the input. We construct a model for input coreference by exploiting contextual features from both the input and translation output representations on top of an existing MT model. We evaluate and analyze our method in the WMT document-level translation task of English-German dataset, the English-Russian dataset, and the multilingual TED talk dataset, demonstrating an improvement of over 1.0 BLEU score when compared with other context-aware models.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Two-level adiabatic transition probability for small avoided crossings generated by tangential intersections
Authors:
Kenta Higuchi,
Takuya Watanabe
Abstract:
In this paper, the asymptotic behaviors of the transition probability for two-level avoided crossings are studied under the limit where two parameters (adiabatic parameter and energy gap parameter) tend to zero. This is a continuation of our previous works where avoided crossings are generated by tangential intersections and obey a non-adiabatic regime. The main results elucidate not only the asym…
▽ More
In this paper, the asymptotic behaviors of the transition probability for two-level avoided crossings are studied under the limit where two parameters (adiabatic parameter and energy gap parameter) tend to zero. This is a continuation of our previous works where avoided crossings are generated by tangential intersections and obey a non-adiabatic regime. The main results elucidate not only the asymptotic expansion of transition probability but also a quantum interference caused by several avoided crossings and a coexistence of two-parameter regimes arising from different vanishing orders.
△ Less
Submitted 28 May, 2024; v1 submitted 27 April, 2024;
originally announced April 2024.
-
Domain Adaptation in Intent Classification Systems: A Review
Authors:
Jesse Atuhurra,
Hidetaka Kamigaito,
Taro Watanabe,
Eric Nichols
Abstract:
Dialogue agents, which perform specific tasks, are part of the long-term goal of NLP researchers to build intelligent agents that communicate with humans in natural language. Such systems should adapt easily from one domain to another to assist users in completing tasks. Researchers have developed a broad range of techniques, objectives, and datasets for intent classification to achieve such syste…
▽ More
Dialogue agents, which perform specific tasks, are part of the long-term goal of NLP researchers to build intelligent agents that communicate with humans in natural language. Such systems should adapt easily from one domain to another to assist users in completing tasks. Researchers have developed a broad range of techniques, objectives, and datasets for intent classification to achieve such systems. Despite the progress in developing intent classification systems (ICS), a systematic review of the progress from a technical perspective is yet to be conducted. In effect, important implementation details of intent classification remain restricted and unclear, making it hard for natural language processing (NLP) researchers to develop new methods. To fill this gap, we review contemporary works in intent classification. Specifically, we conduct a thorough technical review of the datasets, domains, tasks, and methods needed to train the intent classification part of dialogue systems. Our structured analysis describes why intent classification is difficult and studies the limitations to domain adaptation while presenting opportunities for future work.
△ Less
Submitted 26 March, 2024;
originally announced April 2024.
-
Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language Pair
Authors:
Yusuke Sakai,
Mana Makinae,
Hidetaka Kamigaito,
Taro Watanabe
Abstract:
In Simultaneous Machine Translation (SiMT) systems, training with a simultaneous interpretation (SI) corpus is an effective method for achieving high-quality yet low-latency systems. However, it is very challenging to curate such a corpus due to limitations in the abilities of annotators, and hence, existing SI corpora are limited. Therefore, we propose a method to convert existing speech translat…
▽ More
In Simultaneous Machine Translation (SiMT) systems, training with a simultaneous interpretation (SI) corpus is an effective method for achieving high-quality yet low-latency systems. However, it is very challenging to curate such a corpus due to limitations in the abilities of annotators, and hence, existing SI corpora are limited. Therefore, we propose a method to convert existing speech translation corpora into interpretation-style data, maintaining the original word order and preserving the entire source content using Large Language Models (LLM-SI-Corpus). We demonstrate that fine-tuning SiMT models in text-to-text and speech-to-text settings with the LLM-SI-Corpus reduces latencies while maintaining the same level of quality as the models trained with offline datasets. The LLM-SI-Corpus is available at \url{https://github.com/yusuke1997/LLM-SI-Corpus}.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Sharing Parameter by Conjugation for Knowledge Graph Embeddings in Complex Space
Authors:
Xincan Feng,
Zhi Qu,
Yuchang Cheng,
Taro Watanabe,
Nobuhiro Yugami
Abstract:
A Knowledge Graph (KG) is the directed graphical representation of entities and relations in the real world. KG can be applied in diverse Natural Language Processing (NLP) tasks where knowledge is required. The need to scale up and complete KG automatically yields Knowledge Graph Embedding (KGE), a shallow machine learning model that is suffering from memory and training time consumption issues. T…
▽ More
A Knowledge Graph (KG) is the directed graphical representation of entities and relations in the real world. KG can be applied in diverse Natural Language Processing (NLP) tasks where knowledge is required. The need to scale up and complete KG automatically yields Knowledge Graph Embedding (KGE), a shallow machine learning model that is suffering from memory and training time consumption issues. To mitigate the computational load, we propose a parameter-sharing method, i.e., using conjugate parameters for complex numbers employed in KGE models. Our method improves memory efficiency by 2x in relation embedding while achieving comparable performance to the state-of-the-art non-conjugate models, with faster, or at least comparable, training time. We demonstrated the generalizability of our method on two best-performing KGE models $5^{\bigstar}\mathrm{E}$ and $\mathrm{ComplEx}$ on five benchmark datasets.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
JDocQA: Japanese Document Question Answering Dataset for Generative Language Models
Authors:
Eri Onami,
Shuhei Kurita,
Taiki Miyanishi,
Taro Watanabe
Abstract:
Document question answering is a task of question answering on given documents such as reports, slides, pamphlets, and websites, and it is a truly demanding task as paper and electronic forms of documents are so common in our society. This is known as a quite challenging task because it requires not only text understanding but also understanding of figures and tables, and hence visual question ans…
▽ More
Document question answering is a task of question answering on given documents such as reports, slides, pamphlets, and websites, and it is a truly demanding task as paper and electronic forms of documents are so common in our society. This is known as a quite challenging task because it requires not only text understanding but also understanding of figures and tables, and hence visual question answering (VQA) methods are often examined in addition to textual approaches. We introduce Japanese Document Question Answering (JDocQA), a large-scale document-based QA dataset, essentially requiring both visual and textual information to answer questions, which comprises 5,504 documents in PDF format and annotated 11,600 question-and-answer instances in Japanese. Each QA instance includes references to the document pages and bounding boxes for the answer clues. We incorporate multiple categories of questions and unanswerable questions from the document for realistic question-answering applications. We empirically evaluate the effectiveness of our dataset with text-based large language models (LLMs) and multimodal models. Incorporating unanswerable questions in finetuning may contribute to harnessing the so-called hallucination generation.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Bowen's formula for a rational graph-directed Markov system
Authors:
Tadashi Arimitsu,
Johannes Jaerisch,
Hiroki Sumi,
Takayuki Watanabe
Abstract:
We establish Bowen's formula for the Julia set of a non-elementary, expanding, irreducible and aperiodic rational graph-directed Markov system satisfying the backward separating condition. Towards this end, we shall prove that the associated skew product map is topologically exact on the skew product Julia set, and satisfies the density of repelling periodic points. Moreover, we give a criterion f…
▽ More
We establish Bowen's formula for the Julia set of a non-elementary, expanding, irreducible and aperiodic rational graph-directed Markov system satisfying the backward separating condition. Towards this end, we shall prove that the associated skew product map is topologically exact on the skew product Julia set, and satisfies the density of repelling periodic points. Moreover, we give a criterion for expandingness in terms of hyperbolicity.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Cross-lingual Contextualized Phrase Retrieval
Authors:
Huayang Li,
Deng Cai,
Zhi Qu,
Qu Cui,
Hidetaka Kamigaito,
Lemao Liu,
Taro Watanabe
Abstract:
Phrase-level dense retrieval has shown many appealing characteristics in downstream NLP tasks by leveraging the fine-grained information that phrases offer. In our work, we propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval, which aims to augment cross-lingual applications by addressing polysemy using context information. However, the lack of specific…
▽ More
Phrase-level dense retrieval has shown many appealing characteristics in downstream NLP tasks by leveraging the fine-grained information that phrases offer. In our work, we propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval, which aims to augment cross-lingual applications by addressing polysemy using context information. However, the lack of specific training data and models are the primary challenges to achieve our goal. As a result, we extract pairs of cross-lingual phrases using word alignment information automatically induced from parallel sentences. Subsequently, we train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning, which encourages the hidden representations of phrases with similar contexts and semantics to align closely. Comprehensive experiments on both the cross-lingual phrase retrieval task and a downstream task, i.e, machine translation, demonstrate the effectiveness of CCPR. On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher. When utilizing CCPR to augment the large-language-model-based translator, it achieves average gains of 0.7 and 1.5 in BERTScore for translations from X=>En and vice versa, respectively, on WMT16 dataset. Our code and data are available at \url{https://github.com/ghrua/ccpr_release}.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Single-Motor Robotic Gripper with Multi-Surface Fingers for Variable Grasping Configurations
Authors:
Toshihiro Nishimura,
Yosuke Suzuki,
Tokuo Tsuj,
Tetsuyou Watanabe
Abstract:
This study proposes a novel robotic gripper with variable grasping configurations for grasping various objects. The fingers of the developed gripper incorporate multiple different surfaces. The gripper possesses the function of altering the finger surfaces facing a target object by rotating the fingers in its longitudinal direction. In the proposed design equipped with two fingers, the two fingers…
▽ More
This study proposes a novel robotic gripper with variable grasping configurations for grasping various objects. The fingers of the developed gripper incorporate multiple different surfaces. The gripper possesses the function of altering the finger surfaces facing a target object by rotating the fingers in its longitudinal direction. In the proposed design equipped with two fingers, the two fingers incorporate three and four surfaces, respectively, resulting in the nine available grasping configurations by the combination of these finger surfaces. The developed gripper is equipped with the functions of opening/closing its fingers for grasping and rotating its fingers to alter the grasping configuration -all achieved with a single motor. To enable the two motions using a single motor, this study introduces a self-motion switching mechanism utilizing magnets. This mechanism automatically transitions between gripper motions based on the direction of the motor rotation when the gripper is fully opened. In this state, rotating the motor towards closing initiates the finger closing action, while further opening the fingers from the fully opened state activates the finger rotation. This letter presents the gripper design, the mechanics of the self-motion switching mechanism, the control method, and the grasping configuration selection strategy. The performance of the gripper is experimentally demonstrated.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Distilling Named Entity Recognition Models for Endangered Species from Large Language Models
Authors:
Jesse Atuhurra,
Seiveright Cargill Dujohn,
Hidetaka Kamigaito,
Hiroyuki Shindo,
Taro Watanabe
Abstract:
Natural language processing (NLP) practitioners are leveraging large language models (LLM) to create structured datasets from semi-structured and unstructured data sources such as patents, papers, and theses, without having domain-specific knowledge. At the same time, ecological experts are searching for a variety of means to preserve biodiversity. To contribute to these efforts, we focused on end…
▽ More
Natural language processing (NLP) practitioners are leveraging large language models (LLM) to create structured datasets from semi-structured and unstructured data sources such as patents, papers, and theses, without having domain-specific knowledge. At the same time, ecological experts are searching for a variety of means to preserve biodiversity. To contribute to these efforts, we focused on endangered species and through in-context learning, we distilled knowledge from GPT-4. In effect, we created datasets for both named entity recognition (NER) and relation extraction (RE) via a two-stage process: 1) we generated synthetic data from GPT-4 of four classes of endangered species, 2) humans verified the factual accuracy of the synthetic data, resulting in gold data. Eventually, our novel dataset contains a total of 3.6K sentences, evenly divided between 1.8K NER and 1.8K RE sentences. The constructed dataset was then used to fine-tune both general BERT and domain-specific BERT variants, completing the knowledge distillation process from GPT-4 to BERT, because GPT-4 is resource intensive. Experiments show that our knowledge transfer approach is effective at creating a NER model suitable for detecting endangered species from texts.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Image Coding for Machines with Edge Information Learning Using Segment Anything
Authors:
Takahiro Shindo,
Kein Yamada,
Taiju Watanabe,
Hiroshi Watanabe
Abstract:
Image Coding for Machines (ICM) is an image compression technique for image recognition.
This technique is essential due to the growing demand for image recognition AI.
In this paper, we propose a method for ICM that focuses on encoding and decoding only the edge information of object parts in an image, which we call SA-ICM.
This is an Learned Image Compression (LIC) model trained using edge…
▽ More
Image Coding for Machines (ICM) is an image compression technique for image recognition.
This technique is essential due to the growing demand for image recognition AI.
In this paper, we propose a method for ICM that focuses on encoding and decoding only the edge information of object parts in an image, which we call SA-ICM.
This is an Learned Image Compression (LIC) model trained using edge information created by Segment Anything.
Our method can be used for image recognition models with various tasks.
SA-ICM is also robust to changes in input data, making it effective for a variety of use cases.
Additionally, our method provides benefits from a privacy point of view, as it removes human facial information on the encoder's side, thus protecting one's privacy.
Furthermore, this LIC model training method can be used to train Neural Representations for Videos (NeRV), which is a video compression model.
By training NeRV using edge information created by Segment Anything, it is possible to create a NeRV that is effective for image recognition (SA-NeRV).
Experimental results confirm the advantages of SA-ICM, presenting the best performance in image compression for image recognition.
We also show that SA-NeRV is superior to ordinary NeRV in video compression for machines.
Code is available at https://github.com/final-0/SA-ICM.
△ Less
Submitted 7 June, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
Artwork Explanation in Large-scale Vision Language Models
Authors:
Kazuki Hayashi,
Yusuke Sakai,
Hidetaka Kamigaito,
Katsuhiko Hayashi,
Taro Watanabe
Abstract:
Large-scale vision-language models (LVLMs) output text from images and instructions, demonstrating advanced capabilities in text generation and comprehension. However, it has not been clarified to what extent LVLMs understand the knowledge necessary for explaining images, the complex relationships between various pieces of knowledge, and how they integrate these understandings into their explanati…
▽ More
Large-scale vision-language models (LVLMs) output text from images and instructions, demonstrating advanced capabilities in text generation and comprehension. However, it has not been clarified to what extent LVLMs understand the knowledge necessary for explaining images, the complex relationships between various pieces of knowledge, and how they integrate these understandings into their explanations. To address this issue, we propose a new task: the artwork explanation generation task, along with its evaluation dataset and metric for quantitatively assessing the understanding and utilization of knowledge about artworks. This task is apt for image description based on the premise that LVLMs are expected to have pre-existing knowledge of artworks, which are often subjects of wide recognition and documented information. It consists of two parts: generating explanations from both images and titles of artworks, and generating explanations using only images, thus evaluating the LVLMs' language-based and vision-based knowledge. Alongside, we release a training dataset for LVLMs to learn explanations that incorporate knowledge about artworks. Our findings indicate that LVLMs not only struggle with integrating language and visual information but also exhibit a more pronounced limitation in acquiring knowledge from images alone. The datasets (ExpArt=Explain Artworks) are available at https://huggingface.co/datasets/naist-nlp/ExpArt.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
Do LLMs Implicitly Determine the Suitable Text Difficulty for Users?
Authors:
Seiji Gobara,
Hidetaka Kamigaito,
Taro Watanabe
Abstract:
Education that suits the individual learning level is necessary to improve students' understanding. The first step in achieving this purpose by using large language models (LLMs) is to adjust the textual difficulty of the response to students. This work analyzes how LLMs can implicitly adjust text difficulty between user input and its generated text. To conduct the experiments, we created a new da…
▽ More
Education that suits the individual learning level is necessary to improve students' understanding. The first step in achieving this purpose by using large language models (LLMs) is to adjust the textual difficulty of the response to students. This work analyzes how LLMs can implicitly adjust text difficulty between user input and its generated text. To conduct the experiments, we created a new dataset from Stack-Overflow to explore the performance of question-answering-based conversation. Experimental results on the Stack-Overflow dataset and the TSCC dataset, including multi-turn conversation show that LLMs can implicitly handle text difficulty between user input and its generated response. We also observed that some LLMs can surpass humans in handling text difficulty and the importance of instruction-tuning.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Energy exchange between electrons and ions in ion temperature gradient turbulence
Authors:
T. Kato,
H. Sugama,
T. -H. Watanabe,
M. Nunami
Abstract:
Microturbulence in magnetic confined plasmas contributes to energy exchange between particles of different species as well as the particle and heat fluxes. Although the effect of turbulent energy exchange has not been considered significant in previous studies, it is anticipated to have a greater impact than collisional energy exchange in low collisional plasmas such as those in future fusion reac…
▽ More
Microturbulence in magnetic confined plasmas contributes to energy exchange between particles of different species as well as the particle and heat fluxes. Although the effect of turbulent energy exchange has not been considered significant in previous studies, it is anticipated to have a greater impact than collisional energy exchange in low collisional plasmas such as those in future fusion reactors. In this study, gyrokinetic simulations are performed to evaluate the energy exchange in ion temperature gradient (ITG) turbulence. The energy exchange due to the ITG turbulence mainly consists of the cooling of ions in the $\nabla B$-curvature drift motion and the heating of electrons streaming along a field line. It is found that the ITG turbulence transfers energy from ions to electrons regardless of whether the ions or electrons are hotter, which is in marked contrast to the energy transfer by Coulomb collisions. This implies that the ITG turbulence should be suppressed from the viewpoint of sustaining the high ion temperature required for fusion reactions since it prevents energy transfer from alpha-heated electrons to ions as well as enhancing ion heat transport toward the outside of the reactor. Furthermore, linear and nonlinear simulation analyses confirm the feasibility of quasilinear modeling for predicting the turbulent energy exchange in addition to the particle and heat fluxes.
△ Less
Submitted 16 June, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Evaluating Image Review Ability of Vision Language Models
Authors:
Shigeki Saito,
Kazuki Hayashi,
Yusuke Ide,
Yusuke Sakai,
Kazuma Onishi,
Toma Suzuki,
Seiji Gobara,
Hidetaka Kamigaito,
Katsuhiko Hayashi,
Taro Watanabe
Abstract:
Large-scale vision language models (LVLMs) are language models that are capable of processing images and text inputs by a single model. This paper explores the use of LVLMs to generate review texts for images. The ability of LVLMs to review images is not fully understood, highlighting the need for a methodical evaluation of their review abilities. Unlike image captions, review texts can be written…
▽ More
Large-scale vision language models (LVLMs) are language models that are capable of processing images and text inputs by a single model. This paper explores the use of LVLMs to generate review texts for images. The ability of LVLMs to review images is not fully understood, highlighting the need for a methodical evaluation of their review abilities. Unlike image captions, review texts can be written from various perspectives such as image composition and exposure. This diversity of review perspectives makes it difficult to uniquely determine a single correct review for an image. To address this challenge, we introduce an evaluation method based on rank correlation analysis, in which review texts are ranked by humans and LVLMs, then, measures the correlation between these rankings. We further validate this approach by creating a benchmark dataset aimed at assessing the image review ability of recent LVLMs. Our experiments with the dataset reveal that LVLMs, particularly those with proven superiority in other evaluative contexts, excel at distinguishing between high-quality and substandard image reviews.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Centroid-Based Efficient Minimum Bayes Risk Decoding
Authors:
Hiroyuki Deguchi,
Yusuke Sakai,
Hidetaka Kamigaito,
Taro Watanabe,
Hideki Tanaka,
Masao Utiyama
Abstract:
Minimum Bayes risk (MBR) decoding achieved state-of-the-art translation performance by using COMET, a neural metric that has a high correlation with human evaluation. However, MBR decoding requires quadratic time since it computes the expected score between a translation hypothesis and all reference translations. We propose centroid-based MBR (CBMBR) decoding to improve the speed of MBR decoding.…
▽ More
Minimum Bayes risk (MBR) decoding achieved state-of-the-art translation performance by using COMET, a neural metric that has a high correlation with human evaluation. However, MBR decoding requires quadratic time since it computes the expected score between a translation hypothesis and all reference translations. We propose centroid-based MBR (CBMBR) decoding to improve the speed of MBR decoding. Our method clusters the reference translations in the feature space, and then calculates the score using the centroids of each cluster. The experimental results show that our CBMBR not only improved the decoding speed of the expected score calculation 5.7 times, but also outperformed vanilla MBR decoding in translation quality by up to 0.5 COMET in the WMT'22 En$\leftrightarrow$Ja, En$\leftrightarrow$De, En$\leftrightarrow$Zh, and WMT'23 En$\leftrightarrow$Ja translation tasks.
△ Less
Submitted 11 June, 2024; v1 submitted 17 February, 2024;
originally announced February 2024.
-
Generating Diverse Translation with Perturbed kNN-MT
Authors:
Yuto Nishida,
Makoto Morishita,
Hidetaka Kamigaito,
Taro Watanabe
Abstract:
Generating multiple translation candidates would enable users to choose the one that satisfies their needs. Although there has been work on diversified generation, there exists room for improving the diversity mainly because the previous methods do not address the overcorrection problem -- the model underestimates a prediction that is largely different from the training data, even if that predicti…
▽ More
Generating multiple translation candidates would enable users to choose the one that satisfies their needs. Although there has been work on diversified generation, there exists room for improving the diversity mainly because the previous methods do not address the overcorrection problem -- the model underestimates a prediction that is largely different from the training data, even if that prediction is likely. This paper proposes methods that generate more diverse translations by introducing perturbed k-nearest neighbor machine translation (kNN-MT). Our methods expand the search space of kNN-MT and help incorporate diverse words into candidates by addressing the overcorrection problem. Our experiments show that the proposed methods drastically improve candidate diversity and control the degree of diversity by tuning the perturbation's magnitude.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Deriving two dualities simultaneously from a family of identities for multiple harmonic sums
Authors:
Takumi Maesaka,
Shin-ichiro Seki,
Taiki Watanabe
Abstract:
We give a new expression of the multiple harmonic sum, which serves as a refinement of the iterated integral expression of the multiple zeta value, and prove it using the so-called connected sum method. Based on this fact, by taking two kinds of limit operations, we obtain new proofs of both the duality for multiple zeta values and the duality for finite multiple zeta values.
We give a new expression of the multiple harmonic sum, which serves as a refinement of the iterated integral expression of the multiple zeta value, and prove it using the so-called connected sum method. Based on this fact, by taking two kinds of limit operations, we obtain new proofs of both the duality for multiple zeta values and the duality for finite multiple zeta values.
△ Less
Submitted 29 February, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Realized Stochastic Volatility Model with Skew-t Distributions for Improved Volatility and Quantile Forecasting
Authors:
Makoto Takahashi,
Yuta Yamauchi,
Toshiaki Watanabe,
Yasuhiro Omori
Abstract:
Forecasting volatility and quantiles of financial returns is essential for accurately measuring financial tail risks, such as value-at-risk and expected shortfall. The critical elements in these forecasts involve understanding the distribution of financial returns and accurately estimating volatility. This paper introduces an advancement to the traditional stochastic volatility model, termed the r…
▽ More
Forecasting volatility and quantiles of financial returns is essential for accurately measuring financial tail risks, such as value-at-risk and expected shortfall. The critical elements in these forecasts involve understanding the distribution of financial returns and accurately estimating volatility. This paper introduces an advancement to the traditional stochastic volatility model, termed the realized stochastic volatility model, which integrates realized volatility as a precise estimator of volatility. To capture the well-known characteristics of return distribution, namely skewness and heavy tails, we incorporate three types of skew-t distributions. Among these, two distributions include the skew-normal feature, offering enhanced flexibility in modeling the return distribution. We employ a Bayesian estimation approach using the Markov chain Monte Carlo method and apply it to major stock indices. Our empirical analysis, utilizing data from US and Japanese stock indices, indicates that the inclusion of both skewness and heavy tails in daily returns significantly improves the accuracy of volatility and quantile forecasts.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Nearly homogeneous and isotropic turbulence generated by the interaction of supersonic jets
Authors:
Takahiro Mori,
Tomoaki Watanabe,
Koji Nagata
Abstract:
This study reports the development and characterization of a multiple-supersonic-jet wind tunnel designed to investigate the decay of nearly homogeneous and isotropic turbulence in a compressible regime. The interaction of 36 supersonic jets generates turbulence that decays in the streamwise direction. The velocity field is measured with particle image velocimetry by seeding tracer particles with…
▽ More
This study reports the development and characterization of a multiple-supersonic-jet wind tunnel designed to investigate the decay of nearly homogeneous and isotropic turbulence in a compressible regime. The interaction of 36 supersonic jets generates turbulence that decays in the streamwise direction. The velocity field is measured with particle image velocimetry by seeding tracer particles with ethanol condensation. Various velocity statistics are evaluated to diagnose decaying turbulence generated by the supersonic jet interaction. The flow is initially inhomogeneous and anisotropic and possesses intermittent large-scale velocity fluctuations. The flow evolves into a statistically homogeneous and isotropic state as the mean velocity profile becomes uniform. In the nearly homogeneous and isotropic region, the ratio of root-mean-squared velocity fluctuations in the streamwise and vertical directions is about 1.08, the longitudinal integral scales are also similar in these directions, and the large-scale intermittency becomes insignificant. The turbulent kinetic energy per unit mass decays according to a power law with an exponent of about 2, larger than those reported for incompressible grid turbulence. The energy spectra in the inertial subrange agree well with other turbulent flows when normalized by the dissipation rate and kinematic viscosity. The non-dimensional dissipation rate is within a range of 0.51--0.87, which is also consistent with incompressible grid turbulence. These results demonstrate that the multiple-supersonic-jet wind tunnel is helpful in the investigation of decaying homogeneous isotropic turbulence whose generation process is strongly influenced by fluid compressibility.
△ Less
Submitted 14 January, 2024;
originally announced January 2024.
-
Backtracking New Q-Newton's method, Newton's flow, Voronoi's diagram and Stochastic root finding
Authors:
John Erik Fornaess,
Mi Hu,
Tuyen Trung Truong,
Takayuki Watanabe
Abstract:
A new variant of Newton's method - named Backtracking New Q-Newton's method (BNQN) - which has strong theoretical guarantee, is easy to implement, and has good experimental performance, was recently introduced by the third author.
Experiments performed previously showed some remarkable properties of the basins of attractions for finding roots of polynomials and meromorphic functions, with BNQN.…
▽ More
A new variant of Newton's method - named Backtracking New Q-Newton's method (BNQN) - which has strong theoretical guarantee, is easy to implement, and has good experimental performance, was recently introduced by the third author.
Experiments performed previously showed some remarkable properties of the basins of attractions for finding roots of polynomials and meromorphic functions, with BNQN. In general, they look more smooth than that of Newton's method.
In this paper, we continue to experimentally explore in depth this remarkable phenomenon, and connect BNQN to Newton's flow and Voronoi's diagram. This link poses a couple of challenging puzzles to be explained. Experiments also indicate that BNQN is more robust against random perturbations than Newton's method and Random Relaxed Newton's method.
△ Less
Submitted 8 January, 2024; v1 submitted 2 January, 2024;
originally announced January 2024.
-
Backtracking New Q-Newton's method, Schröder's theorem, and Linear Conjugacy
Authors:
John Erik Fornaess,
Mi Hu,
Tuyen Trung Truong,
Takayuki Watanabe
Abstract:
A new variant of Newton's method - named Backtracking New Q-Newton's method (BNQN) - which has strong theoretical guarantee, is easy to implement, and has good experimental performance, was recently introduced by the third author.
Experiments performed previously showed some remarkable properties of the basins of attractions for finding roots of polynomials and meromorphic functions using BNQN.…
▽ More
A new variant of Newton's method - named Backtracking New Q-Newton's method (BNQN) - which has strong theoretical guarantee, is easy to implement, and has good experimental performance, was recently introduced by the third author.
Experiments performed previously showed some remarkable properties of the basins of attractions for finding roots of polynomials and meromorphic functions using BNQN. In particular, it seems that for finding roots of polynomials of degree 2, the basins of attraction of the dynamics for BNQN are the same as that for Newton's method (the latter is the classical Schröder's result in Complex Dynamics).
In this paper, we show that indeed the picture we obtain when finding roots of polynomials of degree 2 is the same as that in Schöder's result, with a remarkable difference: on the boundary line of the basins, the dynamics of Newton's method is chaotic, while the dynamics of BNQN is more smooth. On the way to proving the result, we show that BNQN (in any dimension) is invariant under conjugation by linear operators of the form $A=cR$, where $R$ is unitary and $c>0$ a constant. This again illustrates the similarity-difference relation between BNQN and Newton's method.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Does Pre-trained Language Model Actually Infer Unseen Links in Knowledge Graph Completion?
Authors:
Yusuke Sakai,
Hidetaka Kamigaito,
Katsuhiko Hayashi,
Taro Watanabe
Abstract:
Knowledge graphs (KGs) consist of links that describe relationships between entities. Due to the difficulty of manually enumerating all relationships between entities, automatically completing them is essential for KGs. Knowledge Graph Completion (KGC) is a task that infers unseen relationships between entities in a KG. Traditional embedding-based KGC methods, such as RESCAL, TransE, DistMult, Com…
▽ More
Knowledge graphs (KGs) consist of links that describe relationships between entities. Due to the difficulty of manually enumerating all relationships between entities, automatically completing them is essential for KGs. Knowledge Graph Completion (KGC) is a task that infers unseen relationships between entities in a KG. Traditional embedding-based KGC methods, such as RESCAL, TransE, DistMult, ComplEx, RotatE, HAKE, HousE, etc., infer missing links using only the knowledge from training data. In contrast, the recent Pre-trained Language Model (PLM)-based KGC utilizes knowledge obtained during pre-training. Therefore, PLM-based KGC can estimate missing links between entities by reusing memorized knowledge from pre-training without inference. This approach is problematic because building KGC models aims to infer unseen links between entities. However, conventional evaluations in KGC do not consider inference and memorization abilities separately. Thus, a PLM-based KGC method, which achieves high performance in current KGC evaluations, may be ineffective in practical applications. To address this issue, we analyze whether PLM-based KGC methods make inferences or merely access memorized knowledge. For this purpose, we propose a method for constructing synthetic datasets specified in this analysis and conclude that PLMs acquire the inference abilities required for KGC through pre-training, even though the performance improvements mostly come from textual information of entities and relations.
△ Less
Submitted 6 June, 2024; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Eigenvalue analysis of three-state quantum walks with general coin matrices
Authors:
Jirô Akahori,
Chusei Kiumi,
Norio Konno,
Takuya Watanabe
Abstract:
Mathematical analysis on the existence of eigenvalues is vital, as it corresponds to the occurrence of localization, an exceptionally important property of quantum walks. Previous studies have demonstrated that eigenvalue analysis utilizing the transfer matrix proves beneficial for space inhomogeneous three-state quantum walks with a specific class of coin matrices, including Grover matrices. In t…
▽ More
Mathematical analysis on the existence of eigenvalues is vital, as it corresponds to the occurrence of localization, an exceptionally important property of quantum walks. Previous studies have demonstrated that eigenvalue analysis utilizing the transfer matrix proves beneficial for space inhomogeneous three-state quantum walks with a specific class of coin matrices, including Grover matrices. In this research, we turn our attention to the transfer matrix of three-state quantum walks with a general coin matrix. Building upon previous research methodologies, we dive deeper into investigating the properties of the transfer matrix and employ numerical analysis to derive eigenvalues for models that were previously unanalyzable.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
Lightweight High-Speed and High-Force Gripper for Assembly
Authors:
Toshihiro Nishimura,
Takeshi Takaki,
Yosuke Suzuki,
Tokuo Tsuji,
Tetsuyou Watanabe
Abstract:
This paper presents a novel industrial robotic gripper with a high grasping speed (maximum: 1396 mm/s), high tip force (maximum: 80 N) for grasping, large motion range, and lightweight design (0.3 kg). To realize these features, the high-speed section of the quick-return mechanism and load-sensitive continuously variable transmission mechanism are installed in the gripper. The gripper is also equi…
▽ More
This paper presents a novel industrial robotic gripper with a high grasping speed (maximum: 1396 mm/s), high tip force (maximum: 80 N) for grasping, large motion range, and lightweight design (0.3 kg). To realize these features, the high-speed section of the quick-return mechanism and load-sensitive continuously variable transmission mechanism are installed in the gripper. The gripper is also equipped with a self-centering function. The high grasping speed and self-centering function improve the cycle time in robotic operations. In addition, the high tip force is advantageous for stably grasping and assembling heavy objects. Moreover, the design of the gripper reduce the gripper's proportion of the manipulator's payload, thus increasing the weight of the object that can be grasped. The gripper performance was validated through kinematic and static analyses as well as experimental evaluations. This paper also presents the analysis of the self-centering function of the developed gripper.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
Single-Motor Robotic Gripper With Three Functional Modes for Grasping in Confined Spaces
Authors:
Toshihiro Nishimura,
Tetsuyou Watanabe
Abstract:
This study proposes a novel robotic gripper driven by a single motor. The main task is to pick up objects in confined spaces. For this purpose, the developed gripper has three operating modes: grasping, finger-bending, and pull-in modes. Using these three modes, the developed gripper can rotate and translate a grasped object, i.e., can perform in-hand manipulation. This in-hand manipulation is eff…
▽ More
This study proposes a novel robotic gripper driven by a single motor. The main task is to pick up objects in confined spaces. For this purpose, the developed gripper has three operating modes: grasping, finger-bending, and pull-in modes. Using these three modes, the developed gripper can rotate and translate a grasped object, i.e., can perform in-hand manipulation. This in-hand manipulation is effective for grasping in extremely confined spaces, such as the inside of a box in a shelf, to avoid interference between the grasped object and obstacles. To achieve the three modes using a single motor, the developed gripper is equipped with two novel self-motion switching mechanisms. These mechanisms switch their motions automatically when the motion being generated is prevented. An analysis of the mechanism and control methodology used to achieve the desired behavior are presented. Furthermore, the validity of the analysis and methodology are experimentally demonstrated. The gripper performance is also evaluated through the grasping tests.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
knn-seq: Efficient, Extensible kNN-MT Framework
Authors:
Hiroyuki Deguchi,
Hayate Hirano,
Tomoki Hoshino,
Yuto Nishida,
Justin Vasselli,
Taro Watanabe
Abstract:
k-nearest-neighbor machine translation (kNN-MT) boosts the translation quality of a pre-trained neural machine translation (NMT) model by utilizing translation examples during decoding. Translation examples are stored in a vector database, called a datastore, which contains one entry for each target token from the parallel data it is made from. Due to its size, it is computationally expensive both…
▽ More
k-nearest-neighbor machine translation (kNN-MT) boosts the translation quality of a pre-trained neural machine translation (NMT) model by utilizing translation examples during decoding. Translation examples are stored in a vector database, called a datastore, which contains one entry for each target token from the parallel data it is made from. Due to its size, it is computationally expensive both to construct and to retrieve examples from the datastore. In this paper, we present an efficient and extensible kNN-MT framework, knn-seq, for researchers and developers that is carefully designed to run efficiently, even with a billion-scale large datastore. knn-seq is developed as a plug-in on fairseq and easy to switch models and kNN indexes. Experimental results show that our implemented kNN-MT achieves a comparable gain to the original kNN-MT, and the billion-scale datastore construction took 2.21 hours in the WMT'19 German-to-English translation task. We publish our knn-seq as an MIT-licensed open-source project and the code is available on https://github.com/naist-nlp/knn-seq . The demo video is available on https://youtu.be/zTDzEOq80m0 .
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective
Authors:
Huayang Li,
Tian Lan,
Zihao Fu,
Deng Cai,
Lemao Liu,
Nigel Collier,
Taro Watanabe,
Yixuan Su
Abstract:
There are a number of diverging hypotheses about the neural text degeneration problem, i.e., generating repetitive and dull loops, which makes this problem both interesting and confusing. In this work, we aim to advance our understanding by presenting a straightforward and fundamental explanation from the data perspective. Our preliminary investigation reveals a strong correlation between the dege…
▽ More
There are a number of diverging hypotheses about the neural text degeneration problem, i.e., generating repetitive and dull loops, which makes this problem both interesting and confusing. In this work, we aim to advance our understanding by presenting a straightforward and fundamental explanation from the data perspective. Our preliminary investigation reveals a strong correlation between the degeneration issue and the presence of repetitions in training data. Subsequent experiments also demonstrate that by selectively dropping out the attention to repetitive words in training data, degeneration can be significantly minimized. Furthermore, our empirical analysis illustrates that prior works addressing the degeneration issue from various standpoints, such as the high-inflow words, the likelihood objective, and the self-reinforcement phenomenon, can be interpreted by one simple explanation. That is, penalizing the repetitions in training data is a common and fundamental factor for their effectiveness. Moreover, our experiments reveal that penalizing the repetitions in training data remains critical even when considering larger model sizes and instruction tuning.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Model-based Subsampling for Knowledge Graph Completion
Authors:
Xincan Feng,
Hidetaka Kamigaito,
Katsuhiko Hayashi,
Taro Watanabe
Abstract:
Subsampling is effective in Knowledge Graph Embedding (KGE) for reducing overfitting caused by the sparsity in Knowledge Graph (KG) datasets. However, current subsampling approaches consider only frequencies of queries that consist of entities and their relations. Thus, the existing subsampling potentially underestimates the appearance probabilities of infrequent queries even if the frequencies of…
▽ More
Subsampling is effective in Knowledge Graph Embedding (KGE) for reducing overfitting caused by the sparsity in Knowledge Graph (KG) datasets. However, current subsampling approaches consider only frequencies of queries that consist of entities and their relations. Thus, the existing subsampling potentially underestimates the appearance probabilities of infrequent queries even if the frequencies of their entities or relations are high. To address this problem, we propose Model-based Subsampling (MBS) and Mixed Subsampling (MIX) to estimate their appearance probabilities through predictions of KGE models. Evaluation results on datasets FB15k-237, WN18RR, and YAGO3-10 showed that our proposed subsampling methods actually improved the KG completion performances for popular KGE models, RotatE, TransE, HAKE, ComplEx, and DistMult.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.