Search | arXiv e-print repository

PCF Learned Sort: a Learning Augmented Sort Algorithm with $O(n \log\log n)$ Expected Complexity

Abstract: Sorting is one of the most fundamental algorithms in computer science. Recently, Learned Sorts, which use machine learning to improve sorting speed, have attracted attention. While existing studies show that Learned Sort is experimentally faster than classical sorting algorithms, they do not provide theoretical guarantees about its computational complexity. We propose PCF Learned Sort, a theoretic… ▽ More Sorting is one of the most fundamental algorithms in computer science. Recently, Learned Sorts, which use machine learning to improve sorting speed, have attracted attention. While existing studies show that Learned Sort is experimentally faster than classical sorting algorithms, they do not provide theoretical guarantees about its computational complexity. We propose PCF Learned Sort, a theoretically guaranteed Learned Sort algorithm. We prove that the expected complexity of PCF Learned Sort is $O(n \log \log n)$ under mild assumptions on the data distribution. We also confirm experimentally that PCF Learned Sort has a computational complexity of $O(n \log \log n)$ on both synthetic and real datasets. This is the first study to theoretically support the experimental success of Learned Sort, and provides evidence for why Learned Sort is fast. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.01039 [pdf, other]

A first efficient algorithm for enumerating all the extreme points of a bisubmodular polyhedron

Authors: Yasuko Matsui, Takeshi Naitoh, Ping Zhan

Abstract: Efficiently enumerating all the extreme points of a polytope identified by a system of linear inequalities is a well-known challenge issue.We consider a special case and present an algorithm that enumerates all the extreme points of a bisubmodular polyhedron in $\mathcal{O}(n^4|V|)$ time and $\mathcal{O}(n^2)$ space complexity, where $ n$ is the dimension of underlying space and $V$ is the set of… ▽ More Efficiently enumerating all the extreme points of a polytope identified by a system of linear inequalities is a well-known challenge issue.We consider a special case and present an algorithm that enumerates all the extreme points of a bisubmodular polyhedron in $\mathcal{O}(n^4|V|)$ time and $\mathcal{O}(n^2)$ space complexity, where $ n$ is the dimension of underlying space and $V$ is the set of outputs. We use the reverse search and signed poset linked to extreme points to avoid the redundant search. Our algorithm is a generalization of enumerating all the extreme points of a base polyhedron which comprises some combinatorial enumeration problems. △ Less

Submitted 3 July, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

Comments: 2 pages, 3 figures

MSC Class: G.2.1

arXiv:2404.16398 [pdf, other]

Revisiting Relevance Feedback for CLIP-based Interactive Image Retrieval

Authors: Ryoya Nara, Yu-Chieh Lin, Yuji Nozawa, Youyang Ng, Goh Itoh, Osamu Torii, Yusuke Matsui

Abstract: Many image retrieval studies use metric learning to train an image encoder. However, metric learning cannot handle differences in users' preferences, and requires data to train an image encoder. To overcome these limitations, we revisit relevance feedback, a classic technique for interactive retrieval systems, and propose an interactive CLIP-based image retrieval system with relevance feedback. Ou… ▽ More Many image retrieval studies use metric learning to train an image encoder. However, metric learning cannot handle differences in users' preferences, and requires data to train an image encoder. To overcome these limitations, we revisit relevance feedback, a classic technique for interactive retrieval systems, and propose an interactive CLIP-based image retrieval system with relevance feedback. Our retrieval system first executes the retrieval, collects each user's unique preferences through binary feedback, and returns images the user prefers. Even when users have various preferences, our retrieval system learns each user's preference through the feedback and adapts to the preference. Moreover, our retrieval system leverages CLIP's zero-shot transferability and achieves high accuracy without training. We empirically show that our retrieval system competes well with state-of-the-art metric learning in category-based image retrieval, despite not training image encoders specifically for each dataset. Furthermore, we set up two additional experimental settings where users have various preferences: one-label-based image retrieval and conditioned image retrieval. In both cases, our retrieval system effectively adapts to each user's preferences, resulting in improved accuracy compared to image retrieval without feedback. Overall, our work highlights the potential benefits of integrating CLIP with classic relevance feedback techniques to enhance image retrieval. △ Less

Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: 20 pages, 8 sugures

arXiv:2404.13993 [pdf, other]

Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion

Authors: Yingxuan Li, Ryota Hinami, Kiyoharu Aizawa, Yusuke Matsui

Abstract: Recognizing characters and predicting speakers of dialogue are critical for comic processing tasks, such as voice generation or translation. However, because characters vary by comic title, supervised learning approaches like training character classifiers which require specific annotations for each comic title are infeasible. This motivates us to propose a novel zero-shot approach, allowing machi… ▽ More Recognizing characters and predicting speakers of dialogue are critical for comic processing tasks, such as voice generation or translation. However, because characters vary by comic title, supervised learning approaches like training character classifiers which require specific annotations for each comic title are infeasible. This motivates us to propose a novel zero-shot approach, allowing machines to identify characters and predict speaker names based solely on unannotated comic images. In spite of their importance in real-world applications, these task have largely remained unexplored due to challenges in story comprehension and multimodal integration. Recent large language models (LLMs) have shown great capability for text understanding and reasoning, while their application to multimodal content analysis is still an open problem. To address this problem, we propose an iterative multimodal framework, the first to employ multimodal information for both character identification and speaker prediction tasks. Our experiments demonstrate the effectiveness of the proposed framework, establishing a robust baseline for these tasks. Furthermore, since our method requires no training data or annotations, it can be used as-is on any comic series. △ Less

Submitted 24 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.13710 [pdf, other]

SVGEditBench: A Benchmark Dataset for Quantitative Assessment of LLM's SVG Editing Capabilities

Authors: Kunato Nishina, Yusuke Matsui

Abstract: Text-to-image models have shown progress in recent years. Along with this progress, generating vector graphics from text has also advanced. SVG is a popular format for vector graphics, and SVG represents a scene with XML text. Therefore, Large Language Models can directly process SVG code. Taking this into account, we focused on editing SVG with LLMs. For quantitative evaluation of LLMs' ability t… ▽ More Text-to-image models have shown progress in recent years. Along with this progress, generating vector graphics from text has also advanced. SVG is a popular format for vector graphics, and SVG represents a scene with XML text. Therefore, Large Language Models can directly process SVG code. Taking this into account, we focused on editing SVG with LLMs. For quantitative evaluation of LLMs' ability to edit SVG, we propose SVGEditBench. SVGEditBench is a benchmark for assessing the LLMs' ability to edit SVG code. We also show the GPT-4 and GPT-3.5 results when evaluated on the proposed benchmark. In the experiments, GPT-4 showed superior performance to GPT-3.5 both quantitatively and qualitatively. The dataset is available at https://github.com/mti-lab/SVGEditBench. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: Accepted to Workshop on Graphic Design Understanding and Generation (GDUG), a CVPR2024 workshop. Dataset: https://github.com/mti-lab/SVGEditBench

arXiv:2403.19259 [pdf, other]

J-CRe3: A Japanese Conversation Dataset for Real-world Reference Resolution

Authors: Nobuhiro Ueda, Hideko Habe, Yoko Matsui, Akishige Yuguchi, Seiya Kawano, Yasutomo Kawanishi, Sadao Kurohashi, Koichiro Yoshino

Abstract: Understanding expressions that refer to the physical world is crucial for such human-assisting systems in the real world, as robots that must perform actions that are expected by users. In real-world reference resolution, a system must ground the verbal information that appears in user interactions to the visual information observed in egocentric views. To this end, we propose a multimodal referen… ▽ More Understanding expressions that refer to the physical world is crucial for such human-assisting systems in the real world, as robots that must perform actions that are expected by users. In real-world reference resolution, a system must ground the verbal information that appears in user interactions to the visual information observed in egocentric views. To this end, we propose a multimodal reference resolution task and construct a Japanese Conversation dataset for Real-world Reference Resolution (J-CRe3). Our dataset contains egocentric video and dialogue audio of real-world conversations between two people acting as a master and an assistant robot at home. The dataset is annotated with crossmodal tags between phrases in the utterances and the object bounding boxes in the video frames. These tags include indirect reference relations, such as predicate-argument structures and bridging references as well as direct reference relations. We also constructed an experimental model and clarified the challenges in multimodal reference resolution tasks. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: LREC-COLING 2024

arXiv:2403.13652 [pdf, other]

ZoDi: Zero-Shot Domain Adaptation with Diffusion-Based Image Transfer

Authors: Hiroki Azuma, Yusuke Matsui, Atsuto Maki

Abstract: Deep learning models achieve high accuracy in segmentation tasks among others, yet domain shift often degrades the models' performance, which can be critical in real-world scenarios where no target images are available. This paper proposes a zero-shot domain adaptation method based on diffusion models, called ZoDi, which is two-fold by the design: zero-shot image transfer and model adaptation. Fir… ▽ More Deep learning models achieve high accuracy in segmentation tasks among others, yet domain shift often degrades the models' performance, which can be critical in real-world scenarios where no target images are available. This paper proposes a zero-shot domain adaptation method based on diffusion models, called ZoDi, which is two-fold by the design: zero-shot image transfer and model adaptation. First, we utilize an off-the-shelf diffusion model to synthesize target-like images by transferring the domain of source images to the target domain. In this we specifically try to maintain the layout and content by utilising layout-to-image diffusion models with stochastic inversion. Secondly, we train the model using both source images and synthesized images with the original segmentation maps while maximizing the feature similarity of images from the two domains to learn domain-robust representations. Through experiments we show benefits of ZoDi in the task of image segmentation over state-of-the-art methods. It is also more applicable than existing CLIP-based methods because it assumes no specific backbone or models, and it enables to estimate the model's performance without target images by inspecting generated images. Our implementation will be publicly available. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2402.04713 [pdf, other]

Theoretical and Empirical Analysis of Adaptive Entry Point Selection for Graph-based Approximate Nearest Neighbor Search

Authors: Yutaro Oguri, Yusuke Matsui

Abstract: We present a theoretical and empirical analysis of the adaptive entry point selection for graph-based approximate nearest neighbor search (ANNS). We introduce novel concepts: $b\textit{-monotonic path}$ and $B\textit{-MSNET}$, which better capture an actual graph in practical algorithms than existing concepts like MSNET. We prove that adaptive entry point selection offers better performance upper… ▽ More We present a theoretical and empirical analysis of the adaptive entry point selection for graph-based approximate nearest neighbor search (ANNS). We introduce novel concepts: $b\textit{-monotonic path}$ and $B\textit{-MSNET}$, which better capture an actual graph in practical algorithms than existing concepts like MSNET. We prove that adaptive entry point selection offers better performance upper bound than the fixed central entry point under more general conditions than previous work. Empirically, we validate the method's effectiveness in accuracy, speed, and memory usage across various datasets, especially in challenging scenarios with out-of-distribution data and hard instances. Our comprehensive study provides deeper insights into optimizing entry points for graph-based ANNS for real-world high-dimensional data applications. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2312.10806 [pdf, other]

Cross-Lingual Learning in Multilingual Scene Text Recognition

Authors: Jeonghun Baek, Yusuke Matsui, Kiyoharu Aizawa

Abstract: In this paper, we investigate cross-lingual learning (CLL) for multilingual scene text recognition (STR). CLL transfers knowledge from one language to another. We aim to find the condition that exploits knowledge from high-resource languages for improving performance in low-resource languages. To do so, we first examine if two general insights about CLL discussed in previous works are applied to m… ▽ More In this paper, we investigate cross-lingual learning (CLL) for multilingual scene text recognition (STR). CLL transfers knowledge from one language to another. We aim to find the condition that exploits knowledge from high-resource languages for improving performance in low-resource languages. To do so, we first examine if two general insights about CLL discussed in previous works are applied to multilingual STR: (1) Joint learning with high- and low-resource languages may reduce performance on low-resource languages, and (2) CLL works best between typologically similar languages. Through extensive experiments, we show that two general insights may not be applied to multilingual STR. After that, we show that the crucial condition for CLL is the dataset size of high-resource languages regardless of the kind of high-resource languages. Our code, data, and models are available at https://github.com/ku21fan/CLL-STR. △ Less

Submitted 17 December, 2023; originally announced December 2023.

Comments: Accepted at ICASSP2024, 5 pages, 2 figures

arXiv:2311.15994 [pdf, other]

Adversarial Doodles: Interpretable and Human-drawable Attacks Provide Describable Insights

Authors: Ryoya Nara, Yusuke Matsui

Abstract: DNN-based image classification models are susceptible to adversarial attacks. Most previous adversarial attacks do not focus on the interpretability of the generated adversarial examples, and we cannot gain insights into the mechanism of the target classifier from the attacks. Therefore, we propose Adversarial Doodles, which have interpretable shapes. We optimize black bézier curves to fool the ta… ▽ More DNN-based image classification models are susceptible to adversarial attacks. Most previous adversarial attacks do not focus on the interpretability of the generated adversarial examples, and we cannot gain insights into the mechanism of the target classifier from the attacks. Therefore, we propose Adversarial Doodles, which have interpretable shapes. We optimize black bézier curves to fool the target classifier by overlaying them onto the input image. By introducing random perspective transformation and regularizing the doodled area, we obtain compact attacks that cause misclassification even when humans replicate them by hand. Adversarial doodles provide describable and intriguing insights into the relationship between our attacks and the classifier's output. We utilize adversarial doodles and discover the bias inherent in the target classifier, such as "We add two strokes on its head, a triangle onto its body, and two lines inside the triangle on a bird image. Then, the classifier misclassifies the image as a butterfly." △ Less

Submitted 27 November, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

arXiv:2310.20419 [pdf, other]

Relative NN-Descent: A Fast Index Construction for Graph-Based Approximate Nearest Neighbor Search

Authors: Naoki Ono, Yusuke Matsui

Abstract: Approximate Nearest Neighbor Search (ANNS) is the task of finding the database vector that is closest to a given query vector. Graph-based ANNS is the family of methods with the best balance of accuracy and speed for million-scale datasets. However, graph-based methods have the disadvantage of long index construction time. Recently, many researchers have improved the tradeoff between accuracy and… ▽ More Approximate Nearest Neighbor Search (ANNS) is the task of finding the database vector that is closest to a given query vector. Graph-based ANNS is the family of methods with the best balance of accuracy and speed for million-scale datasets. However, graph-based methods have the disadvantage of long index construction time. Recently, many researchers have improved the tradeoff between accuracy and speed during a search. However, there is little research on accelerating index construction. We propose a fast graph construction algorithm, Relative NN-Descent (RNN-Descent). RNN-Descent combines NN-Descent, an algorithm for constructing approximate K-nearest neighbor graphs (K-NN graphs), and RNG Strategy, an algorithm for selecting edges effective for search. This algorithm allows the direct construction of graph-based indexes without ANNS. Experimental results demonstrated that the proposed method had the fastest index construction speed, while its search performance is comparable to existing state-of-the-art methods such as NSG. For example, in experiments on the GIST1M dataset, the construction of the proposed method is 2x faster than NSG. Additionally, it was even faster than the construction speed of NN-Descent. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: Accepted by ACMMM 2023

arXiv:2309.00472 [pdf, other]

General and Practical Tuning Method for Off-the-Shelf Graph-Based Index: SISAP Indexing Challenge Report by Team UTokyo

Authors: Yutaro Oguri, Yusuke Matsui

Abstract: Despite the efficacy of graph-based algorithms for Approximate Nearest Neighbor (ANN) searches, the optimal tuning of such systems remains unclear. This study introduces a method to tune the performance of off-the-shelf graph-based indexes, focusing on the dimension of vectors, database size, and entry points of graph traversal. We utilize a black-box optimization algorithm to perform integrated t… ▽ More Despite the efficacy of graph-based algorithms for Approximate Nearest Neighbor (ANN) searches, the optimal tuning of such systems remains unclear. This study introduces a method to tune the performance of off-the-shelf graph-based indexes, focusing on the dimension of vectors, database size, and entry points of graph traversal. We utilize a black-box optimization algorithm to perform integrated tuning to meet the required levels of recall and Queries Per Second (QPS). We applied our approach to Task A of the SISAP 2023 Indexing Challenge and got second place in the 10M and 30M tracks. It improves performance substantially compared to brute force methods. This research offers a universally applicable tuning method for graph-based indexes, extending beyond the specific conditions of the competition to broader uses. △ Less

Submitted 1 September, 2023; originally announced September 2023.

Comments: Accepted paper on 2nd place solution of SISAP 2023 Indexing Challenge Task A

arXiv:2308.16426 [pdf, other]

Enumerating minimal vertex covers and dominating sets with capacity and/or connectivity constraints

Authors: Yasuaki Kobayashi, Kazuhiro Kurita, Yasuko Matsui, Hirotaka Ono

Abstract: In this paper, we consider the problems of enumerating minimal vertex covers and minimal dominating sets with capacity and/or connectivity constraints. We develop polynomial-delay enumeration algorithms for these problems on bounded-degree graphs. For the case of minimal connected vertex cover, our algorithm runs in polynomial delay even on the class of $d$-claw free graphs, which extends the resu… ▽ More In this paper, we consider the problems of enumerating minimal vertex covers and minimal dominating sets with capacity and/or connectivity constraints. We develop polynomial-delay enumeration algorithms for these problems on bounded-degree graphs. For the case of minimal connected vertex cover, our algorithm runs in polynomial delay even on the class of $d$-claw free graphs, which extends the result on bounded-degree graphs. To complement these algorithmic results, we show that the problems of enumerating minimal connected vertex covers and minimal capacitated vertex covers in bipartite graphs are at least as hard as enumerating minimal transversals in hypergraphs. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: 13 pages

arXiv:2306.17469 [pdf, other]

Manga109Dialog: A Large-scale Dialogue Dataset for Comics Speaker Detection

Authors: Yingxuan Li, Kiyoharu Aizawa, Yusuke Matsui

Abstract: The expanding market for e-comics has spurred interest in the development of automated methods to analyze comics. For further understanding of comics, an automated approach is needed to link text in comics to characters speaking the words. Comics speaker detection research has practical applications, such as automatic character assignment for audiobooks, automatic translation according to characte… ▽ More The expanding market for e-comics has spurred interest in the development of automated methods to analyze comics. For further understanding of comics, an automated approach is needed to link text in comics to characters speaking the words. Comics speaker detection research has practical applications, such as automatic character assignment for audiobooks, automatic translation according to characters' personalities, and inference of character relationships and stories. To deal with the problem of insufficient speaker-to-text annotations, we created a new annotation dataset Manga109Dialog based on Manga109. Manga109Dialog is the world's largest comics speaker annotation dataset, containing 132,692 speaker-to-text pairs. We further divided our dataset into different levels by prediction difficulties to evaluate speaker detection methods more appropriately. Unlike existing methods mainly based on distances, we propose a deep learning-based method using scene graph generation models. Due to the unique features of comics, we enhance the performance of our proposed model by considering the frame reading order. We conducted experiments using Manga109Dialog and other datasets. Experimental results demonstrate that our scene-graph-based approach outperforms existing methods, achieving a prediction accuracy of over 75%. △ Less

Submitted 22 April, 2024; v1 submitted 30 June, 2023; originally announced June 2023.

Comments: Accepted to ICME2024

arXiv:2306.02846 [pdf, other]

Fast Partitioned Learned Bloom Filter

Authors: Atsuki Sato, Yusuke Matsui

Abstract: A Bloom filter is a memory-efficient data structure for approximate membership queries used in numerous fields of computer science. Recently, learned Bloom filters that achieve better memory efficiency using machine learning models have attracted attention. One such filter, the partitioned learned Bloom filter (PLBF), achieves excellent memory efficiency. However, PLBF requires a $O(N^3k)$ time co… ▽ More A Bloom filter is a memory-efficient data structure for approximate membership queries used in numerous fields of computer science. Recently, learned Bloom filters that achieve better memory efficiency using machine learning models have attracted attention. One such filter, the partitioned learned Bloom filter (PLBF), achieves excellent memory efficiency. However, PLBF requires a $O(N^3k)$ time complexity to construct the data structure, where $N$ and $k$ are the hyperparameters of PLBF. One can improve memory efficiency by increasing $N$, but the construction time becomes extremely long. Thus, we propose two methods that can reduce the construction time while maintaining the memory efficiency of PLBF. First, we propose fast PLBF, which can construct the same data structure as PLBF with a smaller time complexity $O(N^2k)$. Second, we propose fast PLBF++, which can construct the data structure with even smaller time complexity $O(Nk\log N + Nk^2)$. Fast PLBF++ does not necessarily construct the same data structure as PLBF. Still, it is almost as memory efficient as PLBF, and it is proved that fast PLBF++ has the same data structure as PLBF when the distribution satisfies a certain constraint. Our experimental results from real-world datasets show that (i) fast PLBF and fast PLBF++ can construct the data structure up to 233 and 761 times faster than PLBF, (ii) fast PLBF can achieve the same memory efficiency as PLBF, and (iii) fast PLBF++ can achieve almost the same memory efficiency as PLBF. The codes are available at https://github.com/atsukisato/FastPLBF . △ Less

Submitted 28 October, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: NeurIPS 2023

arXiv:2304.04512 [pdf, other]

Defense-Prefix for Preventing Typographic Attacks on CLIP

Authors: Hiroki Azuma, Yusuke Matsui

Abstract: Vision-language pre-training models (VLPs) have exhibited revolutionary improvements in various vision-language tasks. In VLP, some adversarial attacks fool a model into false or absurd classifications. Previous studies addressed these attacks by fine-tuning the model or changing its architecture. However, these methods risk losing the original model's performance and are difficult to apply to dow… ▽ More Vision-language pre-training models (VLPs) have exhibited revolutionary improvements in various vision-language tasks. In VLP, some adversarial attacks fool a model into false or absurd classifications. Previous studies addressed these attacks by fine-tuning the model or changing its architecture. However, these methods risk losing the original model's performance and are difficult to apply to downstream tasks. In particular, their applicability to other tasks has not been considered. In this study, we addressed the reduction of the impact of typographic attacks on CLIP without changing the model parameters. To achieve this, we expand the idea of "prefix learning" and introduce our simple yet effective method: Defense-Prefix (DP), which inserts the DP token before a class name to make words "robust" against typographic attacks. Our method can be easily applied to downstream tasks, such as object detection, because the proposed method is independent of the model parameters. Our method significantly improves the accuracy of classification tasks for typographic attack datasets, while maintaining the zero-shot capabilities of the model. In addition, we leverage our proposed method for object detection, demonstrating its high applicability and effectiveness. The codes and datasets are available at https://github.com/azuma164/Defense-Prefix. △ Less

Submitted 6 September, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

Comments: ICCV2023 Workshop

arXiv:2212.04931 [pdf, ps, other]

doi 10.1088/1751-8121/accee7

Runge-Lenz Vector as a 3d Projection of SO(4) Moment Map in $\mathbb{R}^{4}\times\mathbb{R}^{4}$ Phase Space

Authors: Hitoshi Ikemori, Shinsaku Kitakado, Yoshimitsu Matsui, Toshiro Sato

Abstract: We show, using the methods of geometric algebra, that Runge-Lenz vector in the Kepler problem is a 3-dimensional projection of SO(4) moment map that acts on the phase space of 4-dimensional particle motion. Thus, RL vector is a consequence of geometric symmetry of $\mathbb{R}^4\times \mathbb{R}^4$ phase space. We show, using the methods of geometric algebra, that Runge-Lenz vector in the Kepler problem is a 3-dimensional projection of SO(4) moment map that acts on the phase space of 4-dimensional particle motion. Thus, RL vector is a consequence of geometric symmetry of $\mathbb{R}^4\times \mathbb{R}^4$ phase space. △ Less

Submitted 1 June, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

Comments: Corrected typographical error. The final version is published in J. Phys. A

Journal ref: J. Phys. A: Math. Theor. 56 225204 (2023)

arXiv:2210.00920 [pdf, other]

Unbiased Scene Graph Generation using Predicate Similarities

Authors: Misaki Ohashi, Yusuke Matsui

Abstract: Scene Graphs are widely applied in computer vision as a graphical representation of relationships between objects shown in images. However, these applications have not yet reached a practical stage of development owing to biased training caused by long-tailed predicate distributions. In recent years, many studies have tackled this problem. In contrast, relatively few works have considered predicat… ▽ More Scene Graphs are widely applied in computer vision as a graphical representation of relationships between objects shown in images. However, these applications have not yet reached a practical stage of development owing to biased training caused by long-tailed predicate distributions. In recent years, many studies have tackled this problem. In contrast, relatively few works have considered predicate similarities as a unique dataset feature which also leads to the biased prediction. Due to the feature, infrequent predicates (e.g., parked on, covered in) are easily misclassified as closely-related frequent predicates (e.g., on, in). Utilizing predicate similarities, we propose a new classification scheme that branches the process to several fine-grained classifiers for similar predicate groups. The classifiers aim to capture the differences among similar predicates in detail. We also introduce the idea of transfer learning to enhance the features for the predicates which lack sufficient training samples to learn the descriptive representations. The results of extensive experiments on the Visual Genome dataset show that the combination of our method and an existing debiasing approach greatly improves performance on tail predicates in challenging SGCls/SGDet tasks. Nonetheless, the overall performance of the proposed approach does not reach that of the current state of the art, so further analysis remains necessary as future work. △ Less

Submitted 3 October, 2022; originally announced October 2022.

arXiv:2208.12077 [pdf, ps, other]

doi 10.1103/PhysRevB.107.045141

Pressure suppression of the excitonic insulator state in Ta2NiSe5 observed by optical conductivity

Authors: H. Okamura, T. Mizokawa, K. Miki, Y. Matsui, N. Noguchi, N. Katayama, H. Sawa, M. Nohara, Y. Lu, H. Takagi, Y. Ikemoto, T. Moriwaki

Abstract: The layered chalcogenide Ta2NiSe5 has recently attracted much interest as a strong candidate for the long sought excitonic insulator (EI). Since the physical properties of an EI are expected to depend sensitively on the external pressure, it is important to clarify the pressure evolution of microscopic electronic state in Ta2NiSe5. Here we report the optical conductivity [s(w)] of Ta2NiSe5 measure… ▽ More The layered chalcogenide Ta2NiSe5 has recently attracted much interest as a strong candidate for the long sought excitonic insulator (EI). Since the physical properties of an EI are expected to depend sensitively on the external pressure, it is important to clarify the pressure evolution of microscopic electronic state in Ta2NiSe5. Here we report the optical conductivity [s(w)] of Ta2NiSe5 measured at high pressures to 10 GPa and at low temperatures to 8 K. With cooling at ambient pressure, s(w) develops an energy gap of about 0.17 eV and a pronounced excitonic peak at 0.38 eV, as already reported in the literature. Upon increasing pressure, the energy gap becomes narrower and the excitonic peak is broadened. Above a structural transition at Ps~3 GPa, the energy gap becomes partially filled, indicating that Ta2NiSe5 is a semimetal after the EI state is suppressed by pressure. At higher pressures, s(w) exhibits metallic characteristics with no energy gap. The detailed pressure evolution of s(w) is presented, and discussed mainly in terms of a weakening of excitonic correlation with pressure. △ Less

Submitted 25 August, 2022; originally announced August 2022.

Comments: 8 pages, 7 figures, submitted

Journal ref: Phys. Rev. B 107, 045141 (2023)

arXiv:2207.04675 [pdf, other]

COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts

Authors: Jeonghun Baek, Yusuke Matsui, Kiyoharu Aizawa

Abstract: Recognizing irregular texts has been a challenging topic in text recognition. To encourage research on this topic, we provide a novel comic onomatopoeia dataset (COO), which consists of onomatopoeia texts in Japanese comics. COO has many arbitrary texts, such as extremely curved, partially shrunk texts, or arbitrarily placed texts. Furthermore, some texts are separated into several parts. Each par… ▽ More Recognizing irregular texts has been a challenging topic in text recognition. To encourage research on this topic, we provide a novel comic onomatopoeia dataset (COO), which consists of onomatopoeia texts in Japanese comics. COO has many arbitrary texts, such as extremely curved, partially shrunk texts, or arbitrarily placed texts. Furthermore, some texts are separated into several parts. Each part is a truncated text and is not meaningful by itself. These parts should be linked to represent the intended meaning. Thus, we propose a novel task that predicts the link between truncated texts. We conduct three tasks to detect the onomatopoeia region and capture its intended meaning: text detection, text recognition, and link prediction. Through extensive experiments, we analyze the characteristics of the COO. Our data and code are available at \url{https://github.com/ku21fan/COO-Comic-Onomatopoeia}. △ Less

Submitted 11 July, 2022; originally announced July 2022.

Comments: Accepted at ECCV 2022. 25 pages, 16 figures

arXiv:2203.02505 [pdf, other]

ARM 4-BIT PQ: SIMD-based Acceleration for Approximate Nearest Neighbor Search on ARM

Authors: Yusuke Matsui, Yoshiki Imaizumi, Naoya Miyamoto, Naoki Yoshifuji

Abstract: We accelerate the 4-bit product quantization (PQ) on the ARM architecture. Notably, the drastic performance of the conventional 4-bit PQ strongly relies on x64-specific SIMD register, such as AVX2; hence, we cannot yet achieve such good performance on ARM. To fill this gap, we first bundle two 128-bit registers as one 256-bit component. We then apply shuffle operations for each using the ARM-speci… ▽ More We accelerate the 4-bit product quantization (PQ) on the ARM architecture. Notably, the drastic performance of the conventional 4-bit PQ strongly relies on x64-specific SIMD register, such as AVX2; hence, we cannot yet achieve such good performance on ARM. To fill this gap, we first bundle two 128-bit registers as one 256-bit component. We then apply shuffle operations for each using the ARM-specific NEON instruction. By making this simple but critical modification, we achieve a dramatic speedup for the 4-bit PQ on an ARM architecture. Experiments show that the proposed method consistently achieves a 10x improvement over the naive PQ with the same accuracy. △ Less

Submitted 3 March, 2022; originally announced March 2022.

Comments: ICASSP 2022

arXiv:2110.12204 [pdf, other]

Cascading Feature Extraction for Fast Point Cloud Registration

Authors: Yoichiro Hisadome, Yusuke Matsui

Abstract: We propose a method for speeding up a 3D point cloud registration through a cascading feature extraction. The current approach with the highest accuracy is realized by iteratively executing feature extraction and registration using deep features. However, iterative feature extraction takes time. Our proposed method significantly reduces the computational cost using cascading shallow layers. Our id… ▽ More We propose a method for speeding up a 3D point cloud registration through a cascading feature extraction. The current approach with the highest accuracy is realized by iteratively executing feature extraction and registration using deep features. However, iterative feature extraction takes time. Our proposed method significantly reduces the computational cost using cascading shallow layers. Our idea is to omit redundant computations that do not always contribute to the final accuracy. The proposed approach is approximately three times faster than the existing methods without a loss of accuracy. △ Less

Submitted 23 October, 2021; originally announced October 2021.

Comments: BMVC 2021

arXiv:2103.04400 [pdf, other]

What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels

Authors: Jeonghun Baek, Yusuke Matsui, Kiyoharu Aizawa

Abstract: Scene text recognition (STR) task has a common practice: All state-of-the-art STR models are trained on large synthetic data. In contrast to this practice, training STR models only on fewer real labels (STR with fewer labels) is important when we have to train STR models without synthetic data: for handwritten or artistic texts that are difficult to generate synthetically and for languages other t… ▽ More Scene text recognition (STR) task has a common practice: All state-of-the-art STR models are trained on large synthetic data. In contrast to this practice, training STR models only on fewer real labels (STR with fewer labels) is important when we have to train STR models without synthetic data: for handwritten or artistic texts that are difficult to generate synthetically and for languages other than English for which we do not always have synthetic data. However, there has been implicit common knowledge that training STR models on real data is nearly impossible because real data is insufficient. We consider that this common knowledge has obstructed the study of STR with fewer labels. In this work, we would like to reactivate STR with fewer labels by disproving the common knowledge. We consolidate recently accumulated public real data and show that we can train STR models satisfactorily only with real labeled data. Subsequently, we find simple data augmentation to fully exploit real data. Furthermore, we improve the models by collecting unlabeled data and introducing semi- and self-supervised methods. As a result, we obtain a competitive model to state-of-the-art methods. To the best of our knowledge, this is the first study that 1) shows sufficient performance by only using real labels and 2) introduces semi- and self-supervised methods into STR with fewer labels. Our code and data are available: https://github.com/ku21fan/STR-Fewer-Labels △ Less

Submitted 5 June, 2021; v1 submitted 7 March, 2021; originally announced March 2021.

Comments: CVPR 2021

arXiv:2012.14271 [pdf, other]

Towards Fully Automated Manga Translation

Authors: Ryota Hinami, Shonosuke Ishiwatari, Kazuhiko Yasuda, Yusuke Matsui

Abstract: We tackle the problem of machine translation of manga, Japanese comics. Manga translation involves two important problems in machine translation: context-aware and multimodal translation. Since text and images are mixed up in an unstructured fashion in Manga, obtaining context from the image is essential for manga translation. However, it is still an open problem how to extract context from image… ▽ More We tackle the problem of machine translation of manga, Japanese comics. Manga translation involves two important problems in machine translation: context-aware and multimodal translation. Since text and images are mixed up in an unstructured fashion in Manga, obtaining context from the image is essential for manga translation. However, it is still an open problem how to extract context from image and integrate into MT models. In addition, corpus and benchmarks to train and evaluate such model is currently unavailable. In this paper, we make the following four contributions that establishes the foundation of manga translation research. First, we propose multimodal context-aware translation framework. We are the first to incorporate context information obtained from manga image. It enables us to translate texts in speech bubbles that cannot be translated without using context information (e.g., texts in other speech bubbles, gender of speakers, etc.). Second, for training the model, we propose the approach to automatic corpus construction from pairs of original manga and their translations, by which large parallel corpus can be constructed without any manual labeling. Third, we created a new benchmark to evaluate manga translation. Finally, on top of our proposed methods, we devised a first comprehensive system for fully automated manga translation. △ Less

Submitted 9 January, 2021; v1 submitted 28 December, 2020; originally announced December 2020.

Comments: Accepted to AAAI 2021

arXiv:2011.09653 [pdf, ps, other]

doi 10.1016/j.nuclphysa.2021.122139

Mixing Angle of $\varXi_Q-\varXi_Q'$ in Heavy Quark Effective Therory

Authors: Yoshimitsu Matsui

Abstract: The Heavy Quark Effective Theory provides a systematic method to estimate a mixing angle of hadron states in a heavy quark, such as the charm quark (c) and the bottom quark (b). By using this method, the mixing angle of the baryons $\varXi_Q-\varXi_Q'$ can be estimated. It is found that the mixing angle between $\varXi_Q$ and $\varXi_Q'$ is given by $θ_b = 4.51^{\circ} \pm 0.79 ^{\circ}$ for… ▽ More The Heavy Quark Effective Theory provides a systematic method to estimate a mixing angle of hadron states in a heavy quark, such as the charm quark (c) and the bottom quark (b). By using this method, the mixing angle of the baryons $\varXi_Q-\varXi_Q'$ can be estimated. It is found that the mixing angle between $\varXi_Q$ and $\varXi_Q'$ is given by $θ_b = 4.51^{\circ} \pm 0.79 ^{\circ}$ for $Q = b$ case and $θ_c = 8.12^{\circ} \pm 0.80 ^{\circ}$ for $Q = c$ case. △ Less

Submitted 18 November, 2020; originally announced November 2020.

Comments: 7 pages, no figures

arXiv:2005.04425 [pdf, other]

doi 10.1109/MMUL.2020.2987895

Building a Manga Dataset "Manga109" with Annotations for Multimedia Applications

Authors: Kiyoharu Aizawa, Azuma Fujimoto, Atsushi Otsubo, Toru Ogawa, Yusuke Matsui, Koki Tsubota, Hikaru Ikuta

Abstract: Manga, or comics, which are a type of multimodal artwork, have been left behind in the recent trend of deep learning applications because of the lack of a proper dataset. Hence, we built Manga109, a dataset consisting of a variety of 109 Japanese comic books (94 authors and 21,142 pages) and made it publicly available by obtaining author permissions for academic use. We carefully annotated the fra… ▽ More Manga, or comics, which are a type of multimodal artwork, have been left behind in the recent trend of deep learning applications because of the lack of a proper dataset. Hence, we built Manga109, a dataset consisting of a variety of 109 Japanese comic books (94 authors and 21,142 pages) and made it publicly available by obtaining author permissions for academic use. We carefully annotated the frames, speech texts, character faces, and character bodies; the total number of annotations exceeds 500k. This dataset provides numerous manga images and annotations, which will be beneficial for use in machine learning algorithms and their evaluation. In addition to academic use, we obtained further permission for a subset of the dataset for industrial use. In this article, we describe the details of the dataset and present a few examples of multimedia processing applications (detection, retrieval, and generation) that apply existing deep learning methods and are made possible by the dataset. △ Less

Submitted 12 May, 2020; v1 submitted 9 May, 2020; originally announced May 2020.

Comments: 10 pages, 8 figures

ACM Class: I.4

Journal ref: IEEE MultiMedia 2020

arXiv:2001.01241 [pdf, ps, other]

doi 10.1088/1475-7516/2020/11/039

Gravitational wave spectrum from kinks on infinite cosmic superstrings with Y-junctions

Authors: Yuka Matsui, Koichiro Horiguchi, Daisuke Nitta, Sachiko Kuroyanagi

Abstract: We calculate the gravitational wave (GW) background spectra from kink propagation and kink-kink collisions on infinite cosmic superstrings. We take into account two characteristics of the cosmic superstring network: a small reconnection probability and Y-junctions. First, a small reconnection probability increases the number of infinite strings inside the horizon and enhances the kink production,… ▽ More We calculate the gravitational wave (GW) background spectra from kink propagation and kink-kink collisions on infinite cosmic superstrings. We take into account two characteristics of the cosmic superstring network: a small reconnection probability and Y-junctions. First, a small reconnection probability increases the number of infinite strings inside the horizon and enhances the kink production, which leads a larger amplitude of the GW background. Second, a kink going through a Y-junction transforms into three daughter kinks. In this way, the existence of Y-junctions also increases the number of kinks on cosmic superstrings. However, at the same time, it smooths out the sharpness of kinks rapidly and reduces the number of sharp kinks, which are responsible for the emissions of strong GW bursts. We compute the number distribution of kinks as a function of the sharpness by taking into account the above two effects, and translate it to the amplitude of the GW background spectra. We first investigate the case of the string network with equal string tensions, and find that the effect of Y-junctions to smooth out kink sharpness dominates that of the enhancement of the kink number by a small reconnection probability, and the GW amplitude turns out to be smaller than the ordinary cosmic string case. On the other hand, for non-equal string tensions, we find that there is a parameter space where the GW amplitude is slightly enhanced by the effect of a small reconnection probability. △ Less

Submitted 5 January, 2020; originally announced January 2020.

Comments: 26 pages, 36 figures

arXiv:1902.09120 [pdf, ps, other]

doi 10.1103/PhysRevD.100.123515

Gravitational wave background from kink-kink collisions on infinite cosmic strings

Authors: Yuka Matsui, Sachiko Kuroyanagi

Abstract: We calculate the power spectrum of the stochastic gravitational wave (GW) background expected from kink-kink collisions on infinite cosmic strings. Intersections in the cosmic string network continuously generate kinks, which emit GW bursts by their propagation on curved strings as well as by their collisions. First, we show that the GW background from kink-kink collisions is much larger than the… ▽ More We calculate the power spectrum of the stochastic gravitational wave (GW) background expected from kink-kink collisions on infinite cosmic strings. Intersections in the cosmic string network continuously generate kinks, which emit GW bursts by their propagation on curved strings as well as by their collisions. First, we show that the GW background from kink-kink collisions is much larger than the one from propagating kinks at high frequencies because of the higher event rate. We then propose a method to take into account the energy loss of the string network by GW emission as well as the decrease of kink number due to the GW backreaction. We find that, even though these effects reduce the amplitude of the GW background, we can obtain a constraint on the string tension $Gμ\lesssim 2 \times 10^{-7}$ using the current upper bound on the GW background by Advanced-LIGO, which is as competitive as the constraint from cusps on string loops. △ Less

Submitted 26 December, 2019; v1 submitted 25 February, 2019; originally announced February 2019.

Comments: 7 pages, 7 figures

Journal ref: Phys. Rev. D 100, 123515 (2019)

arXiv:1811.10907 [pdf, other]

Efficient Image Retrieval via Decoupling Diffusion into Online and Offline Processing

Authors: Fan Yang, Ryota Hinami, Yusuke Matsui, Steven Ly, Shin'ichi Satoh

Abstract: Diffusion is commonly used as a ranking or re-ranking method in retrieval tasks to achieve higher retrieval performance, and has attracted lots of attention in recent years. A downside to diffusion is that it performs slowly in comparison to the naive k-NN search, which causes a non-trivial online computational cost on large datasets. To overcome this weakness, we propose a novel diffusion techniq… ▽ More Diffusion is commonly used as a ranking or re-ranking method in retrieval tasks to achieve higher retrieval performance, and has attracted lots of attention in recent years. A downside to diffusion is that it performs slowly in comparison to the naive k-NN search, which causes a non-trivial online computational cost on large datasets. To overcome this weakness, we propose a novel diffusion technique in this paper. In our work, instead of applying diffusion to the query, we pre-compute the diffusion results of each element in the database, making the online search a simple linear combination on top of the k-NN search process. Our proposed method becomes 10~ times faster in terms of online search speed. Moreover, we propose to use late truncation instead of early truncation in previous works to achieve better retrieval performance. △ Less

Submitted 4 January, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

Comments: Accepted by AAAI 2019

arXiv:1808.03969 [pdf, other]

Reconfigurable Inverted Index

Authors: Yusuke Matsui, Ryota Hinami, Shin'ichi Satoh

Abstract: Existing approximate nearest neighbor search systems suffer from two fundamental problems that are of practical importance but have not received sufficient attention from the research community. First, although existing systems perform well for the whole database, it is difficult to run a search over a subset of the database. Second, there has been no discussion concerning the performance decremen… ▽ More Existing approximate nearest neighbor search systems suffer from two fundamental problems that are of practical importance but have not received sufficient attention from the research community. First, although existing systems perform well for the whole database, it is difficult to run a search over a subset of the database. Second, there has been no discussion concerning the performance decrement after many items have been newly added to a system. We develop a reconfigurable inverted index (Rii) to resolve these two issues. Based on the standard IVFADC system, we design a data layout such that items are stored linearly. This enables us to efficiently run a subset search by switching the search method to a linear PQ scan if the size of a subset is small. Owing to the linear layout, the data structure can be dynamically adjusted after new items are added, maintaining the fast speed of the system. Extensive comparisons show that Rii achieves a comparable performance with state-of-the art systems such as Faiss. △ Less

Submitted 12 August, 2018; originally announced August 2018.

Comments: ACMMM 2018 (oral). Code: https://github.com/matsui528/rii

arXiv:1804.02555 [pdf, ps, other]

Drive Video Analysis for the Detection of Traffic Near-Miss Incidents

Authors: Hirokatsu Kataoka, Teppei Suzuki, Shoko Oikawa, Yasuhiro Matsui, Yutaka Satoh

Abstract: Because of their recent introduction, self-driving cars and advanced driver assistance system (ADAS) equipped vehicles have had little opportunity to learn, the dangerous traffic (including near-miss incident) scenarios that provide normal drivers with strong motivation to drive safely. Accordingly, as a means of providing learning depth, this paper presents a novel traffic database that contains… ▽ More Because of their recent introduction, self-driving cars and advanced driver assistance system (ADAS) equipped vehicles have had little opportunity to learn, the dangerous traffic (including near-miss incident) scenarios that provide normal drivers with strong motivation to drive safely. Accordingly, as a means of providing learning depth, this paper presents a novel traffic database that contains information on a large number of traffic near-miss incidents that were obtained by mounting driving recorders in more than 100 taxis over the course of a decade. The study makes the following two main contributions: (i) In order to assist automated systems in detecting near-miss incidents based on database instances, we created a large-scale traffic near-miss incident database (NIDB) that consists of video clip of dangerous events captured by monocular driving recorders. (ii) To illustrate the applicability of NIDB traffic near-miss incidents, we provide two primary database-related improvements: parameter fine-tuning using various near-miss scenes from NIDB, and foreground/background separation into motion representation. Then, using our new database in conjunction with a monocular driving recorder, we developed a near-miss recognition method that provides automated systems with a performance level that is comparable to a human-level understanding of near-miss incidents (64.5% vs. 68.4% at near-miss recognition, 61.3% vs. 78.7% at near-miss detection). △ Less

Submitted 7 April, 2018; originally announced April 2018.

Comments: Accepted to ICRA 2018

arXiv:1803.08670 [pdf, ps, other]

Object Detection for Comics using Manga109 Annotations

Authors: Toru Ogawa, Atsushi Otsubo, Rei Narita, Yusuke Matsui, Toshihiko Yamasaki, Kiyoharu Aizawa

Abstract: With the growth of digitized comics, image understanding techniques are becoming important. In this paper, we focus on object detection, which is a fundamental task of image understanding. Although convolutional neural networks (CNN)-based methods archived good performance in object detection for naturalistic images, there are two problems in applying these methods to the comic object detection ta… ▽ More With the growth of digitized comics, image understanding techniques are becoming important. In this paper, we focus on object detection, which is a fundamental task of image understanding. Although convolutional neural networks (CNN)-based methods archived good performance in object detection for naturalistic images, there are two problems in applying these methods to the comic object detection task. First, there is no large-scale annotated comics dataset. The CNN-based methods require large-scale annotations for training. Secondly, the objects in comics are highly overlapped compared to naturalistic images. This overlap causes the assignment problem in the existing CNN-based methods. To solve these problems, we proposed a new annotation dataset and a new CNN model. We annotated an existing image dataset of comics and created the largest annotation dataset, named Manga109-annotations. For the assignment problem, we proposed a new CNN-based detector, SSD300-fork. We compared SSD300-fork with other detection methods using Manga109-annotations and confirmed that our model outperformed them based on the mAP score. △ Less

Submitted 26 March, 2018; v1 submitted 23 March, 2018; originally announced March 2018.

Comments: http://www.manga109.org/en/

arXiv:1803.08244 [pdf, other]

Unsupervised Adversarial Learning of 3D Human Pose from 2D Joint Locations

Authors: Yasunori Kudo, Keisuke Ogaki, Yusuke Matsui, Yuri Odagiri

Abstract: The task of three-dimensional (3D) human pose estimation from a single image can be divided into two parts: (1) Two-dimensional (2D) human joint detection from the image and (2) estimating a 3D pose from the 2D joints. Herein, we focus on the second part, i.e., a 3D pose estimation from 2D joint locations. The problem with existing methods is that they require either (1) a 3D pose dataset or (2) 2… ▽ More The task of three-dimensional (3D) human pose estimation from a single image can be divided into two parts: (1) Two-dimensional (2D) human joint detection from the image and (2) estimating a 3D pose from the 2D joints. Herein, we focus on the second part, i.e., a 3D pose estimation from 2D joint locations. The problem with existing methods is that they require either (1) a 3D pose dataset or (2) 2D joint locations in consecutive frames taken from a video sequence. We aim to solve these problems. For the first time, we propose a method that learns a 3D human pose without any 3D datasets. Our method can predict a 3D pose from 2D joint locations in a single image. Our system is based on the generative adversarial networks, and the networks are trained in an unsupervised manner. Our primary idea is that, if the network can predict a 3D human pose correctly, the 3D pose that is projected onto a 2D plane should not collapse even if it is rotated perpendicularly. We evaluated the performance of our method using Human3.6M and the MPII dataset and showed that our network can predict a 3D pose well even if the 3D dataset is not available during training. △ Less

Submitted 22 March, 2018; originally announced March 2018.

arXiv:1803.04158 [pdf]

doi 10.1038/s41467-018-06312-z

Probing ultrafast spin-relaxation and precession dynamics in a cuprate Mott insulator with 7-fs optical pulses

Authors: T. Miyamoto, Y. Matsui, T. Terashige, T. Morimoto, N. Sono, H. Yada, S. Ishihara, Y. Watanabe, S. Adachi, T. Ito, K. Oka, A. Sawa, H. Okamoto

Abstract: A charge excitation in a two-dimensional Mott insulator is strongly coupled with the surrounding spins, which is observed as magnetic-polaron formations of doped carriers and a magnon sideband in the Mott-gap transition spectrum. However, the dynamics related to the spin sector are difficult to measure. Here, we show that pump-probe reflection spectroscopy with 7-fs laser pulses can detect the opt… ▽ More A charge excitation in a two-dimensional Mott insulator is strongly coupled with the surrounding spins, which is observed as magnetic-polaron formations of doped carriers and a magnon sideband in the Mott-gap transition spectrum. However, the dynamics related to the spin sector are difficult to measure. Here, we show that pump-probe reflection spectroscopy with 7-fs laser pulses can detect the optically induced spin dynamics in Nd$_2$CuO$_4$, a cuprate Mott insulator. The bleaching signal at the Mott-gap transition is enhanced at $\sim$18 fs, which corresponds to the spin-relaxation time in magnetic-polaron formations and is characterized by the exchange interaction. More importantly, ultrafast coherent oscillations appear in the time evolutions of the reflectivity changes, and their frequencies (1400-2700 cm$^{-1}$) are equal to the probe energy measured from the Mott-gap transition peak. These oscillations originate from interferences between charge excitations with two magnons and provide direct evidence for charge-spin coupling. △ Less

Submitted 12 March, 2018; originally announced March 2018.

Comments: 20 pages including 4 figures (Supplementary materials: 11 pages including 4 figures)

Journal ref: Nat. Commun. 9 (2018) 3948

arXiv:1709.09106 [pdf, other]

Region-Based Image Retrieval Revisited

Authors: Ryota Hinami, Yusuke Matsui, Shin'ichi Satoh

Abstract: Region-based image retrieval (RBIR) technique is revisited. In early attempts at RBIR in the late 90s, researchers found many ways to specify region-based queries and spatial relationships; however, the way to characterize the regions, such as by using color histograms, were very poor at that time. Here, we revisit RBIR by incorporating semantic specification of objects and intuitive specification… ▽ More Region-based image retrieval (RBIR) technique is revisited. In early attempts at RBIR in the late 90s, researchers found many ways to specify region-based queries and spatial relationships; however, the way to characterize the regions, such as by using color histograms, were very poor at that time. Here, we revisit RBIR by incorporating semantic specification of objects and intuitive specification of spatial relationships. Our contributions are the following. First, to support multiple aspects of semantic object specification (category, instance, and attribute), we propose a multitask CNN feature that allows us to use deep learning technique and to jointly handle multi-aspect object specification. Second, to help users specify spatial relationships among objects in an intuitive way, we propose recommendation techniques of spatial relationships. In particular, by mining the search results, a system can recommend feasible spatial relationships among the objects. The system also can recommend likely spatial relationships by assigned object category names based on language prior. Moreover, object-level inverted indexing supports very fast shortlist generation, and re-ranking based on spatial constraints provides users with instant RBIR experiences. △ Less

Submitted 26 September, 2017; originally announced September 2017.

Comments: To appear in ACM Multimedia 2017 (Oral)

arXiv:1709.03708 [pdf, other]

PQk-means: Billion-scale Clustering for Product-quantized Codes

Authors: Yusuke Matsui, Keisuke Ogaki, Toshihiko Yamasaki, Kiyoharu Aizawa

Abstract: Data clustering is a fundamental operation in data analysis. For handling large-scale data, the standard k-means clustering method is not only slow, but also memory-inefficient. We propose an efficient clustering method for billion-scale feature vectors, called PQk-means. By first compressing input vectors into short product-quantized (PQ) codes, PQk-means achieves fast and memory-efficient cluste… ▽ More Data clustering is a fundamental operation in data analysis. For handling large-scale data, the standard k-means clustering method is not only slow, but also memory-inefficient. We propose an efficient clustering method for billion-scale feature vectors, called PQk-means. By first compressing input vectors into short product-quantized (PQ) codes, PQk-means achieves fast and memory-efficient clustering, even for high-dimensional vectors. Similar to k-means, PQk-means repeats the assignment and update steps, both of which can be performed in the PQ-code domain. Experimental results show that even short-length (32 bit) PQ-codes can produce competitive results compared with k-means. This result is of practical importance for clustering in memory-restricted environments. Using the proposed PQk-means scheme, the clustering of one billion 128D SIFT features with K = 10^5 is achieved within 14 hours, using just 32 GB of memory consumption on a single computer. △ Less

Submitted 12 September, 2017; originally announced September 2017.

Comments: To appear in ACMMM 2017

arXiv:1704.06556 [pdf, other]

PQTable: Non-exhaustive Fast Search for Product-quantized Codes using Hash Tables

Authors: Yusuke Matsui, Toshihiko Yamasaki, Kiyoharu Aizawa

Abstract: In this paper, we propose a product quantization table (PQTable); a fast search method for product-quantized codes via hash-tables. An identifier of each database vector is associated with the slot of a hash table by using its PQ-code as a key. For querying, an input vector is PQ-encoded and hashed, and the items associated with that code are then retrieved. The proposed PQTable produces the same… ▽ More In this paper, we propose a product quantization table (PQTable); a fast search method for product-quantized codes via hash-tables. An identifier of each database vector is associated with the slot of a hash table by using its PQ-code as a key. For querying, an input vector is PQ-encoded and hashed, and the items associated with that code are then retrieved. The proposed PQTable produces the same results as a linear PQ scan, and is 10^2 to 10^5 times faster. Although state-of-the-art performance can be achieved by previous inverted-indexing-based approaches, such methods require manually-designed parameter setting and significant training; our PQTable is free of these limitations, and therefore offers a practical and effective solution for real-world problems. Specifically, when the vectors are highly compressed, our PQTable achieves one of the fastest search performances on a single CPU to date with significantly efficient memory usage (0.059 ms per query over 10^9 data points with just 5.5 GB memory consumption). Finally, we show that our proposed PQTable can naturally handle the codes of an optimized product quantization (OPQTable). △ Less

Submitted 21 April, 2017; originally announced April 2017.

arXiv:1607.02108 [pdf, other]

doi 10.1007/s11214-016-0260-5

Solar Coronal Jets: Observations, Theory, and Modeling

Authors: N. E. Raouafi, S. Patsourakos, E. Pariat, P. R. Young, A. C. Sterling, A. Savcheva, M. Shimojo, F. Moreno-Insertis, C. R. DeVore, V. Archontis, T. Török, H. Mason, W. Curdt, K. Meyer, K. Dalmasse, Y. Matsui

Abstract: Coronal jets represent important manifestations of ubiquitous solar transients, which may be the source of significant mass and energy input to the upper solar atmosphere and the solar wind. While the energy involved in a jet-like event is smaller than that of "nominal" solar flares and coronal mass ejections (CMEs), jets share many common properties with these phenomena, in particular, the explos… ▽ More Coronal jets represent important manifestations of ubiquitous solar transients, which may be the source of significant mass and energy input to the upper solar atmosphere and the solar wind. While the energy involved in a jet-like event is smaller than that of "nominal" solar flares and coronal mass ejections (CMEs), jets share many common properties with these phenomena, in particular, the explosive magnetically driven dynamics. Studies of jets could, therefore, provide critical insight for understanding the larger, more complex drivers of the solar activity. On the other side of the size-spectrum, the study of jets could also supply important clues on the physics of transients close or at the limit of the current spatial resolution such as spicules. Furthermore, jet phenomena may hint to basic process for heating the corona and accelerating the solar wind; consequently their study gives us the opportunity to attack a broad range of solar-heliospheric problems. △ Less

Submitted 7 July, 2016; originally announced July 2016.

Comments: 53 pages, 24 figures

Journal ref: Space Science Reviews, 04 July 2016

arXiv:1605.08768 [pdf, ps, other]

doi 10.1088/1475-7516/2016/11/005

Improved calculation of the gravitational wave spectrum from kinks on infinite cosmic strings

Authors: Yuka Matsui, Koichiro Horiguchi, Daisuke Nitta, Sachiko Kuroyanagi

Abstract: Gravitational wave observations provide unique opportunities to search for cosmic strings. One of the strongest sources of gravitational waves is discontinuities of cosmic strings, called kinks, which are generated at points of intersection. Kinks on infinite strings are known to generate a gravitational wave background over a wide range of frequencies. In this paper, we calculate the spectrum of… ▽ More Gravitational wave observations provide unique opportunities to search for cosmic strings. One of the strongest sources of gravitational waves is discontinuities of cosmic strings, called kinks, which are generated at points of intersection. Kinks on infinite strings are known to generate a gravitational wave background over a wide range of frequencies. In this paper, we calculate the spectrum of the gravitational wave background by numerically solving the evolution equation for the distribution function of the kink sharpness. We find that the number of kinks for small sharpness is larger than the analytical estimate used in a previous work, which makes a difference in the spectral shape. Our numerical approach also helps to avoid the use of analytic approximations, and enables us to make a more precise prediction on the spectral amplitude for future gravitational wave experiments. △ Less

Submitted 8 November, 2016; v1 submitted 27 May, 2016; originally announced May 2016.

Comments: 15pages, 7 figures, 2 tables, accepted for Journal of Cosmology and Astroparticle Physics

arXiv:1510.04389 [pdf, other]

doi 10.1007/s11042-016-4020-z

Sketch-based Manga Retrieval using Manga109 Dataset

Authors: Yusuke Matsui, Kota Ito, Yuji Aramaki, Toshihiko Yamasaki, Kiyoharu Aizawa

Abstract: Manga (Japanese comics) are popular worldwide. However, current e-manga archives offer very limited search support, including keyword-based search by title or author, or tag-based categorization. To make the manga search experience more intuitive, efficient, and enjoyable, we propose a content-based manga retrieval system. First, we propose a manga-specific image-describing framework. It consists… ▽ More Manga (Japanese comics) are popular worldwide. However, current e-manga archives offer very limited search support, including keyword-based search by title or author, or tag-based categorization. To make the manga search experience more intuitive, efficient, and enjoyable, we propose a content-based manga retrieval system. First, we propose a manga-specific image-describing framework. It consists of efficient margin labeling, edge orientation histogram feature description, and approximate nearest-neighbor search using product quantization. Second, we propose a sketch-based interface as a natural way to interact with manga content. The interface provides sketch-based querying, relevance feedback, and query retouch. For evaluation, we built a novel dataset of manga images, Manga109, which consists of 109 comic books of 21,142 pages drawn by professional manga artists. To the best of our knowledge, Manga109 is currently the biggest dataset of manga images available for research. We conducted a comparative study, a localization evaluation, and a large-scale qualitative study. From the experiments, we verified that: (1) the retrieval accuracy of the proposed method is higher than those of previous methods; (2) the proposed method can localize an object instance with reasonable runtime and accuracy; and (3) sketch querying is useful for manga search. △ Less

Submitted 14 October, 2015; originally announced October 2015.

Comments: 13 pages

Journal ref: Multimedia Tools and Applications, Volume 76, Issue 20, 2017

arXiv:1504.04185 [pdf, ps, other]

Hyperbolic localization and Lefschetz fixed point formulas for higher-dimensional fixed point sets

Authors: Yuichi Ike, Yutaka Matsui, Kiyoshi Takeuchi

Abstract: We study Lefschetz fixed point formulas for constructible sheaves with higher-dimensional fixed point sets. Under fairly weak assumptions, we prove that the local contributions from them are expressed by some constructible functions associated to hyperbolic localizations. This gives an affirmative answer to a conjecture of Goresky-MacPherson in particular for smooth fixed point components. In the… ▽ More We study Lefschetz fixed point formulas for constructible sheaves with higher-dimensional fixed point sets. Under fairly weak assumptions, we prove that the local contributions from them are expressed by some constructible functions associated to hyperbolic localizations. This gives an affirmative answer to a conjecture of Goresky-MacPherson in particular for smooth fixed point components. In the course of the proof, the new Lagrangian cycles introduced in our previous paper will be effectively used. Moreover we show various examples for which local contributions can be explicitly determined by our method. △ Less

Submitted 24 May, 2015; v1 submitted 16 April, 2015; originally announced April 2015.

Comments: 38 pages, revised. arXiv admin note: substantial text overlap with arXiv:0812.4480

MSC Class: 14C17; 14C40; 32C38; 35A27; 37C25; 55N33

arXiv:1504.01484 [pdf, ps, other]

Equivariance on Discrete Space and Yang-Mills-Higgs Model

Authors: Hitoshi Ikemori, Shinsaku Kitakado, Yoshimitsu Matsui, Hideharu Otsu, Toshiro Sato

Abstract: We introduce the basic equivariant quantity $Q$ in the gauge theory on the noncommutative descrete $Z_{2}$ space, which plays an important role for the equivariant dimensional reduction. If the gauge configuration of the ground state on the extra dimensional space is described by the equivariant $Q$, then the extra dimensional space is invisible. Especially, using the equivariance principle, we sh… ▽ More We introduce the basic equivariant quantity $Q$ in the gauge theory on the noncommutative descrete $Z_{2}$ space, which plays an important role for the equivariant dimensional reduction. If the gauge configuration of the ground state on the extra dimensional space is described by the equivariant $Q$, then the extra dimensional space is invisible. Especially, using the equivariance principle, we show that the Yang-Mills theory on $R^{2}\times Z_{2}$ space is equivalent to the Yang-Mills-Higgs model on $R^{2}$ space. It can be said that this model is the simplest model of this type. △ Less

Submitted 2 February, 2016; v1 submitted 7 April, 2015; originally announced April 2015.

Comments: Some discussion added

arXiv:1501.00958 [pdf]

doi 10.1088/0953-2048/28/1/015004

Investigation of all niobium Nano-SQUIDs based on sub-micrometer cross-type Josephson junctions

Authors: M. Schmelz, Y. Matsui, R. Stolz, V. Zakosarenko, T. Schönau, S. Anders, S. Linzen, H. Itozaki, H. -G. Meyer

Abstract: We report on the development of highly sensitive SQUIDs featuring sub-micrometer loop dimensions. The integration of high quality and low capacitance SIS Nb/AlOx/Nb cross-type Josephson tunnel junctions results in white flux noise levels as low as 66 n$Φ_0$/Hz$^{ 1/2} $, well below state-of-the-art values of their Nb-based counterparts based on constriction type junctions. Estimation of the spin s… ▽ More We report on the development of highly sensitive SQUIDs featuring sub-micrometer loop dimensions. The integration of high quality and low capacitance SIS Nb/AlOx/Nb cross-type Josephson tunnel junctions results in white flux noise levels as low as 66 n$Φ_0$/Hz$^{ 1/2} $, well below state-of-the-art values of their Nb-based counterparts based on constriction type junctions. Estimation of the spin sensitivity of the best SQUIDs yield $S_μ^{ 1/2} < 7 μ_B$/Hz$^{ 1/2} $ in the white noise region, suitable for the investigation of small spin systems. We discuss fabrication challenges, show results on the electrical characterization of devices with various pickup loops, and describe options for further improvement, which may push the sensitivity of such devices even to single spin resolution. △ Less

Submitted 5 January, 2015; originally announced January 2015.

Comments: 8 pages, 3 figures

Journal ref: Supercond. Sci. Technol. 28, 015004 (2015)

arXiv:1406.5339 [pdf, ps, other]

doi 10.7566/JPSJ.83.094703

Coexistence of Antiferromagnetism and Superconductivity in Iron-Based Superconductors

Authors: Yasunori Matsui, Takao Morinari, Takami Tohyama

Abstract: We theoretically investigate the coexistence of antiferromagnetism and superconductivity in the iron-based superconductors by using the mean-field theory for two- and three-orbital models. We find that both the s_{+-}-wave and s_{++}-wave superconductivity can coexist with antiferromagnetism in the two models. On Dirac Fermi surfaces emerging in the antiferromagnetic phase, a superconducting-gap f… ▽ More We theoretically investigate the coexistence of antiferromagnetism and superconductivity in the iron-based superconductors by using the mean-field theory for two- and three-orbital models. We find that both the s_{+-}-wave and s_{++}-wave superconductivity can coexist with antiferromagnetism in the two models. On Dirac Fermi surfaces emerging in the antiferromagnetic phase, a superconducting-gap function has a node for s_{++} wave but is nodeless for s_{+-} wave. On the other hand, the gap function on non-Dirac Fermi surfaces is either nodeless or accidentally nodal, depending on the parameters of pairing interaction, which is independent of pairing symmetry. △ Less

Submitted 24 July, 2014; v1 submitted 20 June, 2014; originally announced June 2014.

Comments: 5 pages, 5 figures, to appear in J. Phys. Soc. Jpn

Journal ref: J. Phys. Soc. Jpn., 83, 094703 (2014)

arXiv:1302.2319 [pdf, ps, other]

doi 10.1038/nnano.2013.174

Skyrmions with varying size and helicity in composition-spread helimagnetic alloys

Authors: K. Shibata, X. Z. Yu, T. Hara, D. Morikawa, N. Kanazawa, K. Kimoto, S. Ishiwata, Y. Matsui, Y. Tokura

Abstract: The chirality, i.e. left or right handedness, is an important notion in a broad range of science. In condensed matter, this occurs not only in molecular or crystal forms but also in magnetic structures. A magnetic skyrmion, a topologically-stable spin vortex structure, as observed in chiral-lattice helimagnets is one such example; the spin swirling direction (skyrmion helicity) should be closely r… ▽ More The chirality, i.e. left or right handedness, is an important notion in a broad range of science. In condensed matter, this occurs not only in molecular or crystal forms but also in magnetic structures. A magnetic skyrmion, a topologically-stable spin vortex structure, as observed in chiral-lattice helimagnets is one such example; the spin swirling direction (skyrmion helicity) should be closely related to the underlying lattice chirality via the relativistic spin-orbit coupling (SOC). Here, we report on the correlation between skyrmion helicity and crystal chirality as observed by Lorentz transmission electron microscopy (TEM) and convergent-beam electron diffraction (CBED) on the composition-spread alloys of helimagnets Mn1-xFexGe over a broad range (x = 0.3 - 1.0) of the composition. The skyrmion lattice constant or the skyrmion size shows non-monotonous variation with the composition x, accompanying a divergent behavior around x = 0.8, where the correlation between magnetic helicity and crystal chirality is reversed. The underlying mechanism is a continuous x-variation of the SOC strength accompanying sign reversal in the metallic alloys. This may offer a promising way to tune the skyrmion size and helicity. △ Less

Submitted 10 February, 2013; originally announced February 2013.

Comments: 15 pages, 4 figures, 1 table

Journal ref: Nature Nanotech. 8 (2013) 723-728

arXiv:1209.0867 [pdf, ps, other]

doi 10.1088/0004-637X/759/1/15

Multi-wavelength spectroscopic observation of EUV jet in AR 10960

Authors: Y. Matsui, T. Yokoyama, N. Kitagawa, S. Imada

Abstract: We have studied the relationship between the velocity and temperature of a solar EUV jet. The highly accelerated jet occurred in the active region NOAA 10960 on 2007 June 5. Multi-wavelength spectral observations with EIS/Hinode allow us to investigate Doppler velocities at the wide temperature range. We analyzed the three-dimensional angle of the jet from the stereoscopic analysis with STEREO. Us… ▽ More We have studied the relationship between the velocity and temperature of a solar EUV jet. The highly accelerated jet occurred in the active region NOAA 10960 on 2007 June 5. Multi-wavelength spectral observations with EIS/Hinode allow us to investigate Doppler velocities at the wide temperature range. We analyzed the three-dimensional angle of the jet from the stereoscopic analysis with STEREO. Using this angle and Doppler velocity, we derived the true velocity of the jet. As a result, we found that the cool jet observed with \ion{He}{2} 256 Å$\log_{10}T_e[\rm{K}] = 4.9$ is accelerated to around $220 \rm{km/s}$ which is over the upper limit of the chromospheric evaporation. The velocities observed with the other lines are under the upper limit of the chromospheric evaporation while most of the velocities of hot lines are higher than that of cool lines. We interpret that the chromospheric evaporation and magnetic acceleration occur simultaneously. A morphological interpretation of this event based on the reconnection model is given by utilizing the multi-instrumental observations. △ Less

Submitted 5 September, 2012; originally announced September 2012.

Comments: Accepted for publication in ApJ

arXiv:1202.5077 [pdf, ps, other]

On the sizes of the Jordan blocks of monodromies at infinity

Authors: Yutaka Matsui, Kiyoshi Takeuchi

Abstract: We obtain general upper bounds of the sizes and the numbers of Jordan blocks for the eigenvalues $λ\not= 1$ in the monodromies at infinity of polynomial maps. We obtain general upper bounds of the sizes and the numbers of Jordan blocks for the eigenvalues $λ\not= 1$ in the monodromies at infinity of polynomial maps. △ Less

Submitted 22 February, 2012; originally announced February 2012.

Comments: 9 pages. Section 6 of the previous paper arXiv:0912.5144v11 became an independent paper

MSC Class: 14F05; 32C38; 32S35; 32S40

arXiv:1202.5076 [pdf, ps, other]

Motivic Milnor fibers and Jordan normal forms of Milnor monodromies

Authors: Yutaka Matsui, Kiyoshi Takeuchi

Abstract: By calculating the equivariant mixed Hodge numbers of motivic Milnor fibers introduced by Denef-Loeser, we obtain explicit formulas for the Jordan normal forms of Milnor monodromies. The numbers of the Jordan blocks will be described by the Newton polyhedron of the polynomial. By calculating the equivariant mixed Hodge numbers of motivic Milnor fibers introduced by Denef-Loeser, we obtain explicit formulas for the Jordan normal forms of Milnor monodromies. The numbers of the Jordan blocks will be described by the Newton polyhedron of the polynomial. △ Less

Submitted 22 February, 2012; originally announced February 2012.

Comments: 17 pages. Section 7 of the previous paper arXiv:0912.5144v11 became an independent paper

MSC Class: 14E18; 14M25; 32C38; 32S35; 32S40

arXiv:1009.5496 [pdf, ps, other]

doi 10.1103/PhysRevB.82.184511

Electron correlation in FeSe superconductor studied by bulk-sensitive photoemission spectroscopy

Authors: A. Yamasaki, Y. Matsui, S. Imada, K. Takase, H. Azuma, T. Muro, Y. Kato, A. Higashiya, A. Sekiyama, S. Suga, M. Yabashi, K. Tamasaku, T. Ishikawa, K. Terashima, H. Kobori, A. Sugimura, N. Umeyama, H. Sato, Y. Hara, N. Miyakawa, S. I. Ikeda

Abstract: We have investigated the electronic structures of recently discovered superconductor FeSe by soft-x-ray and hard-x-ray photoemission spectroscopy with high bulk sensitivity. The large Fe 3d spectral weight is located in the vicinity of the Fermi level (EF), which is demonstrated to be a coherent quasi-particle peak. Compared with the results of the band structure calculation with local-density app… ▽ More We have investigated the electronic structures of recently discovered superconductor FeSe by soft-x-ray and hard-x-ray photoemission spectroscopy with high bulk sensitivity. The large Fe 3d spectral weight is located in the vicinity of the Fermi level (EF), which is demonstrated to be a coherent quasi-particle peak. Compared with the results of the band structure calculation with local-density approximation, Fe 3d band narrowing and the energy shift of the band toward EF are found, suggesting an importance of the electron correlation effect in FeSe. The self energy correction provides the larger mass enhancement value (Z^-1=3.6) than in Fe-As superconductors and enables us to separate a incoherent part from the spectrum. These features are quite consistent with the results of recent dynamical mean-field calculations, in which the incoherent part is attributed to the lower Hubbard band. △ Less

Submitted 28 September, 2010; originally announced September 2010.

Comments: 8 pages, 5 figures, 1 talble

Journal ref: Phys. Rev. B 82, 184511 (2010)

arXiv:0912.5144 [pdf, ps, other]

Monodromy at infinity of polynomial maps and Newton polyhedra (with Appendix by C. Sabbah)

Authors: Yutaka Matsui, Kiyoshi Takeuchi

Abstract: By introducing motivic Milnor fibers at infinity of polynomial maps, we propose some methods for the study of nilpotent parts of monodromies at infinity. The numbers of Jordan blocks in the monodromy at infinity will be described by the Newton polyhedron at infinity of the polynomial. By introducing motivic Milnor fibers at infinity of polynomial maps, we propose some methods for the study of nilpotent parts of monodromies at infinity. The numbers of Jordan blocks in the monodromy at infinity will be described by the Newton polyhedron at infinity of the polynomial. △ Less

Submitted 22 February, 2012; v1 submitted 28 December, 2009; originally announced December 2009.

Comments: 43 pages, to appear in IMRN. Sections 6 and 7 of the previous version arXiv:0912.5144v11 were suggested to submit to another journals by the referee

MSC Class: 14E18; 14M25; 32C38; 32S35; 32S40

Showing 1–50 of 75 results for author: Matsui, Y