Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (5)

Search Parameters:
Keywords = multimodal named entity recognition

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 7301 KiB  
Article
Visual Clue Guidance and Consistency Matching Framework for Multimodal Named Entity Recognition
by Li He, Qingxiang Wang, Jie Liu, Jianyong Duan and Hao Wang
Appl. Sci. 2024, 14(6), 2333; https://doi.org/10.3390/app14062333 - 10 Mar 2024
Viewed by 927
Abstract
The goal of multimodal named entity recognition (MNER) is to detect entity spans in given image–text pairs and classify them into corresponding entity types. Despite the success of existing works that leverage cross-modal attention mechanisms to integrate textual and visual representations, we observe [...] Read more.
The goal of multimodal named entity recognition (MNER) is to detect entity spans in given image–text pairs and classify them into corresponding entity types. Despite the success of existing works that leverage cross-modal attention mechanisms to integrate textual and visual representations, we observe three key issues. Firstly, models are prone to misguidance when fusing unrelated text and images. Secondly, most existing visual features are not enhanced or filtered. Finally, due to the independent encoding strategies employed for text and images, a noticeable semantic gap exists between them. To address these challenges, we propose a framework called visual clue guidance and consistency matching (GMF). To tackle the first issue, we introduce a visual clue guidance (VCG) module designed to hierarchically extract visual information from multiple scales. This information is utilized as an injectable visual clue guidance sequence to steer text representations for error-insensitive prediction decisions. Furthermore, by incorporating a cross-scale attention (CSA) module, we successfully mitigate interference across scales, enhancing the image’s capability to capture details. To address the third issue of semantic disparity between text and images, we employ a consistency matching (CM) module based on the idea of multimodal contrastive learning, facilitating the collaborative learning of multimodal data. To validate the effectiveness of our proposed framework, we conducted comprehensive experimental studies, including extensive comparative experiments, ablation studies, and case studies, on two widely used benchmark datasets, demonstrating the efficacy of the framework. Full article
(This article belongs to the Special Issue Cross-Applications of Natural Language Processing and Text Mining)
Show Figures

Figure 1

19 pages, 14885 KiB  
Article
Extracting Domain-Specific Chinese Named Entities for Aviation Safety Reports: A Case Study
by Xin Wang, Zurui Gan, Yaxi Xu, Bingnan Liu and Tao Zheng
Appl. Sci. 2023, 13(19), 11003; https://doi.org/10.3390/app131911003 - 6 Oct 2023
Viewed by 1118
Abstract
Aviation safety reports can provide detailed records of past aviation safety accidents, analyze their problems and hidden dangers, and help airlines and other aviation enterprises avoid similar accidents from happening again. In a novel way, we plan to use named entity recognition technology [...] Read more.
Aviation safety reports can provide detailed records of past aviation safety accidents, analyze their problems and hidden dangers, and help airlines and other aviation enterprises avoid similar accidents from happening again. In a novel way, we plan to use named entity recognition technology to quickly mine important information in reports, helping safety personnel improve efficiency. The development of intelligent civil aviation creates demands for the incorporation of big data and artificial intelligence. Because of the aviation-specific terms and the complexity of identifying named entity boundaries, the mining of aviation safety report texts is a challenging domain. This paper proposes a novel method for aviation safety report entity extraction. First, ten kinds of entities and sequences, such as event, company, city, operation, date, aircraft type, personnel, flight number, aircraft registration and aircraft part, were annotated using the BIO format. Second, we present a semantic representation enhancement approach through the fusion of enhanced representation through knowledge integration embedding (ERNIE), pinyin embedding and glyph embedding. Then, in order to improve the accuracy of specific entity extraction, we constructed and utilized the aviation domain dictionary which includes high-frequency technical aviation terms. After that, we adopted bilinear attention networks (BANs), the feature fusion approach originally used in multi-modal analysis, in our study to incorporate features extracted from both iterated dilated convolutional neural network (IDCNN) and bi-directional long short-term memory (BiLSTM) architectures. A case study of specific entity extraction for an aviation safety events dataset was conducted. The experimental results demonstrate that our proposed algorithm, with an F1 score reaching 97.93%, is superior to several baseline and advanced algorithms. Therefore, the proposed approach offers a robust methodological foundation for the relationship extraction and knowledge graph construction of aviation safety reports. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

17 pages, 2456 KiB  
Article
Visual Description Augmented Integration Network for Multimodal Entity and Relation Extraction
by Min Zuo, Yingjun Wang, Wei Dong, Qingchuan Zhang, Yuanyuan Cai and Jianlei Kong
Appl. Sci. 2023, 13(10), 6178; https://doi.org/10.3390/app13106178 - 18 May 2023
Cited by 1 | Viewed by 1460
Abstract
Multimodal Named Entity Recognition (MNER) and multimodal Relationship Extraction (MRE) play an important role in processing multimodal data and understanding entity relationships across textual and visual domains. However, irrelevant image information may introduce noise that misleads the recognition of information. Additionally, visual and [...] Read more.
Multimodal Named Entity Recognition (MNER) and multimodal Relationship Extraction (MRE) play an important role in processing multimodal data and understanding entity relationships across textual and visual domains. However, irrelevant image information may introduce noise that misleads the recognition of information. Additionally, visual and semantic features originate from different modalities, and modal disparity hinders semantic alignment. Therefore, this paper proposes the Visual Description Augmentation Integration Network (VDAIN), which introduces an image description generation technique that allows semantic features generated from image descriptions to be presented in the same modality as the semantic features of textual information. This not only reduces the modal gap but also captures more accurately the high-level semantic information and underlying visual structure in the images. To filter out the modal noise, we use VDAIN to adaptively fuse visual features, semantic features of image descriptions, and textual information, thus eliminating irrelevant modal noise. The F1 score of the proposed model in this paper reaches 75.8% and 87.78% for the MNER task and 82.54% for the MRE task on the three public data sets, respectively, which are significantly better than the baseline model. The experimental results demonstrate the effectiveness of the proposed method in solving the modal noise and modal gap problems. Full article
Show Figures

Figure 1

27 pages, 2098 KiB  
Article
A Survey on Multimodal Knowledge Graphs: Construction, Completion and Applications
by Yong Chen, Xinkai Ge, Shengli Yang, Linmei Hu, Jie Li and Jinwen Zhang
Mathematics 2023, 11(8), 1815; https://doi.org/10.3390/math11081815 - 11 Apr 2023
Cited by 8 | Viewed by 6550
Abstract
As an essential part of artificial intelligence, a knowledge graph describes the real-world entities, concepts and their various semantic relationships in a structured way and has been gradually popularized in a variety practical scenarios. The majority of existing knowledge graphs mainly concentrate on [...] Read more.
As an essential part of artificial intelligence, a knowledge graph describes the real-world entities, concepts and their various semantic relationships in a structured way and has been gradually popularized in a variety practical scenarios. The majority of existing knowledge graphs mainly concentrate on organizing and managing textual knowledge in a structured representation, while paying little attention to the multimodal resources (e.g., pictures and videos), which can serve as the foundation for the machine perception of a real-world data scenario. To this end, in this survey, we comprehensively review the related advances of multimodal knowledge graphs, covering multimodal knowledge graph construction, completion and typical applications. For construction, we outline the methods of named entity recognition, relation extraction and event extraction. For completion, we discuss the multimodal knowledge graph representation learning and entity linking. Finally, the mainstream applications of multimodal knowledge graphs in miscellaneous domains are summarized. Full article
Show Figures

Figure 1

12 pages, 1125 KiB  
Article
Robust Chinese Named Entity Recognition Based on Fusion Graph Embedding
by Xuhui Song, Hongtao Yu, Shaomei Li and Huansha Wang
Electronics 2023, 12(3), 569; https://doi.org/10.3390/electronics12030569 - 22 Jan 2023
Cited by 3 | Viewed by 1499
Abstract
Named entity recognition is an important basic task in the field of natural language processing. The current mainstream named entity recognition methods are mainly based on the deep neural network model. The vulnerability of the deep neural network itself leads to a significant [...] Read more.
Named entity recognition is an important basic task in the field of natural language processing. The current mainstream named entity recognition methods are mainly based on the deep neural network model. The vulnerability of the deep neural network itself leads to a significant decline in the accuracy of named entity recognition when there is adversarial text in the text. In order to improve the robustness of named entity recognition under adversarial conditions, this paper proposes a Chinese named entity recognition model based on fusion graph embedding. Firstly, the model encodes and represents the phonetic and glyph information of the input text through graph learning and integrates above-multimodal knowledge into the model, thus enhancing the robustness of the model. Secondly, we use the Bi-LSTM to further obtain the context information of the text. Finally, conditional random field is used to decode and label entities. The experimental results on OntoNotes4.0, MSRA, Weibo, and Resume datasets show that the F1 values of this model increased by 3.76%, 3.93%, 4.16%, and 6.49%, respectively, in the presence of adversarial text, which verifies the effectiveness of this model. Full article
Show Figures

Figure 1

Back to TopTop