The collection of high-resolution training data is crucial in building robust plant disease diagn... more The collection of high-resolution training data is crucial in building robust plant disease diagnosis systems, since such data have a significant impact on diagnostic performance. However, they are very difficult to obtain and are not always available in practice. Deep learning-based techniques, and particularly generative adversarial networks (GANs), can be applied to generate high-quality super-resolution images, but these methods often produce unexpected artifacts that can lower the diagnostic performance. In this paper, we propose a novel artifactsuppression super-resolution method that is specifically designed for diagnosing leaf disease, called Leaf Artifact-Suppression Super Resolution (LASSR). Thanks to its own artifact removal module that detects and suppresses artifacts to a considerable extent, LASSR can generate much more pleasing, high-quality images compared to the state-of-the-art ESRGAN model. Experiments based on a five-class cucumber disease (including healthy) discrimination model show that training with data generated by LASSR significantly boosts the performance on an unseen test dataset by nearly 22% compared with the baseline, and that our approach is more than 2% better than a model trained with images generated by ESRGAN.
Automated plant diagnosis using images taken from a distance is often insufficient in resolution ... more Automated plant diagnosis using images taken from a distance is often insufficient in resolution and degrades diagnostic accuracy since the important external characteristics of symptoms are lost. In this paper, we first propose an effective preprocessing method for improving the performance of automated plant disease diagnosis systems using super-resolution techniques. We investigate the efficiency of two different super-resolution methods by comparing the disease diagnostic performance on the practical original high-resolution, low-resolution, and superresolved cucumber images. Our method generates super-resolved images that look very close to natural images with 4× upscaling factors and is capable of recovering the lost detailed symptoms, largely boosting the diagnostic performance. Our model improves the disease classification accuracy by 26.9% over the bicubic interpolation method of 65.6% and shows a small gap (3% lower) between the original result of 95.5%.
Practical automated detection and diagnosis of plant disease from wide-angle images (i.e. in-fiel... more Practical automated detection and diagnosis of plant disease from wide-angle images (i.e. in-field images containing multiple leaves using a fixed-position camera) is a very important application for large-scale farm management, in view of the need to ensure global food security. However, developing automated systems for disease diagnosis is often difficult, because labeling a reliable wide-angle disease dataset from actual field images is very laborious. In addition, the potential similarities between the training and test data lead to a serious problem of model overfitting. In this paper, we investigate changes in performance when applying disease diagnosis systems to different scenarios involving wide-angle cucumber test data captured on real farms, and propose an effective diagnostic strategy. We show that leading object recognition techniques such as SSD and Faster R-CNN achieve excellent end-to-end disease diagnostic performance only for a test dataset that is collected from the same population as the training dataset (with F1-score of 81.5%-84.1% for diagnosed cases of disease), but their performance markedly deteriorates for a completely different test dataset (with F1-score of 4.4-6.2%). In contrast, our proposed two-stage systems using independent leaf detection and leaf diagnosis stages attain a promising disease diagnostic performance that is more than six times higher than end-to-end systems (with F1-score of 33.4-38.9%) on an unseen target dataset. We also confirm the efficiency of our proposal based on visual assessment, concluding that a two-stage model is a suitable and reasonable choice for practical applications.
In recent years, malware aims at Android OS has been increasing due to its rapid popularization. ... more In recent years, malware aims at Android OS has been increasing due to its rapid popularization. Several studies have been conducted for automated malware detection with machine learning approach and reported promising performance. However, they require a large amount of computation when running on the client; typically mobile phone and/or similar devices. Thus, problems remain in terms of practicality. In this paper, we propose an accurate and light-weight Android malware detection method. Our method treats very limited part of raw APK (Android application package) file of the target as a short string and analyzes it with one-dimensional convolutional neural network (1-D CNN). We used two different datasets each consisting of 5,000 malwares and 2,000 goodwares. We confirmed our method using only the last 512–1K bytes of APK file achieved 95.40–97.04% in accuracy discriminating their malignancy under the 10-fold cross-validation strategy.
Many applications for the automated diagnosis of plant disease have been developed based on the s... more Many applications for the automated diagnosis of plant disease have been developed based on the success of deep learning techniques. However, these applications often suffer from overfitting, and the diagnostic performance is drastically decreased when used on test datasets from new environments. In this paper, we propose LeafGAN, a novel image-to-image translation system with own attention mechanism. LeafGAN generates a wide variety of diseased images via transformation from healthy images, as a data augmentation tool for improving the performance of plant disease diagnosis. Thanks to its own attention mechanism, our model can transform only relevant areas from images with a variety of backgrounds, thus enriching the versatility of the training images. Experiments with five-class cucumber disease classification show that data augmentation with vanilla CycleGAN cannot help to improve the generalization, i.e., disease diagnostic performance increased by only 0.7% from the baseline. In contrast, LeafGAN boosted the diagnostic performance by 7.4%. We also visually confirmed the generated images by our LeafGAN were much better quality and more convincing than those generated by vanilla CycleGAN.
To build a robust and practical content-based image retrieval (CBIR) system that is applicable to... more To build a robust and practical content-based image retrieval (CBIR) system that is applicable to a clinical brain MRI database, we propose a new framework-Disease-oriented image embedding with pseudo-scanner standardization (DI-PSS)-that consists of two core techniques, data harmonization and a dimension reduction algorithm. Our DI-PSS uses skull stripping and CycleGAN-based image transformations that map to a standard brain followed by transformation into a brain image taken with a given reference scanner. Then, our 3D convolutioinal autoencoders (3D-CAE) with deep metric learning acquires a low-dimensional embedding that better reflects the characteristics of the disease. The effectiveness of our proposed framework was tested on the T1-weighted MRIs selected from the Alzheimer's Disease Neuroimaging Initiative and the Parkinson's Progression Markers Initiative. We confirmed that our PSS greatly reduced the variability of low-dimensional embeddings caused by different scanner and datasets. Compared with the baseline condition, our PSS reduced the variability in the distance from Alzheimer's disease (AD) to clinically normal (CN) and Parkinson disease (PD) cases by 15.8-22.6% and 18.0-29.9%, respectively. These properties allow DI-PSS to generate lower dimensional representations that are more amenable to disease classification. In AD and CN classification experiments based on spectral clustering, PSS improved the average accuracy and macro-F1 by 6.2% and 10.7%, respectively. Given the potential of the DI-PSS for harmonizing images scanned by MRI scanners that were not used to scan the training data, we expect that the DI-PSS is suitable for application to a large number of legacy MRIs scanned in heterogeneous environments.
Although attention mechanisms have become fundamental components of deep learning models, they ar... more Although attention mechanisms have become fundamental components of deep learning models, they are vulnerable to perturbations, which may degrade the prediction performance and model interpretability. Adversar
X-ray examinations are a common choice in mass screenings for gastric cancer. Compared to endosco... more X-ray examinations are a common choice in mass screenings for gastric cancer. Compared to endoscopy and other common modalities, X-ray examinations have the significant advantage that they can be performed not only by radiologists but also by radiology technicians. However, the diagnosis of gastric X-ray images is very difficult and it has been reported that the diagnostic accuracy of these images is only 85.5%. In this study, we propose a practical diagnosis support system for gastric X-ray images. An important component of our system is the proposed on-line data augmentation strategy named stochastic gastric image augmentation (sGAIA), which stochastically generates various enhanced images of gastric folds in X-ray images. The proposed sGAIA improves the detection performance of the malignant region by 6.9% in F1-score and our system demonstrates promising screening performance for gastric cancer (recall of 92.3% with a precision of 32.4%) from X-ray images in a clinical setting based on Faster R-CNN with ResNetl01 networks.
Malignant melanoma is the deadliest form of skin cancer, and has, among cancer types, one of the ... more Malignant melanoma is the deadliest form of skin cancer, and has, among cancer types, one of the most rapidly increasing incidence rates in the world. Early diagnosis is crucial, since if detected early, its cure is simple. In this paper, we present an effective approach to melanoma identification from dermoscopic images of skin lesions based on ensemble classification. First, we perform automatic border detection to segment the lesion from the background skin. Based on the extracted border, we extract a series of colour, texture and shape features. The derived features are then employed in a pattern classification stage for which we employ a novel, dedicated ensemble learning approach to address the class imbalance in the training data and to yield improved classification performance. Our classifier committee trains individual classifiers on balanced subspaces, removes redundant predictors based on a diversity measure and combines the remaining classifiers using a neural network fuser. Experimental results on a large dataset of dermoscopic skin lesion images show our approach to work well, to provide both high sensitivity and specificity, and our presented classifier ensemble to lead to statistically better
Medical images are extremely valuable for supporting medical diagnoses. However, in practice, low... more Medical images are extremely valuable for supporting medical diagnoses. However, in practice, low-quality (LQ) medical images, such as images that are hazy/blurry, have uneven illumination, or are out of focus, among others, are often obtained during data acquisition. This leads to difficulties in the screening and diagnosis of medical diseases. Several generative adversarial networks (GAN)-based image enhancement methods have been proposed and have shown promising results. However, there is a quality-originality trade-off among these methods in the sense that they produce visually pleasing results but lose the ability to preserve originality, especially the structural inputs. Moreover, to our knowledge, there is no objective metric in evaluating the structure preservation of medical image enhancement methods in unsupervised settings due to the unavailability of paired groundtruth data. In this study, we propose a framework for practical unsupervised medical image enhancement that includes (1) a non-reference objective evaluation of structure preservation for medical image enhancement tasks called Laplacian structural similarity index measure (LaSSIM), which is based on SSIM and the Laplacian pyramid, and (2) a novel unsupervised GANbased method called Laplacian medical image enhancement (LaMEGAN) to support the improvement of both originality and quality from LQ images. The LaSSIM metric does not require clean reference images and has been shown to be superior to SSIM in capturing image structural changes under image degradations, such as strong blurring on different datasets. The experiments demonstrated that our LaMEGAN achieves a satisfactory balance between quality and originality, with robust structure preservation performance while generating compelling visual results with very high image quality scores. The code will be made available at https://github.com/AillisInc/USPMIE.
There is increasing interest in the use of multimodal data in various web applications, such as d... more There is increasing interest in the use of multimodal data in various web applications, such as digital advertising and e-commerce. Typical methods for extracting important information from multimodal data rely on a mid-fusion architecture that combines the feature representations from multiple encoders. However, as the number of modalities increases, several potential problems with the mid-fusion model structure arise, such as an increase in the dimensionality of the concatenated multimodal features and missing modalities. To address these problems, we propose a new concept that considers multimodal inputs as a set of sequences, namely, deep multimodal sequence sets (DM 2 S 2). Our set-aware concept consists of three components that capture the relationships among multiple modalities: (a) a BERT-based encoder to handle the inter-and intra-order of elements in the sequences, (b) intra-modality residual attention (IntraMRA) to capture the importance of the elements in a modality, and (c) inter-modality residual attention (InterMRA) to enhance the importance of elements with modality-level granularity further. Our concept exhibits performance that is comparable to or better than the previous set-aware models. Furthermore, we demonstrate that the visualization of the learned InterMRA and IntraMRA weights can provide an interpretation of the prediction results. INDEX TERMS Attention mechanism, deep neural networks, multimodal learning.
We propose a simple yet effective image captioning framework that can determine the quality of an... more We propose a simple yet effective image captioning framework that can determine the quality of an image and notify the user of the reasons for any flaws in the image. Our framework first determines the quality of images and then generates captions using only those images that are determined to be of high quality. The user is notified by the flaws feature to retake if image quality is low, and this cycle is repeated until the input image is deemed to be of high quality. As a component of the framework, we trained and evaluated a low-quality image detection model that simultaneously learns difficulty in recognizing images and individual flaws, and we demonstrated that our proposal can explain the reasons for flaws with a sufficient score. We also evaluated a dataset with lowquality images removed by our framework and found improved values for all four common metrics (e.g., BLEU-4, METEOR, ROUGE-L, CIDEr), confirming an improvement in generalpurpose image captioning capability. Our framework would assist the visually impaired, who have difficulty judging image quality.
There is increasing interest in the use of multimodal data in various web applications, such as d... more There is increasing interest in the use of multimodal data in various web applications, such as digital advertising and e-commerce. Typical methods for extracting important information from multimodal data rely on a mid-fusion architecture that combines the feature representations from multiple encoders. However, as the number of modalities increases, several potential problems with the mid-fusion model structure arise, such as an increase in the dimensionality of the concatenated multimodal features and missing modalities. To address these problems, we propose a new concept that considers multimodal inputs as a set of sequences, namely, deep multimodal sequence sets (DM 2 S 2). Our set-aware concept consists of three components that capture the relationships among multiple modalities: (a) a BERT-based encoder to handle the inter-and intra-order of elements in the sequences, (b) intra-modality residual attention (IntraMRA) to capture the importance of the elements in a modality, and (c) inter-modality residual attention (InterMRA) to enhance the importance of elements with modality-level granularity further. Our concept exhibits performance that is comparable to or better than the previous set-aware models. Furthermore, we demonstrate that the visualization of the learned InterMRA and IntraMRA weights can provide an interpretation of the prediction results. INDEX TERMS attention mechanism, deep neural networks, multimodal learning This is a pre-print for private use only. A definitive version was published in IEEE Access and is available at
ardiac events are important causes of perioperative mortality and morbidity in non-cardiac surger... more ardiac events are important causes of perioperative mortality and morbidity in non-cardiac surgery. Thus, cardiac risk should be stratified in individual patients who are to undergo surgery, but it is difficult to assess the likelihood of perioperative cardiac events mainly because of complicated interrelationships between clinical risk factors and type of surgery. To overcome this problem, The American College of Cardiology/American Heart Association (ACC/AHA) Task Force published guidelines for perioperative cardiovascular evaluation for non-cardiac surgery in 1996 and 2002. 1,2 These guidelines divide clinical risk factors into major, intermediate and minor categories, and surgical procedures into high, intermediate and low-risk types. These are then used to determine further preoperative examinations, preoperative therapy, operative performance and perioperative management. The guidelines recommend non-invasive cardiac testing including myocardial perfusion imaging to be used mainly in patients with poor functional capacity (<4 METs). Although there have been enough data
The collection of high-resolution training data is crucial in building robust plant disease diagn... more The collection of high-resolution training data is crucial in building robust plant disease diagnosis systems, since such data have a significant impact on diagnostic performance. However, they are very difficult to obtain and are not always available in practice. Deep learning-based techniques, and particularly generative adversarial networks (GANs), can be applied to generate high-quality super-resolution images, but these methods often produce unexpected artifacts that can lower the diagnostic performance. In this paper, we propose a novel artifactsuppression super-resolution method that is specifically designed for diagnosing leaf disease, called Leaf Artifact-Suppression Super Resolution (LASSR). Thanks to its own artifact removal module that detects and suppresses artifacts to a considerable extent, LASSR can generate much more pleasing, high-quality images compared to the state-of-the-art ESRGAN model. Experiments based on a five-class cucumber disease (including healthy) discrimination model show that training with data generated by LASSR significantly boosts the performance on an unseen test dataset by nearly 22% compared with the baseline, and that our approach is more than 2% better than a model trained with images generated by ESRGAN.
Automated plant diagnosis using images taken from a distance is often insufficient in resolution ... more Automated plant diagnosis using images taken from a distance is often insufficient in resolution and degrades diagnostic accuracy since the important external characteristics of symptoms are lost. In this paper, we first propose an effective preprocessing method for improving the performance of automated plant disease diagnosis systems using super-resolution techniques. We investigate the efficiency of two different super-resolution methods by comparing the disease diagnostic performance on the practical original high-resolution, low-resolution, and superresolved cucumber images. Our method generates super-resolved images that look very close to natural images with 4× upscaling factors and is capable of recovering the lost detailed symptoms, largely boosting the diagnostic performance. Our model improves the disease classification accuracy by 26.9% over the bicubic interpolation method of 65.6% and shows a small gap (3% lower) between the original result of 95.5%.
Practical automated detection and diagnosis of plant disease from wide-angle images (i.e. in-fiel... more Practical automated detection and diagnosis of plant disease from wide-angle images (i.e. in-field images containing multiple leaves using a fixed-position camera) is a very important application for large-scale farm management, in view of the need to ensure global food security. However, developing automated systems for disease diagnosis is often difficult, because labeling a reliable wide-angle disease dataset from actual field images is very laborious. In addition, the potential similarities between the training and test data lead to a serious problem of model overfitting. In this paper, we investigate changes in performance when applying disease diagnosis systems to different scenarios involving wide-angle cucumber test data captured on real farms, and propose an effective diagnostic strategy. We show that leading object recognition techniques such as SSD and Faster R-CNN achieve excellent end-to-end disease diagnostic performance only for a test dataset that is collected from the same population as the training dataset (with F1-score of 81.5%-84.1% for diagnosed cases of disease), but their performance markedly deteriorates for a completely different test dataset (with F1-score of 4.4-6.2%). In contrast, our proposed two-stage systems using independent leaf detection and leaf diagnosis stages attain a promising disease diagnostic performance that is more than six times higher than end-to-end systems (with F1-score of 33.4-38.9%) on an unseen target dataset. We also confirm the efficiency of our proposal based on visual assessment, concluding that a two-stage model is a suitable and reasonable choice for practical applications.
In recent years, malware aims at Android OS has been increasing due to its rapid popularization. ... more In recent years, malware aims at Android OS has been increasing due to its rapid popularization. Several studies have been conducted for automated malware detection with machine learning approach and reported promising performance. However, they require a large amount of computation when running on the client; typically mobile phone and/or similar devices. Thus, problems remain in terms of practicality. In this paper, we propose an accurate and light-weight Android malware detection method. Our method treats very limited part of raw APK (Android application package) file of the target as a short string and analyzes it with one-dimensional convolutional neural network (1-D CNN). We used two different datasets each consisting of 5,000 malwares and 2,000 goodwares. We confirmed our method using only the last 512–1K bytes of APK file achieved 95.40–97.04% in accuracy discriminating their malignancy under the 10-fold cross-validation strategy.
Many applications for the automated diagnosis of plant disease have been developed based on the s... more Many applications for the automated diagnosis of plant disease have been developed based on the success of deep learning techniques. However, these applications often suffer from overfitting, and the diagnostic performance is drastically decreased when used on test datasets from new environments. In this paper, we propose LeafGAN, a novel image-to-image translation system with own attention mechanism. LeafGAN generates a wide variety of diseased images via transformation from healthy images, as a data augmentation tool for improving the performance of plant disease diagnosis. Thanks to its own attention mechanism, our model can transform only relevant areas from images with a variety of backgrounds, thus enriching the versatility of the training images. Experiments with five-class cucumber disease classification show that data augmentation with vanilla CycleGAN cannot help to improve the generalization, i.e., disease diagnostic performance increased by only 0.7% from the baseline. In contrast, LeafGAN boosted the diagnostic performance by 7.4%. We also visually confirmed the generated images by our LeafGAN were much better quality and more convincing than those generated by vanilla CycleGAN.
To build a robust and practical content-based image retrieval (CBIR) system that is applicable to... more To build a robust and practical content-based image retrieval (CBIR) system that is applicable to a clinical brain MRI database, we propose a new framework-Disease-oriented image embedding with pseudo-scanner standardization (DI-PSS)-that consists of two core techniques, data harmonization and a dimension reduction algorithm. Our DI-PSS uses skull stripping and CycleGAN-based image transformations that map to a standard brain followed by transformation into a brain image taken with a given reference scanner. Then, our 3D convolutioinal autoencoders (3D-CAE) with deep metric learning acquires a low-dimensional embedding that better reflects the characteristics of the disease. The effectiveness of our proposed framework was tested on the T1-weighted MRIs selected from the Alzheimer's Disease Neuroimaging Initiative and the Parkinson's Progression Markers Initiative. We confirmed that our PSS greatly reduced the variability of low-dimensional embeddings caused by different scanner and datasets. Compared with the baseline condition, our PSS reduced the variability in the distance from Alzheimer's disease (AD) to clinically normal (CN) and Parkinson disease (PD) cases by 15.8-22.6% and 18.0-29.9%, respectively. These properties allow DI-PSS to generate lower dimensional representations that are more amenable to disease classification. In AD and CN classification experiments based on spectral clustering, PSS improved the average accuracy and macro-F1 by 6.2% and 10.7%, respectively. Given the potential of the DI-PSS for harmonizing images scanned by MRI scanners that were not used to scan the training data, we expect that the DI-PSS is suitable for application to a large number of legacy MRIs scanned in heterogeneous environments.
Although attention mechanisms have become fundamental components of deep learning models, they ar... more Although attention mechanisms have become fundamental components of deep learning models, they are vulnerable to perturbations, which may degrade the prediction performance and model interpretability. Adversar
X-ray examinations are a common choice in mass screenings for gastric cancer. Compared to endosco... more X-ray examinations are a common choice in mass screenings for gastric cancer. Compared to endoscopy and other common modalities, X-ray examinations have the significant advantage that they can be performed not only by radiologists but also by radiology technicians. However, the diagnosis of gastric X-ray images is very difficult and it has been reported that the diagnostic accuracy of these images is only 85.5%. In this study, we propose a practical diagnosis support system for gastric X-ray images. An important component of our system is the proposed on-line data augmentation strategy named stochastic gastric image augmentation (sGAIA), which stochastically generates various enhanced images of gastric folds in X-ray images. The proposed sGAIA improves the detection performance of the malignant region by 6.9% in F1-score and our system demonstrates promising screening performance for gastric cancer (recall of 92.3% with a precision of 32.4%) from X-ray images in a clinical setting based on Faster R-CNN with ResNetl01 networks.
Malignant melanoma is the deadliest form of skin cancer, and has, among cancer types, one of the ... more Malignant melanoma is the deadliest form of skin cancer, and has, among cancer types, one of the most rapidly increasing incidence rates in the world. Early diagnosis is crucial, since if detected early, its cure is simple. In this paper, we present an effective approach to melanoma identification from dermoscopic images of skin lesions based on ensemble classification. First, we perform automatic border detection to segment the lesion from the background skin. Based on the extracted border, we extract a series of colour, texture and shape features. The derived features are then employed in a pattern classification stage for which we employ a novel, dedicated ensemble learning approach to address the class imbalance in the training data and to yield improved classification performance. Our classifier committee trains individual classifiers on balanced subspaces, removes redundant predictors based on a diversity measure and combines the remaining classifiers using a neural network fuser. Experimental results on a large dataset of dermoscopic skin lesion images show our approach to work well, to provide both high sensitivity and specificity, and our presented classifier ensemble to lead to statistically better
Medical images are extremely valuable for supporting medical diagnoses. However, in practice, low... more Medical images are extremely valuable for supporting medical diagnoses. However, in practice, low-quality (LQ) medical images, such as images that are hazy/blurry, have uneven illumination, or are out of focus, among others, are often obtained during data acquisition. This leads to difficulties in the screening and diagnosis of medical diseases. Several generative adversarial networks (GAN)-based image enhancement methods have been proposed and have shown promising results. However, there is a quality-originality trade-off among these methods in the sense that they produce visually pleasing results but lose the ability to preserve originality, especially the structural inputs. Moreover, to our knowledge, there is no objective metric in evaluating the structure preservation of medical image enhancement methods in unsupervised settings due to the unavailability of paired groundtruth data. In this study, we propose a framework for practical unsupervised medical image enhancement that includes (1) a non-reference objective evaluation of structure preservation for medical image enhancement tasks called Laplacian structural similarity index measure (LaSSIM), which is based on SSIM and the Laplacian pyramid, and (2) a novel unsupervised GANbased method called Laplacian medical image enhancement (LaMEGAN) to support the improvement of both originality and quality from LQ images. The LaSSIM metric does not require clean reference images and has been shown to be superior to SSIM in capturing image structural changes under image degradations, such as strong blurring on different datasets. The experiments demonstrated that our LaMEGAN achieves a satisfactory balance between quality and originality, with robust structure preservation performance while generating compelling visual results with very high image quality scores. The code will be made available at https://github.com/AillisInc/USPMIE.
There is increasing interest in the use of multimodal data in various web applications, such as d... more There is increasing interest in the use of multimodal data in various web applications, such as digital advertising and e-commerce. Typical methods for extracting important information from multimodal data rely on a mid-fusion architecture that combines the feature representations from multiple encoders. However, as the number of modalities increases, several potential problems with the mid-fusion model structure arise, such as an increase in the dimensionality of the concatenated multimodal features and missing modalities. To address these problems, we propose a new concept that considers multimodal inputs as a set of sequences, namely, deep multimodal sequence sets (DM 2 S 2). Our set-aware concept consists of three components that capture the relationships among multiple modalities: (a) a BERT-based encoder to handle the inter-and intra-order of elements in the sequences, (b) intra-modality residual attention (IntraMRA) to capture the importance of the elements in a modality, and (c) inter-modality residual attention (InterMRA) to enhance the importance of elements with modality-level granularity further. Our concept exhibits performance that is comparable to or better than the previous set-aware models. Furthermore, we demonstrate that the visualization of the learned InterMRA and IntraMRA weights can provide an interpretation of the prediction results. INDEX TERMS Attention mechanism, deep neural networks, multimodal learning.
We propose a simple yet effective image captioning framework that can determine the quality of an... more We propose a simple yet effective image captioning framework that can determine the quality of an image and notify the user of the reasons for any flaws in the image. Our framework first determines the quality of images and then generates captions using only those images that are determined to be of high quality. The user is notified by the flaws feature to retake if image quality is low, and this cycle is repeated until the input image is deemed to be of high quality. As a component of the framework, we trained and evaluated a low-quality image detection model that simultaneously learns difficulty in recognizing images and individual flaws, and we demonstrated that our proposal can explain the reasons for flaws with a sufficient score. We also evaluated a dataset with lowquality images removed by our framework and found improved values for all four common metrics (e.g., BLEU-4, METEOR, ROUGE-L, CIDEr), confirming an improvement in generalpurpose image captioning capability. Our framework would assist the visually impaired, who have difficulty judging image quality.
There is increasing interest in the use of multimodal data in various web applications, such as d... more There is increasing interest in the use of multimodal data in various web applications, such as digital advertising and e-commerce. Typical methods for extracting important information from multimodal data rely on a mid-fusion architecture that combines the feature representations from multiple encoders. However, as the number of modalities increases, several potential problems with the mid-fusion model structure arise, such as an increase in the dimensionality of the concatenated multimodal features and missing modalities. To address these problems, we propose a new concept that considers multimodal inputs as a set of sequences, namely, deep multimodal sequence sets (DM 2 S 2). Our set-aware concept consists of three components that capture the relationships among multiple modalities: (a) a BERT-based encoder to handle the inter-and intra-order of elements in the sequences, (b) intra-modality residual attention (IntraMRA) to capture the importance of the elements in a modality, and (c) inter-modality residual attention (InterMRA) to enhance the importance of elements with modality-level granularity further. Our concept exhibits performance that is comparable to or better than the previous set-aware models. Furthermore, we demonstrate that the visualization of the learned InterMRA and IntraMRA weights can provide an interpretation of the prediction results. INDEX TERMS attention mechanism, deep neural networks, multimodal learning This is a pre-print for private use only. A definitive version was published in IEEE Access and is available at
ardiac events are important causes of perioperative mortality and morbidity in non-cardiac surger... more ardiac events are important causes of perioperative mortality and morbidity in non-cardiac surgery. Thus, cardiac risk should be stratified in individual patients who are to undergo surgery, but it is difficult to assess the likelihood of perioperative cardiac events mainly because of complicated interrelationships between clinical risk factors and type of surgery. To overcome this problem, The American College of Cardiology/American Heart Association (ACC/AHA) Task Force published guidelines for perioperative cardiovascular evaluation for non-cardiac surgery in 1996 and 2002. 1,2 These guidelines divide clinical risk factors into major, intermediate and minor categories, and surgical procedures into high, intermediate and low-risk types. These are then used to determine further preoperative examinations, preoperative therapy, operative performance and perioperative management. The guidelines recommend non-invasive cardiac testing including myocardial perfusion imaging to be used mainly in patients with poor functional capacity (<4 METs). Although there have been enough data
Uploads
Papers by Hitoshi Iyatomi