Search | arXiv e-print repository

arXiv:2407.20268 [pdf, other]

Utilizing Generative Adversarial Networks for Image Data Augmentation and Classification of Semiconductor Wafer Dicing Induced Defects

Authors: Zhining Hu, Tobias Schlosser, Michael Friedrich, André Luiz Vieira e Silva, Frederik Beuth, Danny Kowerko

Abstract: In semiconductor manufacturing, the wafer dicing process is central yet vulnerable to defects that significantly impair yield - the proportion of defect-free chips. Deep neural networks are the current state of the art in (semi-)automated visual inspection. However, they are notoriously known to require a particularly large amount of data for model training. To address these challenges, we explore… ▽ More In semiconductor manufacturing, the wafer dicing process is central yet vulnerable to defects that significantly impair yield - the proportion of defect-free chips. Deep neural networks are the current state of the art in (semi-)automated visual inspection. However, they are notoriously known to require a particularly large amount of data for model training. To address these challenges, we explore the application of generative adversarial networks (GAN) for image data augmentation and classification of semiconductor wafer dicing induced defects to enhance the variety and balance of training data for visual inspection systems. With this approach, synthetic yet realistic images are generated that mimic real-world dicing defects. We employ three different GAN variants for high-resolution image synthesis: Deep Convolutional GAN (DCGAN), CycleGAN, and StyleGAN3. Our work-in-progress results demonstrate that improved classification accuracies can be obtained, showing an average improvement of up to 23.1 % from 65.1 % (baseline experiment) to 88.2 % (DCGAN experiment) in balanced accuracy, which may enable yield optimization in production. △ Less

Submitted 24 July, 2024; originally announced July 2024.

Comments: Accepted for: 2024 IEEE 29th International Conference on Emerging Technologies and Factory Automation (ETFA)

arXiv:2407.10539 [pdf, other]

Intelligent Urban Traffic Management via Semantic Interoperability across Multiple Heterogeneous Mobility Data Sources

Authors: Mario Scrocca, Marco Grassi, Marco Comerio, Valentina Anita Carriero, Tiago Delgado Dias, Ana Vieira Da Silva, Irene Celino

Abstract: The integrated exploitation of data sources in the mobility domain is key to providing added-value services to passengers, transport companies and authorities. Indeed, multiple stakeholders operate and maintain different kinds of data but several interoperability issues limit their effective usage. In this paper, we present an architecture enabled by Semantic Web technologies to overcome such issu… ▽ More The integrated exploitation of data sources in the mobility domain is key to providing added-value services to passengers, transport companies and authorities. Indeed, multiple stakeholders operate and maintain different kinds of data but several interoperability issues limit their effective usage. In this paper, we present an architecture enabled by Semantic Web technologies to overcome such issues and facilitate the development of an integrated solution for mobility stakeholders. The proposed solution is composed of different components that address challenges for enabling data interoperability, from the findability of data sources to their integrated consumption adopting standardised data formats. We report on the implementation and validation in four European cities of the TANGENT solution enabling data-driven tools for the dynamic management of multimodal traffic. Finally, we discuss the feedback received by users testing the solution and the lessons learnt during its development. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: In Use paper accepted for publication at the 23rd International Semantic Web Conference (ISWC) 2024. This preprint has not undergone peer review or any post-submission improvements or corrections. The Version of Record of this contribution will be published in the conference proceedings

arXiv:2407.02876 [pdf, other]

Prävention und Beseitigung von Fehlerursachen im Kontext von unbemannten Fahrzeugen

Authors: Aron Schnakenbeck, Christoph Sieber, Luis Miguel Vieira da Silva, Felix Gehlhoff, Alexander Fay

Abstract: Mobile robots, becoming increasingly autonomous, are capable of operating in diverse and unknown environments. This flexibility allows them to fulfill goals independently and adapting their actions dynamically without rigidly predefined control codes. However, their autonomous behavior complicates guaranteeing safety and reliability due to the limited influence of a human operator to accurately su… ▽ More Mobile robots, becoming increasingly autonomous, are capable of operating in diverse and unknown environments. This flexibility allows them to fulfill goals independently and adapting their actions dynamically without rigidly predefined control codes. However, their autonomous behavior complicates guaranteeing safety and reliability due to the limited influence of a human operator to accurately supervise and verify each robot's actions. To ensure autonomous mobile robot's safety and reliability, which are aspects of dependability, methods are needed both in the planning and execution of missions for autonomous mobile robots. In this article, a twofold approach is presented that ensures fault removal in the context of mission planning and fault prevention during mission execution for autonomous mobile robots. First, the approach consists of a concept based on formal verification applied during the planning phase of missions. Second, the approach consists of a rule-based concept applied during mission execution. A use case applying the approach is presented, discussing how the two concepts complement each other and what contribution they make to certain aspects of dependability. Unbemannte Fahrzeuge sind durch zunehmende Autonomie in der Lage in unterschiedlichen unbekannten Umgebungen zu operieren. Diese Flexibilität ermöglicht es ihnen Ziele eigenständig zu erfüllen und ihre Handlungen dynamisch anzupassen ohne starr vorgegebenen Steuerungscode. Allerdings erschwert ihr autonomes Verhalten die Gewährleistung von Sicherheit und Zuverlässigkeit, bzw. der Verlässlichkeit, da der Einfluss eines menschlichen Bedieners zur genauen Überwachung und Verifizierung der Aktionen jedes Roboters begrenzt ist. Daher werden Methoden sowohl in der Planung als auch in der Ausführung von Missionen für unbemannte Fahrzeuge benötigt, um die Sicherheit und Zuverlässigkeit dieser Fahrzeuge zu gewährleisten. In diesem Artikel wird ein zweistufiger Ansatz vorgestellt, der eine Fehlerbeseitigung während der Missionsplanung und eine Fehlerprävention während der Missionsausführung für unbemannte Fahrzeuge sicherstellt. Die Fehlerbeseitigung basiert auf formaler Verifikation, die während der Planungsphase der Missionen angewendet wird. Die Fehlerprävention basiert auf einem regelbasierten Konzept, das während der Missionsausführung angewendet wird. Der Ansatz wird an einem Beispiel angewendet und es wird diskutiert, wie die beiden Konzepte sich ergänzen und welchen Beitrag sie zu verschiedenen Aspekten der Verlässlichkeit leisten. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Language: German. Dieser Beitrag wird eingereicht in: "dtec.bw-Beiträge der Helmut-Schmidt-Universität/Universität der Bundeswehr Hamburg: Forschungsaktivitäten im Zentrum für Digitalisierungs- und Technologieforschung der Bundeswehr dtec.bw"

arXiv:2406.13128 [pdf, other]

A New Approach for Evaluating and Improving the Performance of Segmentation Algorithms on Hard-to-Detect Blood Vessels

Authors: João Pedro Parella, Matheus Viana da Silva, Cesar Henrique Comin

Abstract: Many studies regarding the vasculature of biological tissues involve the segmentation of the blood vessels in a sample followed by the creation of a graph structure to model the vasculature. The graph is then used to extract relevant vascular properties. Small segmentation errors can lead to largely distinct connectivity patterns and a high degree of variability of the extracted properties. Nevert… ▽ More Many studies regarding the vasculature of biological tissues involve the segmentation of the blood vessels in a sample followed by the creation of a graph structure to model the vasculature. The graph is then used to extract relevant vascular properties. Small segmentation errors can lead to largely distinct connectivity patterns and a high degree of variability of the extracted properties. Nevertheless, global metrics such as Dice, precision, and recall are commonly applied for measuring the performance of blood vessel segmentation algorithms. These metrics might conceal important information about the accuracy at specific regions of a sample. To tackle this issue, we propose a local vessel salience (LVS) index to quantify the expected difficulty in segmenting specific blood vessel segments. The LVS index is calculated for each vessel pixel by comparing the local intensity of the vessel with the image background around the pixel. The index is then used for defining a new accuracy metric called low-salience recall (LSRecall), which quantifies the performance of segmentation algorithms on blood vessel segments having low salience. The perspective provided by the LVS index is used to define a data augmentation procedure that can be used to improve the segmentation performance of convolutional neural networks. We show that segmentation algorithms having high Dice and recall values can display very low LSRecall values, which reveals systematic errors of these algorithms for vessels having low salience. The proposed data augmentation procedure is able to improve the LSRecall of some samples by as much as 25%. The developed methodology opens up new possibilities for comparing the performance of segmentation algorithms regarding hard-to-detect blood vessels as well as their capabilities for vascular topology preservation. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.07962 [pdf, other]

Toward a Method to Generate Capability Ontologies from Natural Language Descriptions

Authors: Luis Miguel Vieira da Silva, Aljosha Köcher, Felix Gehlhoff, Alexander Fay

Abstract: To achieve a flexible and adaptable system, capability ontologies are increasingly leveraged to describe functions in a machine-interpretable way. However, modeling such complex ontological descriptions is still a manual and error-prone task that requires a significant amount of effort and ontology expertise. This contribution presents an innovative method to automate capability ontology modeling… ▽ More To achieve a flexible and adaptable system, capability ontologies are increasingly leveraged to describe functions in a machine-interpretable way. However, modeling such complex ontological descriptions is still a manual and error-prone task that requires a significant amount of effort and ontology expertise. This contribution presents an innovative method to automate capability ontology modeling using Large Language Models (LLMs), which have proven to be well suited for such tasks. Our approach requires only a natural language description of a capability, which is then automatically inserted into a predefined prompt using a few-shot prompting technique. After prompting an LLM, the resulting capability ontology is automatically verified through various steps in a loop with the LLM to check the overall correctness of the capability ontology. First, a syntax check is performed, then a check for contradictions, and finally a check for hallucinations and missing ontology elements. Our method greatly reduces manual effort, as only the initial natural language description and a final human review and possible correction are necessary, thereby streamlining the capability ontology generation process. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2405.14548 [pdf, other]

Rapid modelling of reactive transport in porous media using machine learning: limitations and solutions

Authors: Vinicius L S Silva, Geraldine Regnier, Pablo Salinas, Claire E Heaney, Matthew D Jackson, Christopher C Pain

Abstract: Reactive transport in porous media plays a pivotal role in subsurface reservoir processes, influencing fluid properties and geochemical characteristics. However, coupling fluid flow and transport with geochemical reactions is computationally intensive, requiring geochemical calculations at each grid cell and each time step within a discretized simulation domain. Although recent advancements have i… ▽ More Reactive transport in porous media plays a pivotal role in subsurface reservoir processes, influencing fluid properties and geochemical characteristics. However, coupling fluid flow and transport with geochemical reactions is computationally intensive, requiring geochemical calculations at each grid cell and each time step within a discretized simulation domain. Although recent advancements have integrated machine learning techniques as surrogates for geochemical simulations, ensuring computational efficiency and accuracy remains a challenge. This chapter investigates machine learning models as replacements for a geochemical module in a reactive transport in porous media simulation. We test this approach on a well-documented cation exchange problem. While the surrogate models excel in isolated predictions, they fall short in rollout predictions over successive time steps. By introducing modifications, including physics-based constraints and tailored dataset generation strategies, we show that machine learning surrogates can achieve accurate rollout predictions. Our findings emphasize that, when judiciously designed, machine learning surrogates can substantially expedite the cation exchange problem without compromising accuracy, offering significant potential for a range of reactive transport applications. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2404.17524 [pdf, other]

On the Use of Large Language Models to Generate Capability Ontologies

Authors: Luis Miguel Vieira da Silva, Aljosha Köcher, Felix Gehlhoff, Alexander Fay

Abstract: Capability ontologies are increasingly used to model functionalities of systems or machines. The creation of such ontological models with all properties and constraints of capabilities is very complex and can only be done by ontology experts. However, Large Language Models (LLMs) have shown that they can generate machine-interpretable models from natural language text input and thus support engine… ▽ More Capability ontologies are increasingly used to model functionalities of systems or machines. The creation of such ontological models with all properties and constraints of capabilities is very complex and can only be done by ontology experts. However, Large Language Models (LLMs) have shown that they can generate machine-interpretable models from natural language text input and thus support engineers / ontology experts. Therefore, this paper investigates how LLMs can be used to create capability ontologies. We present a study with a series of experiments in which capabilities with varying complexities are generated using different prompting techniques and with different LLMs. Errors in the generated ontologies are recorded and compared. To analyze the quality of the generated ontologies, a semi-automated approach based on RDF syntax checking, OWL reasoning, and SHACL constraints is used. The results of this study are very promising because even for complex capabilities, the generated ontologies are almost free of errors. △ Less

Submitted 30 June, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.10170 [pdf, other]

High-Resolution Detection of Earth Structural Heterogeneities from Seismic Amplitudes using Convolutional Neural Networks with Attention layers

Authors: Luiz Schirmer, Guilherme Schardong, Vinícius da Silva, Rogério Santos, Hélio Lopes

Abstract: Earth structural heterogeneities have a remarkable role in the petroleum economy for both exploration and production projects. Automatic detection of detailed structural heterogeneities is challenging when considering modern machine learning techniques like deep neural networks. Typically, these techniques can be an excellent tool for assisted interpretation of such heterogeneities, but it heavily… ▽ More Earth structural heterogeneities have a remarkable role in the petroleum economy for both exploration and production projects. Automatic detection of detailed structural heterogeneities is challenging when considering modern machine learning techniques like deep neural networks. Typically, these techniques can be an excellent tool for assisted interpretation of such heterogeneities, but it heavily depends on the amount of data to be trained. We propose an efficient and cost-effective architecture for detecting seismic structural heterogeneities using Convolutional Neural Networks (CNNs) combined with Attention layers. The attention mechanism reduces costs and enhances accuracy, even in cases with relatively noisy data. Our model has half the parameters compared to the state-of-the-art, and it outperforms previous methods in terms of Intersection over Union (IoU) by 0.6% and precision by 0.4%. By leveraging synthetic data, we apply transfer learning to train and fine-tune the model, addressing the challenge of limited annotated data availability. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2403.10304 [pdf, ps, other]

KIF: A Wikidata-Based Framework for Integrating Heterogeneous Knowledge Sources

Authors: Guilherme Lima, João M. B. Rodrigues, Marcelo Machado, Elton Soares, Sandro R. Fiorini, Raphael Thiago, Leonardo G. Azevedo, Viviane T. da Silva, Renato Cerqueira

Abstract: We present a Wikidata-based framework, called KIF, for virtually integrating heterogeneous knowledge sources. KIF is written in Python and is released as open-source. It leverages Wikidata's data model and vocabulary plus user-defined mappings to construct a unified view of the underlying sources while keeping track of the context and provenance of their statements. The underlying sources can be t… ▽ More We present a Wikidata-based framework, called KIF, for virtually integrating heterogeneous knowledge sources. KIF is written in Python and is released as open-source. It leverages Wikidata's data model and vocabulary plus user-defined mappings to construct a unified view of the underlying sources while keeping track of the context and provenance of their statements. The underlying sources can be triplestores, relational databases, CSV files, etc., which may or may not use the vocabulary and RDF encoding of Wikidata. The end result is a virtual knowledge base which behaves like an "extended Wikidata" and which can be queried using a simple but expressive pattern language, defined in terms of Wikidata's data model. In this paper, we present the design and implementation of KIF, discuss how we have used it to solve a real integration problem in the domain of chemistry (involving Wikidata, PubChem, and IBM CIRCA), and present experimental results on the performance and overhead of KIF △ Less

Submitted 24 July, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

arXiv:2402.06110 [pdf, other]

AI enhanced data assimilation and uncertainty quantification applied to Geological Carbon Storage

Authors: G. S. Seabra, N. T. Mücke, V. L. S. Silva, D. Voskov, F. Vossepoel

Abstract: This study investigates the integration of machine learning (ML) and data assimilation (DA) techniques, focusing on implementing surrogate models for Geological Carbon Storage (GCS) projects while maintaining high fidelity physical results in posterior states. Initially, we evaluate the surrogate modeling capability of two distinct machine learning models, Fourier Neural Operators (FNOs) and Trans… ▽ More This study investigates the integration of machine learning (ML) and data assimilation (DA) techniques, focusing on implementing surrogate models for Geological Carbon Storage (GCS) projects while maintaining high fidelity physical results in posterior states. Initially, we evaluate the surrogate modeling capability of two distinct machine learning models, Fourier Neural Operators (FNOs) and Transformer UNet (T-UNet), in the context of CO$_2$ injection simulations within channelized reservoirs. We introduce the Surrogate-based hybrid ESMDA (SH-ESMDA), an adaptation of the traditional Ensemble Smoother with Multiple Data Assimilation (ESMDA). This method uses FNOs and T-UNet as surrogate models and has the potential to make the standard ESMDA process at least 50% faster or more, depending on the number of assimilation steps. Additionally, we introduce Surrogate-based Hybrid RML (SH-RML), a variational data assimilation approach that relies on the randomized maximum likelihood (RML) where both the FNO and the T-UNet enable the computation of gradients for the optimization of the objective function, and a high-fidelity model is employed for the computation of the posterior states. Our comparative analyses show that SH-RML offers better uncertainty quantification compared to conventional ESMDA for the case study. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 29 pages, 20 figures, submited to the International Journal of Greenhouse Gas Control

ACM Class: J.2

arXiv:2401.16235 [pdf]

Player Pressure Map -- A Novel Representation of Pressure in Soccer for Evaluating Player Performance in Different Game Contexts

Authors: Chaoyi Gu, Jiaming Na, Yisheng Pei, Varuna De Silva

Abstract: In soccer, contextual player performance metrics are invaluable to coaches. For example, the ability to perform under pressure during matches distinguishes the elite from the average. Appropriate pressure metric enables teams to assess players' performance accurately under pressure and design targeted training scenarios to address their weaknesses. The primary objective of this paper is to leverag… ▽ More In soccer, contextual player performance metrics are invaluable to coaches. For example, the ability to perform under pressure during matches distinguishes the elite from the average. Appropriate pressure metric enables teams to assess players' performance accurately under pressure and design targeted training scenarios to address their weaknesses. The primary objective of this paper is to leverage both tracking and event data and game footage to capture the pressure experienced by the possession team in a soccer game scene. We propose a player pressure map to represent a given game scene, which lowers the dimension of raw data and still contains rich contextual information. Not only does it serve as an effective tool for visualizing and evaluating the pressure on the team and each individual, but it can also be utilized as a backbone for accessing players' performance. Overall, our model provides coaches and analysts with a deeper understanding of players' performance under pressure so that they make data-oriented tactical decisions. △ Less

Submitted 7 March, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.15059 [pdf, other]

Fully Independent Communication in Multi-Agent Reinforcement Learning

Authors: Rafael Pina, Varuna De Silva, Corentin Artaud, Xiaolan Liu

Abstract: Multi-Agent Reinforcement Learning (MARL) comprises a broad area of research within the field of multi-agent systems. Several recent works have focused specifically on the study of communication approaches in MARL. While multiple communication methods have been proposed, these might still be too complex and not easily transferable to more practical contexts. One of the reasons for that is due to t… ▽ More Multi-Agent Reinforcement Learning (MARL) comprises a broad area of research within the field of multi-agent systems. Several recent works have focused specifically on the study of communication approaches in MARL. While multiple communication methods have been proposed, these might still be too complex and not easily transferable to more practical contexts. One of the reasons for that is due to the use of the famous parameter sharing trick. In this paper, we investigate how independent learners in MARL that do not share parameters can communicate. We demonstrate that this setting might incur into some problems, to which we propose a new learning scheme as a solution. Our results show that, despite the challenges, independent agents can still learn communication strategies following our method. Additionally, we use this method to investigate how communication in MARL is affected by different network capacities, both for sharing and not sharing parameters. We observe that communication may not always be needed and that the chosen agent network sizes need to be considered when used together with communication in order to achieve efficient learning. △ Less

Submitted 26 March, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

Comments: Extended version of the paper appearing on AAMAS 2024 with the same title. 11 pages, 8 figures

arXiv:2401.08686 [pdf, other]

Attention Modules Improve Modern Image-Level Anomaly Detection: A DifferNet Case Study

Authors: André Luiz B. Vieira e Silva, Francisco Simões, Danny Kowerko, Tobias Schlosser, Felipe Battisti, Veronica Teichrieb

Abstract: Within (semi-)automated visual inspection, learning-based approaches for assessing visual defects, including deep neural networks, enable the processing of otherwise small defect patterns in pixel size on high-resolution imagery. The emergence of these often rarely occurring defect patterns explains the general need for labeled data corpora. To not only alleviate this issue but to furthermore adva… ▽ More Within (semi-)automated visual inspection, learning-based approaches for assessing visual defects, including deep neural networks, enable the processing of otherwise small defect patterns in pixel size on high-resolution imagery. The emergence of these often rarely occurring defect patterns explains the general need for labeled data corpora. To not only alleviate this issue but to furthermore advance the current state of the art in unsupervised visual inspection, this contribution proposes a DifferNet-based solution enhanced with attention modules utilizing SENet and CBAM as backbone - AttentDifferNet - to improve the detection and classification capabilities on three different visual inspection and anomaly detection datasets: MVTec AD, InsPLAD-fault, and Semiconductor Wafer. In comparison to the current state of the art, it is shown that AttentDifferNet achieves improved results, which are, in turn, highlighted throughout our quantitative as well as qualitative evaluation, indicated by a general improvement in AUC of 94.34 vs. 92.46, 96.67 vs. 94.69, and 90.20 vs. 88.74%. As our variants to AttentDifferNet show great prospects in the context of currently investigated approaches, a baseline is formulated, emphasizing the importance of attention for anomaly detection. △ Less

Submitted 12 January, 2024; originally announced January 2024.

Comments: Accepted to CVPRW 2023: VISION'23 - 1st workshop on Vision-based InduStrial InspectiON (Extended Abstract). arXiv admin note: substantial text overlap with arXiv:2311.02747

arXiv:2312.12598 [pdf, other]

A Case Study on Test Case Construction with Large Language Models: Unveiling Practical Insights and Challenges

Authors: Roberto Francisco de Lima Junior, Luiz Fernando Paes de Barros Presta, Lucca Santos Borborema, Vanderson Nogueira da Silva, Marcio Leal de Melo Dahia, Anderson Carlos Sousa e Santos

Abstract: This paper presents a detailed case study examining the application of Large Language Models (LLMs) in the construction of test cases within the context of software engineering. LLMs, characterized by their advanced natural language processing capabilities, are increasingly garnering attention as tools to automate and enhance various aspects of the software development life cycle. Leveraging a cas… ▽ More This paper presents a detailed case study examining the application of Large Language Models (LLMs) in the construction of test cases within the context of software engineering. LLMs, characterized by their advanced natural language processing capabilities, are increasingly garnering attention as tools to automate and enhance various aspects of the software development life cycle. Leveraging a case study methodology, we systematically explore the integration of LLMs in the test case construction process, aiming to shed light on their practical efficacy, challenges encountered, and implications for software quality assurance. The study encompasses the selection of a representative software application, the formulation of test case construction methodologies employing LLMs, and the subsequent evaluation of outcomes. Through a blend of qualitative and quantitative analyses, this study assesses the impact of LLMs on test case comprehensiveness, accuracy, and efficiency. Additionally, delves into challenges such as model interpretability and adaptation to diverse software contexts. The findings from this case study contributes with nuanced insights into the practical utility of LLMs in the domain of test case construction, elucidating their potential benefits and limitations. By addressing real-world scenarios and complexities, this research aims to inform software practitioners and researchers alike about the tangible implications of incorporating LLMs into the software testing landscape, fostering a more comprehensive understanding of their role in optimizing the software development process. △ Less

Submitted 21 December, 2023; v1 submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.08801 [pdf, other]

Automated Process Planning Based on a Semantic Capability Model and SMT

Authors: Aljosha Köcher, Luis Miguel Vieira da Silva, Alexander Fay

Abstract: In research of manufacturing systems and autonomous robots, the term capability is used for a machine-interpretable specification of a system function. Approaches in this research area develop information models that capture all information relevant to interpret the requirements, effects and behavior of functions. These approaches are intended to overcome the heterogeneity resulting from the vario… ▽ More In research of manufacturing systems and autonomous robots, the term capability is used for a machine-interpretable specification of a system function. Approaches in this research area develop information models that capture all information relevant to interpret the requirements, effects and behavior of functions. These approaches are intended to overcome the heterogeneity resulting from the various types of processes and from the large number of different vendors. However, these models and associated methods do not offer solutions for automated process planning, i.e. finding a sequence of individual capabilities required to manufacture a certain product or to accomplish a mission using autonomous robots. Instead, this is a typical task for AI planning approaches, which unfortunately require a high effort to create the respective planning problem descriptions. In this paper, we present an approach that combines these two topics: Starting from a semantic capability model, an AI planning problem is automatically generated. The planning problem is encoded using Satisfiability Modulo Theories and uses an existing solver to find valid capability sequences including required parameter values. The approach also offers possibilities to integrate existing human expertise and to provide explanations for human operators in order to help understand planning decisions. △ Less

Submitted 14 February, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: Presented at CAIPI Workshop at AAAI 2024

arXiv:2311.11849 [pdf, other]

Multilayer Quantile Graph for Multivariate Time Series Analysis and Dimensionality Reduction

Authors: Vanessa Freitas Silva, Maria Eduarda Silva, Pedro Ribeiro, Fernando Silva

Abstract: In recent years, there has been a surge in the prevalence of high- and multi-dimensional temporal data across various scientific disciplines. These datasets are characterized by their vast size and challenging potential for analysis. Such data typically exhibit serial and cross-dependency and possess high dimensionality, thereby introducing additional complexities to conventional time series analy… ▽ More In recent years, there has been a surge in the prevalence of high- and multi-dimensional temporal data across various scientific disciplines. These datasets are characterized by their vast size and challenging potential for analysis. Such data typically exhibit serial and cross-dependency and possess high dimensionality, thereby introducing additional complexities to conventional time series analysis methods. To address these challenges, a recent and complementary approach has emerged, known as network-based analysis methods for multivariate time series. In univariate settings, Quantile Graphs have been employed to capture temporal transition properties and reduce data dimensionality by mapping observations to a smaller set of sample quantiles. To confront the increasingly prominent issue of high dimensionality, we propose an extension of Quantile Graphs into a multivariate variant, which we term "Multilayer Quantile Graphs". In this innovative mapping, each time series is transformed into a Quantile Graph, and inter-layer connections are established to link contemporaneous quantiles of pairwise series. This enables the analysis of dynamic transitions across multiple dimensions. In this study, we demonstrate the effectiveness of this new mapping using a synthetic multivariate time series dataset. We delve into the resulting network's topological structures, extract network features, and employ these features for original dataset analysis. Furthermore, we compare our results with a recent method from the literature. The resulting multilayer network offers a significant reduction in the dimensionality of the original data while capturing serial and cross-dimensional transitions. This approach facilitates the characterization and analysis of large multivariate time series datasets through network analysis techniques. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.02747 [pdf, other]

Attention Modules Improve Image-Level Anomaly Detection for Industrial Inspection: A DifferNet Case Study

Authors: André Luiz Buarque Vieira e Silva, Francisco Simões, Danny Kowerko, Tobias Schlosser, Felipe Battisti, Veronica Teichrieb

Abstract: Within (semi-)automated visual industrial inspection, learning-based approaches for assessing visual defects, including deep neural networks, enable the processing of otherwise small defect patterns in pixel size on high-resolution imagery. The emergence of these often rarely occurring defect patterns explains the general need for labeled data corpora. To alleviate this issue and advance the curre… ▽ More Within (semi-)automated visual industrial inspection, learning-based approaches for assessing visual defects, including deep neural networks, enable the processing of otherwise small defect patterns in pixel size on high-resolution imagery. The emergence of these often rarely occurring defect patterns explains the general need for labeled data corpora. To alleviate this issue and advance the current state of the art in unsupervised visual inspection, this work proposes a DifferNet-based solution enhanced with attention modules: AttentDifferNet. It improves image-level detection and classification capabilities on three visual anomaly detection datasets for industrial inspection: InsPLAD-fault, MVTec AD, and Semiconductor Wafer. In comparison to the state of the art, AttentDifferNet achieves improved results, which are, in turn, highlighted throughout our quali-quantitative study. Our quantitative evaluation shows an average improvement - compared to DifferNet - of 1.77 +/- 0.25 percentage points in overall AUROC considering all three datasets, reaching SOTA results in InsPLAD-fault, an industrial inspection in-the-wild dataset. As our variants to AttentDifferNet show great prospects in the context of currently investigated approaches, a baseline is formulated, emphasizing the importance of attention for industrial anomaly detection both in the wild and in controlled environments. △ Less

Submitted 7 November, 2023; v1 submitted 5 November, 2023; originally announced November 2023.

Comments: Accepted at WACV 2024

arXiv:2311.02746 [pdf, other]

Staged Reinforcement Learning for Complex Tasks through Decomposed Environments

Authors: Rafael Pina, Corentin Artaud, Xiaolan Liu, Varuna De Silva

Abstract: Reinforcement Learning (RL) is an area of growing interest in the field of artificial intelligence due to its many notable applications in diverse fields. Particularly within the context of intelligent vehicle control, RL has made impressive progress. However, currently it is still in simulated controlled environments where RL can achieve its full super-human potential. Although how to apply simul… ▽ More Reinforcement Learning (RL) is an area of growing interest in the field of artificial intelligence due to its many notable applications in diverse fields. Particularly within the context of intelligent vehicle control, RL has made impressive progress. However, currently it is still in simulated controlled environments where RL can achieve its full super-human potential. Although how to apply simulation experience in real scenarios has been studied, how to approximate simulated problems to the real dynamic problems is still a challenge. In this paper, we discuss two methods that approximate RL problems to real problems. In the context of traffic junction simulations, we demonstrate that, if we can decompose a complex task into multiple sub-tasks, solving these tasks first can be advantageous to help minimising possible occurrences of catastrophic events in the complex task. From a multi-agent perspective, we introduce a training structuring mechanism that exploits the use of experience learned under the popular paradigm called Centralised Training Decentralised Execution (CTDE). This experience can then be leveraged in fully decentralised settings that are conceptually closer to real settings, where agents often do not have access to a central oracle and must be treated as isolated independent units. The results show that the proposed approaches improve agents performance in complex tasks related to traffic junctions, minimising potential safety-critical problems that might happen in these scenarios. Although still in simulation, the investigated situations are conceptually closer to real scenarios and thus, with these results, we intend to motivate further research in the subject. △ Less

Submitted 5 November, 2023; originally announced November 2023.

Comments: Intelligent Systems and Pattern Recognition 2023 (ISPR 2023)

arXiv:2311.02741 [pdf, other]

Learning Independently from Causality in Multi-Agent Environments

Authors: Rafael Pina, Varuna De Silva, Corentin Artaud

Abstract: Multi-Agent Reinforcement Learning (MARL) comprises an area of growing interest in the field of machine learning. Despite notable advances, there are still problems that require investigation. The lazy agent pathology is a famous problem in MARL that denotes the event when some of the agents in a MARL team do not contribute to the common goal, letting the teammates do all the work. In this work, w… ▽ More Multi-Agent Reinforcement Learning (MARL) comprises an area of growing interest in the field of machine learning. Despite notable advances, there are still problems that require investigation. The lazy agent pathology is a famous problem in MARL that denotes the event when some of the agents in a MARL team do not contribute to the common goal, letting the teammates do all the work. In this work, we aim to investigate this problem from a causality-based perspective. We intend to create the bridge between the fields of MARL and causality and argue about the usefulness of this link. We study a fully decentralised MARL setup where agents need to learn cooperation strategies and show that there is a causal relation between individual observations and the team reward. The experiments carried show how this relation can be used to improve independent agents in MARL, resulting not only on better performances as a team but also on the rise of more intelligent behaviours on individual agents. △ Less

Submitted 5 November, 2023; originally announced November 2023.

Comments: Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2023)

arXiv:2311.01619 [pdf, other]

doi 10.1080/01431161.2023.2283900

InsPLAD: A Dataset and Benchmark for Power Line Asset Inspection in UAV Images

Authors: André Luiz Buarque Vieira e Silva, Heitor de Castro Felix, Franscisco Paulo Magalhães Simões, Veronica Teichrieb, Michel Mozinho dos Santos, Hemir Santiago, Virginia Sgotti, Henrique Lott Neto

Abstract: Power line maintenance and inspection are essential to avoid power supply interruptions, reducing its high social and financial impacts yearly. Automating power line visual inspections remains a relevant open problem for the industry due to the lack of public real-world datasets of power line components and their various defects to foster new research. This paper introduces InsPLAD, a Power Line A… ▽ More Power line maintenance and inspection are essential to avoid power supply interruptions, reducing its high social and financial impacts yearly. Automating power line visual inspections remains a relevant open problem for the industry due to the lack of public real-world datasets of power line components and their various defects to foster new research. This paper introduces InsPLAD, a Power Line Asset Inspection Dataset and Benchmark containing 10,607 high-resolution Unmanned Aerial Vehicles colour images. The dataset contains seventeen unique power line assets captured from real-world operating power lines. Additionally, five of those assets present six defects: four of which are corrosion, one is a broken component, and one is a bird's nest presence. All assets were labelled according to their condition, whether normal or the defect name found on an image level. We thoroughly evaluate state-of-the-art and popular methods for three image-level computer vision tasks covered by InsPLAD: object detection, through the AP metric; defect classification, through Balanced Accuracy; and anomaly detection, through the AUROC metric. InsPLAD offers various vision challenges from uncontrolled environments, such as multi-scale objects, multi-size class instances, multiple objects per image, intra-class variation, cluttered background, distinct point-of-views, perspective distortion, occlusion, and varied lighting conditions. To the best of our knowledge, InsPLAD is the first large real-world dataset and benchmark for power line asset inspection with multiple components and defects for various computer vision tasks, with a potential impact to improve state-of-the-art methods in the field. It will be publicly available in its integrity on a repository with a thorough description. It can be found at https://github.com/andreluizbvs/InsPLAD. △ Less

Submitted 3 December, 2023; v1 submitted 2 November, 2023; originally announced November 2023.

Comments: This is an original manuscript of an article published by Taylor & Francis in the International Journal of Remote Sensing on 29 Nov 2023, available online: https://doi.org/10.1080/01431161.2023.2283900

arXiv:2310.11344 [pdf, other]

The effect of stemming and lemmatization on Portuguese fake news text classification

Authors: Lucca de Freitas Santos, Murilo Varges da Silva

Abstract: With the popularization of the internet, smartphones and social media, information is being spread quickly and easily way, which implies bigger traffic of information in the world, but there is a problem that is harming society with the dissemination of fake news. With a bigger flow of information, some people are trying to disseminate deceptive information and fake news. The automatic detection o… ▽ More With the popularization of the internet, smartphones and social media, information is being spread quickly and easily way, which implies bigger traffic of information in the world, but there is a problem that is harming society with the dissemination of fake news. With a bigger flow of information, some people are trying to disseminate deceptive information and fake news. The automatic detection of fake news is a challenging task because to obtain a good result is necessary to deal with linguistics problems, especially when we are dealing with languages that not have been comprehensively studied yet, besides that, some techniques can help to reach a good result when we are dealing with text data, although, the motivation of detecting this deceptive information it is in the fact that the people need to know which information is true and trustful and which one is not. In this work, we present the effect the pre-processing methods such as lemmatization and stemming have on fake news classification, for that we designed some classifier models applying different pre-processing techniques. The results show that the pre-processing step is important to obtain betters results, the stemming and lemmatization techniques are interesting methods and need to be more studied to develop techniques focused on the Portuguese language so we can reach better results. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2309.06033 [pdf, other]

doi 10.1109/LCOMM.2023.3312793

Energy-Aware Federated Learning with Distributed User Sampling and Multichannel ALOHA

Authors: Rafael Valente da Silva, Onel L. Alcaraz López, Richard Demo Souza

Abstract: Distributed learning on edge devices has attracted increased attention with the advent of federated learning (FL). Notably, edge devices often have limited battery and heterogeneous energy availability, while multiple rounds are required in FL for convergence, intensifying the need for energy efficiency. Energy depletion may hinder the training process and the efficient utilization of the trained… ▽ More Distributed learning on edge devices has attracted increased attention with the advent of federated learning (FL). Notably, edge devices often have limited battery and heterogeneous energy availability, while multiple rounds are required in FL for convergence, intensifying the need for energy efficiency. Energy depletion may hinder the training process and the efficient utilization of the trained model. To solve these problems, this letter considers the integration of energy harvesting (EH) devices into a FL network with multi-channel ALOHA, while proposing a method to ensure both low energy outage probability and successful execution of future tasks. Numerical results demonstrate the effectiveness of this method, particularly in critical setups where the average energy income fails to cover the iteration cost. The method outperforms a norm based solution in terms of convergence time and battery level. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2309.01526 [pdf, other]

Passing Heatmap Prediction Based on Transformer Model and Tracking Data

Authors: Yisheng Pei, Varuna De Silva, Mike Caine

Abstract: Although the data-driven analysis of football players' performance has been developed for years, most research only focuses on the on-ball event including shots and passes, while the off-ball movement remains a little-explored area in this domain. Players' contributions to the whole match are evaluated unfairly, those who have more chances to score goals earn more credit than others, while the ind… ▽ More Although the data-driven analysis of football players' performance has been developed for years, most research only focuses on the on-ball event including shots and passes, while the off-ball movement remains a little-explored area in this domain. Players' contributions to the whole match are evaluated unfairly, those who have more chances to score goals earn more credit than others, while the indirect and unnoticeable impact that comes from continuous movement has been ignored. This research presents a novel deep-learning network architecture which is capable to predict the potential end location of passes and how players' movement before the pass affects the final outcome. Once analysed more than 28,000 pass events, a robust prediction can be achieved with more than 0.7 Top-1 accuracy. And based on the prediction, a better understanding of the pitch control and pass option could be reached to measure players' off-ball movement contribution to defensive performance. Moreover, this model could provide football analysts a better tool and metric to understand how players' movement over time contributes to the game strategy and final victory. △ Less

Submitted 4 September, 2023; originally announced September 2023.

arXiv:2308.13888 [pdf, other]

Neural Implicit Morphing of Face Images

Authors: Guilherme Schardong, Tiago Novello, Hallison Paz, Iurii Medvedev, Vinícius da Silva, Luiz Velho, Nuno Gonçalves

Abstract: Face morphing is a problem in computer graphics with numerous artistic and forensic applications. It is challenging due to variations in pose, lighting, gender, and ethnicity. This task consists of a warping for feature alignment and a blending for a seamless transition between the warped images. We propose to leverage coord-based neural networks to represent such warpings and blendings of face im… ▽ More Face morphing is a problem in computer graphics with numerous artistic and forensic applications. It is challenging due to variations in pose, lighting, gender, and ethnicity. This task consists of a warping for feature alignment and a blending for a seamless transition between the warped images. We propose to leverage coord-based neural networks to represent such warpings and blendings of face images. During training, we exploit the smoothness and flexibility of such networks by combining energy functionals employed in classical approaches without discretizations. Additionally, our method is time-dependent, allowing a continuous warping/blending of the images. During morphing inference, we need both direct and inverse transformations of the time-dependent warping. The first (second) is responsible for warping the target (source) image into the source (target) image. Our neural warping stores those maps in a single network dismissing the need for inverting them. The results of our experiments indicate that our method is competitive with both classical and generative models under the lens of image quality and face-morphing detectors. Aesthetically, the resulting images present a seamless blending of diverse faces not yet usual in the literature. △ Less

Submitted 13 June, 2024; v1 submitted 26 August, 2023; originally announced August 2023.

Comments: 14 pages, 20 figures, accepted for CVPR 2024

ACM Class: I.4.8; I.4.10

arXiv:2307.10296 [pdf, other]

Towards Automated Semantic Segmentation in Mammography Images

Authors: Cesar A. Sierra-Franco, Jan Hurtado, Victor de A. Thomaz, Leonardo C. da Cruz, Santiago V. Silva, Alberto B. Raposo

Abstract: Mammography images are widely used to detect non-palpable breast lesions or nodules, preventing cancer and providing the opportunity to plan interventions when necessary. The identification of some structures of interest is essential to make a diagnosis and evaluate image adequacy. Thus, computer-aided detection systems can be helpful in assisting medical interpretation by automatically segmenting… ▽ More Mammography images are widely used to detect non-palpable breast lesions or nodules, preventing cancer and providing the opportunity to plan interventions when necessary. The identification of some structures of interest is essential to make a diagnosis and evaluate image adequacy. Thus, computer-aided detection systems can be helpful in assisting medical interpretation by automatically segmenting these landmark structures. In this paper, we propose a deep learning-based framework for the segmentation of the nipple, the pectoral muscle, the fibroglandular tissue, and the fatty tissue on standard-view mammography images. We introduce a large private segmentation dataset and extensive experiments considering different deep-learning model architectures. Our experiments demonstrate accurate segmentation performance on variate and challenging cases, showing that this framework can be integrated into clinical practice. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: 6 pages

arXiv:2307.10018 [pdf, other]

RobôCIn Small Size League Extended Team Description Paper for RoboCup 2023

Authors: Aline Lima de Oliveira, Cauê Addae da Silva Gomes, Cecília Virginia Santos da Silva, Charles Matheus de Sousa Alves, Danilo Andrade Martins de Souza, Driele Pires Ferreira Araújo Xavier, Edgleyson Pereira da Silva, Felipe Bezerra Martins, Lucas Henrique Cavalcanti Santos, Lucas Dias Maciel, Matheus Paixão Gumercindo dos Santos, Matheus Lafayette Vasconcelos, Matheus Vinícius Teotonio do Nascimento Andrade, João Guilherme Oliveira Carvalho de Melo, João Pedro Souza Pereira de Moura, José Ronald da Silva, José Victor Silva Cruz, Pedro Henrique Santana de Morais, Pedro Paulo Salman de Oliveira, Riei Joaquim Matos Rodrigues, Roberto Costa Fernandes, Ryan Vinicius Santos Morais, Tamara Mayara Ramos Teobaldo, Washington Igor dos Santos Silva, Edna Natividade Silva Barros

Abstract: RobôCIn has participated in RoboCup Small Size League since 2019, won its first world title in 2022 (Division B), and is currently a three-times Latin-American champion. This paper presents our improvements to defend the Small Size League (SSL) division B title in RoboCup 2023 in Bordeaux, France. This paper aims to share some of the academic research that our team developed over the past year. Ou… ▽ More RobôCIn has participated in RoboCup Small Size League since 2019, won its first world title in 2022 (Division B), and is currently a three-times Latin-American champion. This paper presents our improvements to defend the Small Size League (SSL) division B title in RoboCup 2023 in Bordeaux, France. This paper aims to share some of the academic research that our team developed over the past year. Our team has successfully published 2 articles related to SSL at two high-impact conferences: the 25th RoboCup International Symposium and the 19th IEEE Latin American Robotics Symposium (LARS 2022). Over the last year, we have been continuously migrating from our past codebase to Unification. We will describe the new architecture implemented and some points of software and AI refactoring. In addition, we discuss the process of integrating machined components into the mechanical system, our development for participating in the vision blackout challenge last year and what we are preparing for this year. △ Less

Submitted 19 July, 2023; originally announced July 2023.

arXiv:2307.00827 [pdf, ps, other]

doi 10.1109/ETFA54631.2023.10275459

Toward a Mapping of Capability and Skill Models using Asset Administration Shells and Ontologies

Authors: Luis Miguel Vieira da Silva, Aljosha Köcher, Milapji Singh Gill, Marco Weiss, Alexander Fay

Abstract: In order to react efficiently to changes in production, resources and their functions must be integrated into plants in accordance with the plug and produce principle. In this context, research on so-called capabilities and skills has shown promise. However, there are currently two incompatible approaches to modeling capabilities and skills. On the one hand, formal descriptions using ontologies ha… ▽ More In order to react efficiently to changes in production, resources and their functions must be integrated into plants in accordance with the plug and produce principle. In this context, research on so-called capabilities and skills has shown promise. However, there are currently two incompatible approaches to modeling capabilities and skills. On the one hand, formal descriptions using ontologies have been developed. On the other hand, there are efforts to standardize submodels of the Asset Administration Shell (AAS) for this purpose. In this paper, we present ongoing research to connect these two incompatible modeling approaches. Both models are analyzed to identify comparable as well as dissimilar model elements. Subsequently, we present a concept for a bidirectional mapping between AAS submodels and a capability and skill ontology. For this purpose, two unidirectional, declarative mappings are applied that implement transformations from one modeling approach to the other - and vice versa. △ Less

Submitted 28 April, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

Comments: \c{opyright} 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2306.11846 [pdf, other]

Discovering Causality for Efficient Cooperation in Multi-Agent Environments

Authors: Rafael Pina, Varuna De Silva, Corentin Artaud

Abstract: In cooperative Multi-Agent Reinforcement Learning (MARL) agents are required to learn behaviours as a team to achieve a common goal. However, while learning a task, some agents may end up learning sub-optimal policies, not contributing to the objective of the team. Such agents are called lazy agents due to their non-cooperative behaviours that may arise from failing to understand whether they caus… ▽ More In cooperative Multi-Agent Reinforcement Learning (MARL) agents are required to learn behaviours as a team to achieve a common goal. However, while learning a task, some agents may end up learning sub-optimal policies, not contributing to the objective of the team. Such agents are called lazy agents due to their non-cooperative behaviours that may arise from failing to understand whether they caused the rewards. As a consequence, we observe that the emergence of cooperative behaviours is not necessarily a byproduct of being able to solve a task as a team. In this paper, we investigate the applications of causality in MARL and how it can be applied in MARL to penalise these lazy agents. We observe that causality estimations can be used to improve the credit assignment to the agents and show how it can be leveraged to improve independent learning in MARL. Furthermore, we investigate how Amortized Causal Discovery can be used to automate causality detection within MARL environments. The results demonstrate that causality relations between individual observations and the team reward can be used to detect and punish lazy agents, making them develop more intelligent behaviours. This results in improvements not only in the overall performances of the team but also in their individual capabilities. In addition, results show that Amortized Causal Discovery can be used efficiently to find causal relations in MARL. △ Less

Submitted 20 June, 2023; originally announced June 2023.

arXiv:2305.11994 [pdf, other]

ISP meets Deep Learning: A Survey on Deep Learning Methods for Image Signal Processing

Authors: Matheus Henrique Marques da Silva, Jhessica Victoria Santos da Silva, Rodrigo Reis Arrais, Wladimir Barroso Guedes de Araújo Neto, Leonardo Tadeu Lopes, Guilherme Augusto Bileki, Iago Oliveira Lima, Lucas Borges Rondon, Bruno Melo de Souza, Mayara Costa Regazio, Rodolfo Coelho Dalapicola, Claudio Filipi Gonçalves dos Santos

Abstract: The entire Image Signal Processor (ISP) of a camera relies on several processes to transform the data from the Color Filter Array (CFA) sensor, such as demosaicing, denoising, and enhancement. These processes can be executed either by some hardware or via software. In recent years, Deep Learning has emerged as one solution for some of them or even to replace the entire ISP using a single neural ne… ▽ More The entire Image Signal Processor (ISP) of a camera relies on several processes to transform the data from the Color Filter Array (CFA) sensor, such as demosaicing, denoising, and enhancement. These processes can be executed either by some hardware or via software. In recent years, Deep Learning has emerged as one solution for some of them or even to replace the entire ISP using a single neural network for the task. In this work, we investigated several recent pieces of research in this area and provide deeper analysis and comparison among them, including results and possible points of improvement for future researchers. △ Less

Submitted 23 May, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

arXiv:2305.11033 [pdf, other]

Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature

Authors: Ana Cláudia Akemi Matsuki de Faria, Felype de Castro Bastos, José Victor Nogueira Alves da Silva, Vitor Lopes Fabris, Valeska de Sousa Uchoa, Décio Gonçalves de Aguiar Neto, Claudio Filipi Goncalves dos Santos

Abstract: Visual Question Answering (VQA) is an emerging area of interest for researches, being a recent problem in natural language processing and image prediction. In this area, an algorithm needs to answer questions about certain images. As of the writing of this survey, 25 recent studies were analyzed. Besides, 6 datasets were analyzed and provided their link to download. In this work, several recent pi… ▽ More Visual Question Answering (VQA) is an emerging area of interest for researches, being a recent problem in natural language processing and image prediction. In this area, an algorithm needs to answer questions about certain images. As of the writing of this survey, 25 recent studies were analyzed. Besides, 6 datasets were analyzed and provided their link to download. In this work, several recent pieces of research in this area were investigated and a deeper analysis and comparison among them were provided, including results, the state-of-the-art, common errors, and possible points of improvement for future researchers. △ Less

Submitted 2 June, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

Comments: 30 pages. arXiv admin note: text overlap with arXiv:2104.00926, arXiv:2110.02526, arXiv:2108.02059, arXiv:1908.01801 by other authors

arXiv:2305.07511 [pdf, ps, other]

eXplainable Artificial Intelligence on Medical Images: A Survey

Authors: Matteus Vargas Simão da Silva, Rodrigo Reis Arrais, Jhessica Victoria Santos da Silva, Felipe Souza Tânios, Mateus Antonio Chinelatto, Natalia Backhaus Pereira, Renata De Paris, Lucas Cesar Ferreira Domingos, Rodrigo Dória Villaça, Vitor Lopes Fabris, Nayara Rossi Brito da Silva, Ana Claudia Akemi Matsuki de Faria, Jose Victor Nogueira Alves da Silva, Fabiana Cristina Queiroz de Oliveira Marucci, Francisco Alves de Souza Neto, Danilo Xavier Silva, Vitor Yukio Kondo, Claudio Filipi Gonçalves dos Santos

Abstract: Over the last few years, the number of works about deep learning applied to the medical field has increased enormously. The necessity of a rigorous assessment of these models is required to explain these results to all people involved in medical exams. A recent field in the machine learning area is explainable artificial intelligence, also known as XAI, which targets to explain the results of such… ▽ More Over the last few years, the number of works about deep learning applied to the medical field has increased enormously. The necessity of a rigorous assessment of these models is required to explain these results to all people involved in medical exams. A recent field in the machine learning area is explainable artificial intelligence, also known as XAI, which targets to explain the results of such black box models to permit the desired assessment. This survey analyses several recent studies in the XAI field applied to medical diagnosis research, allowing some explainability of the machine learning results in several different diseases, such as cancers and COVID-19. △ Less

Submitted 12 May, 2023; originally announced May 2023.

arXiv:2305.06951 [pdf, other]

doi 10.1609/aaai.v37i5.25792

FastDiagP: An Algorithm for Parallelized Direct Diagnosis

Authors: Viet-Man Le, Cristian Vidal Silva, Alexander Felfernig, David Benavides, José Galindo, Thi Ngoc Trang Tran

Abstract: Constraint-based applications attempt to identify a solution that meets all defined user requirements. If the requirements are inconsistent with the underlying constraint set, algorithms that compute diagnoses for inconsistent constraints should be implemented to help users resolve the "no solution could be found" dilemma. FastDiag is a typical direct diagnosis algorithm that supports diagnosis ca… ▽ More Constraint-based applications attempt to identify a solution that meets all defined user requirements. If the requirements are inconsistent with the underlying constraint set, algorithms that compute diagnoses for inconsistent constraints should be implemented to help users resolve the "no solution could be found" dilemma. FastDiag is a typical direct diagnosis algorithm that supports diagnosis calculation without predetermining conflicts. However, this approach faces runtime performance issues, especially when analyzing complex and large-scale knowledge bases. In this paper, we propose a novel algorithm, so-called FastDiagP, which is based on the idea of speculative programming. This algorithm extends FastDiag by integrating a parallelization mechanism that anticipates and pre-calculates consistency checks requested by FastDiag. This mechanism helps to provide consistency checks with fast answers and boosts the algorithm's runtime performance. The performance improvements of our proposed algorithm have been shown through empirical results using the Linux-2.6.3.33 configuration knowledge base. △ Less

Submitted 11 May, 2023; originally announced May 2023.

Comments: presented at The 37th AAAI Conference on Artificial Intelligence, AAAI'23, Washington DC, USA

arXiv:2305.03361 [pdf, other]

CHAMELEON: OutSystems Live Bidirectional Transformations

Authors: Hugo Lourenço, João Costa Seco, Carla Ferreira, Tiago Simões, Vasco Silva, Filipe Assunção, André Menezes

Abstract: In model-driven engineering, the bidirectional transformation of models plays a crucial role in facilitating the use of editors that operate at different levels of abstraction. This is particularly important in the context of industrial-grade low-code platforms like OutSystems, which feature a comprehensive ecosystem of tools that complement the standard integrated development environment with dom… ▽ More In model-driven engineering, the bidirectional transformation of models plays a crucial role in facilitating the use of editors that operate at different levels of abstraction. This is particularly important in the context of industrial-grade low-code platforms like OutSystems, which feature a comprehensive ecosystem of tools that complement the standard integrated development environment with domain-specific builders and abstract model viewers. We introduce CHAMELEON, a tool that enables the dynamic definition of a live bidirectional model transformation in a declarative manner by leveraging simple and intuitive component patterns. Through this approach, we can gradually define the view and synthesis paths to an abstract model built on top of a low-code metamodel. We devise a standard parser-generating technique for tree-like models that builds upon extended grammar definitions with constraints and name binders. We allow for a greater overlap of model patterns that can still be disambiguated for a clear lens-like behaviour of the transformation. CHAMELEON is evaluated in the fragment of the OutSystems language targeting the definition of user interfaces. To assess performance we used a large set of real OutSystems applications, with approximately 200K UI widgets, and a database of curated widget patterns. We found a worst-case processing time of 92ms for complete models in our benchmark, which is still suitable for the operation of an interactive model editor. △ Less

Submitted 5 May, 2023; originally announced May 2023.

arXiv:2303.15471 [pdf, other]

doi 10.1109/ICPRS58416.2023.10179030

Embedding Contextual Information through Reward Shaping in Multi-Agent Learning: A Case Study from Google Football

Authors: Chaoyi Gu, Varuna De Silva, Corentin Artaud, Rafael Pina

Abstract: Artificial Intelligence has been used to help human complete difficult tasks in complicated environments by providing optimized strategies for decision-making or replacing the manual labour. In environments including multiple agents, such as football, the most common methods to train agents are Imitation Learning and Multi-Agent Reinforcement Learning (MARL). However, the agents trained by Imitati… ▽ More Artificial Intelligence has been used to help human complete difficult tasks in complicated environments by providing optimized strategies for decision-making or replacing the manual labour. In environments including multiple agents, such as football, the most common methods to train agents are Imitation Learning and Multi-Agent Reinforcement Learning (MARL). However, the agents trained by Imitation Learning cannot outperform the expert demonstrator, which makes humans hardly get new insights from the learnt policy. Besides, MARL is prone to the credit assignment problem. In environments with sparse reward signal, this method can be inefficient. The objective of our research is to create a novel reward shaping method by embedding contextual information in reward function to solve the aforementioned challenges. We demonstrate this in the Google Research Football (GRF) environment. We quantify the contextual information extracted from game state observation and use this quantification together with original sparse reward to create the shaped reward. The experiment results in the GRF environment prove that our reward shaping method is a useful addition to state-of-the-art MARL algorithms for training agents in environments with sparse reward signal. △ Less

Submitted 21 July, 2023; v1 submitted 25 March, 2023; originally announced March 2023.

Journal ref: 2023 IEEE 13th International Conference on Pattern Recognition Systems (ICPRS), Guayaquil, Ecuador, 2023, pp. 1-8

arXiv:2303.14227 [pdf, other]

Causality Detection for Efficient Multi-Agent Reinforcement Learning

Authors: Rafael Pina, Varuna De Silva, Corentin Artaud

Abstract: When learning a task as a team, some agents in Multi-Agent Reinforcement Learning (MARL) may fail to understand their true impact in the performance of the team. Such agents end up learning sub-optimal policies, demonstrating undesired lazy behaviours. To investigate this problem, we start by formalising the use of temporal causality applied to MARL problems. We then show how causality can be used… ▽ More When learning a task as a team, some agents in Multi-Agent Reinforcement Learning (MARL) may fail to understand their true impact in the performance of the team. Such agents end up learning sub-optimal policies, demonstrating undesired lazy behaviours. To investigate this problem, we start by formalising the use of temporal causality applied to MARL problems. We then show how causality can be used to penalise such lazy agents and improve their behaviours. By understanding how their local observations are causally related to the team reward, each agent in the team can adjust their individual credit based on whether they helped to cause the reward or not. We show empirically that using causality estimations in MARL improves not only the holistic performance of the team, but also the individual capabilities of each agent. We observe that the improvements are consistent in a set of different environments. △ Less

Submitted 24 March, 2023; originally announced March 2023.

arXiv:2303.13323 [pdf, other]

Deep Generative Multi-Agent Imitation Model as a Computational Benchmark for Evaluating Human Performance in Complex Interactive Tasks: A Case Study in Football

Authors: Chaoyi Gu, Varuna De Silva

Abstract: Evaluating the performance of human is a common need across many applications, such as in engineering and sports. When evaluating human performance in completing complex and interactive tasks, the most common way is to use a metric having been proved efficient for that context, or to use subjective measurement techniques. However, this can be an error prone and unreliable process since static metr… ▽ More Evaluating the performance of human is a common need across many applications, such as in engineering and sports. When evaluating human performance in completing complex and interactive tasks, the most common way is to use a metric having been proved efficient for that context, or to use subjective measurement techniques. However, this can be an error prone and unreliable process since static metrics cannot capture all the complex contexts associated with such tasks and biases exist in subjective measurement. The objective of our research is to create data-driven AI agents as computational benchmarks to evaluate human performance in solving difficult tasks involving multiple humans and contextual factors. We demonstrate this within the context of football performance analysis. We train a generative model based on Conditional Variational Recurrent Neural Network (VRNN) Model on a large player and ball tracking dataset. The trained model is used to imitate the interactions between two teams and predict the performance from each team. Then the trained Conditional VRNN Model is used as a benchmark to evaluate team performance. The experimental results on Premier League football dataset demonstrates the usefulness of our method to existing state-of-the-art static metric used in football analytics. △ Less

Submitted 23 March, 2023; originally announced March 2023.

Comments: 8 pages, 10 figures

arXiv:2302.08905 [pdf, other]

GraphLED: A graph-based approach to process and visualise linked engineering documents

Authors: Vanessa Telles da Silva, Lucas de Angelo Martins Ribeiro, Willian Borges de Lemos, Sílvia Silva da Costa Botelho, Nelson Lopes Duarte Filho, Marcelo Rita Pias

Abstract: The architecture, engineering and construction (AEC) sector extensively uses documents supporting product and process development. As part of this, organisations should handle big data of hundreds, or even thousands, of technical documents strongly linked together, including CAD design of industrial plants, equipment purchase orders, quality certificates, and part material analysis. However, analy… ▽ More The architecture, engineering and construction (AEC) sector extensively uses documents supporting product and process development. As part of this, organisations should handle big data of hundreds, or even thousands, of technical documents strongly linked together, including CAD design of industrial plants, equipment purchase orders, quality certificates, and part material analysis. However, analysing such records is daunting for users because it gets complicated to sift through hundreds of documents to establish valuable relationships. This paper addresses how knowledge extracted from linked engineering documents contributes to industrial digitalisation under IT/OT convergence. The proposed GraphLED is a system tasked with data processing, graph-based modelling, and colourful visualisation of related documents. The graph-based approach ensures an improved understanding of linked information because the graph structure offers a promising tool to model the underlying data properties of engineering documents. Preliminary system validation indicates quality improvements are possible in the OCR-based data (85.9% of ambiguous text data removed). This work has the potential to benefit the industry by improving the reliability and resilience of industrial production systems through automated summaries of large quantities of documents and their linkage. △ Less

Submitted 15 February, 2023; originally announced February 2023.

arXiv:2301.04517 [pdf, other]

A new dataset for measuring the performance of blood vessel segmentation methods under distribution shifts

Authors: Matheus Viana da Silva, Natália de Carvalho Santos, Julie Ouellette, Baptiste Lacoste, Cesar Henrique Comin

Abstract: Creating a dataset for training supervised machine learning algorithms can be a demanding task. This is especially true for medical image segmentation since one or more specialists are usually required for image annotation, and creating ground truth labels for just a single image can take up to several hours. In addition, it is paramount that the annotated samples represent well the different cond… ▽ More Creating a dataset for training supervised machine learning algorithms can be a demanding task. This is especially true for medical image segmentation since one or more specialists are usually required for image annotation, and creating ground truth labels for just a single image can take up to several hours. In addition, it is paramount that the annotated samples represent well the different conditions that might affect the imaged tissues as well as possible changes in the image acquisition process. This can only be achieved by considering samples that are typical in the dataset as well as atypical, or even outlier, samples. We introduce VessMAP, a heterogeneous blood vessel segmentation dataset acquired by carefully sampling relevant images from a larger non-annotated dataset. A methodology was developed to select both prototypical and atypical samples from the base dataset, thus defining an assorted set of images that can be used for measuring the performance of segmentation algorithms on samples that are highly distinct from each other. To demonstrate the potential of the new dataset, we show that the validation performance of a neural network changes significantly depending on the splits used for training the network. △ Less

Submitted 18 April, 2024; v1 submitted 11 January, 2023; originally announced January 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2301.02333 [pdf, other]

MHVG2MTS: Multilayer Horizontal Visibility Graphs for Multivariate Time Series Analysis

Authors: Vanessa Freitas Silva, Maria Eduarda Silva, Pedro Ribeiro, Fernando Silva

Abstract: Understanding the properties of time-indexed multivariate data has been a predominant topic mainly to address open issues in multivariate time series analysis. Usually, the methodologies used to analyze multivariate time series are based on adapting approaches for univariate settings or on assumptions and parameters for specific problems. A different strategy uses complex network to obtain an addi… ▽ More Understanding the properties of time-indexed multivariate data has been a predominant topic mainly to address open issues in multivariate time series analysis. Usually, the methodologies used to analyze multivariate time series are based on adapting approaches for univariate settings or on assumptions and parameters for specific problems. A different strategy uses complex network to obtain an additional and reduced representation of temporal and causal properties of the time series data. Recent strategies involve mapping multivariate time series into high-level network structures, specifically into multiplex networks representing interconnections between contemporary timestamps of different time series components. In this work, we propose a new mapping method that takes advantage of the entire structure of multilayer networks. We introduce the multilayer horizontal visibility graph that is based on the new concept of cross-horizontal visibility between lagged timestamps of different components, which allows describing the cross-dimension dependencies via inter-layer edges. We use a set of existing topological measures of multilayer networks as well as a novel measure to evaluate and validate our approach, which is parameter-free, does not require data pre-processing and is applicable to any kind of multivariate time series data. We provide an extensive experimental evaluation, where we explore the proposed topological measures, showing that the inter-layer edges based on cross-horizontal visibility preserve more information about the time series data after the mappings, information that would inevitably be lost using mapping methods that result in single-layer and multiplex structures. We also verify that the information mapped by the inter-layer edges is not enough on its own, but that it complements the data information captured by the commonly used intra-layer edges. △ Less

Submitted 5 January, 2023; originally announced January 2023.

arXiv:2212.10707 [pdf, other]

doi 10.5220/0011664100003417

Extractive Text Summarization Using Generalized Additive Models with Interactions for Sentence Selection

Authors: Vinícius Camargo da Silva, João Paulo Papa, Kelton Augusto Pontara da Costa

Abstract: Automatic Text Summarization (ATS) is becoming relevant with the growth of textual data; however, with the popularization of public large-scale datasets, some recent machine learning approaches have focused on dense models and architectures that, despite producing notable results, usually turn out in models difficult to interpret. Given the challenge behind interpretable learning-based text summar… ▽ More Automatic Text Summarization (ATS) is becoming relevant with the growth of textual data; however, with the popularization of public large-scale datasets, some recent machine learning approaches have focused on dense models and architectures that, despite producing notable results, usually turn out in models difficult to interpret. Given the challenge behind interpretable learning-based text summarization and the importance it may have for evolving the current state of the ATS field, this work studies the application of two modern Generalized Additive Models with interactions, namely Explainable Boosting Machine and GAMI-Net, to the extractive summarization problem based on linguistic features and binary classification. △ Less

Submitted 20 December, 2022; originally announced December 2022.

arXiv:2211.01426 [pdf, other]

doi 10.1145/3544548.3580990

Reciprocity, Homophily, and Social Network Effects in Pictorial Communication: A Case Study of Bitmoji Stickers

Authors: Julie Jiang, Ron Dotsch, Mireia Triguero Roura, Yozen Liu, Vítor Silva, Maarten W. Bos, Francesco Barbieri

Abstract: Pictorial emojis and stickers are commonly used in online social networking to facilitate and aid communications. We delve into the use of Bitmoji stickers, a highly expressive form of pictorial communication using avatars resembling actual users. We collect a large-scale dataset of the metadata of 3 billion Bitmoji stickers shared among 300 million Snapchat users. We find that individual Bitmoji… ▽ More Pictorial emojis and stickers are commonly used in online social networking to facilitate and aid communications. We delve into the use of Bitmoji stickers, a highly expressive form of pictorial communication using avatars resembling actual users. We collect a large-scale dataset of the metadata of 3 billion Bitmoji stickers shared among 300 million Snapchat users. We find that individual Bitmoji sticker usage patterns can be characterized jointly on dimensions of reciprocity and selectivity: Users are either both reciprocal and selective about whom they use Bitmoji stickers with or neither reciprocal nor selective. We additionally provide evidence of network homophily in that friends use Bitmoji stickers at similar rates. Finally, using a quasi-experimental approach, we show that receiving Bitmoji stickers from a friend encourages future Bitmoji sticker usage and overall Snapchat engagement. We discuss broader implications of our work towards a better understanding of pictorial communication behaviors in social networks. △ Less

Submitted 13 February, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: 14 pages, 10 figures, 4 tables

Journal ref: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

arXiv:2210.03797 [pdf, other]

Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts

Authors: Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados

Abstract: Recent progress in language model pre-training has led to important improvements in Named Entity Recognition (NER). Nonetheless, this progress has been mainly tested in well-formatted documents such as news, Wikipedia, or scientific articles. In social media the landscape is different, in which it adds another layer of complexity due to its noisy and dynamic nature. In this paper, we focus on NER… ▽ More Recent progress in language model pre-training has led to important improvements in Named Entity Recognition (NER). Nonetheless, this progress has been mainly tested in well-formatted documents such as news, Wikipedia, or scientific articles. In social media the landscape is different, in which it adds another layer of complexity due to its noisy and dynamic nature. In this paper, we focus on NER in Twitter, one of the largest social media platforms, and construct a new NER dataset, TweetNER7, which contains seven entity types annotated over 11,382 tweets from September 2019 to August 2021. The dataset was constructed by carefully distributing the tweets over time and taking representative trends as a basis. Along with the dataset, we provide a set of language model baselines and perform an analysis on the language model performance on the task, especially analyzing the impact of different time periods. In particular, we focus on three important temporal aspects in our analysis: short-term degradation of NER models over time, strategies to fine-tune a language model over different periods, and self-labeling as an alternative to lack of recently-labeled data. TweetNER7 is released publicly (https://huggingface.co/datasets/tner/tweetner7) along with the models fine-tuned on it. △ Less

Submitted 15 November, 2022; v1 submitted 7 October, 2022; originally announced October 2022.

Comments: AACL 2022 main conference

arXiv:2210.03178 [pdf, other]

doi 10.1145/3511808.3557672

Probabilistic Model Incorporating Auxiliary Covariates to Control FDR

Authors: Lin Qiu, Nils Murrugarra-Llerena, Vítor Silva, Lin Lin, Vernon M. Chinchilli

Abstract: Controlling False Discovery Rate (FDR) while leveraging the side information of multiple hypothesis testing is an emerging research topic in modern data science. Existing methods rely on the test-level covariates while ignoring metrics about test-level covariates. This strategy may not be optimal for complex large-scale problems, where indirect relations often exist among test-level covariates and… ▽ More Controlling False Discovery Rate (FDR) while leveraging the side information of multiple hypothesis testing is an emerging research topic in modern data science. Existing methods rely on the test-level covariates while ignoring metrics about test-level covariates. This strategy may not be optimal for complex large-scale problems, where indirect relations often exist among test-level covariates and auxiliary metrics or covariates. We incorporate auxiliary covariates among test-level covariates in a deep Black-Box framework controlling FDR (named as NeurT-FDR) which boosts statistical power and controls FDR for multiple-hypothesis testing. Our method parametrizes the test-level covariates as a neural network and adjusts the auxiliary covariates through a regression framework, which enables flexible handling of high-dimensional features as well as efficient end-to-end optimization. We show that NeurT-FDR makes substantially more discoveries in three real datasets compared to competitive baselines. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: Short Version of NeurT-FDR, accepted at CIKM 2022. arXiv admin note: substantial text overlap with arXiv:2101.09809

arXiv:2210.01108 [pdf, other]

SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis

Authors: Jiaxin Pei, Vítor Silva, Maarten Bos, Yozon Liu, Leonardo Neves, David Jurgens, Francesco Barbieri

Abstract: We propose MINT, a new Multilingual INTimacy analysis dataset covering 13,372 tweets in 10 languages including English, French, Spanish, Italian, Portuguese, Korean, Dutch, Chinese, Hindi, and Arabic. We benchmarked a list of popular multilingual pre-trained language models. The dataset is released along with the SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis (https://sites.google.com/u… ▽ More We propose MINT, a new Multilingual INTimacy analysis dataset covering 13,372 tweets in 10 languages including English, French, Spanish, Italian, Portuguese, Korean, Dutch, Chinese, Hindi, and Arabic. We benchmarked a list of popular multilingual pre-trained language models. The dataset is released along with the SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis (https://sites.google.com/umich.edu/semeval-2023-tweet-intimacy). △ Less

Submitted 3 February, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

Comments: SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis

arXiv:2209.10900 [pdf, other]

doi 10.1515/auto-2022-0122

A Capability and Skill Model for Heterogeneous Autonomous Robots

Authors: Luis Miguel Vieira da Silva, Aljosha Köcher, Alexander Fay

Abstract: Teams of heterogeneous autonomous robots become increasingly important due to their facilitation of various complex tasks. For such heterogeneous robots, there is currently no consistent way of describing the functions that each robot provides. In the field of manufacturing, capability modeling is considered a promising approach to semantically model functions provided by different machines. This… ▽ More Teams of heterogeneous autonomous robots become increasingly important due to their facilitation of various complex tasks. For such heterogeneous robots, there is currently no consistent way of describing the functions that each robot provides. In the field of manufacturing, capability modeling is considered a promising approach to semantically model functions provided by different machines. This contribution investigates how to apply and extend capability models from manufacturing to the field of autonomous robots and presents an approach for such a capability model. △ Less

Submitted 9 February, 2023; v1 submitted 22 September, 2022; originally announced September 2022.

arXiv:2209.09824 [pdf, other]

Twitter Topic Classification

Authors: Dimosthenis Antypas, Asahi Ushio, Jose Camacho-Collados, Leonardo Neves, Vítor Silva, Francesco Barbieri

Abstract: Social media platforms host discussions about a wide variety of topics that arise everyday. Making sense of all the content and organising it into categories is an arduous task. A common way to deal with this issue is relying on topic modeling, but topics discovered using this technique are difficult to interpret and can differ from corpus to corpus. In this paper, we present a new task based on t… ▽ More Social media platforms host discussions about a wide variety of topics that arise everyday. Making sense of all the content and organising it into categories is an arduous task. A common way to deal with this issue is relying on topic modeling, but topics discovered using this technique are difficult to interpret and can differ from corpus to corpus. In this paper, we present a new task based on tweet topic classification and release two associated datasets. Given a wide range of topics covering the most important discussion points in social media, we provide training and testing data from recent time periods that can be used to evaluate tweet classification models. Moreover, we perform a quantitative evaluation and analysis of current general- and domain-specific language models on the task, which provide more insights on the challenges and nature of the task. △ Less

Submitted 20 September, 2022; originally announced September 2022.

Comments: Accepted at COLING 2022

arXiv:2208.11813 [pdf, other]

Multiresolution Neural Networks for Imaging

Authors: Hallison Paz, Tiago Novello, Vinicius Silva, Luiz Schirmer, Guilherme Schardong, Fabio Chagas, Helio Lopes, Luiz Velho

Abstract: We present MR-Net, a general architecture for multiresolution neural networks, and a framework for imaging applications based on this architecture. Our coordinate-based networks are continuous both in space and in scale as they are composed of multiple stages that progressively add finer details. Besides that, they are a compact and efficient representation. We show examples of multiresolution ima… ▽ More We present MR-Net, a general architecture for multiresolution neural networks, and a framework for imaging applications based on this architecture. Our coordinate-based networks are continuous both in space and in scale as they are composed of multiple stages that progressively add finer details. Besides that, they are a compact and efficient representation. We show examples of multiresolution image representation and applications to texturemagnification, minification, and antialiasing. This document is the extended version of the paper [PNS+22]. It includes additional material that would not fit the page limitations of the conference track for publication. △ Less

Submitted 10 September, 2022; v1 submitted 24 August, 2022; originally announced August 2022.

MSC Class: 68T07; 68U10 ACM Class: I.4.10; I.3.3

arXiv:2208.01712 [pdf, other]

No Pattern, No Recognition: a Survey about Reproducibility and Distortion Issues of Text Clustering and Topic Modeling

Authors: Marília Costa Rosendo Silva, Felipe Alves Siqueira, João Pedro Mantovani Tarrega, João Vitor Pataca Beinotti, Augusto Sousa Nunes, Miguel de Mattos Gardini, Vinícius Adolfo Pereira da Silva, Nádia Félix Felipe da Silva, André Carlos Ponce de Leon Ferreira de Carvalho

Abstract: Extracting knowledge from unlabeled texts using machine learning algorithms can be complex. Document categorization and information retrieval are two applications that may benefit from unsupervised learning (e.g., text clustering and topic modeling), including exploratory data analysis. However, the unsupervised learning paradigm poses reproducibility issues. The initialization can lead to variabi… ▽ More Extracting knowledge from unlabeled texts using machine learning algorithms can be complex. Document categorization and information retrieval are two applications that may benefit from unsupervised learning (e.g., text clustering and topic modeling), including exploratory data analysis. However, the unsupervised learning paradigm poses reproducibility issues. The initialization can lead to variability depending on the machine learning algorithm. Furthermore, the distortions can be misleading when regarding cluster geometry. Amongst the causes, the presence of outliers and anomalies can be a determining factor. Despite the relevance of initialization and outlier issues for text clustering and topic modeling, the authors did not find an in-depth analysis of them. This survey provides a systematic literature review (2011-2022) of these subareas and proposes a common terminology since similar procedures have different terms. The authors describe research opportunities, trends, and open issues. The appendices summarize the theoretical background of the text vectorization, the factorization, and the clustering algorithms that are directly or indirectly related to the reviewed works. △ Less

Submitted 2 August, 2022; originally announced August 2022.

ACM Class: I.2; I.2.7; I.5.3

arXiv:2207.03332 [pdf]

doi 10.1007/978-3-031-10461-9_38

Text to Image Synthesis using Stacked Conditional Variational Autoencoders and Conditional Generative Adversarial Networks

Authors: Haileleol Tibebu, Aadil Malik, Varuna De Silva

Abstract: Synthesizing a realistic image from textual description is a major challenge in computer vision. Current text to image synthesis approaches falls short of producing a highresolution image that represent a text descriptor. Most existing studies rely either on Generative Adversarial Networks (GANs) or Variational Auto Encoders (VAEs). GANs has the capability to produce sharper images but lacks the d… ▽ More Synthesizing a realistic image from textual description is a major challenge in computer vision. Current text to image synthesis approaches falls short of producing a highresolution image that represent a text descriptor. Most existing studies rely either on Generative Adversarial Networks (GANs) or Variational Auto Encoders (VAEs). GANs has the capability to produce sharper images but lacks the diversity of outputs, whereas VAEs are good at producing a diverse range of outputs, but the images generated are often blurred. Taking into account the relative advantages of both GANs and VAEs, we proposed a new stacked Conditional VAE (CVAE) and Conditional GAN (CGAN) network architecture for synthesizing images conditioned on a text description. This study uses Conditional VAEs as an initial generator to produce a high-level sketch of the text descriptor. This high-level sketch output from first stage and a text descriptor is used as an input to the conditional GAN network. The second stage GAN produces a 256x256 high resolution image. The proposed architecture benefits from a conditioning augmentation and a residual block on the Conditional GAN network to achieve the results. Multiple experiments were conducted using CUB and Oxford-102 dataset and the result of the proposed approach is compared against state-ofthe-art techniques such as StackGAN. The experiments illustrate that the proposed method generates a high-resolution image conditioned on text descriptions and yield competitive results based on Inception and Frechet Inception Score using both datasets △ Less

Submitted 15 August, 2022; v1 submitted 6 July, 2022; originally announced July 2022.

arXiv:2205.15245 [pdf, other]

Residual Q-Networks for Value Function Factorizing in Multi-Agent Reinforcement Learning

Authors: Rafael Pina, Varuna De Silva, Joosep Hook, Ahmet Kondoz

Abstract: Multi-Agent Reinforcement Learning (MARL) is useful in many problems that require the cooperation and coordination of multiple agents. Learning optimal policies using reinforcement learning in a multi-agent setting can be very difficult as the number of agents increases. Recent solutions such as Value Decomposition Networks (VDN), QMIX, QTRAN and QPLEX adhere to the centralized training and decent… ▽ More Multi-Agent Reinforcement Learning (MARL) is useful in many problems that require the cooperation and coordination of multiple agents. Learning optimal policies using reinforcement learning in a multi-agent setting can be very difficult as the number of agents increases. Recent solutions such as Value Decomposition Networks (VDN), QMIX, QTRAN and QPLEX adhere to the centralized training and decentralized execution scheme and perform factorization of the joint action-value functions. However, these methods still suffer from increased environmental complexity, and at times fail to converge in a stable manner. We propose a novel concept of Residual Q-Networks (RQNs) for MARL, which learns to transform the individual Q-value trajectories in a way that preserves the Individual-Global-Max criteria (IGM), but is more robust in factorizing action-value functions. The RQN acts as an auxiliary network that accelerates convergence and will become obsolete as the agents reach the training objectives. The performance of the proposed method is compared against several state-of-the-art techniques such as QPLEX, QMIX, QTRAN and VDN, in a range of multi-agent cooperative tasks. The results illustrate that the proposed method, in general, converges faster, with increased stability and shows robust performance in a wider family of environments. The improvements in results are more prominent in environments with severe punishments for non-cooperative behaviours and especially in the absence of complete state information during training time. △ Less

Submitted 30 May, 2022; originally announced May 2022.

Comments: Accepted for publication on IEEE Transactions on Neural Networks and Learning Systems

Showing 1–50 of 124 results for author: Silva, V