-
A Flexible and Scalable Approach for Collecting Wildlife Advertisements on the Web
Authors:
Juliana Barbosa,
Sunandan Chakraborty,
Juliana Freire
Abstract:
Wildlife traffickers are increasingly carrying out their activities in cyberspace. As they advertise and sell wildlife products in online marketplaces, they leave digital traces of their activity. This creates a new opportunity: by analyzing these traces, we can obtain insights into how trafficking networks work as well as how they can be disrupted. However, collecting such information is difficul…
▽ More
Wildlife traffickers are increasingly carrying out their activities in cyberspace. As they advertise and sell wildlife products in online marketplaces, they leave digital traces of their activity. This creates a new opportunity: by analyzing these traces, we can obtain insights into how trafficking networks work as well as how they can be disrupted. However, collecting such information is difficult. Online marketplaces sell a very large number of products and identifying ads that actually involve wildlife is a complex task that is hard to automate. Furthermore, given that the volume of data is staggering, we need scalable mechanisms to acquire, filter, and store the ads, as well as to make them available for analysis. In this paper, we present a new approach to collect wildlife trafficking data at scale. We propose a data collection pipeline that combines scoped crawlers for data discovery and acquisition with foundational models and machine learning classifiers to identify relevant ads. We describe a dataset we created using this pipeline which is, to the best of our knowledge, the largest of its kind: it contains almost a million ads obtained from 41 marketplaces, covering 235 species and 20 languages. The source code is publicly available at \url{https://github.com/VIDA-NYU/wildlife_pipeline}.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
Agile Minds, Innovative Solutions, and Industry-Academia Collaboration: Lean R&D Meets Problem-Based Learning in Software Engineering Education
Authors:
Lucas Romao,
Marcos Kalinowski,
Clarissa Barbosa,
Allysson Allex Araújo,
Simone D. J. Barbosa,
Helio Lopes
Abstract:
[Context] Software Engineering (SE) education constantly seeks to bridge the gap between academic knowledge and industry demands, with active learning methods like Problem-Based Learning (PBL) gaining prominence. Despite these efforts, recent graduates struggle to align skills with industry needs. Recognizing the relevance of Industry-Academia Collaboration (IAC), Lean R&D has emerged as a success…
▽ More
[Context] Software Engineering (SE) education constantly seeks to bridge the gap between academic knowledge and industry demands, with active learning methods like Problem-Based Learning (PBL) gaining prominence. Despite these efforts, recent graduates struggle to align skills with industry needs. Recognizing the relevance of Industry-Academia Collaboration (IAC), Lean R&D has emerged as a successful agile-based research and development approach, emphasizing business and software development synergy. [Goal] This paper aims to extend Lean R&D with PBL principles, evaluating its application in an educational program designed by ExACTa PUC- Rio for Americanas S.A., a large Brazilian retail company. [Method] The educational program engaged 40 part-time students receiving lectures and mentoring while working on real problems, coordinators and mentors, and company stakeholders in industry projects. Empirical evaluation, through a case study approach, utilized structured questionnaires based on the Technology Acceptance Model (TAM). [Results] Stakeholders were satisfied with Lean R&D PBL for problem-solving. Students reported increased knowledge proficiency and perceived working on real problems as contributing the most to their learning. [Conclusion] This research contributes to academia by sharing Lean R&D PBL as an educational IAC approach. For industry, we discuss the implementation of this proposal in an IAC program that promotes workforce skill development and innovative solutions.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
Are Large Language Models Moral Hypocrites? A Study Based on Moral Foundations
Authors:
José Luiz Nunes,
Guilherme F. C. F. Almeida,
Marcelo de Araujo,
Simone D. J. Barbosa
Abstract:
Large language models (LLMs) have taken centre stage in debates on Artificial Intelligence. Yet there remains a gap in how to assess LLMs' conformity to important human values. In this paper, we investigate whether state-of-the-art LLMs, GPT-4 and Claude 2.1 (Gemini Pro and LLAMA 2 did not generate valid results) are moral hypocrites. We employ two research instruments based on the Moral Foundatio…
▽ More
Large language models (LLMs) have taken centre stage in debates on Artificial Intelligence. Yet there remains a gap in how to assess LLMs' conformity to important human values. In this paper, we investigate whether state-of-the-art LLMs, GPT-4 and Claude 2.1 (Gemini Pro and LLAMA 2 did not generate valid results) are moral hypocrites. We employ two research instruments based on the Moral Foundations Theory: (i) the Moral Foundations Questionnaire (MFQ), which investigates which values are considered morally relevant in abstract moral judgements; and (ii) the Moral Foundations Vignettes (MFVs), which evaluate moral cognition in concrete scenarios related to each moral foundation. We characterise conflicts in values between these different abstractions of moral evaluation as hypocrisy. We found that both models displayed reasonable consistency within each instrument compared to humans, but they displayed contradictory and hypocritical behaviour when we compared the abstract values present in the MFQ to the evaluation of concrete moral violations of the MFV.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Regular Typed Unification
Authors:
João Barbosa,
Mário Florido,
Vítor Santos Costa
Abstract:
Here we define a new unification algorithm for terms interpreted in semantic domains denoted by a subclass of regular types here called deterministic regular types. This reflects our intention not to handle the semantic universe as a homogeneous collection of values, but instead, to partition it in a way that is similar to data types in programming languages. We first define the new unification al…
▽ More
Here we define a new unification algorithm for terms interpreted in semantic domains denoted by a subclass of regular types here called deterministic regular types. This reflects our intention not to handle the semantic universe as a homogeneous collection of values, but instead, to partition it in a way that is similar to data types in programming languages. We first define the new unification algorithm which is based on constraint generation and constraint solving, and then prove its main properties: termination, soundness, and completeness with respect to the semantics. Finally, we discuss how to apply this algorithm to a dynamically typed version of Prolog.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Multi-Objective Optimization of Consumer Group Autoscaling in Message Broker Systems
Authors:
Diogo Landau,
Nishant Saurabh,
Xavier Andrade,
Jorge G Barbosa
Abstract:
Message brokers often mediate communication between data producers and consumers by adding variable-sized messages to ordered distributed queues. Our goal is to determine the number of consumers and consumer-partition assignments needed to ensure that the rate of data consumption keeps up with the rate of data production. We model the problem as a variable item size bin packing problem. As the rat…
▽ More
Message brokers often mediate communication between data producers and consumers by adding variable-sized messages to ordered distributed queues. Our goal is to determine the number of consumers and consumer-partition assignments needed to ensure that the rate of data consumption keeps up with the rate of data production. We model the problem as a variable item size bin packing problem. As the rate of production varies, new consumer-partition assignments are computed, which may require rebalancing a partition from one consumer to another. While rebalancing a queue, the data being produced into the queue is not read leading to additional latency costs. As such, we focus on the multi-objective optimization cost of minimizing both the number of consumers and queue migrations. We present a variety of algorithms and compare them to established bin packing heuristics for this application. Comparing our proposed consumer group assignment strategy with Kafka's, a commonly employed strategy, our strategy presents a 90th percentile latency of 4.52s compared to Kafka's 217s with both using the same amount of consumers. Kafka's assignment strategy only improved the consumer group's performance with regards to latency with configurations that used at least 60% more resources than our approach.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Intelligent methods for business rule processing: State-of-the-art
Authors:
Cristiano André da Costa,
Uélison Jean Lopes dos Santos,
Eduardo Souza dos Reis,
Rodolfo Stoffel Antunes,
Henrique Chaves Pacheco,
Thaynã da Silva França,
Rodrigo da Rosa Righi,
Jorge Luis Victória Barbosa,
Franklin Jebadoss,
Jorge Montalvao,
Rogerio Kunkel
Abstract:
In this article, we provide an overview of the latest intelligent techniques used for processing business rules. We have conducted a comprehensive survey of the relevant literature on robot process automation, with a specific focus on machine learning and other intelligent approaches. Additionally, we have examined the top vendors in the market and their leading solutions to tackle this issue.
In this article, we provide an overview of the latest intelligent techniques used for processing business rules. We have conducted a comprehensive survey of the relevant literature on robot process automation, with a specific focus on machine learning and other intelligent approaches. Additionally, we have examined the top vendors in the market and their leading solutions to tackle this issue.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
DER Pricing Power in the Presence of Multi-Location Consumers with Load Migration Capabilities
Authors:
Sara Mollaeivaneghi,
Julia Barbosa,
Florian Steinke
Abstract:
Renewable distributed energy resources (DERs) have the potential to provide multi-location electricity consumers (MLECs) with electricity at prices lower than those offered by the grid using behind-the-meter advantages. This study examines the pricing power of such DER owners in a local environment with few competitors and how it depends on the MLEC's ability to migrate a portion of the load betwe…
▽ More
Renewable distributed energy resources (DERs) have the potential to provide multi-location electricity consumers (MLECs) with electricity at prices lower than those offered by the grid using behind-the-meter advantages. This study examines the pricing power of such DER owners in a local environment with few competitors and how it depends on the MLEC's ability to migrate a portion of the load between locations. We simulate a dynamic game between an MLEC and the local DER owners, where the MLEC is modeled as a cost-minimizer and the DER owners as strategic profit maximizers. We show that, when the MLEC is inflexible, the DER owners' optimal behavior is to offer their electricity close to maximal prices, that is, at the grid price level. However, when the MLEC can migrate a fraction of the load to the other locations, the prices offered by the DER owners quickly decrease to the minimum level, that is, the DERs' grid feed-in tariffs quickly decrease to a lower level, depending on the load migration capability.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
A Systematic Mapping Study and Practitioner Insights on the Use of Software Engineering Practices to Develop MVPs
Authors:
Silvio Alonso,
Marcos Kalinowski,
Bruna Ferreira,
Simone D. J. Barbosa,
Helio Lopes
Abstract:
[Background] The MVP concept has influenced the way in which development teams apply Software Engineering practices. However, the overall understanding of this influence of MVPs on SE practices is still poor. [Objective] Our goal is to characterize the publication landscape on practices that have been used in the context of software MVPs and to gather practitioner insights on the identified practi…
▽ More
[Background] The MVP concept has influenced the way in which development teams apply Software Engineering practices. However, the overall understanding of this influence of MVPs on SE practices is still poor. [Objective] Our goal is to characterize the publication landscape on practices that have been used in the context of software MVPs and to gather practitioner insights on the identified practices. [Method] We conducted a systematic mapping study and discussed its results in two focus groups sessions involving twelve industry practitioners that extensively use MVPs in their projects to capture their perceptions on the findings of the mapping study. [Results] We identified 33 papers published between 2013 and 2020 and observed some trends related to MVP ideation and evaluation practices. For instance, regarding ideation, we found six different approaches and mainly informal end-user involvement practices. Regarding evaluation, there is an emphasis on end-user validations based on practices such as usability tests, A/B testing, and usage data analysis. However, there is still limited research related to MVP technical feasibility assessment and effort estimation. Practitioners of the focus group sessions reinforced the confidence in our results regarding ideation and evaluation practices, being aware of most of the identified practices. They also reported how they deal with the technical feasibility assessments and effort estimation in practice. [Conclusion] Our analysis suggests that there are opportunities for solution proposals and evaluation studies to address literature gaps concerning technical feasibility assessment and effort estimation. Overall, more effort needs to be invested into empirically evaluating the existing MVP-related practices.
△ Less
Submitted 14 May, 2023;
originally announced May 2023.
-
Human-AI Co-Creation Approach to Find Forever Chemicals Replacements
Authors:
Juliana Jansen Ferreira,
Vinícius Segura,
Joana G. R. Souza,
Gabriel D. J. Barbosa,
João Gallas,
Renato Cerqueira,
Dmitry Zubarev
Abstract:
Generative models are a powerful tool in AI for material discovery. We are designing a software framework that supports a human-AI co-creation process to accelerate finding replacements for the ``forever chemicals''-- chemicals that enable our modern lives, but are harmful to the environment and the human health. Our approach combines AI capabilities with the domain-specific tacit knowledge of sub…
▽ More
Generative models are a powerful tool in AI for material discovery. We are designing a software framework that supports a human-AI co-creation process to accelerate finding replacements for the ``forever chemicals''-- chemicals that enable our modern lives, but are harmful to the environment and the human health. Our approach combines AI capabilities with the domain-specific tacit knowledge of subject matter experts to accelerate the material discovery. Our co-creation process starts with the interaction between the subject matter experts and a generative model that can generate new molecule designs. In this position paper, we discuss our hypothesis that these subject matter experts can benefit from a more iterative interaction with the generative model, asking for smaller samples and ``guiding'' the exploration of the discovery space with their knowledge.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Lessons Learned to Improve the UX Practices in Agile Projects Involving Data Science and Process Automation
Authors:
Bruna Ferreira,
Silvio Marques,
Marcos Kalinowski,
Helio Lopes,
Simone D. J. Barbosa
Abstract:
Context: User-Centered Design and Agile methodologies focus on human issues. Nevertheless, agile methodologies focus on contact with contracting customers and generating value for them. Usually, the communication between end users and the agile team is mediated by customers. However, they do not know the problems end users face in their routines. Hence, UX issues are typically identified only afte…
▽ More
Context: User-Centered Design and Agile methodologies focus on human issues. Nevertheless, agile methodologies focus on contact with contracting customers and generating value for them. Usually, the communication between end users and the agile team is mediated by customers. However, they do not know the problems end users face in their routines. Hence, UX issues are typically identified only after the implementation, during user testing and validation. Objective: Aiming to improve the understanding and definition of the problem in agile projects, this research investigates the practices and difficulties experienced by agile teams during the development of data science and process automation projects. Also, we analyze the benefits and the teams' perceptions regarding user participation in these projects. Method: We collected data from four agile teams in an academia-industry collaboration focusing on delivering data science and process automation solutions. Therefore, we applied a carefully designed questionnaire answered by developers, scrum masters, and UX designers. In total, 18 subjects answered the questionnaire. Results: From the results, we identify practices used by the teams to define and understand the problem and to represent the solution. The practices most often used are prototypes and meetings with stakeholders. Another practice that helped the team to understand the problem was using Lean Inceptions. Also, our results present some specific issues regarding data science projects. Conclusion: We observed that end-user participation can be critical to understanding and defining the problem. They help to define elements of the domain and barriers in the implementation. We identified a need for approaches that facilitate user-team communication in data science projects and the need for more detailed requirements representations to support data science solutions.
△ Less
Submitted 24 November, 2022;
originally announced November 2022.
-
GPU-based Data-parallel Rendering of Large, Unstructured, and Non-convexly Partitioned Data
Authors:
Alper Sahistan,
Serkan Demirci,
Ingo Wald,
Stefan Zellmann,
João Barbosa,
Nathan Morrical,
Uğur Güdükbay
Abstract:
Computational fluid dynamic simulations often produce large clusters of finite elements with non-trivial, non-convex boundaries and uneven distributions among compute nodes, posing challenges to compositing during interactive volume rendering. Correct, in-place visualization of such clusters becomes difficult because viewing rays straddle domain boundaries across multiple compute nodes. We propose…
▽ More
Computational fluid dynamic simulations often produce large clusters of finite elements with non-trivial, non-convex boundaries and uneven distributions among compute nodes, posing challenges to compositing during interactive volume rendering. Correct, in-place visualization of such clusters becomes difficult because viewing rays straddle domain boundaries across multiple compute nodes. We propose a GPU-based, scalable, memory-efficient direct volume visualization framework suitable for in~situ and post~hoc usage. Our approach reduces memory usage of the unstructured volume elements by leveraging an exclusive or-based index reduction scheme and provides fast ray-marching-based traversal without requiring large external data structures built over the elements themselves. Moreover, we present a GPU-optimized deep compositing scheme that allows correct order compositing of intermediate color values accumulated across different ranks that works even for non-convex clusters. Our method scales well on large data-parallel systems and achieves interactive frame rates during visualization. We can interactively render both Fun3D Small Mars Lander (14 GB / 798.4 million finite elements) and Huge Mars Lander (111.57 GB / 6.4 billion finite elements) data sets at 14 and 10 frames per second using 72 and 80 GPUs, respectively, on TACC's Frontera supercomputer.
△ Less
Submitted 28 September, 2022;
originally announced September 2022.
-
Typed SLD-Resolution: Dynamic Typing for Logic Programming
Authors:
João Barbosa,
Mário Florido,
Vítor Santos Costa
Abstract:
The semantic foundations for logic programming are usually separated into two different approaches. The operational semantics, which uses SLD-resolution, the proof method that computes answers in logic programming, and the declarative semantics, which sees logic programs as formulas and its semantics as models. Here, we define a new operational semantics called TSLD-resolution, which stands for Ty…
▽ More
The semantic foundations for logic programming are usually separated into two different approaches. The operational semantics, which uses SLD-resolution, the proof method that computes answers in logic programming, and the declarative semantics, which sees logic programs as formulas and its semantics as models. Here, we define a new operational semantics called TSLD-resolution, which stands for Typed SLD-resolution, where we include a value "wrong", that corresponds to the detection of a type error at run-time. For this we define a new typed unification algorithm. Finally we prove the correctness of TSLD-resolution with respect to a typed declarative semantics.
△ Less
Submitted 30 July, 2022;
originally announced August 2022.
-
Kafka Consumer Group Autoscaler
Authors:
Diogo Landau,
Xavier Andrade,
Jorge G. Barbosa
Abstract:
Message brokers enable asynchronous communication between data producers and consumers in distributed environments by assigning messages to ordered queues. Message broker systems often provide with mechanisms to parallelize tasks between consumers to increase the rate at which data is consumed. The consumption rate must exceed the production rate or queues would grow indefinitely. Still, consumers…
▽ More
Message brokers enable asynchronous communication between data producers and consumers in distributed environments by assigning messages to ordered queues. Message broker systems often provide with mechanisms to parallelize tasks between consumers to increase the rate at which data is consumed. The consumption rate must exceed the production rate or queues would grow indefinitely. Still, consumers are costly and their number should be minimized. We model the problem of determining the required number of consumers, and the partition-consumer assignments, as a variable item size bin packing variant. Data cannot be read when a queue is being migrated to another consumer. Hence, we propose the R-score metric to account for these rebalancing costs. Then, we introduce an assortment of R-score based algorithms, and compare their performance to established heuristics for the Bin Packing Problem for this application. We instantiate our method within an existing system, demonstrating its effectiveness. Our approach guarantees adequate consumption rates something the previous system was unable to at lower operational costs.
△ Less
Submitted 22 June, 2022;
originally announced June 2022.
-
A heuristic to determine the initial gravitational constant of the GSA
Authors:
Alfredo J. P. Barbosa,
Edmilson M. Moreira,
Carlos H. V. Moraes,
Otávio A. S. Carpinteiro
Abstract:
The Gravitational Search Algorithm (GSA) is an optimization algorithm based on Newton's laws of gravity and dynamics. Introduced in 2009, the GSA already has several versions and applications. However, its performance depends on the values of its parameters, which are determined empirically. Hence, its generality is compromised, because the parameters that are suitable for a particular application…
▽ More
The Gravitational Search Algorithm (GSA) is an optimization algorithm based on Newton's laws of gravity and dynamics. Introduced in 2009, the GSA already has several versions and applications. However, its performance depends on the values of its parameters, which are determined empirically. Hence, its generality is compromised, because the parameters that are suitable for a particular application are not necessarily suitable for another. This paper proposes the Gravitational Search Algorithm with Normalized Gravitational Constant (GSA-NGC), which defines a new heuristic to determine the initial gravitational constant of the GSA. The new heuristic is grounded in the Brans-Dicke theory of gravitation and takes into consideration the multiple dimensions of the search space of the application. It aims to improve the final solution and reduce the number of iterations and premature convergences of the GSA. The GSA-NGC is validated experimentally, proving to be suitable for various applications and improving significantly the generality, performance, and efficiency of the GSA.
△ Less
Submitted 21 April, 2022;
originally announced May 2022.
-
Towards Quantum Ray Tracing
Authors:
Luís Paulo Santos,
Thomas Bashford-Rogers,
João Barbosa,
Paul Navrátil
Abstract:
Rendering on conventional computers is capable of generating realistic imagery, but the computational complexity of these light transport algorithms is a limiting factor of image synthesis. Quantum computers have the potential to significantly improve rendering performance through reducing the underlying complexity of the algorithms behind light transport. This paper investigates hybrid quantum-cl…
▽ More
Rendering on conventional computers is capable of generating realistic imagery, but the computational complexity of these light transport algorithms is a limiting factor of image synthesis. Quantum computers have the potential to significantly improve rendering performance through reducing the underlying complexity of the algorithms behind light transport. This paper investigates hybrid quantum-classical algorithms for ray tracing, a core component of most rendering techniques. Through a practical implementation of quantum ray tracing in a 3D environment, we show quantum approaches provide a quadratic improvement in query complexity compared to the equivalent classical approach. Based on domain specific knowledge, we then propose algorithms to significantly reduce the computation required for quantum ray tracing through exploiting image space coherence and a principled termination criteria for quantum searching. We show results for both Whitted style ray tracing, and for accelerating ray tracing operations when performing classical Monte Carlo integration for area lights and indirect illumination.
△ Less
Submitted 27 April, 2022;
originally announced April 2022.
-
Perspectives on risk prioritization of data center vulnerabilities using rank aggregation and multi-objective optimization
Authors:
Bruno Grisci,
Gabriela Kuhn,
Felipe Colombelli,
Vítor Matter,
Leomar Lima,
Karine Heinen,
Mauricio Pegoraro,
Marcio Borges,
Sandro Rigo,
Jorge Barbosa,
Rodrigo da Rosa Righi,
Cristiano André da Costa,
Gabriel de Oliveira Ramos
Abstract:
Nowadays, data has become an invaluable asset to entities and companies, and keeping it secure represents a major challenge. Data centers are responsible for storing data provided by software applications. Nevertheless, the number of vulnerabilities has been increasing every day. Managing such vulnerabilities is essential for building a reliable and secure network environment. Releasing patches to…
▽ More
Nowadays, data has become an invaluable asset to entities and companies, and keeping it secure represents a major challenge. Data centers are responsible for storing data provided by software applications. Nevertheless, the number of vulnerabilities has been increasing every day. Managing such vulnerabilities is essential for building a reliable and secure network environment. Releasing patches to fix security flaws in software is a common practice to handle these vulnerabilities. However, prioritization becomes crucial for organizations with an increasing number of vulnerabilities since time and resources to fix them are usually limited. This review intends to present a survey of vulnerability ranking techniques and promote a discussion on how multi-objective optimization could benefit the management of vulnerabilities risk prioritization. The state-of-the-art approaches for risk prioritization were reviewed, intending to develop an effective model for ranking vulnerabilities in data centers. The main contribution of this work is to point out multi-objective optimization as a not commonly explored but promising strategy to prioritize vulnerabilities, enabling better time management and increasing security.
△ Less
Submitted 12 February, 2022;
originally announced February 2022.
-
Fifty Years of Prolog and Beyond
Authors:
Philipp Körner,
Michael Leuschel,
João Barbosa,
Vítor Santos Costa,
Verónica Dahl,
Manuel V. Hermenegildo,
Jose F. Morales,
Jan Wielemaker,
Daniel Diaz,
Salvador Abreu,
Giovanni Ciatto
Abstract:
Both logic programming in general, and Prolog in particular, have a long and fascinating history, intermingled with that of many disciplines they inherited from or catalyzed. A large body of research has been gathered over the last 50 years, supported by many Prolog implementations. Many implementations are still actively developed, while new ones keep appearing. Often, the features added by diffe…
▽ More
Both logic programming in general, and Prolog in particular, have a long and fascinating history, intermingled with that of many disciplines they inherited from or catalyzed. A large body of research has been gathered over the last 50 years, supported by many Prolog implementations. Many implementations are still actively developed, while new ones keep appearing. Often, the features added by different systems were motivated by the interdisciplinary needs of programmers and implementors, yielding systems that, while sharing the "classic" core language, and, in particular, the main aspects of the ISO-Prolog standard, also depart from each other in other aspects. This obviously poses challenges for code portability. The field has also inspired many related, but quite different languages that have created their own communities.
This article aims at integrating and applying the main lessons learned in the process of evolution of Prolog. It is structured into three major parts. Firstly, we overview the evolution of Prolog systems and the community approximately up to the ISO standard, considering both the main historic developments and the motivations behind several Prolog implementations, as well as other logic programming languages influenced by Prolog. Then, we discuss the Prolog implementations that are most active after the appearance of the standard: their visions, goals, commonalities, and incompatibilities. Finally, we perform a SWOT analysis in order to better identify the potential of Prolog, and propose future directions along which Prolog might continue to add useful features, interfaces, libraries, and tools, while at the same time improving compatibility between implementations.
△ Less
Submitted 14 March, 2022; v1 submitted 26 January, 2022;
originally announced January 2022.
-
Data Type Inference for Logic Programming
Authors:
João Barbosa,
Mário Florido,
Vítor Santos Costa
Abstract:
In this paper we present a new static data type inference algorithm for logic programming. Without the need of declaring types for predicates, our algorithm is able to automatically assign types to predicates which, in most cases, correspond to the data types processed by their intended meaning. The algorithm is also able to infer types given data type definitions similar to data definitions in Ha…
▽ More
In this paper we present a new static data type inference algorithm for logic programming. Without the need of declaring types for predicates, our algorithm is able to automatically assign types to predicates which, in most cases, correspond to the data types processed by their intended meaning. The algorithm is also able to infer types given data type definitions similar to data definitions in Haskell and, in this case, the inferred types are more informative in general. We present the type inference algorithm, prove some properties and finally, we evaluate our approach on example programs that deal with different data structures.
△ Less
Submitted 14 August, 2021;
originally announced August 2021.
-
Machine Learning Automatically Detects COVID-19 using Chest CTs in a Large Multicenter Cohort
Authors:
Eduardo Jose Mortani Barbosa Jr.,
Bogdan Georgescu,
Shikha Chaganti,
Gorka Bastarrika Aleman,
Jordi Broncano Cabrero,
Guillaume Chabin,
Thomas Flohr,
Philippe Grenier,
Sasa Grbic,
Nakul Gupta,
François Mellot,
Savvas Nicolaou,
Thomas Re,
Pina Sanelli,
Alexander W. Sauter,
Youngjin Yoo,
Valentin Ziebandt,
Dorin Comaniciu
Abstract:
Objectives: To investigate machine-learning classifiers and interpretable models using chest CT for detection of COVID-19 and differentiation from other pneumonias, ILD and normal CTs.
Methods: Our retrospective multi-institutional study obtained 2096 chest CTs from 16 institutions (including 1077 COVID-19 patients). Training/testing cohorts included 927/100 COVID-19, 388/33 ILD, 189/33 other pn…
▽ More
Objectives: To investigate machine-learning classifiers and interpretable models using chest CT for detection of COVID-19 and differentiation from other pneumonias, ILD and normal CTs.
Methods: Our retrospective multi-institutional study obtained 2096 chest CTs from 16 institutions (including 1077 COVID-19 patients). Training/testing cohorts included 927/100 COVID-19, 388/33 ILD, 189/33 other pneumonias, and 559/34 normal (no pathologies) CTs. A metric-based approach for classification of COVID-19 used interpretable features, relying on logistic regression and random forests. A deep learning-based classifier differentiated COVID-19 via 3D features extracted directly from CT attenuation and probability distribution of airspace opacities.
Results: Most discriminative features of COVID-19 are percentage of airspace opacity and peripheral and basal predominant opacities, concordant with the typical characterization of COVID-19 in the literature. Unsupervised hierarchical clustering compares feature distribution across COVID-19 and control cohorts. The metrics-based classifier achieved AUC=0.83, sensitivity=0.74, and specificity=0.79 of versus respectively 0.93, 0.90, and 0.83 for the DL-based classifier. Most of ambiguity comes from non-COVID-19 pneumonia with manifestations that overlap with COVID-19, as well as mild COVID-19 cases. Non-COVID-19 classification performance is 91% for ILD, 64% for other pneumonias and 94% for no pathologies, which demonstrates the robustness of our method against different compositions of control groups.
Conclusions: Our new method accurately discriminates COVID-19 from other types of pneumonia, ILD, and no pathologies CTs, using quantitative imaging features derived from chest CT, while balancing interpretability of results and classification performance, and therefore may be useful to facilitate diagnosis of COVID-19.
△ Less
Submitted 9 October, 2020; v1 submitted 8 June, 2020;
originally announced June 2020.
-
Brazilian Lyrics-Based Music Genre Classification Using a BLSTM Network
Authors:
Raul de Araújo Lima,
Rômulo César Costa de Sousa,
Simone Diniz Junqueira Barbosa,
Hélio Cortês Vieira Lopes
Abstract:
Organize songs, albums, and artists in groups with shared similarity could be done with the help of genre labels. In this paper, we present a novel approach for automatic classifying musical genre in Brazilian music using only the song lyrics. This kind of classification remains a challenge in the field of Natural Language Processing. We construct a dataset of 138,368 Brazilian song lyrics distrib…
▽ More
Organize songs, albums, and artists in groups with shared similarity could be done with the help of genre labels. In this paper, we present a novel approach for automatic classifying musical genre in Brazilian music using only the song lyrics. This kind of classification remains a challenge in the field of Natural Language Processing. We construct a dataset of 138,368 Brazilian song lyrics distributed in 14 genres. We apply SVM, Random Forest and a Bidirectional Long Short-Term Memory (BLSTM) network combined with different word embeddings techniques to address this classification task. Our experiments show that the BLSTM method outperforms the other models with an F1-score average of $0.48$. Some genres like "gospel", "funk-carioca" and "sertanejo", which obtained 0.89, 0.70 and 0.69 of F1-score, respectively, can be defined as the most distinct and easy to classify in the Brazilian musical genres context.
△ Less
Submitted 6 March, 2020;
originally announced March 2020.
-
VisMaker: a Question-Oriented Visualization Recommender System for Data Exploration
Authors:
Raul de Araújo Lima,
Simone Diniz Junqueira Barbosa
Abstract:
The increasingly rapid growth of data production and the consequent need to explore data to obtain answers to the most varied questions have promoted the development of tools to facilitate the manipulation and construction of data visualizations. However, building useful data visualizations is not a trivial task: it may involve a large number of subtle decisions that require experience from their…
▽ More
The increasingly rapid growth of data production and the consequent need to explore data to obtain answers to the most varied questions have promoted the development of tools to facilitate the manipulation and construction of data visualizations. However, building useful data visualizations is not a trivial task: it may involve a large number of subtle decisions that require experience from their designer. In this paper, we present VisMaker, a visualization recommender tool that uses a set of rules to present visualization recommendations organized and described through questions, in order to facilitate the understanding of the recommendations and assisting the visual exploration process. We carried out two studies comparing our tool with Voyager 2 and analyzed some aspects of the use of tools. We collected feedback from participants to identify the advantages and disadvantages of our recommendation approach. As a result, we gathered comments to help improve the development of tools in this domain.
△ Less
Submitted 14 February, 2020;
originally announced February 2020.
-
A Three-Valued Semantics for Typed Logic Programming
Authors:
João Barbosa,
Mário Florido,
Vítor Santos Costa
Abstract:
Types in logic programming have focused on conservative approximations of program semantics by regular types, on one hand, and on type systems based on a prescriptive semantics defined for typed programs, on the other. In this paper, we define a new semantics for logic programming, where programs evaluate to true, false, and to a new semantic value called wrong, corresponding to a run-time type er…
▽ More
Types in logic programming have focused on conservative approximations of program semantics by regular types, on one hand, and on type systems based on a prescriptive semantics defined for typed programs, on the other. In this paper, we define a new semantics for logic programming, where programs evaluate to true, false, and to a new semantic value called wrong, corresponding to a run-time type error. We then have a type language with a separated semantics of types. Finally, we define a type system for logic programming and prove that it is semantically sound with respect to a semantic relation between programs and types where, if a program has a type, then its semantics is not wrong. Our work follows Milner's approach for typed functional languages where the semantics of programs is independent from the semantic of types, and the type system is proved to be sound with respect to a relation between both semantics.
△ Less
Submitted 18 September, 2019;
originally announced September 2019.
-
BULNER: BUg Localization with word embeddings and NEtwork Regularization
Authors:
Jacson Rodrigues Barbosa,
Ricardo Marcondes Marcacini,
Ricardo Britto,
Frederico Soares,
Solange Rezende,
Auri M. R. Vincenzi,
Marcio E. Delamaro
Abstract:
Bug localization (BL) from the bug report is the strategic activity of the software maintaining process. Because BL is a costly and tedious activity, BL techniques information retrieval-based and machine learning-based could aid software engineers. We propose a method for BUg Localization with word embeddings and Network Regularization (BULNER). The preliminary results suggest that BULNER has bett…
▽ More
Bug localization (BL) from the bug report is the strategic activity of the software maintaining process. Because BL is a costly and tedious activity, BL techniques information retrieval-based and machine learning-based could aid software engineers. We propose a method for BUg Localization with word embeddings and Network Regularization (BULNER). The preliminary results suggest that BULNER has better performance than two state-of-the-art methods.
△ Less
Submitted 26 August, 2019;
originally announced August 2019.
-
Semantics-aware Virtual Machine Image Management in IaaS Clouds
Authors:
Nishant Saurabh,
Julian Remmers,
Dragi Kimovski,
Radu Prodan,
Jorge G. Barbosa
Abstract:
Infrastructure-as-a-service (IaaS) Clouds concurrently accommodate diverse sets of user requests, requiring an efficient strategy for storing and retrieving virtual machine images (VMIs) at a large scale. The VMI storage management require dealing with multiple VMIs, typically in the magnitude of gigabytes, which entails VMI sprawl issues hindering the elastic resource management and provisioning.…
▽ More
Infrastructure-as-a-service (IaaS) Clouds concurrently accommodate diverse sets of user requests, requiring an efficient strategy for storing and retrieving virtual machine images (VMIs) at a large scale. The VMI storage management require dealing with multiple VMIs, typically in the magnitude of gigabytes, which entails VMI sprawl issues hindering the elastic resource management and provisioning. Nevertheless, existing techniques to facilitate VMI management overlook VMI semantics (i.e at the level of base image and software packages) with either restricted possibility to identify and extract reusable functionalities or with higher VMI publish and retrieval overheads. In this paper, we design, implement and evaluate Expelliarmus, a novel VMI management system that helps to minimize storage, publish and retrieval overheads. To achieve this goal, Expelliarmus incorporates three complementary features. First, it makes use of VMIs modelled as semantic graphs to expedite the similarity computation between multiple VMIs. Second, Expelliarmus provides a semantic aware VMI decomposition and base image selection to extract and store non-redundant base image and software packages. Third, Expelliarmus can also assemble VMIs based on the required software packages upon user request. We evaluate Expelliarmus through a representative set of synthetic Cloud VMIs on the real test-bed. Experimental results show that our semantic-centric approach is able to optimize repository size by 2.2-16 times compared to state-of-the-art systems (e.g. IBM's Mirage and Hemera) with significant VMI publish and slight retrieval performance improvement.
△ Less
Submitted 29 July, 2019; v1 submitted 21 June, 2019;
originally announced June 2019.
-
Simplified Graph-based Visualization for Scientific Publication
Authors:
Orlando Fonseca Guilarte,
Simone Diniz Junqueira Barbosa,
Sinesio Pesco
Abstract:
Understanding citations to scientific publications is a task of vital importance in the academic world. This task can be supported by appropriate data structures and visualization mechanisms. One challenge is the amount of existing relationships and the difficulty of determining which of the references of a document are considered the most potentially relevant to it. In this paper, we propose a si…
▽ More
Understanding citations to scientific publications is a task of vital importance in the academic world. This task can be supported by appropriate data structures and visualization mechanisms. One challenge is the amount of existing relationships and the difficulty of determining which of the references of a document are considered the most potentially relevant to it. In this paper, we propose a simplified visualization of the relationships between scientific publications, in the form of a directed acyclic graph. From a given document, it is possible to visualize a path of references in which each step corresponds to the main citation of the previous one. A methodology is proposed in order to build this graph based in the opinion of the authors of scientific articles and an editorial board.
△ Less
Submitted 25 May, 2018;
originally announced May 2018.
-
BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments
Authors:
Maria Luiza Mondelli,
Thiago Magalhães,
Guilherme Loss,
Michael Wilde,
Ian Foster,
Marta Mattoso,
Daniel S. Katz,
Helio J. C. Barbosa,
Ana Tereza R. Vasconcelos,
Kary Ocaña,
Luiz M. R. Gadelha Jr
Abstract:
Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems (SWfMS) and databases. In this wo…
▽ More
Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems (SWfMS) and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process.
△ Less
Submitted 11 January, 2018;
originally announced January 2018.
-
A State-of-the-art Integrated Transportation Simulation Platform
Authors:
Tiago Azevedo,
Rosaldo J. F. Rossetti,
Jorge G. Barbosa
Abstract:
Nowadays, universities and companies have a huge need for simulation and modelling methodologies. In the particular case of traffic and transportation, making physical modifications to the real traffic networks could be highly expensive, dependent on political decisions and could be highly disruptive to the environment. However, while studying a specific domain or problem, analysing a problem thro…
▽ More
Nowadays, universities and companies have a huge need for simulation and modelling methodologies. In the particular case of traffic and transportation, making physical modifications to the real traffic networks could be highly expensive, dependent on political decisions and could be highly disruptive to the environment. However, while studying a specific domain or problem, analysing a problem through simulation may not be trivial and may need several simulation tools, hence raising interoperability issues. To overcome these problems, we propose an agent-directed transportation simulation platform, through the cloud, by means of services. We intend to use the IEEE standard HLA (High Level Architecture) for simulators interoperability and agents for controlling and coordination. Our motivations are to allow multiresolution analysis of complex domains, to allow experts to collaborate on the analysis of a common problem and to allow co-simulation and synergy of different application domains. This paper will start by presenting some preliminary background concepts to help better understand the scope of this work. After that, the results of a literature review is shown. Finally, the general architecture of a transportation simulation platform is proposed.
△ Less
Submitted 29 January, 2016;
originally announced January 2016.
-
Densifying the sparse cloud SimSaaS: The need of a synergy among agent-directed simulation, SimSaaS and HLA
Authors:
Tiago Azevedo,
Rosaldo J. F. Rossetti,
Jorge G. Barbosa
Abstract:
Modelling & Simulation (M&S) is broadly used in real scenarios where making physical modifications could be highly expensive. With the so-called Simulation Software-as-a-Service (SimSaaS), researchers could take advantage of the huge amount of resource that cloud computing provides. Even so, studying and analysing a problem through simulation may need several simulation tools, hence raising intero…
▽ More
Modelling & Simulation (M&S) is broadly used in real scenarios where making physical modifications could be highly expensive. With the so-called Simulation Software-as-a-Service (SimSaaS), researchers could take advantage of the huge amount of resource that cloud computing provides. Even so, studying and analysing a problem through simulation may need several simulation tools, hence raising interoperability issues. Having this in mind, IEEE developed a standard for interoperability among simulators named High Level Architecture (HLA). Moreover, the multi-agent system approach has become recognised as a convenient approach for modelling and simulating complex systems. Despite all the recent works and acceptance of these technologies, there is still a great lack of work regarding synergies among them. This paper shows by means of a literature review this lack of work or, in other words, the sparse Cloud SimSaaS. The literature review and the resulting taxonomy are the main contributions of this paper, as they provide a research agenda illustrating future research opportunities and trends.
△ Less
Submitted 29 January, 2016;
originally announced January 2016.