Chatbots, taking advantage of the success of the messaging apps and recent advances in Artificial... more Chatbots, taking advantage of the success of the messaging apps and recent advances in Artificial Intelligence, have become very popular, from helping business to improve customer services to chatting to users for the sake of conversation and engagement (celebrity or personal bots). However, developing and improving a chatbot requires understanding their data generated by its users. Dialog data has a different nature of a simple question and answering interaction, in which context and temporal properties (turn order) creates a different understanding of such data. In this paper, we propose a novelty metric to compute dialogs' similarity based not only on the text content but also on the information related to the dialog structure. Our experimental results performed over the Switchboard dataset show that using evidence from both textual content and the dialog structure leads to more accurate results than using each measure in isolation.
Graphs appear in several settings, like social networks, recommendation systems, and numerous mor... more Graphs appear in several settings, like social networks, recommendation systems, and numerous more. A deep, recurring question is “How do real graphs look like? ” That is, how can we separate real graphs from synthetic or real graphs with masked portions? The main contribution of this paper is ShatterPlots, a simple and powerful algorithm to tease out patterns of real graphs that help us spot fake/masked graphs. The idea is to shatter a graph, by deleting edges, force it to reach a critical (“Shattering”) point, and study the properties at that point. One of our most discriminative patterns is the “NodeShatteringRatio ”: that can almost perfectly separate the real from the synthetic graphs of our extensive collection. Additional contributions of this paper are (a) the careful, scalable design of the algorithm that needs only O(E) time, (b) extensive experiments on a large collection of graphs (19 in total), with up to hundred of thousand of nodes and million edges; and (c) a wealth ...
In this paper we explore the use of meta-knowledge embedded in intent identifiers to improve inte... more In this paper we explore the use of meta-knowledge embedded in intent identifiers to improve intent recognition in conversational systems. As evidenced by the analysis of thousands of real-world chatbots and in interviews with professional chatbot curators, developers and domain experts tend to organize the set of chatbot intents by identifying them using proto-taxonomies, i.e., meta-knowledge connecting high-level, symbolic concepts shared across different intents. By using neuro-symbolic algorithms able to incorporate such proto-taxonomies to expand intent representation, we show that such mined meta-knowledge can improve accuracy in intent recognition. In a dataset with intents and example utterances from hundreds of professional chatbots, we saw improvements of more than 10% in the equal error rate (EER) in almost a third of the chatbots when we apply those algorithms in comparison to a baseline of the same algorithms without the meta-knowledge. The meta-knowledge proved to be e...
Being able to predict when invoices will be paid is valuable in multiple industries and supports ... more Being able to predict when invoices will be paid is valuable in multiple industries and supports decision-making processes in most financial workflows. However, due to the complexity of data related to invoices and the fact that the decision-making process is not registered in the accounts receivable system, performing this prediction becomes a challenge. In this paper, we present a prototype able to support collectors in predicting the payment of invoices. This prototype is part of a solution developed in partnership with a multinational bank and it has reached up to 81% of prediction accuracy, which improved the prioritization of customers and supported the daily work of collectors. Our simulations show that the adoption of our model to prioritize the work o collectors saves up to ~1.75 million dollars per month. The methodology and results presented in this paper will allow researchers and practitioners in dealing with the problem of invoice payment prediction, providing insights...
Health insurance companies in Brazil have their data about claims organized having the view only ... more Health insurance companies in Brazil have their data about claims organized having the view only for service providers. In this way, they lose the view of physicians' activity and how physicians share patients. Partnership between physicians can be seen as fruitful, when they team up to help a patient, but could represent an issue as well, when a recommendation to visit another physician occurs only because they work in same clinic. This work took place during a short-term project involving a partnership between our lab and a large health insurance company in Brazil. The goal of the project was to provide insights (with business impact) about physicians' activity from the analysis of the claims database. This work presents one of the outcomes of the project, i.e., a way of modeling the underlying referrals in the social network of physicians resulting from health insurance claims data. The approach considers the flow of patients through the physician–physician network, highl...
Large network data are being produced by various applications in an evergrowing rate, from social... more Large network data are being produced by various applications in an evergrowing rate, from social networks such as Facebook and Twitter, scientific citation networks such as CiteSeerX, to biological networks such as protein interaction networks. Network data analysis is crucial for exploiting the wealth of information encoded in such network data. An effective analysis of this data must take into account complex structure including social, temporal, and spatial dimensions, while an efficient analysis of such data requires scalable techniques. As a result, there has been increasing research in developing novel and scalable solutions for practical network analytics applications.
Chatbots, taking advantage of the success of the messaging apps and recent advances in Artificial... more Chatbots, taking advantage of the success of the messaging apps and recent advances in Artificial Intelligence, have become very popular, from helping business to improve customer services to chatting to users for the sake of conversation and engagement (celebrity or personal bots). However, developing and improving a chatbot requires understanding their data generated by its users. Dialog data has a different nature of a simple question and answering interaction, in which context and temporal properties (turn order) creates a different understanding of such data. In this paper, we propose a novelty metric to compute dialogs' similarity based not only on the text content but also on the information related to the dialog structure. Our experimental results performed over the Switchboard dataset show that using evidence from both textual content and the dialog structure leads to more accurate results than using each measure in isolation.
This book contains the didactic texts written by the authors of the short courses selected for th... more This book contains the didactic texts written by the authors of the short courses selected for the 2010 edition of four major scientific symposiums promoted by the Brazilian Computer Society (SBC), namely the VII Brazilian Symposium on Collaborative Systems (SBSC), the XVI Brazilian Symposium on Multimedia and the Web (WebMedia), the IX Symposium of Human Factors in Computing Systems (IHC) and the XXV Brazilian Symposium on Databases (SBBD). The purpose of this book is, in addition to supporting the participation of those present during the realization of short courses, to increase the impact of these short courses. This material will guarantee a complement of information to the participants and to all interested in the respective themes, allowing development of their knowledge in the area.
Predicting invoice payment is valuable in multiple industries and supports decision-making proces... more Predicting invoice payment is valuable in multiple industries and supports decision-making processes in most financial workflows. However, the challenge in this realm involves dealing with complex data and the lack of data related to decisions-making processes not registered in the accounts receivable system. This work presents a prototype developed as a solution devised during a partnership with a multinational bank to support collectors in predicting invoices payment. The proposed prototype reached up to 77\% of accuracy, which improved the prioritization of customers and supported the daily work of collectors. With the presented results, one expects to support researchers dealing with the problem of invoice payment prediction to get insights and examples of how to tackle issues present in real data.
In this chapter we will discuss the concepts and challenges to design Cognitive Systems. Cognitiv... more In this chapter we will discuss the concepts and challenges to design Cognitive Systems. Cognitive Computing is the use of computational learning systems to augment cognitive capabilities in solving real world problems. Cognitive systems are designed to draw inferences from data and pursue the objectives they were given. The era of big data is the basis for innovative cognitive solutions that cannot rely on traditional systems. While traditional computers must be programmed by humans to perform specific tasks, cognitive systems will learn from their interactions with data and humans. Not only is Cognitive Computing a fundamentally new computing paradigm for tackling real world problems, exploiting enormous amounts of data using massively parallel machines, but also it engenders a new form of interaction between humans and computers. As machines start to enhance human cognition and help people make better decisions, new issues arise for research. We will address these questions for Cognitive Systems: What are the needs? Where to apply? Which are the sources of information to relying on?
Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems
This paper describes how machine learning training data and symbolic knowledge from curators of c... more This paper describes how machine learning training data and symbolic knowledge from curators of conversational systems can be used together to improve the accuracy of those systems and to enable better curatorial tools. This is done in the context of a real-world practice of curators of conversational systems who often embed taxonomically-structured meta-knowledge into their documentation. The paper provides evidence that the practice is quite common among curators, that is used as part of their collaborative practices, and that the embedded knowledge can be mined by algorithms. Further, this meta-knowledge can be integrated, using neuro-symbolic algorithms, to the machine learning-based conversational system, to improve its run-time accuracy and to enable tools to support curatorial tasks. Those results point towards new ways of designing development tools which explore an integrated use of code and documentation by machines.
Graphs are a convenient representation for large sets of data, being complex networks, social net... more Graphs are a convenient representation for large sets of data, being complex networks, social networks, publication networks, and so on. The growing volume of data modeled as complex networks, e.g. the World Wide Web, and social networks like Twitter, Facebook, has raised a new area of research focused in complex networks mining. In this new multidisciplinary area, it is possible to highlight some important tasks: extraction of statistical properties, community detection, link prediction, among several others. This new approach has been driven largely by the growing availability of computers and communication networks, which allow us to gather and analyze data on a scale far larger than previously possible. In this chapter we will give an overview of several graph mining approach to mine and handle large complex networks.
2016 5th Brazilian Conference on Intelligent Systems (BRACIS), 2016
The exponentially grow of Web and data availability, the semantic web area has expanded and each ... more The exponentially grow of Web and data availability, the semantic web area has expanded and each day more data is expressed as knowledge bases. Knowledge bases (KB) used in most projects are represented in an ontology-based fashion, so the data can be better organized and easily accessible. It is common to map these KBs into a graph when trying to induce inference rules from the KB, thus it is possible to apply graph-mining techniques to extract implicit knowledge. One common graph-based task is link prediction, which can be used to predict edges (new facts for the KB) that will appear in a near future. In this paper, we present Graph Rule Learner (GRL), a method designed to extract inference rules from ontological knowledge bases mapped to graphs. GRL is based on graph-mining techniques, and explores the combination of link prediction metrics. Empirical analysis revealed GRL can successfully be applied to NELL(Never-Ending Language Learner) helping the system to infer new KB beliefs from existing beliefs (a crucial task for a never-ending learning system).
Chatbots, taking advantage of the success of the messaging apps and recent advances in Artificial... more Chatbots, taking advantage of the success of the messaging apps and recent advances in Artificial Intelligence, have become very popular, from helping business to improve customer services to chatting to users for the sake of conversation and engagement (celebrity or personal bots). However, developing and improving a chatbot requires understanding their data generated by its users. Dialog data has a different nature of a simple question and answering interaction, in which context and temporal properties (turn order) creates a different understanding of such data. In this paper, we propose a novelty metric to compute dialogs' similarity based not only on the text content but also on the information related to the dialog structure. Our experimental results performed over the Switchboard dataset show that using evidence from both textual content and the dialog structure leads to more accurate results than using each measure in isolation.
Graphs appear in several settings, like social networks, recommendation systems, and numerous mor... more Graphs appear in several settings, like social networks, recommendation systems, and numerous more. A deep, recurring question is “How do real graphs look like? ” That is, how can we separate real graphs from synthetic or real graphs with masked portions? The main contribution of this paper is ShatterPlots, a simple and powerful algorithm to tease out patterns of real graphs that help us spot fake/masked graphs. The idea is to shatter a graph, by deleting edges, force it to reach a critical (“Shattering”) point, and study the properties at that point. One of our most discriminative patterns is the “NodeShatteringRatio ”: that can almost perfectly separate the real from the synthetic graphs of our extensive collection. Additional contributions of this paper are (a) the careful, scalable design of the algorithm that needs only O(E) time, (b) extensive experiments on a large collection of graphs (19 in total), with up to hundred of thousand of nodes and million edges; and (c) a wealth ...
In this paper we explore the use of meta-knowledge embedded in intent identifiers to improve inte... more In this paper we explore the use of meta-knowledge embedded in intent identifiers to improve intent recognition in conversational systems. As evidenced by the analysis of thousands of real-world chatbots and in interviews with professional chatbot curators, developers and domain experts tend to organize the set of chatbot intents by identifying them using proto-taxonomies, i.e., meta-knowledge connecting high-level, symbolic concepts shared across different intents. By using neuro-symbolic algorithms able to incorporate such proto-taxonomies to expand intent representation, we show that such mined meta-knowledge can improve accuracy in intent recognition. In a dataset with intents and example utterances from hundreds of professional chatbots, we saw improvements of more than 10% in the equal error rate (EER) in almost a third of the chatbots when we apply those algorithms in comparison to a baseline of the same algorithms without the meta-knowledge. The meta-knowledge proved to be e...
Being able to predict when invoices will be paid is valuable in multiple industries and supports ... more Being able to predict when invoices will be paid is valuable in multiple industries and supports decision-making processes in most financial workflows. However, due to the complexity of data related to invoices and the fact that the decision-making process is not registered in the accounts receivable system, performing this prediction becomes a challenge. In this paper, we present a prototype able to support collectors in predicting the payment of invoices. This prototype is part of a solution developed in partnership with a multinational bank and it has reached up to 81% of prediction accuracy, which improved the prioritization of customers and supported the daily work of collectors. Our simulations show that the adoption of our model to prioritize the work o collectors saves up to ~1.75 million dollars per month. The methodology and results presented in this paper will allow researchers and practitioners in dealing with the problem of invoice payment prediction, providing insights...
Health insurance companies in Brazil have their data about claims organized having the view only ... more Health insurance companies in Brazil have their data about claims organized having the view only for service providers. In this way, they lose the view of physicians' activity and how physicians share patients. Partnership between physicians can be seen as fruitful, when they team up to help a patient, but could represent an issue as well, when a recommendation to visit another physician occurs only because they work in same clinic. This work took place during a short-term project involving a partnership between our lab and a large health insurance company in Brazil. The goal of the project was to provide insights (with business impact) about physicians' activity from the analysis of the claims database. This work presents one of the outcomes of the project, i.e., a way of modeling the underlying referrals in the social network of physicians resulting from health insurance claims data. The approach considers the flow of patients through the physician–physician network, highl...
Large network data are being produced by various applications in an evergrowing rate, from social... more Large network data are being produced by various applications in an evergrowing rate, from social networks such as Facebook and Twitter, scientific citation networks such as CiteSeerX, to biological networks such as protein interaction networks. Network data analysis is crucial for exploiting the wealth of information encoded in such network data. An effective analysis of this data must take into account complex structure including social, temporal, and spatial dimensions, while an efficient analysis of such data requires scalable techniques. As a result, there has been increasing research in developing novel and scalable solutions for practical network analytics applications.
Chatbots, taking advantage of the success of the messaging apps and recent advances in Artificial... more Chatbots, taking advantage of the success of the messaging apps and recent advances in Artificial Intelligence, have become very popular, from helping business to improve customer services to chatting to users for the sake of conversation and engagement (celebrity or personal bots). However, developing and improving a chatbot requires understanding their data generated by its users. Dialog data has a different nature of a simple question and answering interaction, in which context and temporal properties (turn order) creates a different understanding of such data. In this paper, we propose a novelty metric to compute dialogs' similarity based not only on the text content but also on the information related to the dialog structure. Our experimental results performed over the Switchboard dataset show that using evidence from both textual content and the dialog structure leads to more accurate results than using each measure in isolation.
This book contains the didactic texts written by the authors of the short courses selected for th... more This book contains the didactic texts written by the authors of the short courses selected for the 2010 edition of four major scientific symposiums promoted by the Brazilian Computer Society (SBC), namely the VII Brazilian Symposium on Collaborative Systems (SBSC), the XVI Brazilian Symposium on Multimedia and the Web (WebMedia), the IX Symposium of Human Factors in Computing Systems (IHC) and the XXV Brazilian Symposium on Databases (SBBD). The purpose of this book is, in addition to supporting the participation of those present during the realization of short courses, to increase the impact of these short courses. This material will guarantee a complement of information to the participants and to all interested in the respective themes, allowing development of their knowledge in the area.
Predicting invoice payment is valuable in multiple industries and supports decision-making proces... more Predicting invoice payment is valuable in multiple industries and supports decision-making processes in most financial workflows. However, the challenge in this realm involves dealing with complex data and the lack of data related to decisions-making processes not registered in the accounts receivable system. This work presents a prototype developed as a solution devised during a partnership with a multinational bank to support collectors in predicting invoices payment. The proposed prototype reached up to 77\% of accuracy, which improved the prioritization of customers and supported the daily work of collectors. With the presented results, one expects to support researchers dealing with the problem of invoice payment prediction to get insights and examples of how to tackle issues present in real data.
In this chapter we will discuss the concepts and challenges to design Cognitive Systems. Cognitiv... more In this chapter we will discuss the concepts and challenges to design Cognitive Systems. Cognitive Computing is the use of computational learning systems to augment cognitive capabilities in solving real world problems. Cognitive systems are designed to draw inferences from data and pursue the objectives they were given. The era of big data is the basis for innovative cognitive solutions that cannot rely on traditional systems. While traditional computers must be programmed by humans to perform specific tasks, cognitive systems will learn from their interactions with data and humans. Not only is Cognitive Computing a fundamentally new computing paradigm for tackling real world problems, exploiting enormous amounts of data using massively parallel machines, but also it engenders a new form of interaction between humans and computers. As machines start to enhance human cognition and help people make better decisions, new issues arise for research. We will address these questions for Cognitive Systems: What are the needs? Where to apply? Which are the sources of information to relying on?
Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems
This paper describes how machine learning training data and symbolic knowledge from curators of c... more This paper describes how machine learning training data and symbolic knowledge from curators of conversational systems can be used together to improve the accuracy of those systems and to enable better curatorial tools. This is done in the context of a real-world practice of curators of conversational systems who often embed taxonomically-structured meta-knowledge into their documentation. The paper provides evidence that the practice is quite common among curators, that is used as part of their collaborative practices, and that the embedded knowledge can be mined by algorithms. Further, this meta-knowledge can be integrated, using neuro-symbolic algorithms, to the machine learning-based conversational system, to improve its run-time accuracy and to enable tools to support curatorial tasks. Those results point towards new ways of designing development tools which explore an integrated use of code and documentation by machines.
Graphs are a convenient representation for large sets of data, being complex networks, social net... more Graphs are a convenient representation for large sets of data, being complex networks, social networks, publication networks, and so on. The growing volume of data modeled as complex networks, e.g. the World Wide Web, and social networks like Twitter, Facebook, has raised a new area of research focused in complex networks mining. In this new multidisciplinary area, it is possible to highlight some important tasks: extraction of statistical properties, community detection, link prediction, among several others. This new approach has been driven largely by the growing availability of computers and communication networks, which allow us to gather and analyze data on a scale far larger than previously possible. In this chapter we will give an overview of several graph mining approach to mine and handle large complex networks.
2016 5th Brazilian Conference on Intelligent Systems (BRACIS), 2016
The exponentially grow of Web and data availability, the semantic web area has expanded and each ... more The exponentially grow of Web and data availability, the semantic web area has expanded and each day more data is expressed as knowledge bases. Knowledge bases (KB) used in most projects are represented in an ontology-based fashion, so the data can be better organized and easily accessible. It is common to map these KBs into a graph when trying to induce inference rules from the KB, thus it is possible to apply graph-mining techniques to extract implicit knowledge. One common graph-based task is link prediction, which can be used to predict edges (new facts for the KB) that will appear in a near future. In this paper, we present Graph Rule Learner (GRL), a method designed to extract inference rules from ontological knowledge bases mapped to graphs. GRL is based on graph-mining techniques, and explores the combination of link prediction metrics. Empirical analysis revealed GRL can successfully be applied to NELL(Never-Ending Language Learner) helping the system to infer new KB beliefs from existing beliefs (a crucial task for a never-ending learning system).
Uploads