Rosa Meo
Università degli Studi di Torino, Informatica, Faculty Member
This Technical report provides a description of the experimental evaluation settings we adopted in project cInQ for deployment of query languages for data mining and inductive databases and their system prototypes in a real life... more
This Technical report provides a description of the experimental evaluation settings we adopted in project cInQ for deployment of query languages for data mining and inductive databases and their system prototypes in a real life applicative case study: WEB usage mining through Web Logs.
L’articolo si occupa dell’impatto dei big data sul patrimonio conoscitivo delle pubbliche amministrazioni a partire da una sperimentazione sulla banca dati dei contratti pubblici nazionali che ha coinvolto in prima persona l’Università di... more
L’articolo si occupa dell’impatto dei big data sul patrimonio conoscitivo delle pubbliche amministrazioni a partire da una sperimentazione sulla banca dati dei contratti pubblici nazionali che ha coinvolto in prima persona l’Università di Torino e l’Autorità Nazionale Anticorruzione (ANAC). L’articolo illustra le varie fasi che un informatico o data scientist seguono per giungere all’utilizzo dei dati ai fini conoscitivi: l’iniziale approccio statistico volto a identificare le caratteristiche descrittive dei casi oggetto di studio, è seguito dall’approccio descrittivo volto a individuare le regolarità e correlazioni della base di dati a disposizione; a queste due fasi, il ricercatore può aggiungere l’approccio predittivo attraverso le tecniche di machine learning. L’articolo conclude promuovendo l’approccio di tipo prescrittivo come funzionale all’indivi- duazione di decisioni che dovrebbero essere assunte sulla base dei dati a disposizione e che potrebbe suggerire delle buone pratiche future.
Research Interests:
In this work we analyse data collected from sensors installed on some vehicles of the local public transportation system in a European city. Our analysis is conducted by means of generation and application of Bayesian networks to describe... more
In this work we analyse data collected from sensors installed on some vehicles of the local public transportation system in a European city. Our analysis is conducted by means of generation and application of Bayesian networks to describe the dependence relationships between variables and to predict the target variable of fuel consumption. We experimented with different algorithms that explore the search space of the possible alternatives guided by heuristics. We compare them with the results obtained with the technology of High Performance Computing, that allowed us to do an exhaustive search and find the optimal solution from the viewpoint of the likelihood evaluation measure. We solve the model evaluation and selection problem by application of an alternative evaluation measure: Granger causality. In addition we compared the predictive ability of the target by the obtained networks. Finally, we conducted "whatif" analysis under the form of intervention and counterfactual analysis and show which decisions policy makers and the service owners should afford to reduce costs and pollution.
Research Interests:
Emotion analysis in social media is challenging. While most studies focus on positive and negative sentiments, the differentiation between emotions is more difficult. We investigate the problem as a collection of binary classification... more
Emotion analysis in social media is challenging. While most studies focus on positive and negative sentiments, the differentiation between emotions is more difficult. We investigate the problem as a collection of binary classification tasks on the basis of four opposing emotion pairs provided by Plutchik. We processed the content of messages by three alternative methods: structural and lexical features, latent factors, and natural language processing. The final prediction is suggested by classifiers deriving from the state of the art in machine learning. Results are convincing in the possibility to distinguish the emotions pairs in social media. CCS Concepts: r Computing methodologies → Machine learning approaches; Natural language processing; r Human-centered computing → Collaborative and social computing;
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Data mining evolved as a collection of applicative problems and efficient solution algorithms relative to rather peculiar problems, all focused on the discovery of relevant information hidden in databases of huge dimensions. In... more
Data mining evolved as a collection of applicative problems and efficient solution algorithms relative to rather peculiar problems, all focused on the discovery of relevant information hidden in databases of huge dimensions. In particular, one of the most investigated topics is the discovery of association rules.
Research Interests:
Clustering data described by categorical attributes is a challenging task in data mining applications. Unlike numerical attributes, it is difficult to define a distance between pairs of values of a categorical attribute, since the values... more
Clustering data described by categorical attributes is a challenging task in data mining applications. Unlike numerical attributes, it is difficult to define a distance between pairs of values of a categorical attribute, since the values are not ordered. In this article, we propose a framework to learn a context-based distance for categorical attributes. The key intuition of this work is that the distance between two values of a categorical attribute A i can be determined by the way in which the values of the other attributes A j are distributed in the dataset objects: if they are similarly distributed in the groups of objects in correspondence of the distinct values of A i a low value of distance is obtained. We propose also a solution to the critical point of the choice of the attributes A j . We validate our approach by embedding our distance learning framework in a hierarchical clustering algorithm. We applied it on various real world and synthetic datasets, both low and high-di...
Research Interests:
Research Interests:
Abstract. In this paper we present the application of the inductive database approach to two practical analytical case studies: Web usage mining in Web logs and financial data. As far as concerns the Web domain, we have considered the... more
Abstract. In this paper we present the application of the inductive database approach to two practical analytical case studies: Web usage mining in Web logs and financial data. As far as concerns the Web domain, we have considered the enriched XML Web logs, that we call conceptual logs, produced by specific Web applications. These ones have been built by using a conceptual model, namely WebML, and its accompanying CASE tool, WebRatio. The Web conceptual logs integrate the usual information about user requests with meta-data concerning the Web site structure. As far as concerns the analysis of financial data, we have considered the trade stock exchange index Dow Jones and studied its component stocks from 1997 to 2002 using the so-called technical analysis. Technical analysis consists in the identification of the relevant (graphical) patterns that occur in the plot of evolution of a stock quote as time proceeds, often adopting different time granularities. On the plots the correlatio...
Research Interests:
Abstract. In recent years, researchers have begun to study inductive databases, a new generation of databases for leveraging decision support applications. In this context, the user interacts with the DBMS using advanced, constraint-based... more
Abstract. In recent years, researchers have begun to study inductive databases, a new generation of databases for leveraging decision support applications. In this context, the user interacts with the DBMS using advanced, constraint-based languages for data mining where constraints have been specifically introduced to increase the relevance of the results and, at the same time, to reduce its volume. In this paper we study the problem of mining frequent itemsets using an inductive database 1 . We propose a technique for query answering which consists in rewriting the query in terms of union and intersection of the result sets of other queries, previously executed and materialized. Unfortunately, the exploitation of past queries is not always applicable. We then present sufficient conditions for the optimization to apply and show that these conditions are strictly connected with the presence of functional dependencies between the attributes involved in the queries. We show some experi...
Research Interests:
We present two approaches for digital twinning in the context of the forecast of power production by photovoltaic panels. We employ two digital models that are complementary: the first one is a cyber-physical system, simulating the... more
We present two approaches for digital twinning in the context of the forecast of power production by photovoltaic panels. We employ two digital models that are complementary: the first one is a cyber-physical system, simulating the physical properties of a photovoltaic panel, built by the Open- Source Object-Oriented modeling language Modelica. The second model is data-driven, obtained by the application of Machine Learning techniques on the data collected in an installation of the equipment. Both approaches make use of data from the weather forecast of each day. We compare the results of the two approaches. Finally, we integrate them in more sophisticated hybrid systems that get the benefits of both.
Research Interests:
In this paper we propose and test the use of hierarchical clustering for feature selection in databases. The clustering method is Ward’s with a distance measure based on Goodman-Kruskal τ . We motivate the choice of this measure and... more
In this paper we propose and test the use of hierarchical clustering for feature selection in databases. The clustering method is Ward’s with a distance measure based on Goodman-Kruskal τ . We motivate the choice of this measure and compare it with other ones. Our hierarchical clustering is applied to over 40 data-sets from UCI archive. The proposed approach is interesting from many viewpoints. First, it produces the feature subsets dendrogram which serves as a valuable tool to study relevance relationships among features. Secondarily, the dendrogram is used in a feature selection algorithm to select the best features by a wrapper method. Experiments were run with three different families of classifiers: Naive Bayes, decision trees and k nearest neighbours. Our method allows all the three classifiers to generally outperform their corresponding ones without feature selection. We compare our feature selection with other state-of-the-art methods, obtaining on average a better classific...
Research Interests:
In this paper we solve the problem of classifying chestnut plants according to their place of origin. We compare the results obtained by state of the art classifiers, among which, MLP, RBF, SVM, C4.5 decision tree and random forest. We... more
In this paper we solve the problem of classifying chestnut plants according to their place of origin. We compare the results obtained by state of the art classifiers, among which, MLP, RBF, SVM, C4.5 decision tree and random forest. We determine which features are meaningful for the classification, the achievable classification accuracy of these classifiers families with the available features and how much the classifiers are robust to noise. Among the obtained classifiers, neural networks show the greatest robustness to noise.
Research Interests:
English. In this paper we describe the implementation of the MuMe dialogue system, a task-based dialogue system for a car sharing service, and its evaluation through the IDIAL protocol. Finally we report some comments on this novel... more
English. In this paper we describe the implementation of the MuMe dialogue system, a task-based dialogue system for a car sharing service, and its evaluation through the IDIAL protocol. Finally we report some comments on this novel dialogue system evaluation method.1 Italiano. In questo lavoro descriviamo l’implementazione del sistema di dialogo MuMe, realizzato per un sistema di car sharing, e la sua valutazione attraverso il protocollo IDIAL. Infine, offriamo alcuni commenti su questo nuovo metodo per la valutazione di sistemi di dialogo.
Research Interests:
This three-volume set LNAI 8724, 8725 and 8726 constitutes the refereed proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: ECML PKDD 2014, held in Nancy, France, in September 2014. The 115... more
This three-volume set LNAI 8724, 8725 and 8726 constitutes the refereed proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: ECML PKDD 2014, held in Nancy, France, in September 2014. The 115 revised research papers presented together with 13 demo track papers, 10 nectar track papers, 8 PhD track papers, and 9 invited talks were carefully reviewed and selected from 550 submissions. The papers cover the latest high-quality interdisciplinary research results in all areas related to machine learning and knowledge discovery in databases.
AS the number of submissions to the IEEE Transactions on Knowledge and Data Engineering (TKDE) and the diversity in topics keep increasing, TKDE needs fresh blood and strong hands. I am pleased to officially welcome the 13 associate... more
AS the number of submissions to the IEEE Transactions on Knowledge and Data Engineering (TKDE) and the diversity in topics keep increasing, TKDE needs fresh blood and strong hands. I am pleased to officially welcome the 13 associate editors who just joined the editorial board: Drs. Leman Akoglu, Hongrae Lee, Justin Levandoski, Xuelong Li, RosaMeo, Carlos Ordonez, Jeff Philips, Barbara Poblete, K. Selçuk Candan, Meng Wang, Jirong Wen, Li Xiong, and Wenjie Zhang. This group of newly appointed associate editors are established and active working experts in the wonderful wide spectrum of knowledge and data engineering.Moreover, they are very committed and dedicated to serving the community and handling the review processes, as testified by their rich experience. Their biographies and photos are provided below. At the same time, I want to sincerely thank Drs. Shivnath Babu, Sanjay Chawla, Xiaofei He, Daxin Jiang, Ruoming Jin, and Evaggelia Pitoura, who just retired from the editorial boa...
We propose DepMiner a software prototype implementing a simple but effective model for the evaluation of itemsets, and in general for the evaluation of the dependencies between the variables on a domain of finite values. This method is... more
We propose DepMiner a software prototype implementing a simple but effective model for the evaluation of itemsets, and in general for the evaluation of the dependencies between the variables on a domain of finite values. This method is based on ∆, the departure of the observed probability of a set of valued variables in a database and a referential probability, estimated in the condition of maximum entropy. This model is able to distinguish between dependencies intrinsic to the itemset and dependencies “inherited” from the subsets: thus it is suitable to directly compare the utility of an itemset with its subsets and to reduce the volume of non significant itemsets in the result of a frequent itemset mining request. This method is powerful because at the same time is able to detect significant positive dependencies as well as negative ones that occur when the association among the variables is rarer than expected. The system returns itemsets ranked by a normalized version of ∆ and t...
Research Interests:
ABSTRACT The influence of training, posture, nutrition or psychological attitudes on an athlete’s career is well described in literature. An additional factor of success that is widely recognized as crucial is the network of matches that... more
ABSTRACT The influence of training, posture, nutrition or psychological attitudes on an athlete’s career is well described in literature. An additional factor of success that is widely recognized as crucial is the network of matches that an athlete plays during a season. The hypothesis is that the quality of a player’s opponents affects her long-term ranking and performance. Even though the relevance of these factors is widely recognized as important, a quantitative characterization is missing. In this paper, we try to fill this gap combining network analysis and machine learning to estimate the contribution of the network of matches in predicting an athlete’s success. We consider all the official games played by the Italian table tennis players between 2011 and 2016. We observe that the matches network shows scale-free behavior, typical of several real-world systems, and that different structural properties are positively correlated with the athletes’ performance (Spearman , p-value ). Using these findings, we implement three different tasks, such as talent identification, performance and ranking prediction. Results shows consistently that machine learning approaches are able to predict players’ success and that the topological features play an effective role in increasing their predictive power.
Research Interests:
... CorGhiLanLeoMeoMonRov:2010-BIOBITS (Book part). Author(s), Francesca Cordero, Stefano Ghignone, Luisa Lanfranco, Giorgio Leonardi, Rosa Meo, Stefania Montani and Luca Roversi. Title, « BIOBITS: A Study on Candidatus ...
... 77 Onur G??rg??n and Olcay Taner Yildiz A Small Footprint Hybrid Statistical and Unit Selection Text-to-Speech Synthesis System for Turkish..... ... 143 H. Erkal, FM Ozcelik, MA Antepli, BT Bacinoglu and E. Uysal-Biyikoglu Page 8. ...
Research Interests:
... CorGhiLanLeoMeoMonRov:2010-BIOBITS (Book part). Author(s), Francesca Cordero, Stefano Ghignone, Luisa Lanfranco, Giorgio Leonardi, Rosa Meo, Stefania Montani and Luca Roversi. Title, « BIOBITS: A Study on Candidatus ...
I. RATIONALE For the first year, ICDM hosts a Forum dedicated to PhD students. The aim of the ICDM PhD Forum is to provide an international environment in which PhD students can meet, exchange their ideas and experiences both with peers... more
I. RATIONALE For the first year, ICDM hosts a Forum dedicated to PhD students. The aim of the ICDM PhD Forum is to provide an international environment in which PhD students can meet, exchange their ideas and experiences both with peers and with senior researchers from the Data Mining Community, in an international scope. Here, PhD students have a unique opportunity to present their ideas and discuss on the work-inprogress in preparation of the PhD dissertation and on the major interests in the Data Mining field. The PhD Forum ...
Research Interests:
Research Interests:
Research Interests:
Research Interests:
We designed this track on" Data Mining" with an emphasis on declarative data mining, intelligent querying and associated issues such as optimization, indexing, query processing, languages and... more
We designed this track on" Data Mining" with an emphasis on declarative data mining, intelligent querying and associated issues such as optimization, indexing, query processing, languages and constraints, such as in previous two years in SAC. This year, attention is also placed to data preprocessing problems, such as data cleaning, discretization and sampling, etc. We encouraged and received also submissions of papers on new applications of data mining systems, such as in biology and science, in WEB analysis and XML documents ...
Research Interests:
Research Interests:
Research Interests: SEBD and Feature Space
In this paper we present the application of the inductive database ap-proach to a practical analytical case study: analysis of nancial data. Inductive databases provide advanced support for Data Mining applica-tions, through the... more
In this paper we present the application of the inductive database ap-proach to a practical analytical case study: analysis of nancial data. Inductive databases provide advanced support for Data Mining applica-tions, through the integration of DBMS technology and powerful mining languages. In this case study, we have considered the trade stock exchange index Dow Jones 30 and studied its component stocks from 1997 to 2002 using the so-called technical analysis. Technical analysis consists in the identiication of the relevant (graphical) patterns that occur in the plot of evolution of a stock quote as time proceeds, often adopting diierent time granularities. On the plots the correlations between distinctive variables of the stocks quote are pointed out, such as the quote trend, the per-centage variation and the volume of the stocks exchanged. In particular we adopted candle-sticks, a gurative pattern representing in a condensed diagram the evolution of the stock quotes in a daily sto...