ABSTRACT A parallel algorithm to solve linear equation systems is presented. This method, known a... more ABSTRACT A parallel algorithm to solve linear equation systems is presented. This method, known as Neville elimination, is appropriate especially for the case of a totally positive matrix (all its minors are non-negative). We prove that this algorithm is cost-optimal for a given parallel implementation of Neville elimination, in which the coefficient matrix is rowwise stripe-partitioned among the processors. In case of Gaussian elimination it is necessary a pipelined version to obtain the optimal cost. Furthermore, experimental results obtained on an IBM SP2 multicomputer using MPI corroborate the theoretic estimation about the algorithm efficiency.
International Journal of Computer Mathematics, 2014
ABSTRACT Privacy issues represent a longstanding problem nowadays. Measures such as k-anonymity, ... more ABSTRACT Privacy issues represent a longstanding problem nowadays. Measures such as k-anonymity, l-diversity and t-closeness are among the most used ways to protect released data. This work proposes to extend these three measures when the data are protected using fuzzy sets instead of intervals or representative elements. The proposed approach is then tested using Energy Information Authority data set and different fuzzy partition methods. Results shows an improvement in protecting data when data are encoded using fuzzy sets.
Communications in Computer and Information Science, 2012
We introduce the problem of information which become sensitive when combined, named shared secret... more We introduce the problem of information which become sensitive when combined, named shared secrets, and we propose a model based on Choquet integral to assess the risk that an actor in a social network is able to combine all information available. Some examples are presented and discussed and future directions are outlined.
Communications in Computer and Information Science, 2010
Abstract. In this paper we address the problem of controlling the dis-closure of sensible informa... more Abstract. In this paper we address the problem of controlling the dis-closure of sensible information by inferring them by the other attributes made public. This threat to privacy is commonly known as prediction or attribute disclosure. Our approach is based on identifying those rules ...
Communications in Computer and Information Science, 2014
ABSTRACT Assessing the similarity between node profiles in a social network is an important tool ... more ABSTRACT Assessing the similarity between node profiles in a social network is an important tool in its analysis. Several approaches exist to study profile similarity, including semantic approaches and natural language processing. However, to date there is no research combining these aspects into a unified measure of profile similarity. Traditionally, semantic similarity is assessed using keywords, that is, formatted text information, with no natural language processing component. This study proposes an alternative approach, whereby the similarity assessment based on keywords is applied to the output of natural language processing of profiles. A unified similarity measure results from this approach. The approach is illustrated on a real data set extracted from Facebook and compared with other similarity measures for the same data.
The efficiency and effectiveness of the retrieval of documents which are relevant to a certain to... more The efficiency and effectiveness of the retrieval of documents which are relevant to a certain topic or user query can be improved by means of the clustering of similar documents as well as by introducing parallel strategies. In this paper we explore the use of unsupervised learning, using clustering algorithms based on neural networks, as well as the introduction of NOW Architectures, a kind of low-cost parallel architecture, and study the impact on Information Retrieval.
The design of nearest neighbour classifiers is very dependent from some crucial parameters involv... more The design of nearest neighbour classifiers is very dependent from some crucial parameters involved in learning, like the number of prototypes to use, the initial localization of these prototypes, and a smoothing parameter. These parameters have to be found by a trial and error process or by some automatic methods. In this work, an evolutionary approach based on Nearest Neighbour Classifier (ENNC), is described. Main property of this algorithm is that it does not require any of the above mentioned parameters. The algorithm is based on the evolution of a set of prototypes that can execute several operators in order to increase their quality in a local sense, and emerging a high classification accuracy for the whole classifier.
ABSTRACT Data mining techniques represent a useful tool to cope with privacy problems. In this wo... more ABSTRACT Data mining techniques represent a useful tool to cope with privacy problems. In this work an association rule mining algorithm adapted to the privacy context is developed. The algorithm produces association rules with a certain structure (the premise set is a subset of the public features of a released table while the consequent is the feature to protect). These rules are then used to reveal and explain relationships from data affected by some kind of anonymization process and thus, to detect threats.
Page 1. Fuzzy sets in data protection: strategies and cardinalities IRENE DÍAZ, Department of Com... more Page 1. Fuzzy sets in data protection: strategies and cardinalities IRENE DÍAZ, Department of Computer Science, University of Oviedo, Spain. E-mail: sirene@uniovi.es LUIS J. RODRÍGUEZ-MU NIZ, Department of Statistics and OR, University of Oviedo, Spain. ...
Journal of Chemical Information and Modeling, 2013
The combined monitoring-based and modeling-based priority setting (COMMPS) provides a procedure f... more The combined monitoring-based and modeling-based priority setting (COMMPS) provides a procedure for the identification of priority hazardous substances outlined in the Working Document (ENV/191000/01 of January 16, 2001). This procedure is based on scoring a set of criteria which individually make substances more or less hazardous. The way scores are weighted and combined has been established by a panel of experts. Different authors outlined how such a procedure might be affected by subjectiveness of judgment, and alternative solutions based on partial order theory (POT) and random linear extensions (RLE) have been suggested. This method consists of generating a set of RLE and of averaging the rank given to each substance, so that a total order could be determined. Any POT/RLE approach must face the issue of covering as much as possible the space of linear extensions that, in the case of the 85 substances considered by COMMPS, becomes extremely large, and an exhaustive generation of linear extension is not feasible. Therefore, having a faster algorithm would help to consider a larger number of linear extensions in a given time frame. In this paper, we discuss this problem, and we outline a possible solution.
International Journal of Computer Mathematics, 2008
The scalability of a parallel system is a measure of its capacity to effectively use an increasin... more The scalability of a parallel system is a measure of its capacity to effectively use an increasing number of processors. Both the isoefficiency function and the scaled efficiency are metrics used to analyse the scalability of parallel algorithms and architectures. The first function relates the size of the problem being solved to the number of processors required to maintain efficiency
ABSTRACT A parallel algorithm to solve linear equation systems is presented. This method, known a... more ABSTRACT A parallel algorithm to solve linear equation systems is presented. This method, known as Neville elimination, is appropriate especially for the case of a totally positive matrix (all its minors are non-negative). We prove that this algorithm is cost-optimal for a given parallel implementation of Neville elimination, in which the coefficient matrix is rowwise stripe-partitioned among the processors. In case of Gaussian elimination it is necessary a pipelined version to obtain the optimal cost. Furthermore, experimental results obtained on an IBM SP2 multicomputer using MPI corroborate the theoretic estimation about the algorithm efficiency.
International Journal of Computer Mathematics, 2014
ABSTRACT Privacy issues represent a longstanding problem nowadays. Measures such as k-anonymity, ... more ABSTRACT Privacy issues represent a longstanding problem nowadays. Measures such as k-anonymity, l-diversity and t-closeness are among the most used ways to protect released data. This work proposes to extend these three measures when the data are protected using fuzzy sets instead of intervals or representative elements. The proposed approach is then tested using Energy Information Authority data set and different fuzzy partition methods. Results shows an improvement in protecting data when data are encoded using fuzzy sets.
Communications in Computer and Information Science, 2012
We introduce the problem of information which become sensitive when combined, named shared secret... more We introduce the problem of information which become sensitive when combined, named shared secrets, and we propose a model based on Choquet integral to assess the risk that an actor in a social network is able to combine all information available. Some examples are presented and discussed and future directions are outlined.
Communications in Computer and Information Science, 2010
Abstract. In this paper we address the problem of controlling the dis-closure of sensible informa... more Abstract. In this paper we address the problem of controlling the dis-closure of sensible information by inferring them by the other attributes made public. This threat to privacy is commonly known as prediction or attribute disclosure. Our approach is based on identifying those rules ...
Communications in Computer and Information Science, 2014
ABSTRACT Assessing the similarity between node profiles in a social network is an important tool ... more ABSTRACT Assessing the similarity between node profiles in a social network is an important tool in its analysis. Several approaches exist to study profile similarity, including semantic approaches and natural language processing. However, to date there is no research combining these aspects into a unified measure of profile similarity. Traditionally, semantic similarity is assessed using keywords, that is, formatted text information, with no natural language processing component. This study proposes an alternative approach, whereby the similarity assessment based on keywords is applied to the output of natural language processing of profiles. A unified similarity measure results from this approach. The approach is illustrated on a real data set extracted from Facebook and compared with other similarity measures for the same data.
The efficiency and effectiveness of the retrieval of documents which are relevant to a certain to... more The efficiency and effectiveness of the retrieval of documents which are relevant to a certain topic or user query can be improved by means of the clustering of similar documents as well as by introducing parallel strategies. In this paper we explore the use of unsupervised learning, using clustering algorithms based on neural networks, as well as the introduction of NOW Architectures, a kind of low-cost parallel architecture, and study the impact on Information Retrieval.
The design of nearest neighbour classifiers is very dependent from some crucial parameters involv... more The design of nearest neighbour classifiers is very dependent from some crucial parameters involved in learning, like the number of prototypes to use, the initial localization of these prototypes, and a smoothing parameter. These parameters have to be found by a trial and error process or by some automatic methods. In this work, an evolutionary approach based on Nearest Neighbour Classifier (ENNC), is described. Main property of this algorithm is that it does not require any of the above mentioned parameters. The algorithm is based on the evolution of a set of prototypes that can execute several operators in order to increase their quality in a local sense, and emerging a high classification accuracy for the whole classifier.
ABSTRACT Data mining techniques represent a useful tool to cope with privacy problems. In this wo... more ABSTRACT Data mining techniques represent a useful tool to cope with privacy problems. In this work an association rule mining algorithm adapted to the privacy context is developed. The algorithm produces association rules with a certain structure (the premise set is a subset of the public features of a released table while the consequent is the feature to protect). These rules are then used to reveal and explain relationships from data affected by some kind of anonymization process and thus, to detect threats.
Page 1. Fuzzy sets in data protection: strategies and cardinalities IRENE DÍAZ, Department of Com... more Page 1. Fuzzy sets in data protection: strategies and cardinalities IRENE DÍAZ, Department of Computer Science, University of Oviedo, Spain. E-mail: sirene@uniovi.es LUIS J. RODRÍGUEZ-MU NIZ, Department of Statistics and OR, University of Oviedo, Spain. ...
Journal of Chemical Information and Modeling, 2013
The combined monitoring-based and modeling-based priority setting (COMMPS) provides a procedure f... more The combined monitoring-based and modeling-based priority setting (COMMPS) provides a procedure for the identification of priority hazardous substances outlined in the Working Document (ENV/191000/01 of January 16, 2001). This procedure is based on scoring a set of criteria which individually make substances more or less hazardous. The way scores are weighted and combined has been established by a panel of experts. Different authors outlined how such a procedure might be affected by subjectiveness of judgment, and alternative solutions based on partial order theory (POT) and random linear extensions (RLE) have been suggested. This method consists of generating a set of RLE and of averaging the rank given to each substance, so that a total order could be determined. Any POT/RLE approach must face the issue of covering as much as possible the space of linear extensions that, in the case of the 85 substances considered by COMMPS, becomes extremely large, and an exhaustive generation of linear extension is not feasible. Therefore, having a faster algorithm would help to consider a larger number of linear extensions in a given time frame. In this paper, we discuss this problem, and we outline a possible solution.
International Journal of Computer Mathematics, 2008
The scalability of a parallel system is a measure of its capacity to effectively use an increasin... more The scalability of a parallel system is a measure of its capacity to effectively use an increasing number of processors. Both the isoefficiency function and the scaled efficiency are metrics used to analyse the scalability of parallel algorithms and architectures. The first function relates the size of the problem being solved to the number of processors required to maintain efficiency
Uploads
Papers by Irene D'iaz