Ignazio Pillai

Followers

Following

Co-authors

Public Views

Interests

Uploads

Papers by Ignazio Pillai

Improving image spam filtering using image text features

by Battista Biggio and Ignazio Pillai

Download

Designing multi-label classifiers that maximize F measures: State of the art

Pattern Recognition, 2017

Randomized Prediction Games for Adversarial Machine Learning

by Battista Biggio and Ignazio Pillai

IEEE transactions on neural networks and learning systems, Jan 4, 2016

In spam and malware detection, attackers exploit randomization to obfuscate malicious data and in... more In spam and malware detection, attackers exploit randomization to obfuscate malicious data and increase their chances of evading detection at test time, e.g., malware code is typically obfuscated using random strings or byte sequences to hide known exploits. Interestingly, randomization has also been proposed to improve security of learning algorithms against evasion attacks, as it results in hiding information about the classifier to the attacker. Recent work has proposed game-theoretical formulations to learn secure classifiers, by simulating different evasion attacks and modifying the classification function accordingly. However, both the classification function and the simulated data manipulations have been modeled in a deterministic manner, without accounting for any form of randomization. In this paper, we overcome this limitation by proposing a randomized prediction game, namely, a noncooperative game-theoretic formulation in which the classifier and the attacker make randomi...

F-measure optimisation in multi-label classifiers

Proceedings of the 21st International Conference on Pattern Recognition, Nov 1, 2012

ABSTRACT When a multi-label classifier outputs a real-valued score for each class, a well known d... more ABSTRACT When a multi-label classifier outputs a real-valued score for each class, a well known design strategy consists of tuning the corresponding decision thresholds by optimising the performance measure of interest on validation data. In this paper we focus on the F-measure, which is widely used in multi-label problems. We derive two properties of the micro-averaged F measure, viewed as a function of the threshold values, which allow its global maximum to be found by an optimisation strategy with an upper bound on computational complexity of O(n2N2), where N and n are respectively the number of classes and of validation samples. So far, only a suboptimal threshold selection rule and a greedy algorithm without any optimality guarantee were known for this task. We then devise a possible optimisation algorithm based on our strategy, and evaluate it on three benchmark, multi-label data sets.

Spam Filtering Based On The Analysis Of Text Information Embedded Into Images

The Journal of Machine Learning Research, 2006

Download

Exploiting depth information for indoor-outdoor scene classification

Proceedings of the 16th International Conference on Image Analysis and Processing Volume Part Ii, 2011

Download

A Two-Stage Classifier with Reject Option

In this paper, we investigate the usefulness of the reject option in text categorisation systems.... more In this paper, we investigate the usefulness of the reject option in text categorisation systems. The reject option is introduced by allowing a text classifier to withhold the decision of assigning or not a document to any subset of categories, for which the decision is considered not su#ciently reliable. To automatically handle rejections, a two-stage classifier architecture is used, in which documents rejected at the first stage are automatically classified at the second stage, so that no rejections eventually remain. The performance improvement achievable by using the reject option is assessed on a real text categorisation task, using the well known Reuters data set.

Classification with reject option in text categorisation systems

12th International Conference on Image Analysis and Processing, 2003.Proceedings., 2003

Download

Learning of Multilabel Classifiers

A Classification Approach with a Reject Option for Multi-label Problems

Lecture Notes in Computer Science, 2011

Download

Exploiting Depth Information for Indoor-Outdoor Scene Classification

Lecture Notes in Computer Science, 2011

Download

A Two-Stage Classifier with Reject Option for Text Categorisation

Lecture Notes in Computer Science, 2004

Download

Image Spam Filtering Using Visual Information

by Battista Biggio and Ignazio Pillai

14th International Conference on Image Analysis and Processing (ICIAP 2007), 2007

Download

Classifier Selection Approaches for Multi-label Problems

Lecture Notes in Computer Science, 2011

ABSTRACT While it is known that multiple classifier systems can be effective also in multi-label ... more ABSTRACT While it is known that multiple classifier systems can be effective also in multi-label problems, only the classifier fusion approach has been considered so far. In this paper we focus on the classifier selection approach instead. We propose an implementation of this approach specific to multi-label classifiers, based on selecting the outputs of a possibly different subset of multi-label classifiers for each class. We then derive static selection criteria for the macro- and micro-averaged F measure, which is widely used in multi-label problems. Preliminary experimental results show that the considered selection strategy can exploit the complementarity of an ensemble of multi-label classifiers more effectively than selection approaches analogous to the ones used in single-label problems, which select the outputs of the same classifier subset for all classes. Our results also show that the derived selection criteria can provide a better trade-off between the macro- and micro-averaged F measure, despite it is known that an increase in either of them is usually attained at the expense of the other one.

Sample Size Issues in the Choice between the Best Classifier and Fusion by Trainable Combiners

by Ignazio Pillai and Aistis Raudys

Lecture Notes in Computer Science, 2014

Download

Is data clustering in adversarial settings secure?

by Battista Biggio and Ignazio Pillai

Proceedings of the 2013 ACM workshop on Artificial intelligence and security - AISec '13, 2013

ABSTRACT Clustering algorithms have been increasingly adopted in security applications to spot da... more ABSTRACT Clustering algorithms have been increasingly adopted in security applications to spot dangerous or illicit activities. However, they have not been originally devised to deal with deliberate attack attempts that may aim to subvert the clustering process itself. Whether clustering can be safely adopted in such settings remains thus questionable. In this work we propose a general framework that allows one to identify potential attacks against clustering algorithms, and to evaluate their impact, by making specific assumptions on the adversary&#39;s goal, knowledge of the attacked system, and capabilities of manipulating the input data. We show that an attacker may significantly poison the whole clustering process by adding a relatively small percentage of attack samples to the input data, and that some attack samples may be obfuscated to be hidden within some existing clusters. We present a case study on single-linkage hierarchical clustering, and report experiments on clustering of malware samples and handwritten digits.

Poisoning Complete-Linkage Hierarchical Clustering

by Battista Biggio, Ignazio Pillai, and Eyasu Mequanint

Download

Learning of Multilabel Classifiers

2014 22nd International Conference on Pattern Recognition, 2014

A survey and experimental evaluation of image spam filtering techniques

by Battista Biggio and Ignazio Pillai

Pattern Recognition Letters, 2011

Download

Multi-label classification with a reject option

Pattern Recognition, 2013

ABSTRACT We consider multi-label classification problems in application scenarios where classifie... more ABSTRACT We consider multi-label classification problems in application scenarios where classifier accuracy is not satisfactory, but manual annotation is too costly. In single-label problems, a well known solution consists of using a reject option, i.e., allowing a classifier to withhold unreliable decisions, leaving them (and only them) to human operators. We argue that this solution can be exploited also in multi-label problems. However, the current theoretical framework for classification with a reject option applies only to single-label problems. We thus develop a specific framework for multi-label ones. In particular, we extend multi-label accuracy measures to take into account rejections, and define manual annotation cost as a cost function. We then formalise the goal of attaining a desired trade-off between classifier accuracy on non-rejected decisions, and the cost of manually handling rejected decisions, as a constrained optimisation problem. We finally develop two possible implementations of our framework, tailored to the widely used F accuracy measure, and to the only cost models proposed so far for multi-label annotation tasks, and experimentally evaluate them on five application domains.