IRSTEA | TETIS - Academia.edu

Bookmark
Download
- by Dino Ienco
- •
- 2
  Social Network, IEEE Conference

Bookmark
Download
- by Dino Ienco
- •
- 3
  Search Space, Web Pages, Database System

Bookmark
Download
- by Dino Ienco
- •
- 6
  Machine Learning, Data Mining, Feature Selection, Classification Accuracy

Bookmark
Download
- by Dino Ienco
- •
- 9
  Case Study, Formal language, Antimicrobial Peptide, Protein Function

The beginning of post-genomic era is characterized by a rising numbers of public collected genomes. The evolutionary relationship among these genomes may be caught by means of the comparative analysis of sequences, in order to identify... more

Bookmark
- by Dino Ienco
- •

In many domains (e.g., data mining, data management, data warehouse), a hierarchical organization of attribute values can help the data analysis process. Nevertheless, such hierarchical knowledge does not always available or even may be... more

Bookmark
- by Dino Ienco
- •

Microblogging is a modern communication paradigm in which users post bits of information, or “memes” as we call them, that are brief text updates or micromedia such as photos, video or audio clips. Once a user post a meme, it become... more

Bookmark
- by Dino Ienco
- •

Bookmark
Download
- by Dino Ienco
- •
- 4
  Gene ontology, Distance metric, Very high throughput, Gene Expression Data

Bookmark
- by Dino Ienco
- •

The availability of data represented with multiple features coming from heterogeneous domains is getting more and more common in real world applications. Such data represent objects of a certain type, connected to other types of data, the... more

Clustering data is challenging especially for two reasons. The dimensionality of the data is often very high which makes the cluster interpretation hard. Moreover, with high-dimensional data the classic metrics fail in identifying the... more

Clustering data is challenging especially for two reasons. The dimensionality of the data is often very high which makes the cluster interpretation hard. Moreover, with high-dimensional data the classic metrics fail in identifying the real similarities between objects. The second challenge is the evolving nature of the observed phenomena which makes the datasets accumulating over time. In this paper we show how we propose to solve these problems. To tackle the high-dimensionality problem, we propose to apply a co-clustering approach on the dataset that stores the occurrence of features in the observed objects. Co-clustering computes a partition of objects and a partition of features simultaneously. The novelty of our co-clustering solution is that it arranges the clusters in a hierarchical fashion, and it consists of two hierarchies: one on the objects and one on the features. The two hierarchies are coupled because the clusters at a certain level in one hierarchy are coupled with the clusters at the same level of the other hierarchy and form the co-clusters. Each cluster of one of the two hierarchies thus provides insights on the clusters of the other hierarchy. Another novelty of the proposed solution is that the number of clusters is possibly unlimited. Nevertheless, the produced hierarchies are still compact and therefore more readable because our method allows multiple splits of a cluster at the lower level. As regards the second challenge, the accumulating nature of the data makes the datasets intractably huge over time. In this case, an incremental solution relieves the issue because it partitions the problem. In this paper we introduce an incremental version of our algorithm of hierarchical co-clustering. It starts from an intermediate solution computed on the previous version of the data and it updates the co-clustering results considering only the added block of data. This solution has the merit of speeding up the computation with respect to the original approach that would recompute the result on the overall dataset. In addition, the incremental algorithm guarantees approximately the same answer than the original version, but it saves much computational load. We validate the incremental approach on several high-dimensional datasets and perform an accurate comparison with both the original version of our algorithm and with the state of the art competitors as well. The obtained results open the way to a novel usage of the co-clustering algorithms in which it is advantageous to partition the data into several blocks and process them incrementally thus “incorporating” data gradually into an on-going co-clustering solution.

Bookmark
- by Dino Ienco
- •

TETIS

Log In