No abstract available.
Analyzing massive data streams: past, present, and future
Continuous data streams arise naturally, for example, in the installations of large telecom and Internet service providers where detailed usage information (Call-Detail-Records, SNMP-/RMON packet-flow data, etc.) from different parts of the underlying ...
A symbolic representation of time series, with implications for streaming algorithms
The parallel explosions of interest in streaming data, and data mining of time series have had surprisingly little intersection. This is in spite of the fact that time series data are typically streaming data. The main reason for this apparent paradox ...
Clustering binary data streams with K-means
Clustering data streams is an interesting Data Mining problem. This article presents three variants of the K-means algorithm to cluster binary data streams. The variants include On-line K-means, Scalable K-means, and Incremental K-means, a proposed ...
Processing frequent itemset discovery queries by division and set containment join operators
SQL-based data mining algorithms are rarely used in practice today. Most performance experiments have shown that SQL-based approaches are inferior to main-memory algorithms. Nevertheless, database vendors try to integrate analysis functionalities to ...
Efficient OLAP operations for spatial data using peano trees
Online Analytical Processing (OLAP) is an important application of data warehouses. With more and more spatial data being collected, such as remotely sensed images, geographical information, digital sky survey data, efficient OLAP for spatial data is in ...
Clustering gene expression data in SQL using locally adaptive metrics
The clustering problem concerns the discovery of homogeneous groups of data according to a certain similarity measure. Clustering suffers from the curse of dimensionality. It is not meaningful to look for clusters in high dimensional spaces as the ...
Graph-based ranking algorithms for e-mail expertise analysis
In this paper we study graph--based ranking measures for the purpose of using them to rank email correspondents according to their degree of expertise on subjects of interest. While this complete expertise analysis consists of several steps, in this ...
Deriving link-context from HTML tag tree
HTML anchors are often surrounded by text that seems to describe the destination page appropriately. The text surrounding a link or the link-context is used for a variety of tasks associated with Web information retrieval. These tasks can benefit by ...
Clustering of streaming time series is meaningless
Time series data is perhaps the most frequently encountered type of data examined by the data mining community. Clustering is perhaps the most frequently used data mining algorithm, being useful in it's own right as an exploratory technique, and also as ...
A learning-based approach to estimate statistics of operators in continuous queries: a case study
Statistic estimation such as output size estimation of operators is a well-studied subject in the database research community, mainly for the purpose of query optimization. The assumption, however, is that queries are ad-hoc and therefore the emphasis ...
Using transposition for pattern discovery from microarray data
We analyze expression matrices to identify a priori interesting sets of genes, e.g., genes that are frequently co-regulated. Such matrices provide expression values for given biological situations (the lines) and given genes (columns). The frequent ...
Weave amino acid sequences for protein secondary structure prediction
Given a known protein sequence, predicting its secondary structure can help understand its three-dimensional (tertiary) structure, i.e., the folding. In this paper, we present an approach for predicting protein secondary structures. Different from the ...
Assuring privacy when big brother is watching
Homeland security measures are increasing the amount of data collected, processed and mined. At the same time, owners of the data raised legitimate concern about their privacy and potential abuses of the data. Privacy-preserving data mining techniques ...
Dynamic inference control
An inference problem exists in a multilevel database if knowledge of some objects in the database allows information with a higher security level to be inferred. Many such inferences may be prevented prior to any query processing by raising the security ...
Cited By
-
Wei Y, Li Z, Zhu J, Shen Y, Zhang H, Shen L and Zhong G (2023). A driving style recognition method based on SAX and bitmap 2023 3rd International Conference on Computer Vision and Pattern Analysis (ICCPA 2023), 10.1117/12.2684246, 9781510667563, (58)
-
Dhayanithi J and Akilandeswari J A Framework for Mining Heterogeneous Dataset, SSRN Electronic Journal, 10.2139/ssrn.3134291
-
Scargle J, Norris J, Jackson B and Chiang J (2013). STUDIES IN ASTRONOMICAL TIME SERIES ANALYSIS. VI. BAYESIAN BLOCK REPRESENTATIONS, The Astrophysical Journal, 10.1088/0004-637X/764/2/167, 764:2, (167)