Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery

DMKD '03: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery

June 2003

2003 Proceeding

Conference Chairs:
Mohammed J. Zaki
Rensselaer Polytechnic Institute, Troy, New York
,
Charu C. Aggarwal
IBM T.J. Watson Research Center, Yorktown Heights, New York

Publisher:

Association for Computing Machinery
New York
NY
United States

Conference:

DMKD03: 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery ( held in conjunction with MOD/PODS 2003 conference / co-located with FCRC 2003 Conference ) San Diego California 13 June 2003

ISBN:

978-1-4503-7422-4

Published:

13 June 2003

Sponsors:

SIGMOD

Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Get Alerts for this ConferenceAlerts Save to BinderBinder

Save to Binder

Create a New Binder

Name

Export CitationCitation

Share on

Reflects downloads up to 20 Feb 2025Bibliometrics

Citation Count

1,269

Downloads (6 weeks)

Downloads (12 months)

493

Downloads (cumulative)

22,192

Sections

DMKD '03: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery

2003

Previous Next

Abstract

No abstract available.

Skip Table Of Content Section

Select All

Export Citations Save to Binder

SESSION: Invited talk

Article

Analyzing massive data streams: past, present, and future

Minos Garofalakis

Page 1https://doi.org/10.1145/882082.882084

Continuous data streams arise naturally, for example, in the installations of large telecom and Internet service providers where detailed usage information (Call-Detail-Records, SNMP-/RMON packet-flow data, etc.) from different parts of the underlying ...

- 1
- 746
Metrics
Total Citations1
Total Downloads746
Last 12 Months1
Last 6 weeks0

Abstract
Get Access

SESSION: Data streams I

section

Session details: Data streams I

Mohammed Zaki

https://doi.org/10.1145/3244067

- 0
Metrics
Total Citations0

Article

A symbolic representation of time series, with implications for streaming algorithms

Jessica Lin,
Eamonn Keogh,
Stefano Lonardi,
Bill Chiu

Pages 2–11https://doi.org/10.1145/882082.882086

The parallel explosions of interest in streaming data, and data mining of time series have had surprisingly little intersection. This is in spite of the fact that time series data are typically streaming data. The main reason for this apparent paradox ...

- 883
- 7,107
Metrics
Total Citations883
Total Downloads7,107
Last 12 Months333
Last 6 weeks31

Abstract
Get Access

Article

Clustering binary data streams with K-means

Carlos Ordonez

Pages 12–19https://doi.org/10.1145/882082.882087

Clustering data streams is an interesting Data Mining problem. This article presents three variants of the K-means algorithm to cluster binary data streams. The variants include On-line K-means, Scalable K-means, and Incremental K-means, a proposed ...

- 121
- 4,061
Metrics
Total Citations121
Total Downloads4,061
Last 12 Months71
Last 6 weeks9

Abstract
Get Access

SESSION: DB integration

section

Session details: DB integration

Ankur Teredesai

https://doi.org/10.1145/3244068

- 0
Metrics
Total Citations0

Article

Processing frequent itemset discovery queries by division and set containment join operators

Ralf Rantzau

Pages 20–27https://doi.org/10.1145/882082.882089

SQL-based data mining algorithms are rarely used in practice today. Most performance experiments have shown that SQL-based approaches are inferior to main-memory algorithms. Nevertheless, database vendors try to integrate analysis functionalities to ...

- 15
- 741
Metrics
Total Citations15
Total Downloads741
Last 12 Months1
Last 6 weeks0

Abstract
Get Access

Article

Efficient OLAP operations for spatial data using peano trees

Baoying Wang,
Fei Pan,
Dongmei Ren,
Yue Cui,
Qiang Ding,
William Perrizo

Pages 28–34https://doi.org/10.1145/882082.882090

Online Analytical Processing (OLAP) is an important application of data warehouses. With more and more spatial data being collected, such as remotely sensed images, geographical information, digital sky survey data, efficient OLAP for spatial data is in ...

- 6
- 467
Metrics
Total Citations6
Total Downloads467
Last 12 Months1
Last 6 weeks0

Abstract
Get Access

Article

Clustering gene expression data in SQL using locally adaptive metrics

Dimitris Papadopoulos,
Carlotta Domeniconi,
Dimitrios Gunopulos,
Sheng Ma

Pages 35–41https://doi.org/10.1145/882082.882091

The clustering problem concerns the discovery of homogeneous groups of data according to a certain similarity measure. Clustering suffers from the curse of dimensionality. It is not meaningful to look for clusters in high dimensional spaces as the ...

- 3
- 157
Metrics
Total Citations3
Total Downloads157
Last 12 Months0
Last 6 weeks0

Abstract
Get Access

SESSION: WWW mining

section

Session details: WWW mining

Jean-Francois Boulicaut

https://doi.org/10.1145/3244069

- 0
Metrics
Total Citations0

Article

Graph-based ranking algorithms for e-mail expertise analysis

Byron Dom,
Iris Eiron,
Alex Cozzi,
Yi Zhang

Pages 42–48https://doi.org/10.1145/882082.882093

In this paper we study graph--based ranking measures for the purpose of using them to rank email correspondents according to their degree of expertise on subjects of interest. While this complete expertise analysis consists of several steps, in this ...

- 95
- 1,120
Metrics
Total Citations95
Total Downloads1,120
Last 12 Months7
Last 6 weeks1

Abstract
Get Access

Article

Deriving link-context from HTML tag tree

Gautam Pant

Pages 49–55https://doi.org/10.1145/882082.882094

HTML anchors are often surrounded by text that seems to describe the destination page appropriately. The text surrounding a link or the link-context is used for a variety of tasks associated with Web information retrieval. These tasks can benefit by ...

- 31
- 719
Metrics
Total Citations31
Total Downloads719
Last 12 Months5
Last 6 weeks0

Abstract
Get Access

SESSION: Data streams II

section

Session details: Data streams II

Jean-Francois Boulicaut

https://doi.org/10.1145/3244070

- 0
Metrics
Total Citations0

Article

Clustering of streaming time series is meaningless

Jessica Lin,
Eamonn Keogh,
Wagner Truppel

Pages 56–65https://doi.org/10.1145/882082.882096

Time series data is perhaps the most frequently encountered type of data examined by the data mining community. Clustering is perhaps the most frequently used data mining algorithm, being useful in it's own right as an exploratory technique, and also as ...

- 30
- 2,607
Metrics
Total Citations30
Total Downloads2,607
Last 12 Months45
Last 6 weeks4

Abstract
Get Access

Article

A learning-based approach to estimate statistics of operators in continuous queries: a case study

Like Gao,
Min Wang,
X. Sean Wang,
Sriram Padmanabhan

Pages 66–72https://doi.org/10.1145/882082.882097

Statistic estimation such as output size estimation of operators is a well-studied subject in the database research community, mainly for the purpose of query optimization. The assumption, however, is that queries are ad-hoc and therefore the emphasis ...

- 4
- 338
Metrics
Total Citations4
Total Downloads338
Last 12 Months0
Last 6 weeks0

Abstract
Get Access

SESSION: Bioinformatics

section

Session details: Bioinformatics

William Maniatty

https://doi.org/10.1145/3244071

- 0
Metrics
Total Citations0

Article

Using transposition for pattern discovery from microarray data

François Rioult,
Jean-François Boulicaut,
Bruno Crémilleux,
Jérémy Besson

Pages 73–79https://doi.org/10.1145/882082.882099

We analyze expression matrices to identify a priori interesting sets of genes, e.g., genes that are frequently co-regulated. Such matrices provide expression values for given biological situations (the lines) and given genes (columns). The frequent ...

- 29
- 576
Metrics
Total Citations29
Total Downloads576
Last 12 Months1
Last 6 weeks1

Abstract
Get Access

Article

Weave amino acid sequences for protein secondary structure prediction

Xiaochun Yang,
Bin Wang

Pages 80–87https://doi.org/10.1145/882082.882100

Given a known protein sequence, predicting its secondary structure can help understand its three-dimensional (tertiary) structure, i.e., the folding. In this paper, we present an approach for predicting protein secondary structures. Different from the ...

- 7
- 796
Metrics
Total Citations7
Total Downloads796
Last 12 Months1
Last 6 weeks0

Abstract
Get Access

SESSION: Privacy & security

section

Session details: Privacy & security

William Maniatty

https://doi.org/10.1145/3244072

- 0
Metrics
Total Citations0

Article

Assuring privacy when big brother is watching

Murat Kantarcioĝlu,
Chris Clifton

Pages 88–93https://doi.org/10.1145/882082.882102

Homeland security measures are increasing the amount of data collected, processed and mined. At the same time, owners of the data raised legitimate concern about their privacy and potential abuses of the data. Privacy-preserving data mining techniques ...

- 14
- 1,316
Metrics
Total Citations14
Total Downloads1,316
Last 12 Months23
Last 6 weeks22

Abstract
Get Access

Article

Dynamic inference control

Jessica Staddon

Pages 94–100https://doi.org/10.1145/882082.882103

An inference problem exists in a multilevel database if knowledge of some objects in the database allows information with a higher security level to be inferred. Many such inferences may be prevented prior to any query processing by raising the security ...

- 27
- 1,441
Metrics
Total Citations27
Total Downloads1,441
Last 12 Months4
Last 6 weeks0

Abstract
Get Access

Cited By

Wei Y, Li Z, Zhu J, Shen Y, Zhang H, Shen L and Zhong G (2023). A driving style recognition method based on SAX and bitmap 2023 3rd International Conference on Computer Vision and Pattern Analysis (ICCPA 2023), 10.1117/12.2684246, 9781510667563, (58)
Dhayanithi J and Akilandeswari J A Framework for Mining Heterogeneous Dataset, SSRN Electronic Journal, 10.2139/ssrn.3134291
Scargle J, Norris J, Jackson B and Chiang J (2013). STUDIES IN ASTRONOMICAL TIME SERIES ANALYSIS. VI. BAYESIAN BLOCK REPRESENTATIONS, The Astrophysical Journal, 10.1088/0004-637X/764/2/167, 764:2, (167)

Save to Binder

Create a New Binder

Name

Contributors

Mohammed J. Zaki
Rensselaer Polytechnic Institute
- Publication Years1993 - 2024
- Publication counts190
- Citation count5,874
- Available for Download78
- Downloads (cumulative)133,355
- Downloads (12 months)4,564
- Downloads (6 weeks)646
- Average Downloads per Article1,710
- Average Citation per Article31
View Full Profile
Charu C. Aggarwal
IBM Thomas J. Watson Research Center
- Publication Years1996 - 2025
- Publication counts260
- Citation count12,289
- Available for Download136
- Downloads (cumulative)190,866
- Downloads (12 months)37,745
- Downloads (6 weeks)7,028
- Average Downloads per Article1,403
- Average Citation per Article47
View Full Profile

Comments

Recommendations

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
DMKD '04: Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

Export Citations

Select Citation format

Please download or close your previous search result export first before starting a new bulk export.
Preview is not available.
By clicking download,a status dialog will open to start the export process. The process may takea few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress.
Download
- Download citation
- Copy citation

Save to Binder

Sections

Cited By

Save to Binder

Recommendations

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

DMKD '04: Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery

KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining