SIGKDD: Vol 6, No 2

Volume 6, Issue 2December 2004

Volume 6, Issue 2

December 2004

Publisher:

Association for Computing Machinery
New York
NY
United States

ISSN:1931-0145

EISSN:1931-0153

Tags:

classification
KDD Cup
data mining
clustering
MITCH

Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Get Alerts for this NewsletterAlerts Save to BinderBinder Export CitationCitation

Share on

Reflects downloads up to 28 Jan 2025Bibliometrics

Citation Count

668

Downloads (6 weeks)

Downloads (12 months)

225

Downloads (cumulative)

19,111

Sections

Volume 6 , Issue 2

December 2004

PreviousIssue NextIssue

Newsletter Downloads

PDFFront matter (TOC)

PDFBack matter (Ads)

Select All

Export Citations Save to Binder

article

Editorial: special issue on web content mining

Bing Liu,
Kevin Chen-Chuan-Chang

Pages 1–4https://doi.org/10.1145/1046456.1046457

With the phenomenal growth of the Web, there is an everincreasing volume of data and information published in numerous Web pages. The research in Web mining aims to develop new techniques to effectively extract and mine useful knowledge or information ...

- 63
- 1,689
Metrics
Total Citations63
Total Downloads1,689
Last 12 Months18
Last 6 weeks1

Abstract
Get Access

article

Extracting relational data from HTML repositories

Ruth Yuee Zhang,
Laks V. S. Lakshmanan,
Ruben H. Zamar

Pages 5–13https://doi.org/10.1145/1046456.1046458

There is a vast amount of valuable information in HTML documents, widely distributed across the World Wide Web and across corporate intranets. Unfortunately, HTML is mainly presentation oriented and hard to query. In this paper, we develop a system to ...

- 7
- 571
Metrics
Total Citations7
Total Downloads571
Last 12 Months0
Last 6 weeks0

Abstract
Get Access

article

Learning important models for web page blocks based on layout and content analysis

Ruihua Song,
Haifeng Liu,
Ji-Rong Wen,
Wei-Ying Ma

Pages 14–23https://doi.org/10.1145/1046456.1046459

Previous work shows that a web page can be partitioned into multiple segments or blocks, and often the importance of those blocks in a page is not equivalent. It has also been proven that differentiating noisy and unimportant blocks from pages can ...

- 32
- 1,356
Metrics
Total Citations32
Total Downloads1,356
Last 12 Months9
Last 6 weeks1

Abstract
Get Access

article

Learning by googling

Philipp Cimiano,
Steffen Staab

Pages 24–33https://doi.org/10.1145/1046456.1046460

The goal of giving a well-defined meaning to information is currently shared by endeavors such as the Semantic Web as well as by current trends within Knowledge Management. They all depend on the large-scale formalization of knowledge and on the ...

- 113
- 1,414
Metrics
Total Citations113
Total Downloads1,414
Last 12 Months5
Last 6 weeks1

Abstract
Get Access

article

Correlating summarization of multi-source news with k-way graph bi-clustering

Ya Zhang,
Chao-Hsien Chu,
Xiang Ji,
Hongyuan Zha

Pages 34–42https://doi.org/10.1145/1046456.1046461

With the emergence of enormous amount of online news, it is desirable to construct text mining methods that can extract, compare and highlight similarities of them. In this paper, we explore the research issue and methodology of correlated summarization ...

- 16
- 539
Metrics
Total Citations16
Total Downloads539
Last 12 Months1
Last 6 weeks0

Abstract
Get Access

article

Information diffusion through blogspace

D. Gruhl,
David Liben-Nowell,
R. Guha,
A. Tomkins

Pages 43–52https://doi.org/10.1145/1046456.1046462

We study the dynamics of information propagation in environments of low-overhead personal publishing, using a large collection of WebLogs over time as our example domain. We characterize and model this collection at two levels. First, we present a ...

- 123
- 1,262
Metrics
Total Citations123
Total Downloads1,262
Last 12 Months9
Last 6 weeks0

Abstract
Get Access

article

Mining structures for semantics

Xin Dong,
Jayant Madhavan,
Alon Halevy

Pages 53–60https://doi.org/10.1145/1046456.1046463

Online data is available in two avors: unstructured data that resides as free text in HTML pages, and structured data that resides in databases and knowledge bases. Unstructured data is easily accessed as human-readable text on a browser, while ...

- 19
- 651
Metrics
Total Citations19
Total Downloads651
Last 12 Months4
Last 6 weeks0

Abstract
Get Access

article

Learning to extract information from large domain-specific websites using sequential models

Sunita Sarawagi,
V. G. Vinod Vydiswaran

Pages 61–66https://doi.org/10.1145/1046456.1046464

In this article we describe a novel information extraction task on the web and show how it can be solved effectively using the emerging conditional exponential models. The task involves learning to find specific goal pages on large domain-specific ...

- 2
- 396
Metrics
Total Citations2
Total Downloads396
Last 12 Months1
Last 6 weeks0

Abstract
Get Access

article

Mining semantics for large scale integration on the web: evidences, insights, and challenges

Kevin Chen-Chuan Chang,
Bin He,
Zhen Zhang

Pages 67–76https://doi.org/10.1145/1046456.1046465

The Web has been rapidly "deepened" -- with myriad searchable databases online, where data are hidden behind query interfaces. Toward large scale integration over this "deep Web," we are facing a new challenge- With its dynamic and ad-hoc nature, such ...

- 11
- 707
Metrics
Total Citations11
Total Downloads707
Last 12 Months0
Last 6 weeks0

Abstract
Get Access

COLUMN: Contribued articles

COLUMN: Reports

Save to Binder

Create a New Binder

Name

Subjects

Currently Not Available

Comments

Export Citations

Select Citation format

Please download or close your previous search result export first before starting a new bulk export.
Preview is not available.
By clicking download,a status dialog will open to start the export process. The process may takea few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress.
Download
- Download citation
- Copy citation