Overview

Authors:

Chen Ye ⁰,
Hongzhi Wang ¹,
Guojun Dai ²

Chen Ye
1. Computer and Software Department, Hangzhou Dianzi University, Hangzhou, China
View author publications

You can also search for this author in PubMed Google Scholar
Hongzhi Wang
1. Computer Science and Technology, Harbin Institute of Technology, Harbin, China
View author publications

You can also search for this author in PubMed Google Scholar
Guojun Dai
1. Computer and Software department, Hangzhou Dianzi University, Hangzhou, China
View author publications

You can also search for this author in PubMed Google Scholar

Provides various techniques to discover useful knowledge based on different data models of multi-sourced data
Covers both truth discovery and fact discovery based on different data quality properties in detail
Presents optimization methods for developers solving knowledge discovery problems

Part of the book series: SpringerBriefs in Computer Science (BRIEFSCOMPUTER)

1136 Accesses

This is a preview of subscription content, log in via an institution to check access.

Access this book

eBook USD 39.99

Price excludes VAT (USA)

Softcover Book USD 54.99

Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

About this book

This book addresses several knowledge discovery problems on multi-sourced data where the theories, techniques, and methods in data cleaning, data mining, and natural language processing are synthetically used. This book mainly focuses on three data models: the multi-sourced isomorphic data, the multi-sourced heterogeneous data, and the text data. On the basis of three data models, this book studies the knowledge discovery problems including truth discovery and fact discovery on multi-sourced data from four important properties: relevance, inconsistency, sparseness, and heterogeneity, which is useful for specialists as well as graduate students.

Data, even describing the same object or event, can come from a variety of sources such as crowd workers and social media users. However, noisy pieces of data or information are unavoidable. Facing the daunting scale of data, it is unrealistic to expect humans to “label” or tell which data source is more reliable.Hence, it is crucial to identify trustworthy information from multiple noisy information sources, referring to the task of knowledge discovery.

At present, the knowledge discovery research for multi-sourced data mainly faces two challenges. On the structural level, it is essential to consider the different characteristics of data composition and application scenarios and define the knowledge discovery problem on different occasions. On the algorithm level, the knowledge discovery task needs to consider different levels of information conflicts and design efficient algorithms to mine more valuable information using multiple clues. Existing knowledge discovery methods have defects on both the structural level and the algorithm level, making the knowledge discovery problem far from totally solved.

Knowledge Harvesting: Achievements and Challenges

A survey on semantic schema discovery

Article 27 November 2021

Fine-grained semantic type discovery for heterogeneous sources using clustering

Article Open access 17 May 2022

Keywords

Table of contents (5 chapters)

Front Matter

Pages i-xii

Download chapter PDF
Introduction
- Chen Ye, Hongzhi Wang, Guojun Dai
Pages 1-11
Functional-Dependency-Based Truth Discovery for Isomorphic Data
- Chen Ye, Hongzhi Wang, Guojun Dai
Pages 13-31
Denial-Constraint-Based Truth Discovery for Isomorphic Data
- Chen Ye, Hongzhi Wang, Guojun Dai
Pages 33-51
Pattern Discovery for Heterogeneous Data
- Chen Ye, Hongzhi Wang, Guojun Dai
Pages 53-67
Fact Discovery for Text Data
- Chen Ye, Hongzhi Wang, Guojun Dai
Pages 69-83

Authors and Affiliations

Computer and Software Department, Hangzhou Dianzi University, Hangzhou, China

Chen Ye
Computer Science and Technology, Harbin Institute of Technology, Harbin, China

Hongzhi Wang
Computer and Software department, Hangzhou Dianzi University, Hangzhou, China

Guojun Dai

About the authors

Chen Ye is currently an Associate Researcher at the School of Computer Science and Technology, Hangzhou Dianzi University, China. She received the Ph.D. degree in Computer Software and Theory from Harbin Institute of Technology, China. Her current research interests include data repairing, truth discovery, and crowdsourcing. She has won the ACM SIGMOD China Doctoral Dissertation Award in 2020.

Hongzhi Wang is a Professor and Doctoral Supervisor at the School of Computer Science and Technology, Harbin Institute of Technology, China. His research interests include big data management and analysis, data quality, graph data management, and web data management. He has published more than 150 papers, and he is the Primary Investigator of more than 10 projects including three NSFC projects, and co-PI of 973, 863, and NSFC key projects. He was awarded as Microsoft fellowship, China Excellent Database Engineer, and IBM Ph.D. fellowship.

Guojun Dai is now working in the School of Computer Science and Technology of Hangzhou Dianzi University, as the Head of the National Brain-Computer Collaborative Intelligent Technology International Joint Research Center, the director of the Institute of Computer Application Technology. His research interests include Internet of Things, industrial big data, network collaborative manufacturing, edge computing, brain-computer interface, cognitive computing, artificial intelligence. He has published over 50 research papers in top-quality international conferences and journals, particularly, INFOCOM, IEEE Transactions on Industrial Informatics, and IEEE Transactions on Mobile Computing.

Bibliographic Information

Book Title: Knowledge Discovery from Multi-Sourced Data
Authors: Chen Ye, Hongzhi Wang, Guojun Dai
Series Title: SpringerBriefs in Computer Science
DOI: https://doi.org/10.1007/978-981-19-1879-7
Publisher: Springer Singapore
eBook Packages: Computer Science, Computer Science (R0)
Copyright Information: The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022
Softcover ISBN: 978-981-19-1878-0Published: 15 June 2022
eBook ISBN: 978-981-19-1879-7Published: 13 June 2022
Series ISSN: 2191-5768
Series E-ISSN: 2191-5776
Edition Number: 1
Number of Pages: XII, 83
Number of Illustrations: 5 b/w illustrations, 9 illustrations in colour
Topics: Data Mining and Knowledge Discovery, Database Management, Data Structures and Information Theory, Artificial Intelligence

Publish with us

Policies and ethics

Knowledge Discovery from Multi-Sourced Data

Overview

Access this book

Other ways to access

About this book

Similar content being viewed by others

Knowledge Harvesting: Achievements and Challenges

A survey on semantic schema discovery

Fine-grained semantic type discovery for heterogeneous sources using clustering

Keywords

Table of contents (5 chapters)

Front Matter

Introduction

Functional-Dependency-Based Truth Discovery for Isomorphic Data

Denial-Constraint-Based Truth Discovery for Isomorphic Data

Pattern Discovery for Heterogeneous Data

Fact Discovery for Text Data

Authors and Affiliations

Computer and Software Department, Hangzhou Dianzi University, Hangzhou, China

Computer Science and Technology, Harbin Institute of Technology, Harbin, China

Computer and Software department, Hangzhou Dianzi University, Hangzhou, China

About the authors

Bibliographic Information

Publish with us

Navigation

Knowledge Discovery from Multi-Sourced Data

Overview

Access this book

Other ways to access

About this book

Similar content being viewed by others

Keywords

Table of contents (5 chapters)

Front Matter

Authors and Affiliations

Computer and Software Department, Hangzhou Dianzi University, Hangzhou, China

Computer Science and Technology, Harbin Institute of Technology, Harbin, China

Computer and Software department, Hangzhou Dianzi University, Hangzhou, China

About the authors

Bibliographic Information

Publish with us

Search

Navigation