Breaking the Deadlock: Simultaneously Discovering Attribute Matching and Cluster Matching with Multi-Objective Simulated Annealing

Liu, Haishan; Dou, Dejing

doi:10.1007/978-3-642-25106-1_21

Haishan Liu²⁹ &
Dejing Dou²⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7045))

Included in the following conference series:

OTM Confederated International Conferences "On the Move to Meaningful Internet Systems"

687 Accesses
1 Citations

Abstract

In this paper, we present a data mining approach to challenges in the matching and integration of heterogeneous datasets. In particular, we propose solutions to two problems that arise in combining information from different results of scientific research. The first problem, attribute matching, involves discovery of correspondences among distinct numeric-typed summary features (“attributes”) that are used to characterize datasets that have been collected and analyzed in different research labs. The second problem, cluster matching, involves discovery of matchings between patterns across datasets. We treat both of these problems together as a multi-objective optimization problem. A multi-objective simulated annealing algorithm is described to find the optimal solution. The utility of this approach is demonstrated in a series of experiments using synthetic and realistic datasets that are designed to simulate heterogeneous data from different sources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Reduction of Dimension and Size of Data Set by Parallel Fast Simulated Annealing

Evidence Accumulation in Multiobjective Data Clustering

Blending multiple algorithmic granular components: a recipe for clustering

Article 06 November 2022

References

Bae, E., Bailey, J., Dong, G.: A Clustering Comparison Measure Using Density Profiles and Its Application to The Discovery tf Alternate Clusterings. Data Min. Knowl. Discov. 21, 427–471 (2010), http://dx.doi.org/10.1007/s10618-009-0164-z
Article MathSciNet Google Scholar
Dhamankar, R., Lee, Y., Doan, A., Halevy, A., Domingos, P.: iMAP: Discovering Complex Semantic Matches between Database Schemas. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. ACM Press (2004)
Google Scholar
Dien, J.: The ERP PCA Toolkit: An Open Source Program for Advanced Statistical Analysis of Event-Related Potential Data. Journal of Neuroscience Methods 187(1), 138–145 (2010), http://www.sciencedirect.com/science/article/B6T04-4Y0KWB2-4/2/3c0e7b36b475b8d0e9a72c7b868a7dcd
Article MathSciNet Google Scholar
Doan, A., Domingos, P., Levy, A.Y.: Learning Source Description for Data Integration. In: WebDB (Informal Proceedings), pp. 81–86 (2000), http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.42.9378
Fred, A.L., Jain, A.K.: Robust Data Clustering. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, p. 128 (2003)
Google Scholar
Frishkoff, G.A., Frank, R.M., Rong, J., Dou, D., Dien, J., Halderman, L.K.: A Framework to Support Automated Classification and Labeling of Brain Electromagnetic Patterns. Computational Intelligence and Neuroscience (CIN), Special Issue, EEG/MEG Analysis and Signal Processing 7(3), 1–13 (2007)
Google Scholar
Hamers, L., Hemeryck, Y., Herweyers, G., Janssen, M., Keters, H., Rousseau, R., Vanhoutte, A.: Similarity Measures In Scientometric Research: The Jaccard Index Versus Salton’s Cosine Formula. Inf. Process. Manage. 25, 315–318 (1989), http://portal.acm.org/citation.cfm?id=67223.67231
Article Google Scholar
Kuhn, H.W.: The Hungarian Method for The Assignment Problem. Naval Research Logistic Quarterly 2, 83–97 (1955)
Article MathSciNet MATH Google Scholar
Larson, J.A., Navathe, S.B., Elmasri, R.: A Theory of Attributed Equivalence in Databases with Application to Schema Integration. IEEE Trans. Softw. Eng. 15, 449–463 (1989), http://portal.acm.org/citation.cfm?id=63379.63387
Article MATH Google Scholar
Li, W.S., Clifton, C.: Semint: A Tool for Identifying Attribute Correspondences in Heterogeneous Databases Using Neural Networks (2000)
Google Scholar
Liu, H., Frishkoff, G., Frank, R., Dou, D.: Ontology-Based Mining of Brainwaves: A Sequence Similarity Technique for Mapping Alternative Features in Event-Related Potentials (ERP) Data. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS, vol. 6119, pp. 43–54. Springer, Heidelberg (2010)
Chapter Google Scholar
Rahm, E., Bernstein, P.A.: A Survey of Approaches to Automatic Schema Matching. VLDB Journal 10 (2001)
Google Scholar
Rand, W.M.: Objective Criteria for the Evaluation of Clustering Methods. Journal of the American Statistical Association 66(336), 846–850 (1971), http://dx.doi.org/10.2307/2284239
Article Google Scholar
Sheth, A.P., Larson, J.A., Cornelio, A., Navathe, S.B.: A Tool for Integrating Conceptual Schemas and User Views. In: Proceedings of the Fourth International Conference on Data Engineering, pp. 176–183. IEEE Computer Society, Washington, DC, USA (1988), http://portal.acm.org/citation.cfm?id=645473.653395
Chapter Google Scholar
Suman, B.: Simulated annealing based multiobjective algorithm and their application for system reliability. Engin. Optim., 391–416 (2003)
Google Scholar
Suman, B., Kumar, P.: A survey of simulated annealing as a tool for single¡/b¿ and multiobjective optimization. Journal of the Operational Research Society 57, 1143–1160 (2006)
Article MATH Google Scholar
Zitzler, E., Thiele, L.: Multiobjective Optimization Using Evolutionary Algorithms - A Comparative Case Study, pp. 292–301. Springer, Heidelberg (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer and Information Science Department, University of Oregon, Eugene, USA
Haishan Liu & Dejing Dou

Authors

Haishan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Dejing Dou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

STAR Lab, Vrije Universiteit Brussel (VUB), Bldg G/10, Pleinlaan 2, 1050, Brussel, Belgium
Robert Meersman
DEBII, Curtin University of Technology, Technology Park, De Laeter Way, 6102, Bentley, WA, Australia
Tharam Dillon
Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, 28660, Boadilla del Monte, Madrid, Spain
Pilar Herrero
Smeal College of Business, Pennsylvania State University, University Park, 16802, P.O. Box, PA, U.S.A.
Akhil Kumar
Institute of Databases and Information Systems, Ulm University, Germany
Manfred Reichert
City University of Hong Kong, Hong Kong
Li Qing
National University of Singapore (NUS), Singapore
Beng-Chin Ooi
Dipartemento Tecnologie dell’Informazione, Universitá degli Studi di Milano, Via Bramante 65, 26013, Crema, Italy
Ernesto Damiani
Vanderbilt University, VU, Station B #1829, 2015 Terrace Place, 37203, Nashville, TN, USA
Douglas C. Schmidt
Virginia Tech, 24060, Blacksburg, VA
Jules White
Digital Enterprise Research Institute (DERI),, National University of Ireland, IDA Business Park, Lower Dangan, Galway, Ireland
Manfred Hauswirth
Kno.e.sis Center, Wright State University, Dayton,, Ohio
Pascal Hitzler
IBM India Research Lab, 4, Block C, Institutional Area, 110 070, Vasant Kunj, New Delhi, India
Mukesh Mohania

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, H., Dou, D. (2011). Breaking the Deadlock: Simultaneously Discovering Attribute Matching and Cluster Matching with Multi-Objective Simulated Annealing. In: Meersman, R., et al. On the Move to Meaningful Internet Systems: OTM 2011. OTM 2011. Lecture Notes in Computer Science, vol 7045. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25106-1_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-25106-1_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25105-4
Online ISBN: 978-3-642-25106-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Breaking the Deadlock: Simultaneously Discovering Attribute Matching and Cluster Matching with Multi-Objective Simulated Annealing

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Reduction of Dimension and Size of Data Set by Parallel Fast Simulated Annealing

Evidence Accumulation in Multiobjective Data Clustering

Blending multiple algorithmic granular components: a recipe for clustering

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Breaking the Deadlock: Simultaneously Discovering Attribute Matching and Cluster Matching with Multi-Objective Simulated Annealing

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Reduction of Dimension and Size of Data Set by Parallel Fast Simulated Annealing

Evidence Accumulation in Multiobjective Data Clustering

Blending multiple algorithmic granular components: a recipe for clustering

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation