Abstract
With increasing interest in querying and analyzing graph data from multiple sources, algorithms and tools to integrate different graphs become very important. Integration of graphs can take place at the schema and instance levels. While links among graph nodes pose additional challenges to graph information integration, they can also serve as useful features for matching nodes representing real-world entities. This chapter introduces a general framework to perform graph information integration. It then gives an overview of the state-of-the-art research and tools in graph information integration.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
MIPT is the acronym of Memorial Institute for the Prevention of Terrorism.
References
O. Benjelloun, H. Garcia-Molina, D. Menestrina, Q. Su, S. E. Whang, and J. Widom. Swoosh: A generic approach to entity resolution. VLDB Journal, 18(1):255–276, 2009.
I. Bhattacharya and L. Getoor. A latent dirichlet model for unsupervised entity resolution. In SIAM Conference on Data Mining, Bethesda, Maryland, USA, 2006.
I. Bhattacharya and L. Getoor. Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data, 1(1), 2007.
M. Bilgic, L. Licamele, L. Getoor, and B. Shneiderman. D-dupe: An interactive tool for entity resolution in social networks. In International Symposium on Graph Drawing, volume 3843 of Lecture Notes in Computer Science, pages 505–507, September 2005.
P. Buneman. Semistructured data. In ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Tucson, Arizona, 1997.
P. Chen. The entity-relationship model—toward a unified view of data. ACM Transactions on Database Systems, 1(1):9–36, 1976.
W. W. Cohen, P. Ravikumar, and S. E. Fienberg. A comparison of string distance metrics for name-matching tasks. In IJCAI Workshop on Information Integration, pages 73–78, Acapulco, Mexico, August 2003.
P. Domingos. Multi-relational record linkage. In KDD-2004 Workshop on Multi-Relational Data Mining, pages 31–48, Seattle, Washington, 2004.
J.-D. Fekete, G. Grinstein, and C. Plaisant. The history of infovis. In IEEE InfoVis 2004 Contest, www.cs.umd.edu/hcil/iv04contest, Austin, Texas, 2004.
M. Fernandez, D. Florescu, J. Kang, A. Levy, and D. Suciu. Strudel: a web site management system. In ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, 1997.
N. Guarino. Formal Ontology in Information Systems, chapter Formal Ontology in Information Systems. IOS Press, Amsterdam, 1998.
G. Jeh and J. Widom. Simrank: A measure of structural-context similarity. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 538–543, Edmonton, Alberta, Canada, 2002.
A. Jhingran, N. Mattos, and H. Pirahesh. Information integration: A research agenda. IBM Systems Journal, 41(4):555–562, 2002.
L. Jin, C. Li, and S. Mehrotra. Efficient record linkage in large data sets. In International Conference on Database Systems for Advanced Applications, Kyoto, Japan, 2003.
E.-P. Lim and J. Srivastava. Query optimization and processing in federated database systems. In ACM Conference on Information and Knowledge Management, pages 720–722, Washington D.C., 1993.
E.-P. Lim, J. Srivastava, S. Prabhakar, and J. Richardson. Entity identification in database integration. In IEEE International Conference on Data Engineering, pages 294–301, Vienna, Austria, 1993.
E.-P. Lim, J. Srivastava, and S. Shekhar. An evidential reasoning approach to attribute value conflict resolution in database integration. IEEE Transactions on Knowledge and Data Engineering, 8(5):707–723, 1996.
W. Litwin, L. Mark, and N. Roussopoulos. Interoperability of multiple autonomous databases. ACM Computing Survey, 22(3):267–293, 1990.
Maureen, A. Sun, E.-P. Lim, A. Datta, and K. Chang. On visualizing heterogeneous semantic networks from multiple data sources. In International Conference on Asian Digital Libraries, pages 266–275, Bali, Indonesia, 2008.
J. McHugh, S. Abiteboul, R. Goldman, D. Quass, and J. Widom. Lore: A database management system for semistructured data. SIGMOD Record, 26(3), 1997.
A. Sheth and J. Larson. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Survey, 22(3):183–236, 1990.
S. Spaccapietra and C. Parent. View integration: A step forward in solving structural conflicts. IEEE Transactions on Knowledge and Data Engineering, 6(2):258–274, 1994.
P. Treeratpituk and C. L. Giles. Disambiguating authors in academic publications using random forests. In Joint Conference in Digital Libraries, Austin, Texas, June 2009.
P. Ziegler and K. R. Dittrich. Three decades of data integration — all problems solved? In 18th IFIP World Computer Congress (WCC 2004), pages 3–12, Toulouse, France, 2004.
Acknowledgements
We would like to acknowledge the support by A*STAR Public Sector R&D, Singapore, Project Number 062 101 0031 in the SSNet Project. We also thank Maureen and Nelman Lubis Ibrahim for implementing the SSnetViz system.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Lim, EP., Sun, A., Datta, A., Chang, K. (2010). Information Integration for Graph Databases. In: Yu, P., Han, J., Faloutsos, C. (eds) Link Mining: Models, Algorithms, and Applications. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6515-8_10
Download citation
DOI: https://doi.org/10.1007/978-1-4419-6515-8_10
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-6514-1
Online ISBN: 978-1-4419-6515-8
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)