Abstract
Data fusion is the process of merging records from multiple sources which represent the same real-world object into a single representation. This review of the literature concerns Data Fusion in the context of data integration, i.e., the integration of structured and semi-structured data from the same domain, and provides an overview of this field of research. We present why data fusion is becoming increasingly necessary, what it is used for (What for?), what methods and solutions for data fusion have been proposed in the literature (In what form?), what research challenges are still open in the data fusion area and what future research directions could usefully take (What is next?)
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Neo4j - https://neo4j.com
References
Ahmed, A.H., & Sadri, F. (2018). Datafusion: taking source confidences into account. In ICIST, ACM, New York, NY, USA (pp. 9:1–9:6), DOI https://doi.org/10.1145/3200842.3200854
Akkaya, K., Demirbas, M., Aygün, R.S. (2008). The impact of data aggregation on the performance of wireless sensor networks. Wireless Communications and Mobile Computing, 8(2), 171–193.
Berti-Équille, L. (2015). Data veracity estimation with ensembling truth discovery methods. In BigData, IEEE (pp. 2628–2636).
Berti-Équille, L., & Borge-Holthoefer, J. (2015). Veracity of data: from truth discovery computation algorithms to models of misinformation dynamics. synthesis lectures on data management. New York: Morgan & Claypool Publishers.
Bilke, A., Bleiholder, J., Böhm, C., Draba, K., Naumann, F., Weis, M. (2005). Automatic data fusion with HumMer. In VLDB, demo abstract band. http://www.informatik.hu-berlin.de/mac/publications/VLDB2005.pdf.
Bleiholder, J. (2010). Data fusion and conflict resolution in integrated information systems. PhD thesis, Uni Potsdam.
Bleiholder, J., & Naumann, F. (2008). Data fusion. ACM Computational Surveys, 41(1), 1–41. https://doi.org/10.1145/1456650.1456651.
Brin, S., & Page, L. (2001). The anatomy of a Large-Scale hypertextual web search engine. In Proceedings of the seventh international world-wide web conference.
Broelemann, K., & Kasneci, G. (2018). Combining restricted boltzmann machines with neural networks for latent truth discovery. arXiv:1807-10680.
Broelemann, K., Gottron, T., Kasneci, G. (2017). Ltd-rbm: Robust and fast latent truth discovery using restricted boltzmann machines. In ICDE, IEEE computer society (pp. 143–146).
Broelemann, K., Gottron, T., Kasneci, G. (2018). Restricted boltzmann machines for robust and fast latent truth discovery. arXiv:1801.00283.
Chhabra, S., & Singh, D. (2015). Article: data fusion and data aggregation/summarization techniques in wsns: a review. International Journal of Computer Applications, 121(19), 21–30. full text available.
De Oliveira Costa, G.M., de Farias, C.M., Pirmez, L. (2018). Athena: a knowledge fusion algorithm for the internet of things. In Q2SWinet, ACM (pp. 92–99). http://dblp.uni-trier.de/db/conf/mswim/q2swinet2018.html#MartinsFP18.
Ding, W., Jing, X., Yan, Z., Yang, L.T. (2019). A survey on data fusion in internet of things: towards secure and privacy-preserving fusion. Information Fusion, 51, 129–144.
Dong, X.L., & Naumann, F. (2009). Data fusion - resolving data conflicts for integration. PVLDB, 2(2), 1654–1655. https://dblp.uni-trier.de/db/journals/pvldb/pvldb2.html.
Dong, X.L., & Srivastava, D. (2015a). Big data integration. synthesis lectures on data management. New York: Morgan & Claypool Publishers.
Dong, X.L., & Srivastava, D. (2015b). Knowledge curation and knowledge fusion: challenges, models and applications. In SIGMOD conference, ACM (pp. 2063–2066). http://dblp.uni-trier.de/db/conf/sigmod/sigmod2015.html#DongS15.
Dong, X.L., Berti-Équille, L., Srivastava, D. (2009a). Integrating conflicting data: The role of source dependence. PVLDB, 2(1), 550–561. http://dblp.uni-trier.de/db/journals/pvldb/pvldb2.html#DongBS09.
Dong, X.L., Berti-Équille, L., Srivastava, D. (2009b). Truth discovery and copying detection in a dynamic world. PVLDB, 2(1), 562–573. http://dblp.uni-trier.de/db/journals/pvldb/pvldb2.html#DongBS09a.
Dong, X.L., Gabrilovich, E., Heitz, G., Horn, W., Murphy, K., Sun, S., Zhang, W. (2014). From data fusion to knowledge fusion. PVLDB, 7(10), 881–892.
Dong, X.L., Berti-Équille, L., Srivastava, D. (2015). Data fusion: resolving conflicts from multiple sources. arXiv:1503.00310.
Fang, X.S. (2017). Truth discovery from conflicting multi-valued objects. In WWW (Companion Volume), ACM (pp. 711–715). http://dblp.uni-trier.de/db/conf/www/www2017c.html#Fang17.
Fang, X.S., Sheng, Q.Z., Wang, X. (2016). An ensemble approach for better truth discovery. In ADMA, lecture notes in computer science, (Vol. 10086 pp. 298–311). http://dblp.uni-trier.de/db/conf/adma/adma2016.html#FangSW16.
Fang, X.S., Sheng, Q.Z., Wang, X., Barhamgi, M., Yao, L., Ngu, A.H.H. (2017a). Sourcevote: fusing multi-valued data via inter-source agreements. In ER, Springer, lecture notes in computer science, (Vol. 10650 pp. 164–172). http://dblp.uni-trier.de/db/conf/er/er2017.html#FangSWBYN17.
Fang, X.S., Sheng, Q.Z., Wang, X., Ngu, A.H.H. (2017b). Smartmtd: a graph-based approach for effective multi-truth discovery. arXiv:1708.02018.
Fonseca, L., Namikawa, L., Castejon, E., Carvalho, L., Pinho, C., Pagamisse, A. (2011). Image fusion for remote sensing applications. In Image fusion and its applications, IntechOpen, Rijeka, chap 9 https://doi.org/10.5772/22899.
Fuxman, A., Fazli, E., Miller, R.J. (2005). Conquer: efficient management of inconsistent databases. In ACM SIGMOD international conference on management of data, ACM, New York, NY, USA (pp. 155–166). https://doi.org/10.1145/1066157.1066176. http://www.cs.toronto.edu/afuxman/publications/sigmod05.pdf.
Galland, A., Abiteboul, S., Marian, A., Senellart, P. (2010). Corroborating information from disagreeing views. In WSDM, ACM (pp. 131–140). http://dblp.uni-trier.de/db/conf/wsdm/wsdm2010.html#GallandAMS10.
Hall, D., & Llinas, J. (1997). An introduction to multisensor data fusion. Proceedings of the IEEE, 85(1), 6–23.
Hara, C.S., de Aguiar Ciferri, C.D., Ciferri, R.R. (2013). Incremental data fusion based on provenance information. In In Search of elegance in the theory and practice of computation, Springer, Lecture Notes in Computer Science, (Vol. 8000 pp. 339–365). http://dblp.uni-trier.de/db/conf/birthday/buneman2013.html#HaraCC13.
James, A.P., & Dasarathy, B.V. (2014). Medical image fusion: s survey of the state of the art. In Information Fusion, (Vol. 19 pp. 4–19). http://dblp.uni-trier.de/db/journals/inffus/inffus19.html#JamesD14.
Koller, D., & Friedman, N. (2009). Probabilistic graphical models: principles and techniques. Cambridge: MIT Press.
Lau, B.P.L., Hasala, M.S., Zhou, Y., Hassan, N.U., Yuen, C., Zhang, M., Tan, U.X. (2019). A survey of data fusion in smart city applications. Information Fusion, 52, 357–374.
Li, F., Dong, X.L., Langen, A., Li, Y. (2017). Discovering multiple truths with a hybrid model. arXiv:1705.04915.
Li, Q., Li, Y., Gao, J., Su, L., Zhao, B., Demirbas, M., Fan, W., Han, J. (2014a). A confidence-aware approach for truth discovery on long-tail data. PVLDB, 8(4), 425–436. http://dblp.uni-trier.de/db/journals/pvldb/pvldb8.html#LiLGSZDFH14.
Li, Q., Li, Y., Gao, J., Zhao, B., Fan, W., Han, J. (2014b). Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In SIGMOD conference, ACM (pp. 1187–1198). http://dblp.uni-trier.de/db/conf/sigmod/sigmod2014.html#LiLGZFH14.
Li, X., Dong, X.L., Lyons, K., Meng, W., Srivastava, D. (2012). Truth finding on the deep web: Is the problem solved? arXiv:1503.00303.
Li, X., Dong, X.L., Lyons, K.B., Meng, W., Srivastava, D. (2015a). Scaling up copy detection. In ICDE, IEEE computer society (pp. 89–100). http://dblp.uni-trier.de/db/conf/icde/icde2015.html#LiDLMS15.
Li, Y., Gao, J., Meng, C., Li, Q., Su, L., Zhao, B., Fan, W., Han, J. (2015b). A survey on truth discovery. SIGKDD Explorations, 17(2), 1–16. http://dblp.uni-trier.de/db/journals/sigkdd/sigkdd17.html#LiGMLSZFH15.
Li, Y., Li, Q., Gao, J., Su, L., Zhao, B., Fan, W., Han, J. (2016). Conflicts to harmony: a framework for resolving conflicts in heterogeneous data by truth discovery. IEEE TKDE, 28(8), 1986–1999. http://dblp.uni-trier.de/db/journals/tkde/tkde28.html#LiLGSZFH16.
Lillis, D., Toolan, F., Collier, R.W., Dunnion, J. (2006). Probfuse: a probabilistic approach to data fusion. In SIGIR, ACM (pp. 139–146). http://dblp.uni-trier.de/db/conf/sigir/sigir2006.html#LillisTCD06.
Lin, X., & Chen, L. (2018). Domain-aware multi-truth discovery from conflicting sources. PVLDB, 11(5), 635–647. http://dblp.uni-trier.de/db/journals/pvldb/pvldb11.html#LinC18.
Liu, W., Liu, J., Duan, H., Hu, W., Wei, B. (2017a). Exploiting source-object networks to resolve object conflicts in linked data. In ESWC (1), lecture notes in computer science, (Vol. 10249 pp. 53–67). http://dblp.uni-trier.de/db/conf/esws/eswc2017-1.html#LiuLDHW17.
Liu, W., Liu, J., Duan, H., Zhang, J., Hu, W., Wei, B. (2017b). Truthdiscover: resolving object conflicts on massive linked data. In WWW (Companion Volume), ACM. http://dblp.uni-trier.de/db/conf/www/www2017c.html#LiuLDZHW17, Vol. 243–246.
Liu, W., Liu, J., Wei, B., Duan, H., Hu, W. (2019). A new truth discovery method for resolving object conflicts over linked data with scale-free property. Knowledge and Information Systems, 59(2), 465–495. http://dblp.uni-trier.de/db/journals/kais/kais59.html#LiuLWDH19.
Liu, X., Dong, X.L., Ooi, B.C., Srivastava, D. (2011). Online data fusion. PVLDB, 4(11), 932–943. http://dblp.uni-trier.de/db/journals/pvldb/pvldb4.html#LiuDOS11.
Ma, B., Jiang, T., Zhou, X., Zhao, F., Yang, Y. (2017). A novel data integration framework based on unified concept model. IEEE Access, 5, 5713–5722. http://dblp.uni-trier.de/db/journals/access/access5.html#MaJZZY17.
Michelfeit, J., & Mynarz, J. (2014). New directions in linked data fusion. In ISWC (Posters & Demos), CEUR workshop proceedings, (Vol. 1272 pp. 397–400). http://dblp.uni-trier.de/db/conf/semweb/iswc2014p.html#MichelfeitM14.
Michelfeit, J., Knap, T., Necaský, M. (2014). Linked data integration with conflicts. arXiv:1410.7990.
Motro, A., & Anokhin, P. (2006). Fusionplex: resolution of data inconsistencies in the integration of heterogeneous information sources. Information Fusion, 7(2), 176–196. http://dblp.uni-trier.de/db/journals/inffus/inffus7.html#MotroA06.
Nakhaei, Z., & Ahmadi, A. (2017). Toward high level data fusion for conflict resolution. In ICMLC, IEEE (pp. 91–97). http://dblp.uni-trier.de/db/conf/icmlc/icmlc2017.html#NakhaeiA17.
Pasternack, J., & Roth, D. (2010). Knowing what to believe (when you already know something). In COLING (pp. 877–885). Tsinghua: Tsinghua University Press. http://dblp.uni-trier.de/db/conf/coling/coling2010.html#PasternackR10.
Pasternack, J., & Roth, D. (2011). Making better informed trust decisions with generalized fact-finding. In Proceedings of the twenty-second international joint conference on artificial intelligence - Volume Three, AAAI Press, IJCAI’11 (pp. 2324–2329). https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-387.
Pasternack, J., & Roth, D. (2013). Latent credibility analysis. In Proceedings of the 22nd international conference on World Wide Web, international world wide web conferences steering committee (pp. 1009–1020). http://www2013.org/proceedings/p1009.pdf.
Pochampally, R., Sarma, A.D., Dong, X.L., Meliou, A., Srivastava, D. (2014). Fusing data with correlations. In SIGMOD conference, ACM (pp. 433–444). http://dblp.uni-trier.de/db/conf/sigmod/sigmod2014.html#PochampallySDMS14.
Preece, A.D., Hui, K.Y., Gray, W.A., Marti, P., Bench-Capon, T.J.M., Cui, Z., Jones, D.M. (2001). Kraft: an agent architecture for knowledge fusion. International Journal of Cooperative Information Systems, 10(1-2), 171–195. http://dblp.uni-trier.de/db/journals/ijcis/ijcis10.html#PreeceHGMBCJ01.
Rekatsinas, T., Joglekar, M., Garcia-Molina, H., Parameswaran, A.G., Ré, C. (2017). Slimfast: guaranteed results for data fusion and source reliability. In SIGMOD conference, ACM (pp. 1399–1414). http://dblp.uni-trier.de/db/conf/sigmod/sigmod2017.html#RekatsinasJGPR17.
Saha, B., & Srivastava, D. (2014). Data quality: the other face of big data. In ICDE, IEEE computer society (pp. 1294–1297). http://dblp.uni-trier.de/db/conf/icde/icde2014.html#SahaS14.
Sethi, P., & Sarangi, S.R. (2017). Internet of things: architectures, protocols, and applications. J Electrical and Computer Engineering, 2017, 9324035:1–9324035:25.
Soldatos, J., Kefalakis, N., Hauswirth, M., Serrano, M., Calbimonte, J.P., Riahi, M., Aberer, K., Jayaraman, P.P., Zaslavsky, A.B., Zarko, I.P., Skorin-Kapov, L., Herzog, R. (2014). Openiot: open source internet-of-things in the cloud. In OpenIoT@SoftCOM, Springer, lecture notes in computer science, (Vol. 9001 pp. 13–25). http://dblp.uni-trier.de/db/conf/softcom/openiot2014.html#SoldatosKHSCRAJ14.
Torra, V., & Narukawa, Y. (2007). Modeling decisions - information fusion and aggregation operators. New York: Springer.
Waguih, D.A., & Berti-Équille, L. (2014). Truth discovery algorithms: an experimental evaluation. arXiv:1409.6428.
Wang, C. (2010). Data analysis in incomplete information systems based on granular computing. In 2010 International conference on system science, engineering design and manufacturing informatization, (Vol. 2 pp. 153–155).
Wang, M., Perera, C., Jayaraman, P.P., Zhang, M., Strazdins, P., Shyamsundar, R.K., Ranjan, R. (2016). City data fusion: Sensor data fusion in the internet of things. IJDST, 7(1), 15–36. http://dblp.uni-trier.de/db/journals/ijdst/ijdst7.html#WangPJZSSR16.
Wang, X., Sheng, Q.Z., Fang, X.S., Yao, L., Xu, X., Li, X. (2015). An integrated bayesian approach for effective multi-truth discovery. In Bailey, J., Moffat, A., Aggarwal, C.C., de Rijke, M., Kumar, R., Murdock, V., Sellis, T.K., & Yu, J.X. (Eds.) CIKM, ACM (pp. 493–502). http://dblp.uni-trier.de/db/conf/cikm/cikm2015.html#WangSFYXL15.
Wang, Y., Ma, F., Su, L., Gao, J. (2017). Discovering truths from distributed data. In ICDM, IEEE computer society (pp. 505–514). http://dblp.uni-trier.de/db/conf/icdm/icdm2017.html#WangMSG17.
Wang, Z., & Ma, Y. (2008). Medical image fusion using m-pcnn. Information Fusion, 9(2), 176–185. http://dblp.uni-trier.de/db/journals/inffus/inffus9.html#WangM08.
Wu, H., Pei, Y., Li, B., Kang, Z., Liu, X., Li, H. (2015). Item recommendation in collaborative tagging systems via heuristic data fusion. Knowledge-Based Systems, 75, 124–140. http://dblp.uni-trier.de/db/journals/kbs/kbs75.html#WuPLKLL15.
Wu, S. (2012a). Data fusion in information retrieval., adaptation, learning, and optimization Vol. 13. New York: Springer.
Wu, S. (2012b). Data fusion in information retrieval., adaptation, learning, and optimization Vol. 13. New York: Springer.
Xiao, H., Gao, J., Li, Q., Ma, F., Su, L., Feng, Y., Zhang, A. (2016). Towards confidence in the truth: a bootstrapping based truth discovery approach. In KDD, ACM (pp. 1935–1944). http://dblp.uni-trier.de/db/conf/kdd/kdd2016.html#XiaoGLMSFZ16.
Xie, Z., Liu, Q., Bao, Z. (2017). Sifting truths from multiple low-quality data sources. In APWeb/WAIM (1), Springer, lecture notes in computer science, (Vol. 10366 pp. 74–81). http://dblp.uni-trier.de/db/conf/apweb/apweb2017-1.html#XieLB17.
Xu, W., & Yu, J. (2017). A novel approach to information fusion in multi-source datasets: a granular computing viewpoint. Information Sciences, 378, 410–423.
Yang, Y., Bai, Q., Liu, Q. (2018). A probabilistic model for truth discovery with object correlations. Knowledge-Based Systems, 165, 360–373. http://dblp.uni-trier.de/db/journals/kbs/kbs165.html#YangBL19.
Yin, X., & Tan, W. (2011). Semi-supervised truth discovery. In WWW, ACM (pp. 217–226). http://dblp.uni-trier.de/db/conf/www/www2011.html#YinT11.
Yin, X., Han, J., Yu, P.S. (2008). Truth discovery with multiple conflicting information providers on the web. IEEE Transactions on Knowledge and Data Engineering, 20(6), 796–808. http://dblp.uni-trier.de/db/journals/tkde/tkde20.html#YinHY08.
Yu, D., Huang, H., Cassidy, T., Ji, H., Wang, C., Zhi, S., Han, J., Voss, C.R., Magdon-Ismail, M. (2014). The wisdom of minority: unsupervised slot filling validation based on multi-dimensional truth-finding. In Hajic J, & Tsujii, J (Eds.) COLING, ACL (pp. 1567–1578).
Zhang, H., Li, Q., Ma, F., Xiao, H., Li, Y., Gao, J., Su, L. (2016). Influence-aware truth discovery. In CIKM, ACM (pp. 851–860). http://dblp.uni-trier.de/db/conf/cikm/cikm2016.html#ZhangLMXLGS16.
Zhang, J., Wang, S., Wu, G., Zhang, L. (2018). A effective truth discovery algorithm with multi-source sparse data. In ICCS (3), Springer, lecture notes in computer science, (Vol. 10862 pp. 434–442). http://dblp.uni-trier.de/db/conf/iccS/iccS2018-3.html#ZhangWWZ18.
Zhao, B., Rubinstein, B.I.P., Gemmell, J., Han, J. (2012). A bayesian approach to discovering truth from conflicting sources for data integration. arXiv:1203.0058.
Zheng, Y., Yin, M., Luo, J., He, G. (2019). Truth discovery on multi-dimensional properties of data sources. In ACM TUR-C, ACM (pp. 164:1–164:8). http://dblp.uni-trier.de/db/conf/acmturc/acmturc2019.html#ZhengYLH19.
Acknowledgements
This research was partially funded by INES 2.0, FACEPE grants APQ-0399-1.03/17 and APQ-0399-1.03/17, CAPES grant 88887.136410/2017-00, and CNPq grant 465614/2014-0.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Canalle, G.K., Salgado, A.C. & Loscio, B.F. A survey on data fusion: what for? in what form? what is next?. J Intell Inf Syst 57, 25–50 (2021). https://doi.org/10.1007/s10844-020-00627-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-020-00627-4