Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

MERA: A Musical Entities Reconciliation Architecture Based on Semantic Technologies

Published: 01 October 2017 Publication History

Abstract

In this paper, the authors describe Musical Entities Reconciliation Architecture MERA, an architecture designed to link music-related databases adapting the reconciliation techniques to each particular case. MERA includes mechanisms to manage third party sources to improve the results and it makes use of semantic technologies, storing and organizing the information in RDF graphs. They have implemented a prototype of their approach and have used it to link sources with different levels of data quality. The prototype has been effective in more than 94% of the cases under the conditions of their experiments. The authors have also compared their prototype with a well-known music-specialized search engine, outperforming the search results in the two experiments that they performed.

References

[1]
AnanthakrishnaR.ChaudhuriS.GantiV. 2002. Eliminating fuzzy duplicates in data warehouses. In Proceedings of the 28th international conference on Very Large Data Bases pp. 586-597. 10.1016/B978-155860869-6/50058-5
[2]
Arrington, M. 2006. AOL proudly releases massive amounts of private data.
[3]
Baxter, R., Christen, P., & Churches, T. 2003. A comparison of fast blocking methods for record linkage. In ACM SIGKDD Vol. 3, pp. 25-27.
[4]
Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S. E., & Widom, J. 2009. Swoosh: a generic approach to entity resolution. The VLDB Journal - The International Journal on Very Large Data Bases, 181, 255-276.
[5]
Bhattacharya, I., & Getoor, L. 2007. Collective entity resolution in relational data. {TKDD}. ACM Transactions on Knowledge Discovery from Data, 11, 5, es.
[6]
BilenkoM.MooneyR. J. 2003. Adaptive duplicate detection using learnable string similarity measures. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining pp. 39-48. 10.1145/956750.956759
[7]
ChaudhuriS.GanjamK.GantiV.MotwaniR. 2003. Robust and efficient fuzzy match for online data cleaning. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data pp. 313-324. 10.1145/872757.872796
[8]
ChristenP. 2008. Febrl-: an open source data cleaning, deduplication and record linkage system with a graphical user interface. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining pp. 1065-1068. 10.1145/1401890.1402020
[9]
Christen, P. 2012. Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer. Retrieved from https://books.google.es/books?id=LZrT6eWf9NMC
[10]
Cohen, W. W., Ravikumar, P., & Fienberg, S. 2003. A comparison of string metrics for matching names and records. In KDD workshop on data cleaning and object consolidation Vol. 3, pp. 73-78.
[11]
de Assis Costa, G., & de Oliveira, J. M. P. 2016. A blocking scheme for entity resolution in the semantic web. In Advanced Information networking and applications AINA, 2016 IEEE 30th international conference on pp. 1138-1145.
[12]
DoH.-H.RahmE. 2002. COMA: a system for flexible combination of schema matching approaches. In Proceedings of the 28th international conference on Very Large Data Bases pp. 610-621. 10.1016/B978-155860869-6/50060-3
[13]
DongX.HalevyA.MadhavanJ. 2005. Reference reconciliation in complex information spaces. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data pp. 85-96. 10.1145/1066157.1066168
[14]
DraisbachU.NaumannF. 2010. DuDe: The duplicate detection toolkit. In Proceedings of the International Workshop on Quality in Databases QDB.
[15]
Dunning, T. E., Kindig, B. D., Joshlin, S. C., & Archibald, C. P. 2011. Associating and linking compact disc metadata. Google Patents.
[16]
Efthymiou, V., Papadakis, G., Papastefanatos, G., Stefanidis, K., & Palpanas, T. 2017. Parallel meta-blocking for scaling entity resolution over big heterogeneous data. Information Systems, 65, 137-157.
[17]
ElmagarmidA.IlyasI. F.OuzzaniM.Quiané-RuizJ.-A.TangN.YinS. 2014. NADEEF/ER: Generic and interactive entity resolution. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data pp. 1071-1074. 10.1145/2588555.2594511
[18]
Enríquez, J. G., Domínguez-Mayo, F. J., Escalona, M. J., Ross, M., & Staples, G. 2017. Entity reconciliation in big data sources: A systematic mapping study. Expert Systems with Applications, 80, 14-27.
[19]
Fellegi, I. P., & Sunter, A. B. 1969. A Theory for Record Linkage. Journal of the American Statistical Association, 64328, 1183-1210.
[20]
FisherJ.ChristenP.WangQ.RahmE. 2015. A clustering-based framework to control block sizes for entity resolution. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp. 279-288. 10.1145/2783258.2783396
[21]
FrontiniF.BrandoC.GanasciaJ.-G. 2015. Domain-adapted named-entity linker using Linked Data. In Workshop on NLP Applications: Completing the Puzzle co-located with the 20th International Conference on Applications of Natural Language to Information Systems NLDB 2015.
[22]
Gravano, L., Ipeirotis, P. G., Jagadish, H. V., Koudas, N., Muthukrishnan, S., & Srivastava, D. et al . 2001. Approximate string joins in a database almost for free Vol. 1, pp. 491-500. VLDB.
[23]
GuhaS.KoudasN.MaratheA.SrivastavaD. 2004. Merging the results of approximate match operations. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30 pp. 636-647. 10.1016/B978-012088469-8.50057-7
[24]
Hall, P. A. V., & Dowling, G. R. 1980. Approximate string matching. {CSUR}. ACM Computing Surveys, 124, 381-402.
[25]
Harron, K., Goldstein, H., & Dibben, C. 2015. Methodological developments in data linkage. John Wiley & Sons.
[26]
Hartnett, J. 2015. Discogs. com. The Charleston Advisor, 164, 26-33.
[27]
Hemerly, J. 2011. Making metadata: The case of MusicBrainz.
[28]
Hernández, M. A., & Stolfo, S. J. 1995. The merge/purge problem for large databases. SIGMOD Record, 242, 127-138.
[29]
Jurczyk, P., Lu, J. J., Xiong, L., Cragan, J. D., & Correa, A. 2008. FRIL: A tool for comparative record linkage. AMIA ... Annual Symposium Proceedings / AMIA Symposium. AMIA Symposium, 2008, 440. 18998844> .
[30]
Kalashnikov, D. V., & Mehrotra, S. 2006. Domain-independent data cleaning via analysis of entity-relationship graph. ACM Transactions on Database Systems, 312, 716-767.
[31]
Kang, H., Getoor, L., Shneiderman, B., Bilgic, M., & Licamele, L. 2008. Interactive entity resolution in relational data: A visual analytic tool and its evaluation. Visualization and Computer Graphics . IEEE Transactions on, 145, 999-1014.
[32]
Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, M., & Jurafsky, D. 2013. Deterministic coreference resolution based on entity-centric, precision-ranked rules. Computational Linguistics, 394, 885-916.
[33]
Monge, A. E., Elkan, C., & Associates. 1996. The Field Matching Problem: Algorithms and Applications. In KDD pp. 267-270.
[34]
Navarro, G. 2001. A guided tour to approximate string matching. ACM Computing Surveys, 331, 31-88.
[35]
Newcombe, H. B., & Kennedy, J. M. 1962. Record linkage: Making maximum use of the discriminating power of identifying information. Communications of the ACM, 511, 563-566.
[36]
NgV.CardieC. 2002. Improving machine learning approaches to coreference resolution. In Proceedings of the 40th annual meeting on association for computational linguistics pp. 104-111.
[37]
Nguyen, K., & Ichise, R. 2016. Linked data entity resolution system enhanced by configuration learning algorithm. IEICE Transactions on Information and Systems, 996, 1521-1530.
[38]
PapadakisG.IoannouE.NiederéeC.PalpanasT.NejdlW. 2012. Beyond 100 million entities: large-scale blocking-based resolution for heterogeneous data. In Proceedings of the fifth ACM international conference on Web search and data mining pp. 53-62. 10.1145/2124295.2124305
[39]
Peng, T., Li, L., & Kennedy, J. 2014. A Comparison of Techniques for Name Matching. {JoC}. Journal on Computing, 21, 55-61.
[40]
Rahmani, H., Ranjbar-Sahraei, B., Weiss, G., & Tuyls, K. 2016. Entity resolution in disjoint graphs: An application on genealogical data. Intelligent Data Analysis, 202, 455-475.
[41]
Rastogi, V., Dalvi, N., & Garofalakis, M. 2011. Large-scale collective entity matching. Proceedings of the VLDB Endowment, 44, 208-218.
[42]
SarawagiS.BhamidipatyA. 2002. Interactive deduplication using active learning. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining pp. 269-278. 10.1145/775047.775087
[43]
Schnell, R., Bachteler, T., & Reiher, J. 2009. Privacy-preserving record linkage using Bloom filters. BMC Medical Informatics and Decision Making, 91, 41. 19706187
[44]
ShinK.JungJ.LeeS.KangU. 2015. Bear: Block elimination approach for random walk with restart on large graphs. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data pp. 1571-1585. 10.1145/2723372.2723716
[45]
Song, D., Luo, Y., & Heflin, J. 2017. Linking heterogeneous data in the semantic web using scalable and domain-independent candidate selection. IEEE Transactions on Knowledge and Data Engineering, 291, 143-156.
[46]
Stutzbach, A. R. 2011. MusicBrainz {review}. Notes, 681, 147-151.
[47]
Swartz, A. 2002. Musicbrainz: A semantic web service. IEEE Intelligent Systems, 171, 76-77.
[48]
Tejada, S., Knoblock, C. A., & Minton, S. 2001. Learning object identification rules for information integration. Information Systems, 268, 607-633.
[49]
TejadaS.KnoblockC. A.MintonS. 2002. Learning domain-independent string transformation weights for high accuracy object identification. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining pp. 350-359. 10.1145/775047.775099
[50]
Vatsalan, D., Christen, P., & Verykios, V. S. 2013. A taxonomy of privacy-preserving record linkage techniques. Information Systems, 386, 946-969.
[51]
Vatsalan, D., Sehili, Z., Christen, P., & Rahm, E. 2017. Privacy-Preserving Record Linkage for Big Data: Current Approaches and Research Challenges. In Handbook of Big Data Technologies pp. 851-895. Springer.
[52]
Volz, J., Bizer, C., Gaedke, M., & Kobilarov, G. 2009. Silk-A Link Discovery Framework for the Web of Data Vol. 538. LDOW.
[53]
Winkler, W. E. 2014. Matching and record linkage. Wiley Interdisciplinary Reviews: Computational Statistics, 65, 313-325.
[54]
Yancey, W. E. 2002. BigMatch: A program for extracting probable matches from a large file for record linkage. Computing, 1, 1-8.
[55]
Yu, M., Li, G., Deng, D., & Feng, J. 2016. String similarity search and join: A survey. Frontiers of Computer Science, 103, 399-417.
[56]
ZhuL.Ghasemi-GolM.SzekelyP.GalstyanA.KnoblockC. A. 2016. Unsupervised Entity Resolution on Multi-type Graphs. In Proceedings of the International Semantic Web Conference pp. 649-667.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image International Journal on Semantic Web & Information Systems
International Journal on Semantic Web & Information Systems  Volume 13, Issue 4
October 2017
220 pages
ISSN:1552-6283
EISSN:1552-6291
Issue’s Table of Contents

Publisher

IGI Global

United States

Publication History

Published: 01 October 2017

Author Tags

  1. Adaptable Architecture
  2. Collective Matching
  3. Entity Reconciliation
  4. Music Metadata
  5. RDF Graph
  6. Record Linkage
  7. Semantic Search

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media