Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3477314.3507351acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
poster

Towards open data discovery: a comparative study

Published: 06 May 2022 Publication History

Abstract

Open Data discovery enables the retrieval of data sources most likely to contain the information needed, facilitating data access and transparency. This work presents a comparative study involving three different methods: a hybrid algorithm based on Linear Discriminant Analysis and Word2Vec, Cosine similarity measure, and a Semantic Test proposed for Open Data search. Each method was evaluated on its ability to discover, among eight open datasets, using only their metadata and descriptions, the most likely one to meet an input question. Three evaluation rounds were performed with different sets of questions and databases, showing a classification accuracy above 81% for all methods.

References

[1]
Alberto Abelló, Oscar Romero, Torben Bach Pedersen, Rafael Berlanga, Victoria Nebot, Maria Jose Aramburu, and Alkis Simitsis. 2014. Using semantic web technologies for exploratory OLAP: a survey. IEEE transactions on knowledge and data engineering 27, 2 (2014), 571--588.
[2]
Judie Attard, Fabrizio Orlandi, Simon Scerri, and Sören Auer. 2015. A systematic review of open government data initiatives. Government information quarterly 32, 4 (2015), 399--418.
[3]
Sharon S Dawes, Lyudmila Vidiasova, and Olga Parkhimovich. 2016. Planning and designing open government data programs: An ecosystem approach. Government Information Quarterly 33, 1 (2016), 15--27.
[4]
Rahma Djiroun, Kamel Boukhalfa, and Zaia Alimazighi. 2019. Designing data cubes in OLAP systems: a decision makers' requirements-based approach. Cluster Computing 22, 3 (2019).
[5]
Wael H Gomaa, Aly A Fahmy, et al. 2013. A survey of text similarity approaches. International Journal of Computer Applications 68, 13 (2013), 13--18.
[6]
Ahmed Helal, Mossad Helali, Khaled Ammar, and Essam Mansour. 2021. A Demonstration of KGLac: A Data Discovery and Enrichment Platform for Data Science. PVLDB 14 (2021), 12.
[7]
Joanna Jedrzejowicz and Magdalena Zakrzewska. 2020. Text classification using LDA-W2V hybrid algorithm. In Intelligent Decision Technologies 2019. Springer, 227--237.
[8]
Baoli Li and Liping Han. 2013. Distance weighted cosine similarity measure for text classification. In International conference on intelligent data engineering and automated learning. Springer, 611--618.
[9]
Fatemeh Nargesian, Erkang Zhu, Ken Q Pu, and Renée J Miller. 2018. Table union search on open data. Proceedings of the VLDB Endowment 11, 7 (2018), 813--825.
[10]
Pengfei Xu, Jiaheng Lu, et al. 2019. Towards a unified framework for string similarity joins. Proceedings of the VLDB Endowment (2019).
[11]
Yi Zhang and Zachary G Ives. 2020. Finding related tables in data lakes for interactive data science. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1951--1966.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing
April 2022
2099 pages
ISBN:9781450387132
DOI:10.1145/3477314
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 May 2022

Check for updates

Author Tags

  1. open data
  2. similarity methods
  3. source discovery

Qualifiers

  • Poster

Funding Sources

  • Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES)

Conference

SAC '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 64
    Total Downloads
  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media