Abstract
Secondary analysis or the reuse of existing survey data is a common practice among social scientists. Searching for relevant datasets in Digital Libraries is a somehow unfamiliar behaviour for this community. Dataset retrieval, especially in the social sciences, incorporates additional material such as codebooks, questionnaires, raw data files and more. Our assumption is that due to the diverse nature of datasets, document retrieval models often do not work as efficiently for retrieving datasets. One way of enhancing these types of searches is to incorporate the users’ interaction context in order to personalise dataset retrieval sessions. As a first step towards this long term goal, we study characteristics of dataset retrieval sessions from a real-life Digital Library for the social sciences that incorporates both: research data and publications. Previous studies reported a way of discerning queries between document search and dataset search by query length. In this paper, we argue the claim and report our findings of an indistinguishability of queries, whether aiming for a dataset or a document. Amongst others, we report our findings of dataset retrieval sessions with respect to query characteristics, interaction sequences and topical drift within 65,000 unique sessions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
The words (document, publication) and (dataset, research data) are used interchangeably in the rest of the paper to imply the same concept.
- 3.
Accessible via: https://search.gesis.org. See details in [6].
- 4.
Character count is used considering the linguistics of German language; the queries submitted to the ISS are mixed, some in German and others in English.
- 5.
- 6.
A high-resolution figure is available at: https://arxiv.org/abs/2006.02770.
References
Angel, A., Koudas, N.: Efficient diversity-aware search. In: Proceedings of ACM SIGMOD, pp. 781–792 (2011)
Brickley, D., Burgess, M., Noy, N.: Google dataset search: building a search engine for datasets in an open web ecosystem. In: WWW, WWW 2019, pp. 1365–1375 (2019)
Cafarella, M.J., Halevy, A., Madhavan, J.: Structured data on the web. Commun. ACM 54(2), 72–79 (2011)
Chapman, A., et al.: Dataset search: a survey. VLDB J. 29(1), 251–272 (2019). https://doi.org/10.1007/s00778-019-00564-x
Chen, J., Wang, X., Cheng, G., Kharlamov, E., Qu, Y.: Towards more usable dataset search: from query characterization to snippet generation. In: Proceedings of the 28th CIKM 2019, pp. 2445–2448 (2019)
Hienert, D., Kern, D., Boland, K., Zapilko, B., Mutschke, P.: A digital library for research data and related information in the social sciences. In: 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 148–157 (2019)
Hienert, D., Mutschke, P.: A usefulness-based approach for measuring the local and global effect of IIR services. In: Proceedings of 2016 ACM CHIIR, pp. 153–162 (2016)
Jansen, B.J., Spink, A.: How are we searching the world wide web? A comparison of nine search engine transaction logs. Inf. Process. Manag. 42(1), 248–263 (2006)
Kacprzak, E., Koesten, L., Tennison, J., Simperl, E.: Characterising dataset search queries. In: Companion of WWW 2018, pp. 1485–1488. ACM Press (2018)
Kern, D., Mathiak, B.: Are there any differences in data set retrieval compared to well-known literature retrieval? In: Research and Advanced Technology for Digital Libraries, pp. 197–208 (2015)
Koesten, L., Mayr, P., Groth, P., Simperl, E., de Rijke, M.: Report on the DATA:SEARCH’18 workshop - searching data on the web. In: SIGIR Forum, vol. 52, no. 2, pp. 117–124 (2018)
Acknowledgement
This work was funded by DFG under grant MA 3964/10-1, the “Establishing Contextual Dataset Retrieval - transferring concepts from document to dataset retrieval” (ConDATA) project, http://bit.ly/Condata.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Carevic, Z., Roy, D., Mayr, P. (2020). Characteristics of Dataset Retrieval Sessions: Experiences from a Real-Life Digital Library. In: Hall, M., Merčun, T., Risse, T., Duchateau, F. (eds) Digital Libraries for Open Knowledge. TPDL 2020. Lecture Notes in Computer Science(), vol 12246. Springer, Cham. https://doi.org/10.1007/978-3-030-54956-5_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-54956-5_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-54955-8
Online ISBN: 978-3-030-54956-5
eBook Packages: Computer ScienceComputer Science (R0)