Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Characteristics of Dataset Retrieval Sessions: Experiences from a Real-Life Digital Library

  • Conference paper
  • First Online:
Digital Libraries for Open Knowledge (TPDL 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12246))

Included in the following conference series:

Abstract

Secondary analysis or the reuse of existing survey data is a common practice among social scientists. Searching for relevant datasets in Digital Libraries is a somehow unfamiliar behaviour for this community. Dataset retrieval, especially in the social sciences, incorporates additional material such as codebooks, questionnaires, raw data files and more. Our assumption is that due to the diverse nature of datasets, document retrieval models often do not work as efficiently for retrieving datasets. One way of enhancing these types of searches is to incorporate the users’ interaction context in order to personalise dataset retrieval sessions. As a first step towards this long term goal, we study characteristics of dataset retrieval sessions from a real-life Digital Library for the social sciences that incorporates both: research data and publications. Previous studies reported a way of discerning queries between document search and dataset search by query length. In this paper, we argue the claim and report our findings of an indistinguishability of queries, whether aiming for a dataset or a document. Amongst others, we report our findings of dataset retrieval sessions with respect to query characteristics, interaction sequences and topical drift within 65,000 unique sessions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://bit.ly/Condata.

  2. 2.

    The words (document, publication) and (dataset, research data) are used interchangeably in the rest of the paper to imply the same concept.

  3. 3.

    Accessible via: https://search.gesis.org. See details in [6].

  4. 4.

    Character count is used considering the linguistics of German language; the queries submitted to the ISS are mixed, some in German and others in English.

  5. 5.

    https://bit.ly/MLT-elastic.

  6. 6.

    A high-resolution figure is available at: https://arxiv.org/abs/2006.02770.

References

  1. Angel, A., Koudas, N.: Efficient diversity-aware search. In: Proceedings of ACM SIGMOD, pp. 781–792 (2011)

    Google Scholar 

  2. Brickley, D., Burgess, M., Noy, N.: Google dataset search: building a search engine for datasets in an open web ecosystem. In: WWW, WWW 2019, pp. 1365–1375 (2019)

    Google Scholar 

  3. Cafarella, M.J., Halevy, A., Madhavan, J.: Structured data on the web. Commun. ACM 54(2), 72–79 (2011)

    Article  Google Scholar 

  4. Chapman, A., et al.: Dataset search: a survey. VLDB J. 29(1), 251–272 (2019). https://doi.org/10.1007/s00778-019-00564-x

    Article  Google Scholar 

  5. Chen, J., Wang, X., Cheng, G., Kharlamov, E., Qu, Y.: Towards more usable dataset search: from query characterization to snippet generation. In: Proceedings of the 28th CIKM 2019, pp. 2445–2448 (2019)

    Google Scholar 

  6. Hienert, D., Kern, D., Boland, K., Zapilko, B., Mutschke, P.: A digital library for research data and related information in the social sciences. In: 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 148–157 (2019)

    Google Scholar 

  7. Hienert, D., Mutschke, P.: A usefulness-based approach for measuring the local and global effect of IIR services. In: Proceedings of 2016 ACM CHIIR, pp. 153–162 (2016)

    Google Scholar 

  8. Jansen, B.J., Spink, A.: How are we searching the world wide web? A comparison of nine search engine transaction logs. Inf. Process. Manag. 42(1), 248–263 (2006)

    Article  Google Scholar 

  9. Kacprzak, E., Koesten, L., Tennison, J., Simperl, E.: Characterising dataset search queries. In: Companion of WWW 2018, pp. 1485–1488. ACM Press (2018)

    Google Scholar 

  10. Kern, D., Mathiak, B.: Are there any differences in data set retrieval compared to well-known literature retrieval? In: Research and Advanced Technology for Digital Libraries, pp. 197–208 (2015)

    Google Scholar 

  11. Koesten, L., Mayr, P., Groth, P., Simperl, E., de Rijke, M.: Report on the DATA:SEARCH’18 workshop - searching data on the web. In: SIGIR Forum, vol. 52, no. 2, pp. 117–124 (2018)

    Google Scholar 

Download references

Acknowledgement

This work was funded by DFG under grant MA 3964/10-1, the “Establishing Contextual Dataset Retrieval - transferring concepts from document to dataset retrieval” (ConDATA) project, http://bit.ly/Condata.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zeljko Carevic .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Carevic, Z., Roy, D., Mayr, P. (2020). Characteristics of Dataset Retrieval Sessions: Experiences from a Real-Life Digital Library. In: Hall, M., Merčun, T., Risse, T., Duchateau, F. (eds) Digital Libraries for Open Knowledge. TPDL 2020. Lecture Notes in Computer Science(), vol 12246. Springer, Cham. https://doi.org/10.1007/978-3-030-54956-5_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-54956-5_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-54955-8

  • Online ISBN: 978-3-030-54956-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics