Characterizing search activities on stack overflow

Published: 18 August 2021


To solve programming issues, developers commonly search on Stack Overflow to seek potential solutions. However, there is a gap between the knowledge developers are interested in and the knowledge they are able to retrieve using search engines. To help developers efficiently retrieve relevant knowledge on Stack Overflow, prior studies proposed several techniques to reformulate queries and generate summarized answers. However, few studies performed a large-scale analysis using real-world search logs. In this paper, we characterize how developers search on Stack Overflow using such logs. By doing so, we identify the challenges developers face when searching on Stack Overflow and seek opportunities for the platform and researchers to help developers efficiently retrieve knowledge. To characterize search activities on Stack Overflow, we use search log data based on requests to Stack Overflow's web servers. We find that the most common search activity is reformulating the immediately preceding queries. Related work looked into query reformulations when using generic search engines and found 13 types of query reformulation strategies. Compared to their results, we observe that 71.78% of the reformulations can be fitted into those reformulation strategies. In terms of how queries are structured, 17.41% of the search sessions only search for fragments of source code artifacts (e.g., class and method names) without specifying the names of programming languages, libraries, or frameworks. Based on our findings, we provide actionable suggestions for Stack Overflow moderators and outline directions for future research. For example, we encourage Stack Overflow to set up a database that includes the relations between all computer programming terminologies shared on Stack Overflow, e.g., method name, data structure name, design pattern, and IDE name. By doing so, Stack Overflow could improve the performance of search engines by considering related programming terminologies at different levels of granularity.


    Author Tags

    1. Data Mining
    2. Query Logs
    3. Query Reformulation
    4. Stack Overflow


    Funding Sources

    • Key Research and Development Program of Zhejiang Province
    • National Science Foundation of China
    • the National Research Foundation, Singapore under its Industry Alignment Fund ? Prepositioning (IAF-PP) Funding Initiative.


    • (2024)What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack OverflowEmpirical Software Engineering10.1007/s10664-024-10499-929:5Online publication date: 3-Jul-2024
    • (2023)A Field Study of Developer Documentation FormatExtended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544549.3585767(1-7)Online publication date: 19-Apr-2023
    • (2023)CodeGen4Libs: A Two-Stage Approach for Library-Oriented Code Generation2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00159(434-445)Online publication date: 11-Sep-2023
    • (2022)SOTitle: A Transformer-based Post Title Generation Approach for Stack Overflow2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER53432.2022.00075(577-588)Online publication date: Mar-2022
    • (2022)Time to vote: Temporal clustering of user activity on Stack OverflowJournal of the Association for Information Science and Technology10.1002/asi.2465873:12(1681-1691)Online publication date: 21-May-2022

