Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3661167.3661231acmotherconferencesArticle/Chapter ViewAbstractPublication PageseaseConference Proceedingsconference-collections
short-paper

Analyzing the Accessibility of GitHub Repositories for PyPI and NPM Libraries

Published: 18 June 2024 Publication History

Abstract

Industrial applications heavily rely on open-source software (OSS) libraries, which provide various benefits. But, they can also present a substantial risk if a vulnerability or attack arises and the community fails to promptly address the issue and release a fix due to inactivity. To be able to monitor the activities of such communities, a comprehensive list of repositories for the libraries of an ecosystem must be accessible. Based on these repositories, integrated libraries of an application can be monitored to observe whether they are adequately maintained. In this descriptive study, we analyze the accessibility of GitHub repositories for PyPI and NPM libraries. For all available libraries, we extract assigned repository URLs, direct dependencies and use the page rank algorithm to comprehensively analyze the ecosystems from a library and dependency chain perspective. For invalid repository URLs, we derive potential reasons. Both ecosystems show varying accessibility to GitHub repository URLs, depending on the page rank score of the analyzed libraries. For individual libraries, up to 73.8% of PyPI and up to 69.4% of NPM libraries have repository URLs. Within dependency chains, up to 80.1% of PyPI libraries have URLs, while up to 81.1% for NPM. That means, most libraries, especially the ones of increasing importance, can be monitored on GitHub. Among the most common reasons for invalid repository URLs is no URLs being assigned at all, which amounts up to 17.9% for PyPI and up to 39.6% for NPM. Package maintainers should address this issue and update the repository information to enable monitoring of their libraries.

References

[1]
Rabe Abdalkareem, Vinicius Oda, Suhaib Mujahid, and Emad Shihab. 2020. On the impact of using trivial packages: An empirical case study on npm and pypi. Empirical Software Engineering 25 (2020), 1168–1204.
[2]
Mahmoud Alfadel, Diego Elias Costa, and Emad Shihab. 2023. Empirical analysis of security vulnerabilities in python packages. Empirical Software Engineering 28, 3 (2023), 59.
[3]
Veronika Bauer, Lars Heinemann, and Florian Deissenboeck. 2012. A structured approach to assess third-party library usage. In 2012 28th IEEE International Conference on Software Maintenance (ICSM). IEEE, 483–492.
[4]
Ethan Bommarito and Michael J Bommarito II. 2019. An Empirical Analysis of the Python Package Index (PyPI). CoRR abs/1907.11073 (2019). arXiv preprint arXiv:1907.11073 (2019).
[5]
Yulu Cao, Lin Chen, Wanwangying Ma, Yanhui Li, Yuming Zhou, and Linzhang Wang. 2022. Towards better dependency management: A first look at dependency smells in python projects. IEEE Transactions on Software Engineering (2022).
[6]
Ton J Cleophas, Aeilko H Zwinderman, Ton J Cleophas, and Aeilko H Zwinderman. 2018. Bayesian Pearson correlation analysis. Modern Bayesian statistics in clinical research (2018), 111–118.
[7]
Russ Cox. 2019. Surviving software dependencies. Commun. ACM 62, 9 (2019), 36–43.
[8]
Cybersecurity and Infrastructure Security Agency. 2023. CISA Open Source Software Security Roadmap. https://www.cisa.gov/resources-tools/resources/cisa-open-source-software-security-roadmap Accessed: January 2, 2024.
[9]
Alexandre Decan, Tom Mens, and Maelick Claes. 2016. On the topology of package dependency networks: A comparison of three programming language ecosystems. In Proccedings of the 10th european conference on software architecture workshops. 1–4.
[10]
Alexandre Decan, Tom Mens, and Eleni Constantinou. 2018. On the impact of security vulnerabilities in the npm package dependency network. In Proceedings of the 15th international conference on mining software repositories. 181–191.
[11]
Alexandre Decan, Tom Mens, and Philippe Grosjean. 2019. An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empirical Software Engineering 24 (2019), 381–416.
[12]
Johannes Düsing and Ben Hermann. 2022. Analyzing the direct and transitive impact of vulnerabilities onto different artifact repositories. Digital Threats: Research and Practice 3, 4 (2022), 1–25.
[13]
Christof Ebert. 2008. Open source software in industry. IEEE Software 25, 3 (2008), 52–53.
[14]
Nadia Eghbal. 2020. Working in public: the making and maintenance of open source software. Stripe Press.
[15]
Wenbo Guo, Zhengzi Xu, Chengwei Liu, Cheng Huang, Yong Fang, and Yang Liu. 2023. An Empirical Study of Malicious Code In PyPI Ecosystem. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 166–177.
[16]
Andrej Hafner, Anže Mur, and Jaka Bernard. 2021. Node package manager’s dependency network robustness. arXiv preprint arXiv:2110.11695 (2021).
[17]
Abbas Javan Jafari, Diego Elias Costa, Emad Shihab, and Rabe Abdalkareem. 2023. Dependency update strategies and package characteristics. ACM Transactions on Software Engineering and Methodology 32, 6 (2023), 1–29.
[18]
Riivo Kikas, Georgios Gousios, Marlon Dumas, and Dietmar Pfahl. 2017. Structure and evolution of package dependency networks. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 102ś112 (2017).
[19]
Raula Gaikovina Kula, Coen De Roover, Daniel German, Takashi Ishio, and Katsuro Inoue. 2014. Visualizing the evolution of systems and their library dependencies. In 2014 Second IEEE Working Conference on Software Visualization. IEEE, 127–136.
[20]
Suhaib Mujahid, Rabe Abdalkareem, and Emad Shihab. 2023. What are the characteristics of highly-selected packages? A case study on the npm ecosystem. Journal of Systems and Software 198 (2023), 111588.
[21]
Suhaib Mujahid, Diego Elias Costa, Rabe Abdalkareem, Emad Shihab, Mohamed Aymen Saied, and Bram Adams. 2021. Toward using package centrality trend to identify packages in decline. IEEE Transactions on Engineering Management (2021).
[22]
Suchita Mukherjee, Abigail Almanza, and Cindy Rubio-González. 2021. Fixing dependency errors for Python build reproducibility. In Proceedings of the 30th ACM SIGSOFT international symposium on software testing and analysis. 439–451.
[23]
Yun Peng, Ruida Hu, Ruoke Wang, Cuiyun Gao, Shuqing Li, and Michael R Lyu. 2023. Less is More? An Empirical Study on Configuration Issues in Python PyPI Ecosystem. arXiv preprint arXiv:2310.12598 (2023).
[24]
Mike Pittenger. 2016. Open source security analysis: The state of open source security in commercial applications. Black Duck Software, Tech. Rep (2016).
[25]
Steven Raemaekers, Arie van Deursen, and Joost Visser. 2011. Exploring risks in the usage of third-party libraries. In of the BElgian-NEtherlands software eVOLution seminar. 31.
[26]
Kristiina Rahkema and Dietmar Pfahl. 2022. SwiftDependencyChecker: Detecting Vulnerable Dependencies Declared Through CocoaPods, Carthage and Swift PM. In 2022 IEEE/ACM 9th International Conference on Mobile Software Engineering and Systems (MobileSoft). IEEE, 107–111.
[27]
Per Runeson and Martin Höst. 2009. Guidelines for conducting and reporting case study research in software engineering. Empirical software engineering 14, 2 (2009), 131–164.
[28]
Alexandros Tsakpinis. 2023. Analyzing Maintenance Activities of Software Libraries. In Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering. 313–318.
[29]
Alexandros Tsakpinis and Alexander Pretschner. 2024. Analyzing the Accessibility of GitHub Repositories for PyPI and NPM Libraries. (4 2024). https://doi.org/10.6084/m9.figshare.25101428
[30]
Marat Valiev, Bogdan Vasilescu, and James Herbsleb. 2018. Ecosystem-level determinants of sustained activity in open-source projects: A case study of the PyPI ecosystem. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 644–655.
[31]
Ying Wang, Ming Wen, Yepang Liu, Yibo Wang, Zhenming Li, Chao Wang, Hai Yu, Shing-Chi Cheung, Chang Xu, and Zhiliang Zhu. 2020. Watchman: Monitoring dependency conflicts for python library ecosystem. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 125–135.

Cited By

View all
  • (2024)Analyzing the Accessibility of GitHub Repositories for PyPI and NPM LibrariesProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661231(345-350)Online publication date: 18-Jun-2024

Index Terms

  1. Analyzing the Accessibility of GitHub Repositories for PyPI and NPM Libraries

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    EASE '24: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering
    June 2024
    728 pages
    ISBN:9798400717017
    DOI:10.1145/3661167
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 June 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Maintenance Activities
    2. OSS Libraries
    3. Repository Mining

    Qualifiers

    • Short-paper
    • Research
    • Refereed limited

    Conference

    EASE 2024

    Acceptance Rates

    Overall Acceptance Rate 71 of 232 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)38
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 08 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Analyzing the Accessibility of GitHub Repositories for PyPI and NPM LibrariesProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661231(345-350)Online publication date: 18-Jun-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media