Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3576914.3587523acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article
Public Access

Software Introspection for Signaling Social-Cyber Operations

Published: 09 May 2023 Publication History

Abstract

Open-source software (OSS) is a critical element in the design and operation of complex cyber-physical systems. Contributions to OSS projects are typically the result of voluntary work and time allocation by researchers, software developers, hackers, and even opportunistic programmers. These communities often operate on a “trust” basis, and although they strive to evaluate the technical correctness and merits of contributed code, the processes they use are usually loosely supervised. Social rules, trust, reputation, and even arcane processes often govern these communities. While these components have undoubtedly contributed to the growth and expansion of OSS, they could also lead to opportunities for subversion [3], hindering the reliability of an OSS project. This, in turn, could not only compromise the integrity of cyber-physical systems depending on OSS but also affect their performance.
The risks of new and emerging socio-technical attack vectors on cyber-physical systems that rely on OSS are real, broad, and growing [8]. Therefore, it is essential for the cyber-defense community to develop both a comprehensive and a deeper understanding of the socio-technical behavior and behavioral dynamics involved in these attacks. Additionally, mechanisms must be in place to extract latent information hidden in these operations. Much of the previous research on understanding socio-technical behavior in OSS projects has focused on a static view of the problem, paying close attention to individual and publicly available traces of information involving source code, commits, logs, or external packages (e.g., [7]). However, social-cyber operations are not static [6]. Instead, they can change over time to help potential contributors build a reputation and eventually become project committers (as seen in the case study of a “successful socialization” scenario in Ducheneaut [2]). Furthermore, some of these dynamics, particularly those related to vulnerability fixes, may occur behind closed doors and be black boxes of complexity [4]. Introspecting multiple streams of information resulting from both social and technical interactions across and between development channels (such as mailing lists, version control systems, and source code) can help us open this black box and build a high fidelity model of socio-technical behavior. Such a model can be operationalized as an early warning mechanism to highlight emergent social-cyber operations that aim to undermine the integrity of OSS projects and their dependent cyber-physical systems. This paper summarizes SIGNAL 1, a single and coherent software introspection capability for signaling social-cyber operations against cyber-physical systems that depend on OSS projects.
As shown in Figure 1, SIGNAL views an OSS project as a changing artifact that grows and evolves over time through socially vetted modifications submitted by programmers. SIGNAL is grounded on three key and inter-connected components: (1) Explainable persuasive behavior extraction (Yellow Patch), (2) Graph-based revision history analysis (Sensor), and (3) Self-supervised mechanisms for dynamic trace analysis (Antenna). In the first component, SIGNAL combines white-box transfer learning for Random Forest and exploratory factor analysis to compute an accurate model of persuasive developer action flows emerging within a project’s social and technical channels. This effectively links key traces of developer social and technical interactions to their associated traces of code modifications. The computed model achieves a comparable accuracy (~68%) to the state-of-the-art [9], and 16x faster training time. In the second component, SIGNAL introduces a novel graph-based pattern mining approach for detecting API misuses that originated from persuasive developer activities. This component looks at chains of code changes in OSS projects to evaluate structural and semantic patterns. It uses this information to identify API misuses. In the third component, SIGNAL combines the output of the first two components and performs self-supervision on their temporal ordering to learn dynamic developer activity embeddings. These embeddings can be used to track the semantic evolution of developer contribution ploys. An advantage of using an embedding approach to track the semantic evolution of socio-technical behavior is that it produces a natural “backtrace” of contributors’ modus operandi. This backtrace details how their actions exploit seams within a project to influence technical change.
Case Study: The Evolution of Hypocrite Commits in the Linux Kernel. In a recent work [5], we assessed the effectiveness of SIGNAL in introspecting a well-documented social engineering attack against the Linux Kernel, specifically the “hypocrite commits” [11]. “Hypocrite commits” refer to scenarios where an attacker exploits the social landscape of OSS projects, such as the Linux Kernel in this case, to earn the trust of maintainers before introducing malicious code or malware that can lead to critical vulnerabilities in the OSS project or its subsystems. Our SIGNAL analysis of the 2020 social engineering attack against the Linux Kernel revealed new and distinct social-cyber operation traces, as depicted in Figure 1 of our recent study. In [5], we sought to capture the dynamics of influence-seeking and trust-building operations carried out by adversaries seeking to acquire write permissions to an OSS project. Additionally, we drew similarities between OSS development life-cycle and online social networks [10] and introduced the concept of trust ascendancy. This concept describes any influence-seeking and trust-building operations seeking to change a project’s technical direction.
In our SIGNAL analysis of the “hypocrite commits” attack, we collected mailing-list, patch, and commit data from August to November 2020, the period when the attack took place [1]. Our approach was hybrid as it formulated our analysis task as an unsupervised learning task with a self-supervised learning twist. Through our experiments, we successfully captured the modus operandi trajectories followed by the aliases involved in the attack and identified a series of potentially influenced maintainers and core contributors. In the process, we also identified a series of trust ascendancy classes, such as opportunistic or awry trust ascendancy 2.
Remarks. SIGNAL makes the following technical contributions:
Moving forward, we aim to scale SIGNAL to new case studies and to larger volumes of diverse socio-technical activity data. Our goal is to chart the strategic landscape of influence-seeking and trust-building operations in OSS development while avoiding information overload and unnecessary CPU-intensive data operations. We anticipate these efforts will facilitate new research in secure and continuous software development, benefiting the advancement of complex cyber-physical system design and development.

References

[1]
Kees Cook. 2021. Report on University of Minnesota Breach-of-Trust Incident. https://lkml.org/lkml/2021/5/5/1244 Accessed: 2023-02-20.
[2]
Nicolas Ducheneaut. 2005. Socialization in an open source software community: A socio-technical analysis. Computer Supported Cooperative Work (CSCW) 14, 4 (2005), 323–368.
[3]
Luiz Giovanini, Daniela Oliveira, Huascar Sanchez, and Deborah Shands. 2021. Leveraging Team Dynamics to Predict Open-source Software Projects’ Susceptibility to Social Engineering Attacks. arXiv preprint arXiv:2106.16067 (2021).
[4]
Ralf Ramsauer, Lukas Bulwahn, Daniel Lohmann, and Wolfgang Mauerer. 2020. The sound of silence: Mining security vulnerabilities from secret integration channels in open-source projects. In Proceedings of the 2020 ACM SIGSAC Conference on Cloud Computing Security Workshop. 147–157.
[5]
Huascar Sanchez and Briland Hitaj. 2022. Trust in Motion: Capturing Trust Ascendancy in Open-Source Projects using Hybrid AI. arXiv preprint arXiv:2210.02656 (2022).
[6]
Yun Shen and Gianluca Stringhini. 2019. ATTACK2VEC: Leveraging Temporal Word Embeddings to Understand the Evolution of Cyberattacks. In 28th USENIX Security Symposium (USENIX Security 19). USENIX Association, Santa Clara, CA, 905–921.
[7]
Nikolai Sviridov, Mikhail Evtikhiev, and Vladimir Kovalenko. 2021. TNM: A Tool for Mining of Socio-Technical Data from Git Repositories. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 295–299.
[8]
Synopsys. 2023. Open Source Security and Risk Analysis Report. https://tinyurl.com/4j8zp82y Accessed: 2023-02-20.
[9]
Xuewei Wang, Weiyan Shi, Richard Kim, Yoojung Oh, Sijia Yang, Jingwen Zhang, and Zhou Yu. 2019. Persuasion for good: Towards a personalized persuasive dialogue system for social good. arXiv preprint arXiv:1906.06725 (2019).
[10]
Yi Wang and David Redmiles. 2016. The diffusion of trust and cooperation in teams with individuals’ variations on baseline trust. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing. 303–318.
[11]
Qiushi Wu and Kangjie Lu. 2021. On the feasibility of stealthily introducing vulnerabilities in open-source software via hypocrite commits. http://www.coding-guidelines.com/code-data/OpenSourceInsecurity.pdf. (2021).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CPS-IoT Week '23: Proceedings of Cyber-Physical Systems and Internet of Things Week 2023
May 2023
419 pages
ISBN:9798400700491
DOI:10.1145/3576914
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 May 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. machine learning
  2. open-source software introspection
  3. socio-technical analysis
  4. trust ascendancy

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

CPS-IoT Week '23
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 89
    Total Downloads
  • Downloads (Last 12 months)54
  • Downloads (Last 6 weeks)15
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media