short-paper

How much SPACE do metrics have in GenAI assisted software development?

Authors:

Samarth Sikand,

Kanchanjot Kaur Phokela,

Vibhu Saujanya Sharma,

Vikrant Kaulgud,

Pragya Sharma, and

Adam P. BurdenAuthors Info & Claims

ISEC '24: Proceedings of the 17th Innovations in Software Engineering Conference

February 2024

Article No.: 14, Pages 1 - 5

https://doi.org/10.1145/3641399.3641419

Published: 22 February 2024 Publication History

Abstract

Large Language Models (LLMs) are revolutionizing the way a developer creates software by replacing code with natural language prompts as primary drivers. While many initial assessments of such LLMs suggest that it helps with developer productivity, other research studies have also pointed out areas in the Software Development Life Cycle(SDLC) and developer experience where such tools fail miserably. Currently, there exist many studies dedicated to evaluation of LLM-based AI-assisted software tools but there lacks a standardization of studies and metrics which may prove to be a hindrance to adoption of metrics and reproducible studies. The primary objective of this survey is to assess the recent user studies and surveys, aimed at evaluating different aspects of developer’s experience of using code-based LLMs, and highlight any existing gaps among them. We have leveraged the SPACE framework to enumerate and categorise metrics from studies conducting some form of controlled user experiments. In Generative AI assisted SDLC, the developer’s experience should encompass the ability to perform the in-hand task efficiently and effectively, with minimal friction using these LLM tools. Our exploration has led to some critical insights regarding complete absence of user studies in Collaborative aspects of teams, bias towards certain LLM models & metrics and lack of diversity in metrics within productivity dimensions. We also propose some recommendations to the research community which will help bring some conformity in the evaluation of such LLMs.

References

[1]

Naser Al Madi. 2022. How readable is model-generated code? examining readability and visual inspection of github copilot. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–5.

Digital Library

[2]

Rohan Anil, Andrew M Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, 2023. Palm 2 technical report. arXiv preprint arXiv:2305.10403 (2023).

[3]

Shraddha Barke, Michael B James, and Nadia Polikarpova. 2022. Grounded copilot: How programmers interact with code-generating models.(2022). CoRR arXiv 2206 (2022).

[4]

Christian Bird, Denae Ford, Thomas Zimmermann, Nicole Forsgren, Eirini Kalliamvakou, Travis Lowdermilk, and Idan Gazit. 2023. Taking Flight with Copilot. Commun. ACM 66, 6 (2023), 56–62.

Digital Library

[5]

Victor Dibia, Adam Fourney, Gagan Bansal, Forough Poursabzi-Sangdeh, Han Liu, and Saleema Amershi. 2022. Aligning Offline Metrics and Human Judgments of Value of AI-Pair Programmers. arXiv preprint arXiv:2210.16494 (2022).

[6]

Mikhail Evtikhiev, Egor Bogomolov, Yaroslav Sokolov, and Timofey Bryksin. 2023. Out of the bleu: how should we assess quality of the code generation models?Journal of Systems and Software 203 (2023), 111741.

[7]

Nicole Forsgren, Margaret-Anne Storey, Chandra Maddila, Thomas Zimmermann, Brian Houck, and Jenna Butler. 2021. The SPACE of Developer Productivity: There’s more to it than you think.Queue 19, 1 (2021), 20–48.

[8]

Saki Imai. 2022. Is github copilot a substitute for human pair-programming? an empirical study. In Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings. 319–321.

Digital Library

[9]

Ellen Jiang, Edwin Toh, Alejandra Molina, Kristen Olson, Claire Kayacik, Aaron Donsbach, Carrie J Cai, and Michael Terry. 2022. Discovering the syntax and strategies of natural language programming with generative language models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–19.

Digital Library

[10]

Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, 2023. StarCoder: may the source be with you!arXiv preprint arXiv:2305.06161 (2023).

[11]

OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774 [cs.CL]

[12]

Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, 2023. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950 (2023).

[13]

Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Brendan Dolan-Gavitt, and Siddharth Garg. 2022. Security implications of large language model code assistants: A user study. arXiv preprint arXiv:2208.09727 (2022).

[14]

Jiao Sun, Q Vera Liao, Michael Muller, Mayank Agarwal, Stephanie Houde, Kartik Talamadupula, and Justin D Weisz. 2022. Investigating explainability of generative AI for code through scenario-based design. In 27th International Conference on Intelligent User Interfaces. 212–228.

Digital Library

[15]

Priyan Vaithilingam, Tianyi Zhang, and Elena L Glassman. 2022. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In Chi conference on human factors in computing systems extended abstracts. 1–7.

Digital Library

[16]

Helena Vasconcelos, Gagan Bansal, Adam Fourney, Q Vera Liao, and Jennifer Wortman Vaughan. 2022. Generation probabilities are not enough: improving error highlighting for AI code suggestions. In Virtual Workshop on Human-Centered AI Workshop at NeurIPS (HCAI@ NeurIPS’22). Virtual Event, USA. 1–4.

[17]

Frank F Xu, Bogdan Vasilescu, and Graham Neubig. 2022. In-ide code generation from natural language: Promise and challenges. ACM Transactions on Software Engineering and Methodology (TOSEM) 31, 2 (2022), 1–47.

Digital Library

[18]

Daoguang Zan, Bei Chen, Fengji Zhang, Dianjie Lu, Bingchao Wu, Bei Guan, Wang Yongji, and Jian-Guang Lou. 2023. Large Language Models Meet NL2Code: A Survey. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 7443–7464. https://doi.org/10.18653/v1/2023.acl-long.411

[19]

Albert Ziegler, Eirini Kalliamvakou, X Alice Li, Andrew Rice, Devon Rifkin, Shawn Simister, Ganesh Sittampalam, and Edward Aftandilian. 2022. Productivity assessment of neural code completion. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming. 21–29.

Digital Library

Cited By

Coutinho MMarques LSantos ADahia MFrança Cde Souza Santos RAdams BZimmermann TOzkaya ILin DZhang J(2024)The Role of Generative AI in Software Development Productivity: A Pilot Case StudyProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3664773(131-138)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3664646.3664773

Index Terms

How much SPACE do metrics have in GenAI assisted software development?
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation

Recommendations

Metrics to Evaluate & Monitor Agile Based Software Development Projects - A Fuzzy Logic Approach
IWSM-MENSURA '12: Proceedings of the 2012 Joint Conference of the 22nd International Workshop on Software Measurement and the 2012 Seventh International Conference on Software Process and Product Measurement

When the initial requirements of a software project are not so consolidated and there is a gap from what the user is able to express and what actually is needed, it seems obvious that the traditional way of "waterfall" developing approach is not so much ...
Read More
Using metrics in Agile and Lean Software Development - A systematic literature review of industrial studies

ContextSoftware industry has widely adopted Agile software development methods. Agile literature proposes a few key metrics but little is known of the actual metrics use in Agile teams. ObjectiveThe objective of this paper is to increase knowledge of ...
Read More
Entropy Metrics for Agile Development Processes
ISSREW '12: Proceedings of the 2012 IEEE 23rd International Symposium on Software Reliability Engineering Workshops

Agile development processes are preferred by most of the software industry over plan-driven processes in recent years. The transition from plan-driven to agile processes has surfaced a problem: How to adopt metrics that will provide information about ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ISEC '24: Proceedings of the 17th Innovations in Software Engineering Conference

February 2024

144 pages

ISBN:9798400717673

DOI:10.1145/3641399

Editors:
Sujit Kumar Chakrabarti
IIIT Bangalore, India
,
Aseem Rastogi
Microsoft Research, India
,
Sudipto Ghosh
University of Colorado, USA
,
Raghavan Komondoor
IISc, India
,
Raveendra Kumar Medicherla
TCS Research, India
,
Lov Kumar
NIT Kurukshetra, India
,
Sangharatna Godboley
NIT Warangal

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Conference

ISEC 2024

ISEC 2024: 17th Innovations in Software Engineering Conference

February 22 - 24, 2024

Bangalore, India

Acceptance Rates

Overall Acceptance Rate 76 of 315 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
132
Total Downloads

Downloads (Last 12 months)132
Downloads (Last 6 weeks)27

Other Metrics

View Author Metrics

Citations

Cited By

Coutinho MMarques LSantos ADahia MFrança Cde Souza Santos RAdams BZimmermann TOzkaya ILin DZhang J(2024)The Role of Generative AI in Software Development Productivity: A Pilot Case StudyProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3664773(131-138)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3664646.3664773

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents