Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Testing research software: a survey

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Background

Research software plays an important role in solving real-life problems, empowering scientific innovations, and handling emergency situations. Therefore, the correctness and trustworthiness of research software are of absolute importance. Software testing is an important activity for identifying problematic code and helping to produce high-quality software. However, testing of research software is difficult due to the complexity of the underlying science, relatively unknown results from scientific algorithms, and the culture of the research software community.

Aims

The goal of this paper is to better understand current testing practices, identify challenges, and provide recommendations on how to improve the testing process for research software development.

Method

We surveyed members of the research software developer community to collect information regarding their knowledge about and use of software testing in their projects.

Results

We analysed 120 responses and identified that even though research software developers report they have an average level of knowledge about software testing, they still find it difficult due to the numerous challenges involved. However, there are a number of ways, such as proper training, that can improve the testing process for research software.

Conclusions

Testing can be challenging for any type of software. This difficulty is especially present in the development of research software, where software engineering activities are typically given less attention. To produce trustworthy results from research software, there is a need for a culture change so that testing is valued and teams devote appropriate effort to writing and executing tests.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. https://se4science.org/workshops/

  2. The survey data is available in a public repository but set to private until publication of this paper (Carver and Eisty 2021)

  3. https://www.istqb.org/

References

Download references

Acknowledgements

We thank the study participants and NSF-1445344.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nasir U. Eisty.

Ethics declarations

Conflict of Interest

None

Additional information

Communicated by: Dietmar Pfahl

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Definitions Provided

We refer to Figs. 1 and 2 for the survey question. In this section we listed the definitions we provided in the actual survey.

  • Acceptance testing—Assess software with respect to requirements or users’ needs.

  • Architect—An individual who is a software development expert who makes high-level design choices and dictates technical standards, including software coding standards, tools, and platforms.

  • Assertion checking—Testing some necessary property of the program under test using a boolean expression or a constraint to verify.

  • Backward compatibility testing—Testing whether the newly updated software works well with an older version of the environment or not.

  • Branch coverage—Testing code coverage by making sure all branches in the program source code are tested at least once.

  • Boundary value analysis—Testing the output by checking if defects exist at boundary values.

  • Condition coverage—Testing code coverage by making sure all conditions in the program source code are tested at least once.

  • Decision table based testing—Testing the output by dealing with different combinations of inputs which produce different results.

  • Developer—An individual who writes, debugs, and executes the source code of a software application.

  • Dual coding—Testing the models created using two different algorithms while using the same or most common set of features.

  • Equivalence partitioning—Testing a set of the group by picking a few values or numbers to understood that all values from that group generate the same output.

  • Error Guessing—Testing the output where the test analyst uses his / her experience to guess the problematic areas of the application.

  • Executive—An individual who establishes and directs the strategic long term goals, policies, and procedures for an organization’s software development program.

  • Fuzzing test—Testing the software for failures or error messages that are presented due to unexpected or random inputs.

  • Graph coverage—Testing code coverage by mapping executable statements and branches to a control flow graph and cover the graph in some way.

  • Input space partitioning—Testing the output by dividing the input space according to logical partitioning and choosing elements from the input space of the software being tested.

  • Integration testing—Asses software with respect to subsystem design.

  • Logic coverage—Testing both semantic and syntactic meaning of how a logical expression is formulated.

  • Maintainer—An individual who builds source code into a binary package for distribution, commit patches or organize code in a source repository.

  • Manager—An individual who is responsible for overseeing and coordinating the people, resources, and processes required to deliver new software or upgrade existing products.

  • Metamorphic testing—Testing how a particular change in input of the program would change the output.

  • Module testing—Asses software with respect to detailed design.

  • Monte carlo test—Testing numerical results using repeated random sampling.

  • Performance testing—Testing some of the non-functional quality attributes of software like Stability, reliability, availability.

  • Quality Assurance Engineer—An individual who tracks the development process, oversee production, testing each part to ensure it meets standards before moving to the next phase.

  • State transition—Testing the outputs by changes to the input conditions or changes to ’state’ of the system.

  • Statement coverage—Testing code coverage by making sure all statements in the program source code are tested at least once.

  • Syntax-based testing—Testing the output using syntax to generate artifacts that are valid or invalid.

  • System testing—Asses software with respect to architectural design and overall behavior.

  • Test driven development—Testing the output by writing an (initially failing) automated test case that defines a desired improvement or new function, then produces the minimum amount of code to pass that test.

  • Unit testing—Asses software with respect to implementation.

  • Using machine learning—Testing the output values using different machine learning techniques.

  • Using statistical tests—Testing the output values using different statistical tests.

Appendix B: List of Testing Techniques

This appendix provides the list of testing techniques respondents mentioned they were familiar with in response to the survey question Q9. The numbers in the parenthesis represent how many respondents indicated that testing technique.

2.1 B.1 Testing Methods

Acceptance testing (9), Integration testing (43), System testing (14), Unit testing (87)

2.2 B.2 Testing Techniques

A/B testing (1), Accuracy testing (1), Alpha testing (1), Approval testing (2), Answer testing (1), Assertions testing (3), Behavioral testing (1), Beta testing (1), Bit-for-bit (1), Black-box testing (4), Built environment testing (1), Builtd testing (1), Checklist testing (1), Checksum (1), Compatibility Testing (1), Concolic testing (1), Correctness tests (1), Dependencies testing (1), Deployment testing (1), Dynamic testing (3), End-to-end testing (2), Equivalence class (1), Engineering tests (1), Exploratory tests (1), Functional testing (6), Fuzz testing (12), Golden master testing (1), Install testing (1), Jenkins automated testing (1), Load testing (1), Manual testing (2), Memory testing (6), Mock testing (6), Mutation testing (5), Penetration testing (1), Performance testing (6), Periodic testing (1), Physics testing (1), Property-based testing (2), Random input testing (2), Reference runs on test datasets (1), Regression testing (39), Reliability testing (1), Resolution testing (1), Scientific testing (2), Security testing (1), Smoke test (2), Statistical testing (1), Stress test (1), Usability testing (1), Use case test (1), User testing (2), Validation testing (9), White-box testing (2)

2.3 B.3 Testing Tools

CTest (3), gtest (1), jUnit (1)

2.4 B.4 Other types of QA

Code coverage (16), Code reviews (2), Documentation checking (3), Static analysis (6)

2.5 B.5 Others

Agile (1), Asan (1), Automatic test-case generation (1), Behavior-Driven Development (1), Bamboo (1), Benchmarking (1), Caliper (1), Code style checking (1), Coding standards (1), Comparison with analytical solutions (1), Continuous integration (33), Contracts (1), DBC (1), Design by contract (1), Doctests (1), Formal Methods (2), GitLab (1), License compliance (1), Linting (1), Method of exact solutions (1), Method of manufacture solution (2), Monitoring production apps (1), Msan (1), N-version (1), Nightly (1), Pre-commit (1), Profiling (1), Release (1), Run-time instrumentation and logging (1), Squish (1), Test-driven development (18), Test suites (1), Tsan (1), Visual Studio (2)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Eisty, N.U., Carver, J.C. Testing research software: a survey. Empir Software Eng 27, 138 (2022). https://doi.org/10.1007/s10664-022-10184-9

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-022-10184-9

Keywords