4.2.1 Process-Related Challenges.
Table
5 reports the process-related challenges identified in our interviews, together with the traceability among which challenge has been encountered by which organization. It is important to remark that process-related challenges may not be specific to CI/CD, but are, more in general, challenges in the development process that, based on what was reported by the interview participants, have an impact on setting up and maintaining a CI/CD pipeline.
The challenges have been grouped into six different categories: general, culture, environment, testing, deployment, and simulators. For each category, in the following, we provide a brief description of the challenges belonging to it, together with some examples.
General. This category accounts for two challenges, each one mentioned by only 1 out of 10 organizations. One of the main benefits of adopting a CI/CD pipeline is related to the overall cycle time reduction (PRC\(_1\)). However, even if O\(_2\) has already invested effort and money in reducing the release time, it already sees space for reducing it: “The biggest problem . . . is cycle time. Three years ago, the cycle time was six weeks, while now we could do it every day. It is still not enough from a developer perspective because the feedback is not fast enough.” Although this challenge also applies to conventional software, when it comes to the CPS context, the challenge is exacerbated mainly due to the need of interacting with both HiL and simulators. In this regard, O\(_2\) mentioned that the cycle time cannot be easily reduced due to (i) the high costs for the infrastructure and (ii) the translation of test strategies to hardware devices being very demanding.
O
\(_7\) is facing problems when trying to onboard new developers (PRC
\(_2\)) mainly due to the complexity of the railways’ domain, as also found by Törngren and Sellgren [
74]. The interviewee stressed that in the railways’ domain, it is crucial to follow specific standards that need to be known and properly understood by developers and testers.
Culture. This category groups two challenges related to the presence of a limited CI/CD culture in the development teams. This may limit the possibility of properly leveraging CI/CD facilities throughout the development process. O
\(_1\) reports the adoption of a pipeline that only includes tasks that are easy to automate mainly due to
“lack of knowledge” (PRC
\(_3\)), as also found by Zampetti et al. [
85]. Instead, although O
\(_9\) has already in place a pipeline for developing and deploying mobile apps to the app-store (i.e.,
“The setting of a CI/CD pipeline in the mobile context has been very easy”), it does not have a pipeline for CPS development due to
“a lack of a deeper knowledge in the CI/CD context for CPS,” in particular for what concerns the interaction between software and hardware components (PRC
\(_4\)). Specifically, there is a need for knowledge on how to properly account for the inclusion and setting of both HiL and simulators in the CI/CD pipeline configuration, as well as how to include a feedback mechanism to gather information directly from the field.
Environment. This category features three different challenges dealing with the characteristics of the physical environment in which the developed code has to be deployed.
Among them, only PRC
\(_5\) (i.e., environment complexity) is mentioned by multiple organizations (5 out of 10), whereas the remaining 2 only come from O
\(_7\). The complexity of the environment impacts the execution environment being set (i.e., simulators or HiL). The unavailability of third-party simulators (and the need for self-developing them) impacts the ability to simulate certain behaviors, or even in deviations between HiL and simulated environments. The consequence is that builds executed on simulators will have a different outcome when run on HiL. For instance, O
\(_4\) mentioned:
“Walking is not so easy to simulate so we need a real walking robot for spotting bugs,” whereas O
\(_8\) stated:
“It could be difficult, demanding and expensive to have a one-to-one relationship between simulators and real systems.” Our findings stress what is already known from previous literature in terms of relying on simulated environments—that is, the testing over simulators may fail to expose problems that would only manifest when running the system on the real hardware [
52].
O
\(_7\) faces a problem related to the high environment variability (PRC
\(_6\)) [
74], due to trains having different characteristics:
“We can rarely copy-paste software that has to run on different train architectures.” At the same time, O
\(_7\) also faces a challenge due to the structure of its development process that is not cloud-based and has no redundancy (PRC
\(_7\)), implying that
“in the presence of network issues or server issues we are totally black and this is affecting everyone.”Testing. This category groups seven challenges. O
\(_5\) and O
\(_8\) mention as a challenge the substantial manual effort required for the test case specification process (PRC
\(_8\)). O
\(_3\) and O
\(_9\), instead, felt the manual execution of testing activities to be challenging—that is, PRC
\(_9\) (e.g.,
“Another big barrier is related to the test case execution that, at the moment, we are doing manually since both the environment setting and the oracle definition require manual intervention” for O
\(_9\)). Our findings confirm what is already pointed out by Mårtensson et al. [
52] in terms of the presence of complex user scenarios implying the need of manual testing.
O\(_7\) found it difficult to automate the test case specification mainly because the standards might be interpreted differently by different developers, and both might be correct (PRC\(_{10}\)): “how do you read the standard? The standard is interpreted so the same requirement can be differently interpreted by different people (a challenge for automation).” A different challenge experienced by O\(_3\) and O\(_4\) is related to the need for a controlled test environment (PRC\(_{11}\)) impacting the execution environment to be used in the pipeline. For instance, O\(_3\) mentioned: “Since the output of the system is sound and the test should check the sound quality it is better to have it in a controlled environment that makes use of simulation.”
Another test automation challenge is related to oracle specification (PRC
\(_{12}\)), as mentioned by 5 out of 10 organizations. The impossibility of specifying an automated oracle hinders what kinds of tests one can run in the pipeline. This may happen, for instance, when one needs to evaluate a signal received from a sensor—that is,
“The main challenges for automatizing the test execution: a good way to model the test itself and have an oracle that can compare with the actual behavior” stated by O
\(_3\). This aspect has already been mentioned by Mårtensson et al. [
52]; however, although they only talked about usability testing, we stress more the impediment in automatically determining and checking the test oracles, also for functional testing mainly due to outcome coming from real hardware devices working in a real environment with many external factors to control for (e.g., to check the quality of the acoustic signal coming from sensors) (O
\(_3\)).
The remaining two challenges are related to difficulties encountered when specifying/deriving integration (PRC
\(_{13}\)) and safety (PRC
\(_{14}\)) tests. With regard to the former, O
\(_{10}\) develops prototypes requiring the interconnection of many different sub-components. This makes it difficult to determine the expected system behavior:
“It is quite hard to derive integration test cases due to the complex combination of all different parts.” With regard to the specification of safety tests, in agreement to what indicated by Gautham et al. [
27], O
\(_4\), O
\(_5\), and O
\(_8\) pointed out the complexity to identify situations
“that could never happen” or
“that you do not expect to happen.” Checking for safety requirements is highly important, especially in those domains, such as aerospace and railways, where the safety integrity level of the system must be equal to or higher than 3.
Deployment. This category features two challenges occurring when deploying software on the customers’ side. Having deployment too late in the development process (PRC
\(_{15}\)) may result in installation issues (PC
\(_9\) shown later in Table
7), as experienced by O
\(_2\):
“we will not be able to run the software on the system because the installation even does not work on the system, because the update/upgrade does not work, or because the system behavior is not being considered in the early stages of development.” Then, there are cases where the deployment is expensive (PRC\(_{16}\)) in terms of time and effort needed to complete it. This impacts both the type of execution environment adopted within the pipeline, as well as the build triggering strategy. As experienced by O\(_8\) in the railways’ domain, the deployment on a test track requires “one day with people involved in the testing and on a train a couple of days where many people need to be involved.”
The late and expensive deployment is strictly related to the CPS nature. Indeed, as already highlighted in Section
4.1, the organizations deploy on real hardware devices only during the last stages of the overall CI/CD process, mainly due to the high costs of the hardware in specific domains such as railways and aerospace.
Simulators. The last category, among the process-related challenges, deals with the usage of simulators. O\(_3\) pointed out the presence of scenarios where it is complex to trust the outcome provided by the simulators since there might be many external factors impacting the behavior of the system in a real environment (PRC\(_{17}\)). Finally, as reported by O\(_8\), some scenarios cannot rely on simulators. Specifically, if it is complex for a human to specify the expected behavior for some scenarios, of course, it is not possible to rely on simulators that can emulate the same behavior (PRC\(_{18}\)).
4.2.2 Barriers for CI/CD Pipeline Setting and Maintaining and Related Mitigation.
Table
6 summarizes the five barriers encountered by the 10 organizations when applying CI/CD to CPSs. These barriers have been grouped into two categories, described in the following.
Resources. This category groups the barriers dealing with limited availability of human (B\(_1\)) and software and/or hardware resources (B\(_2\)), both influencing the type of execution environment adopted within the pipeline. Although we are aware that those barriers can also apply to conventional software systems, the barriers worsen for CPS development, where it is mandatory (i) to rely on simulators, mostly self-developed where you need high expertise about the domain, and (ii) to use HiL that is very expensive particularly in the CPS domain, such as railways and aerospace. For instance, O\(_8\) mostly relies on HiL due to limited availability of human resources having the skills needed to develop/configure simulators: “given the needs and the budget of our company, it’s much better for more complex scenarios to rely on the hardware in the loop and only use simulations when whatever needs to be simulated is very simple.”
All the interviewed organizations reported the limited availability of software and hardware resources. Specifically, O\(_6\) mentioned: “Based on the fact that in the avionics domain the cost of the hardware is very expensive, we do most of the work in simulated environments,” whereas O\(_7\) stated that “Resources for the hardware devices (hardware test tracks and testbeds as real trains) represent an issue for us. We have a limited number of test tracks.”
As reported later in Table
8, the analysis of the interviews’ transcripts has elicited two mitigation strategies: (i) prioritize and select the test cases to be included within the pipeline (i.e.,
“Some strategies rely on genetic algorithms to optimize the resources available for the testing execution environment” from O
\(_1\)), and (ii) adopt incremental builds mainly relying on impact analysis, as reported by O
\(_2\):
“for what concerns rolling builds we try to limit the amount of testing being executed in them to be as fast as possible.” The member-checking survey confirms the previous findings, and, as shown later in Table
8, 6 out of 10 organizations (O
\(_1\), O
\(_2\), O
\(_5\), O
\(_6\), O
\(_7\), O
\(_{10}\)) report to rely on test prioritization, whereas O
\(_3\), O
\(_4\), O
\(_8\), and O
\(_9\) consider it useful while having never used it. With regard to the adoption of incremental builds, instead, O
\(_1\), O
\(_2\), O
\(_4\), O
\(_7,\) and O
\(_9\) mention its adoption, whereas O
\(_8\) considers it a useful approach to deal with limited hardware/software resources.
Alternative solutions reported in the member-checking survey to cope with limited availability of resources are “architectural changes with improved testing concepts” (O\(_7\)) and, unsurprisingly, “platform virtualization” (O\(_5\)).
Domain. This category includes three different barriers, two of them highlighted by only one organization. Specifically, B\(_3\) and B\(_4\) are related to difficulties arising when automating certain phases in the CI/CD pipeline. For instance, O\(_6\) had to cope with the use of a real-time operating system that made task automation difficult: “the complexity of integrating within the pipeline the execution of nonfunctional testing and system testing,” whereas O\(_2\) could not implement automated deployment due to security policies for the healthcare domain: “We cannot deploy at the moment because a change in the security configuration of the software prevented our standard [deployment] process.”
B
\(_5\) is related to coping with a complex execution environment. Specifically, O
\(_{10}\) mentions that they could not integrate HiL in the CI/CD pipeline for safety reasons and adopts simulation/mocking for the hardware devices to overcome it. As shown later in Table
8, all the organizations facing this barrier used the same mitigation strategy to deal with it. Furthermore, O
\(_2\) mentions the possibility to rely on
“digital twin hardware that avoids the safety issues (no moving parts, no radiation) but simulates the hardware to some much better.”4.2.3 Pipeline-Related Challenges and Related Mitigation.
Table
7 summarizes the pipeline-related challenges faced by the 10 organizations. The challenges have been grouped into five categories, each one related to a specific aspect of the CI/CD pipeline setting and evolution: pipeline properties, thoroughness, simulators, HiL, and flaky behavior. In the following, we discuss each identified challenge, together with some examples from the study participants’ experiences, and related mitigation strategies.
Pipeline Properties. This category accounts for six different challenges, two of which deal with the build execution time (PC
\(_1\) and PC
\(_2\)), whereas the remaining four are related to the overall pipeline configuration. Four out of 10 organizations faced long build execution time, influencing the type of tasks automatized within the pipeline. For example, O
\(_6\) mentioned:
“Slow builds hinder the inclusion of running non-functional testing in the pipeline.” Although this is also considered a relevant challenge for conventional applications [
14,
77,
85], for CPSs the problem can be further exacerbated when deploying and executing software on simulators or HiL. The latter confirms what is already found by Mårtensson et al. [
52] highlighting how working with a highly integrated (tightly coupled) system, a small delivery to the main track may cause building and linking of a large part of the system resulting in long build times. The latter has been also mentioned by O
\(_2\) where there is a single integration branch where the components developed by their 70 teams are integrated into a single join point:
“each component has a test service so running unit tests is very fast but we have a huge amount of high-level testing that is easy to write but kills us in terms of execution time.” By looking at the result of the survey (Table
8), the interviewed organizations mentioned a wide set of actions to deal with the preceding challenge. One possibility is to prioritize and select only a subset of test cases in the test suite to be executed (used also by O
\(_1\), O
\(_2\), and O
\(_7\), and considered a useful action by O
\(_4\) and O
\(_8\)). A different approach, highlighted by O
\(_2\), deals with the introduction of parallelization within the overall build process:
“We have 20 test machines in parallel for managing the overall test size, especially for nightly builds.” The latter is also used by O
\(_4\), O
\(_5\), O
\(_6\), and O
\(_7\), whereas O
\(_8\) only felt it as useful. It is also possible to run the whole build process only within nightly builds, even if this may be controversial since it defeats the CI/CD purpose [
13]. However, this is considered acceptable for O
\(_1\), as its pipeline is limited in scope (i.e., used only for V&V purposes). In addition, O
\(_2\), O
\(_5\), O
\(_6\), and O
\(_7\) rely on nightly builds to execute time-intensive tasks while adopting incremental builds during working hours (O
\(_2\), O
\(_5\), and O
\(_{10}\)). The latter is also used by O
\(_7\) and O
\(_8\), whereas O
\(_4\) and O
\(_6\) consider the mitigation useful even if they have never adopted it.
A different challenge, experienced by O\(_9\), that can also apply to conventional systems, although it is more critical for CPSs, is related to the build time variability (PC\(_2\)), due to the adopted infrastructure “since our platform works in the cloud we need to know how much time it is required to acquire and elaborate a huge amount of data points.”
Moving to the overall pipeline configuration, in the absence of clear coding standards or guidelines, the adoption of code style checking tools becomes problematic, if not unfeasible (PC
\(_3\)). In this scenario, approaches for coding style inference may be desirable [
61,
83]. Similar considerations apply to bug-finding tools, sometimes inapplicable to CPSs for automating code review, as experienced by O
\(_7\): “
we need expertise on the developers’ side for determining whether or not a train is behaving in the expected way.” The latter is strictly related to PRC
\(_2\) where, in the presence of safety-critical systems, like the ones in the aerospace and railways domains, it is very difficult to find skilled experts in the domain from both the hardware and software viewpoints.
The lack of access to production code (as experienced by O
\(_1\)) limits the ability to properly set static analysis or testing tools (PC
\(_4\)):
“One big challenge is that we need to guarantee the protection of the source code: How to test a component without having its production code?” The latter is a specialization of the restricted access to information due to security aspects impediment found by Mårtensson et al. [
52]. On the same line, there is a challenge (PC
\(_5\)) related to the extent to which technology restrictions, or restrictions coming from the application domain, may impact the pipeline setting. For instance, O
\(_2\) mentioned that
“the Windows situation does not help us with dockerization,” and at the same time, they are having trouble in properly configuring the CI/CD pipeline for CPS since
“[they] need to follow medical application frameworks providing a base set of rules in terms of how to build applications and how to integrate them.” The latter results in the last challenge related to the impossibility to reuse previously built artifacts (PC
\(_6\)) in the integration branch (i.e., O
\(_2\) mentioned:
“It’s a huge pain that we do not reuse artifacts”), mainly due to constraints imposed by the domain.
Thoroughness. This category groups six challenges related to (i) ensuring the overall accuracy and completeness of the CI/CD pipeline (PC\(_7\), PC\(_8\), PC\(_9\)) scattered across eight organizations, and (ii) closing the DevOps loop by gathering data from the hardware (i.e., PC\(_{10}\), PC\(_{11}\), and PC\(_{12}\) experienced by 3 out of 10 organizations).
O
\(_1\) faces a challenge related to having a development environment detached from the execution environment (PC
\(_7\)). Another challenge (PC
\(_8\) experienced by O
\(_2\) and O
\(_6\)) occurs in the presence of incremental deployment, which makes it difficult to detect and isolate deployment errors. Furthermore, O
\(_6\) reported how this even makes it necessary to reconfigure the entire pipeline:
“you deploy blocks, if there is an error in one of the blocks detecting it and reconfigure and reset the pipeline is a problem.” Finally, continuous installation (PC
\(_9\)) cannot be achieved due to the late deployment strategy (PRC
\(_{17}\)). This is because changes to the environment impact the pipeline configuration, which needs to be adapted every time. For what concerns continuous installation problems, O
\(_2\), O
\(_3\), O
\(_4\), O
\(_5\), O
\(_7,\) and O
\(_8\) have encountered them, with O
\(_4\) pointing out that by using containerization it is possible to facilitate the switching between software versions to deploy, meaning that it will be possible to handle the variability of the environment in terms of dependencies. As shown in Table
8, containerization is also used by O
\(_5\), whereas O
\(_3\), O
\(_7,\) and O
\(_8\) consider it a viable solution.
Moving on to the need for closing the DevOps loop, the interviews indicated three different challenges hindering the acquisition of data from the physical environment (or hardware device). Working in a CPS context implies having a tight interaction with multiple hardware devices (i.e., sensors and actuators), in which gathering data from them could be problematic due to the presence of many external environmental factors that must be taken into account, as well as the need for having invasive measurement instruments directly in the field. Specifically, O\(_5\) stressed the introduction of performance degradation (PC\(_{10}\)) due to invasive measurement instruments: “The challenge is that monitoring becomes invasive with respect to the system performance,” as well as the presence of noise in the collected data (PC\(_{12}\)): “There are architectural ways to deal with that so that if some sensor does not update on time, you still can make a relatively informed decision. But even then, you have to make sure that the drift is not over a certain size because then you cannot make reasonable decisions anymore.” O\(_4\) and O\(_9\) highlighted the presence of uncontrollable factors in a CPS execution environment, making it challenging to close the DevOps loop. For instance, O\(_4\) reported: “Differently from other software applications, there is data that we cannot control such as the presence of something on the floor that the robot is not able to perceive so it will fail. You have to analyze the video data and this is very hard.”
Simulators. This category groups five challenges related to simulators’ issues and limitations stressed more in the CPS domain due to the high environment complexity [
74], which very often results in having scenarios that cannot be emulated, such as in the presence of many external environmental factors to be controlled. Specifically, the need to develop them in-house or the lack of specific skills may lead to simulators that are limited in their functionality (PC
\(_{13}\)). For instance, O
\(_8\) stated,
“we prefer to spend time in testing on real hardware instead of spending time in developing complex simulators,” whereas O
\(_4\) reported,
“Walking is not so easy to simulate, so we need a real walking robot for spotting bugs.” As shown in Table
8, it is a common habit to adopt a pipeline that relies on both simulators and HiL in different build stages to overcome the preceding challenge. A clear example of this happens in O
\(_7\), where there is a build process made up of three different build stages, each one adopting a specific execution environment (see Section
4.2).
A lack of knowledge about the device/system to simulate can lead to wrong assumptions, affecting the simulator’s correctness (PC\(_{14}\)) as experienced within O\(_{10}\): “This happens more at the beginning of a project when you are not too familiar with the device and you make assumptions on how it works.” These problems might have an impact on the whole CI/CD pipeline setting and trustworthiness, because it is possible to have deviations of the monitored system behavior between the real hardware and simulators.
As experienced by O\(_5\), the limited capability to simulate real-time properties (PC\(_{15}\)) hinders the applicability of simulators or at least raises the need for further tests on HiL. The latter is also confirmed by O\(_9\): “for what concerns the simulation for the RFID we think that the simulation will not give us any benefits due to their unpredictable behavior.”
Likewise for PC
\(_{13}\), the high level of interaction between different components (PC
\(_{16}\)) forces organizations to directly test feature interaction by using real devices instead of simulating them. Indeed, when using simulators for CPSs, it is important to remark that they have to interact with a too complex environment that must be simulated as well. As an example, O
\(_6\) mentions problems faced when simulating a car behavior:
“for the CAN data, what do you want to wish to happen here? If you are driving around something you need to know how fast the wheels are turning, as well as what the engine revolutions are together with other sensitive data you might pick up over the canvas. There are a lot of details that are very application dependent.” Also in this case, as shown in Table
8, organizations rely on pipeline configurations including different execution environments—that is, five out of eight organizations facing the challenge declare that this is a useful mitigation strategy (O
\(_2\), O
\(_3\), O
\(_4\), O
\(_8\), O
\(_9\)).
If an organization has to test third-party software, as in the case of O
\(_1\), there may be the need to run the simulated environment on a remote machine, which may be problematic when attempting to properly integrated it into a local pipeline (PC
\(_{17}\)), due to network security restrictions. Such a scenario typically occurs in the development of safety-critical systems (which very often are CPSs), because the software needs to be tested by somebody different from the development organization. To deal with this problem, O
\(_1\) mentions the usage of “timeout” within the pipeline. As shown in Table
8, O
\(_1\) and O
\(_5\) handle external simulator unavailability through timeouts, whereas O
\(_7\) and O
\(_{10}\) consider this useful yet they do not use it. O
\(_1\) also mentions they often
“request some customization at the customer side of their simulators. Sometimes it is accepted, most of the times not.”HiL. This category groups four challenges related to issues and limitations of using HiL in the CI/CD pipeline. As shown in Table
7, three out of four challenges in this category are experienced by multiple organizations, whereas PC
\(_{18}\) is organization dependent. Specifically, O
\(_{10}\) faces problems with checking hardware availability before running tests (PC
\(_{18}\)):
“One of the biggest problems, when any particular hardware is involved, is that the hardware may either not be available, or it may be switched off.” From a different perspective, as experienced by O\(_7\), O\(_8\), and O\(_9\), deployment on HiL may be challenging (PC\(_{19}\)). Specifically, in O\(_8,\) “remote installation cannot be used with real systems,” whereas in O\(_9,\) “The other challenge is related to having a fully automated deployment over the customers’ server in which it is possible to have full control on what is going on and try to identify, as soon as possible, failures/errors occurring during the deployment.”
Testing on HiL (PC\(_{20}\)) is considered very demanding to achieve. O\(_2\) reports, “If you translate test strategies to the hardware it is very demanding,” and this is mostly a consequence of limited human resources being available. However, there are cases where testing on HiL is constrained by the high cost and lack of scalability (PC\(_{21}\)) of the hardware devices/systems: “This costs and does not scale” for O\(_2\), or “ it is very costly to test on trains” for O\(_7\).
As shown in Table
8, the study participants identified two possible strategies to deal with these cost and scalability problems: (i) relying on a mixed pipeline where continuous builds run on simulators and some periodic builds on HiL (used by O
\(_2\), O
\(_4\), and O
\(_9\), and considered useful by O
\(_1\), O
\(_3\), O
\(_5\), and O
\(_8\)), or (ii) adopting the green build rule when transitioning between simulators and HiL [
15], as highlighted by O
\(_7\),
“Only when the tests in the virtual train are green can we move to the next step,” and also used by O
\(_2\), O
\(_3,\) and O
\(_4\). The alternative would be, as pointed out by the O
\(_5\) survey respondent,
“working with virtual devices instead of real hardware devices.”Flaky Behavior. This category accounts for seven different root causes that may lead to non-determinism in the build execution used for CPS development. Flakiness related to non-determinism during test execution [
89] has been largely studied [
16,
46,
48,
58,
90], and approaches to detect and cope with it have been proposed [
47,
49,
59,
65,
87]. Although similar to conventional software, dependency installation within the pipeline (PC
\(_{22}\)) may result in pipelines having a flaky behavior (e.g., for O
\(_4,\) “ROS uses GitHub repositories for dependency resolution so when GitHub or the repositories are down our build jobs will fail due to the impossibility of resolving dependencies”) or else little control over external resources (PC
\(_{26}\)) (e.g.,
“the most important root cause we experienced is related to the load on the server-side”)—the root causes behind flaky behavior in CPSs may be different from conventional software. Specifically, a CI/CD pipeline for CPSs can suffer from flakiness due to the following:
•
The complex interacting environment (PC\(_{23}\)) (i.e., CPSs are systems of systems with tight interactions among different components). For example, for O\(_2,\) “the complexity of [the] subsystems whose features interact across many indirections may lead to non-deterministic behaviors.”
•
HiL unavailability (PC\(_{24}\)), where without a proper check of the availability of hardware, the build outcome might fail intermittently since the pipeline was not able to properly communicate with the device. O\(_{10}\) reported: “We experienced flakiness in terms of non-deterministic behavior mainly due to hardware not being available.” In this specific scenario, it is important to properly discriminate between intermittent failures caused by communication issues with the HiL from failures due to wrongly implemented functionality.
•
Presence of noise in the measurements (PC\(_{25}\)) when using HiL (i.e., difficulty in removing the effect of external environmental factors from the data read from the sensors, as experienced by O\(_5\) and O\(_{10}\)). Specifically, for O\(_{10}\), “Other times the charge level that you read out would go a little bit higher or there is noise in the measurements,” whereas for O\(_5,\) “you need to understand what your sensors are sensing and what the acceptable range of inputs are.”
•
Network issues (PC\(_{27}\)) where, for instance, glitches in the network lead to a connections being lost, as reported by O\(_{10}\), stressing more in the CPS domain where you need to control among the communication occurring across a huge number of different hardware devices operating in a complex environment.
•
Simulators not coping with timing issues (PC\(_{28}\)). For example, O\(_{10}\) stated: “the last problem is related to multi-threaded programming.”
For what concerns flakiness mitigation, as highlighted in Table
8, when the problem is related to the lack of control over resources (PC
\(_{26}\)), the solutions adopted are (i) to change and fix the pipeline configuration (i.e., O
\(_7\) stated,
“The misbehavior is reported back to the integration team responsible for the Jenkins configuration to find a solution”), as well as (ii) to fix the root cause of the flaky behavior within the code:
“to not experience it anymore in the system” from O
\(_2\). When the root cause of the flaky behavior is in the networking (PC
\(_{27}\)), the organizations leverage the “usual” retries (O
\(_2\), O
\(_4\), O
\(_5\), O
\(_7\), O
\(_{10}\))—for example,
“of course we have some retry for network issues” for O
\(_4\), or
“For what concerns flaky connections, you have to be concerned about missed messages and retries” for O
\(_5\). O
\(_6\) instead only considers it a viable solution. Furthermore, the respondent belonging to O
\(_2\) mentioned as an alternative solution the
“introduction of quarantine builds together with an appropriate process of how to deal with these tests.”