Software Rejuvenation
3 Followers
Recent papers in Software Rejuvenation
In this paper we present a comparative experimental study of the main software rejuve-nation techniques developed so far to mitigate the software aging effects. We consider six different rejuvenation techniques with different levels of... more
In this paper we present a comparative experimental study of the main software rejuve-nation techniques developed so far to mitigate the software aging effects. We consider six different rejuvenation techniques with different levels of granularity: (i) physical node re-boot, (ii) virtual machine reboot, (iii) OS reboot, (iv) fast OS reboot, (v) standalone application restart, and (vi) application rejuvenation by a hot standby server. We conduct a set of experiments injecting memory leaks at the application level. We evaluate the performance overhead introduced by software rejuvenation in terms of throughput loss, failed requests, slow requests, and memory fragmentation overhead. We also analyze the selected rejuve-nation techniques' efficiency in mitigating the aging effects. Due to the growing adoption of virtualization technology, we also analyze the overhead of the rejuvenation techniques in virtualized environments. The results show that the performance overheads introduced by the rejuvenation techniques are related to the granularity level. We also capture different levels of memory fragmentation overhead induced by the virtualization demonstrating some drawbacks of using virtualization in comparison with non-virtualized rejuvenation approaches. Finally, based on these research findings we present comprehensive guidelines to support decision making during the design of rejuvenation scheduling algorithms, as well as in selecting the appropriate rejuvenation mechanism.
- by Kishor S Trivedi and +1
- •
- Software aging, Software Rejuvenation
—Recently, the phenomenon of software aging, one in which the state of the software system degrades with time, has been reported. This phenomenon, which may eventually lead to system performance degradation and/or crash/hang failure, is... more
—Recently, the phenomenon of software aging, one in which the state of the software system degrades with time, has been reported. This phenomenon, which may eventually lead to system performance degradation and/or crash/hang failure, is the result of exhaustion of operating system resources, data corruption, and numerical error accumulation. To counteract software aging, a technique called software rejuvenation has been proposed, which essentially involves occasionally terminating an application or a system, cleaning its internal state and/or its environment, and restarting it. Since rejuvenation incurs an overhead, an important research issue is to determine optimal times to initiate this action. In this paper, we first describe how to include faults attributed to software aging in the framework of Gray's software fault classification (deterministic and transient), and study the treatment and recovery strategies for each of the fault classes. We then construct a semi-Markov reward model based on workload and resource usage data collected from the UNIX operating system. We identify different workload states using statistical cluster analysis, estimate transition probabilities, and sojourn time distributions from the data. Corresponding to each resource, a reward function is then defined for the model based on the rate of resource depletion in each state. The model is then solved to obtain estimated times to exhaustion for each resource. The results of the semi-Markov reward model are then fed into a higher-level availability model that accounts for failure followed by reactive recovery, as well as proactive recovery. This comprehensive model is then used to derive optimal rejuvenation schedules that maximize availability or minimize downtime cost.
Long and continuous running of software can cause software aging-induced errors and failures. Cloud data centers suffer from these kinds of failures when Virtual Machine Monitors (VMMs), which control the execution of Virtual Machines... more
Long and continuous running of software can cause software aging-induced errors and failures. Cloud data centers suffer from these kinds of failures when Virtual Machine Monitors (VMMs), which control the execution of Virtual Machines (VMs), age. Software rejuvenation is a proactive fault management technique that can prevent the occurrence of future failures by terminating VMMs, cleaning up their internal states, and restarting them. However, the appropriate time and type of VMM rejuvenation can affect performance, availability, and power consumption of a system. In this paper, an analytical model is proposed based on Stochastic Activity Networks for performance evaluation of Infrastructure-as-a-Service cloud systems. Using the proposed model,
a two-threshold power-aware software rejuvenation scheme is presented. Many details of real cloud systems, such as VM multiplexing, migration of VMs between VMMs, VM heterogeneity, failure of VMMs, failure of VM migration, and different probabilities for arrival of different VM request types are investigated using the proposed model. The performance of the proposed rejuvenation scheme is compared with two baselines based on diverse performance, availability, and power consumption measures defined on the system.
a two-threshold power-aware software rejuvenation scheme is presented. Many details of real cloud systems, such as VM multiplexing, migration of VMs between VMMs, VM heterogeneity, failure of VMMs, failure of VM migration, and different probabilities for arrival of different VM request types are investigated using the proposed model. The performance of the proposed rejuvenation scheme is compared with two baselines based on diverse performance, availability, and power consumption measures defined on the system.
—Preventive maintenance of operational software systems, a novel technique for software fault tolerance, is used specifically to counteract the phenomenon of software " aging. " However, it incurs some overhead. The necessity to do... more
—Preventive maintenance of operational software systems, a novel technique for software fault tolerance, is used specifically to counteract the phenomenon of software " aging. " However, it incurs some overhead. The necessity to do preventive maintenance, not only in general purpose software systems of mass use, but also in safety-critical and highly available systems, clearly indicates the need to follow an analysis based approach to determine the optimal times to perform preventive maintenance. In this paper, we present an analytical model of a software system which serves transactions. Due to aging, not only the service rate of the software decreases with time, but also the software itself experiences crash/hang failures which result in its unavailability. Two policies for preventive maintenance are modeled and expressions for resulting steady state availability, probability that an arriving transaction is lost and an upper bound on the expected response time of a transition are derived. Numerical examples are presented to illustrate the applicability of the models.