10
PREFiguRE: An Analytic Framework for HDD Management
FENG YAN, College of William and Mary
XENIA MOUNTROUIDOU, Jacksonville University
ALMA RISKA, NetApp Corporation
EVGENIA SMIRNI, College of William and Mary
Low disk drive utilization suggests that placing the drive into a power saving mode during idle times may
decrease power consumption. We present PREFiguRE, a robust framework that aims at harvesting future
idle intervals for power savings while meeting strict quality constraints: first, it contains potential delays
in serving IO requests that occur during power savings since the time to bring up the disk is not negligible,
and second, it ensures that the power saving mechanism is triggered a few times only, such that the disk
wear-out due to powering up and down does not compromise the disk’s lifetime. PREFiguRE is based on an
analytic methodology that uses the histogram of idle times to determine schedules for power saving modes
as a function of the preceding constraints. PREFiguRE facilitates analysis for the evaluation of the trade-offs
between power savings and quality targets for the current workload. Extensive experimentation on a set of
enterprise storage traces illustrates PREFiguRE’s effectiveness to consistently achieve high power savings
without undermining disk reliability and performance.
Categories and Subject Descriptors: C.4 [Computer Systems Organization]: Performance of Systems
General Terms: Design, Algorithms, Performance
Additional Key Words and Phrases: Performance modeling, scheduling, disk drives, power savings, reliability,
histograms
ACM Reference Format:
Feng Yan, Xenia Mountrouidou, Alma Riska, and Evgenia Smirni. 2016. PREFiguRE: An analytic framework
for HDD management. ACM Trans. Model. Perform. Eval. Comput. Syst. 1, 3, Article 10 (May 2016), 27 pages.
DOI: http://dx.doi.org/10.1145/2872331
1. INTRODUCTION
Storage systems in data centers host thousands of disk drives. Despite emerging new
storage technologies, such as solid state drives (SSDs), it is the hard disk drives (HDDs)
that continue to store the overwhelming majority of corporate data [Vasudeva 2011;
Grupp et al. 2012; Narayanan et al. 2009]. Specifically, HDDs are expected to store
aging data (from a few weeks old to several years old) that are expected to grow in
size over the years. Given the characteristic of data stored in HDDs, it is expected
that not all data in a vast data center is accessed simultaneously. Consequently, a
This work was supported by the National Science Foundation under grants CCF-0937925 and CCF-1218758.
Authors’ addresses: F. Yan, Room 140, McGlothlin-Street Hall, Department Computer Science, College of
William and Mary, Williamsburg, VA, 23187; email: fyan@cs.wm.edu; X. Mountrouidou, Wofford College,
Department of Computer Science, 445 Melbourne Ln., Spartanburg, SC, 29301; email: xenia.mountrouidou@
gmail.com; A. Riska, 57 Pine Plain Rd., Wellesley, MA, 02481; email: alma.dimnaku@gmail.com; E. Smirni,
P.O. Box 8795, Department of Computer Science, College of William and Mary, Williamsburg, VA, 23187;
email: esmirni@cs.wm.edu.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted
without fee provided that copies are not made or distributed for profit or commercial advantage and that
copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for
components of this work owned by others than ACM must be honored. Abstracting with credit is permitted.
To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this
work in other works requires prior specific permission and/or a fee. Permissions may be requested from
Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212)
869-0481, or permissions@acm.org.
c 2016 ACM 2376-3639/2016/05-ART10 $15.00
DOI: http://dx.doi.org/10.1145/2872331
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
10:2 F. Yan et al.
compelling approach for reducing power consumption in data centers is to spin down
idle HDDs. This approach is routinely deployed in storage systems that serve as
archival or backup systems [Colarelli and Grunwald 2002] and is being exploited even
in high-end computing environments [Narayanan et al. 2008].
Spinning down disk drives to save energy in a high-end environment transparently
to the end user and reliably to the disk drive’s lifetime is a challenging open problem
for a host of reasons. First, in enterprise environments, requests that arrive while the
drive is in a power saving mode are to be inevitably delayed during the time it takes
for the disk drive to reactivate (e.g., to be physically ready to serve jobs again). Second,
idle times can be highly fragmented while the overall drive utilization is very low, and
therefore idle periods that are long enough to be used effectively for power savings
may be very few [Riska and Smirni 2010]. Third, every power up/down wears out the
disk drive, which implies strict limitations on the number of times a disk drive can be
placed into a power saving mode without affecting its reliability.
Common practice methods try to address these challenges by idle waiting for a fixed
amount of time or use the past utilization to guide future scheduling decisions. However,
these common practice methods cannot provide performance guarantees nor take into
consideration disk reliability. To overcome these shortcomings, we present PREFiguRE,
a framework that uses as input user- or system-level constraints (e.g., the number of
allowable power ups/downs of a disk within a time period (strict constraint) and the
user acceptable potential performance degradation of future IOs (soft constraint)) and
estimate the projected power savings as well as provide a strategy on how these power
savings should occur. PREFiguRE uses as a basic tool the histogram of past idle times
and projects future power savings based on statistical information that is monitored
or extracted from this histogram. Probabilistic interpretation of all of the preceding
information leads PREFiguRE to define robust schedules for power saving modes. As
the workload changes in the system, the histogram of idle times and information about
the sequence of idle times are updated. Such updates enable the adjustment of the
schedules of power saving activation to the workload dynamics.
The core of PREFiguRE is a robust, accurate, and computationally efficient analytic
model that enables the identification of effective, user-transparent schedules of power
saving modes in disk drives. Most importantly, the analytic model that is encapsulated
in PREFiguRE encompasses a strong reliability component to comply with the restric-
tions on the number of times a hard disk can go into a specific power saving mode
during its lifetime [Kim and Suk 2007]. In addition, thanks to the excellent prediction
accuracy of the model, it is possible to answer a wide range of questions regarding the
power saving capabilities of the current disk workload. For instance, if the power gains
are projected to be marginal, then it may not be worth engaging the system in any power
savings mode or it may signal that part of the workload should be offloaded (to a buffer
or to another disk) such that idle times, and consequently, power savings, are increased.
Although the main contribution of our framework lies in its theoretical aspect, we
also conduct trace-driven simulations to verify its practical benefit. We drive the eval-
uation of PREFiguRE via a set of enterprise disk drive traces with a wide range of
idleness characteristics. The excellent agreement between the results from PREFig-
uRE’s analytic estimations and the trace-driven simulations suggests that our analytic
methodology can achieve good accuracy and robustness even under the real-world
workloads.
This article is organized as follows. Section 2 summarizes the power saving oppor-
tunities in disk drives and storage systems. In Section 3, we present the methodology
that we propose to identify and estimate the power saving opportunities in a system
under a given workload. We validate the effectiveness of the approach and illustrate
its robustness in Section 4 using trace-driven analysis and simulations. Section 5
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
PREFiguRE: An Analytic Framework for HDD Management 10:3
Table I. Characteristics of Power Saving Modes
Operation Penalty
Mode Description Power Savings (seconds)
Level 1 Serving IOs 0% 0.0
Level 2 Active (but) idle 40% 0.0
Level 3 Unloaded heads 48% 0.5
Level 4 Slowed platters 60% 1
Level 5 Stopped platters 70% 8
Level 6 Shut down 95% 25
positions our contributions relatively to related work. Conclusions and future work
are given in Section 6.
2. POWER SAVING MODES IN DISK DRIVES
Disk drives represent the overwhelming majority of the storage devices deployed in
large data centers where power conservation is a priority. Individual disk drives con-
sume moderate amount of power compared to other components in a computer system.
However, disk drives tend to be more idle than other system components. This is partic-
ularly true in large data centers that deploy thousands of disk drives and host terabytes
and petabytes of data, which are not all accessed simultaneously.
Disk drives are complex hardware devices that consist of both mechanical and elec-
tronic components. The mechanical components, such as the platters that rotate at
high speeds, or the positioning arm that is kept at a specific distance away from the
platters, continue to consume power even when not accessing data. Similarly, the elec-
tronics in a disk drive consume power even during periods of idleness. Overall, disk
drives consume less power when they are idle than when they serve IOs.
Beyond the moderate power savings when an active disk is idle (i.e., the “active idle”
state), additional power can be saved by slowing down components in a disk drive, such
as platter rotation, or by unloading and parking the heads (and the positioning arm) on
the side instead of flying them at constant height over the platters. Finally, completely
shutting down the disk drive eliminates almost the entire power consumption from the
disk drive. Slowing or shutting down the disk comes nonetheless with a performance
cost to user IOs, because bringing the disk back to its active state takes time, which
ranges from hundreds of milliseconds to tens of seconds. This required time period to
reactivate a disk drive can be viewed as an unavoidable performance penalty paid by
those IOs that by arrival find the disk drive that stores their data in an inactive (i.e.,
power saving) mode.
There are several levels of power consumption depending on the state of the disk’s
mechanical and electronic components. Each power consumption level or mode is char-
acterized by the amount of power it consumes and the amount of time it takes to get
out of the power saving mode and become ready to serve IOs. The exact amount of
power saved in a given power saving mode, or the amount of time it takes to become
ready again, differs between disk drive families and manufacturers. Table I presents a
coarse description of the possible power saving modes focusing on the components that
are slowed down or shut off and the penalties associated with each power saving mode.
The reported penalty values are within representative ranges published by disk drive
manufacturers [Seagate Technology 2012, 2014; Hitachi Global Storage Technologies
2007]. For example, the penalty (in seconds) for Level 6 is between 23 (typical) and
30 (max) [Seagate Technology 2014].
Note that during the process of bringing a disk drive out of a power saving mode,
the consumed power surges before settling to a normal consumption level. As with the
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
10:4 F. Yan et al.
power savings in Table I, this power surge during reactivation depends on the drive
family and manufacturer.
The time it takes a disk to become active following a power saving mode make
obvious the need to account for the performance penalty before deciding on a disk
operation mode for power savings. One could argue that putting the disk into an idle
mode immediately after any idleness is detected could maximize power savings. Given
the stochastic nature of the length of idle times and the penalty to bring the disk up to
active mode, it is important to use idle intervals that are sufficiently large (i.e., longer
than the reactivation time) for power savings. In storage systems, it is very common
to not put the system automatically in a power saving mode when an idle interval
is observed. Instead, the system waits for a time period in anticipation of future IO
arrivals.
In addition to the performance penalty associated with reactivating a disk drive that
is put in a power saving mode, there is a reliability penalty as well. The latter is not
straightforward to quantify, because it is associated with the wear-out of the disk drives
during power ups (i.e., in the spin-up phase) or reactivation of individual components.
In disk drives, the spin-up/down (Levels 5 and 6 in Table I) involves certainly more
components than loading/unloading heads (i.e., Level 3 in Table I) or spinning platters
slower while heads are parked on the side (i.e., Level 4 in Table I). While spin-up and
spin-down have been analyzed for years as part of the disk drive wear-out process [Li
et al. 1994], the heads load/unloading in disk drives is more recent and is introduced
solely for the purpose of power savings [Kim and Suk 2007]. As is discussed in the
following sections, in the enterprise environment, loads/unloads (Level 3) are expected
to occur more often because the penalty to bring the HDD into the active state is
smaller than the other power saving levels. During its lifetime, a disk drive is expected
to survive well beyond 300,000 loads/unloads [Kim and Suk 2007], which is used as a
threshold in the methodology in the article.
In the following section, we present a framework that determines when and for how
long a disk drive should be put into a power saving mode without violating a predefined
quality of service target. The framework takes into consideration both the performance
and reliability penalties associated with disk drive power saving modes.
3. ALGORITHMIC FRAMEWORK
Here, we develop an algorithmic framework that determines the schedule of the periods
when a disk drive is placed in power saving modes such that predefined targets of
system quality metrics are met. There are three system quality metrics used in the
framework. They include the performance degradation D, the portion of time the disk
is placed in power saving modes S, and the reliability constraint X. A definition of these
metrics and other notations used in the framework are given in Table II. Note that it
is not necessary to have all three system quality metrics set. For example, if only the
performance target D and the reliability target X are set, then the framework can meet
those targets while the third one (i.e., power saving S) is maximized. It is also possible
to set all three metrics, but whether all targets can be met depends on the viability of
the workloads. Note also that the application performance can be impacted by many
factors (e.g., CPU, memory, networking), and thus for an unbiased analysis, we focus
only on the disk performance itself, which is measured by the average response time
of IO requests.
In addition to the system quality targets, our framework bases its calculations on a
set of monitored (or predefined) input metrics. In particular, it uses the time penalty P
that is necessary to bring a disk drive out of a specific power saving mode. Recall that
different power saving modes have different penalties P. However, because P depends
on the disk drive model, the correct Ps for a given disk drive can be either received
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
PREFiguRE: An Analytic Framework for HDD Management 10:5
Table II. Notation Used in Section 3
Input Parameters
D Quality metric—performance: Relative average response time increase due to power
savings (in percentage).
S Quality metric—power savings: Portion of time in power savings (in in percentage).
X Quality metric—reliability: Number of reactivations per time unit a disk
can have without impacting its lifetime.
P Penalty due to power savings (i.e., time to reactivate a disk from a specific power saving
mode).
Monitored Metrics
p( j) Probability of idle interval of length j.
CDH( j) Cumulative probability of an idle interval of length at most j.
E[idle] Average idle interval length.
RT Average IO request response time.
Intermediate Metrics
W Average additional wait time IO requests experienced due to the disk in a power saving
mode.
wi Additional waiting time affecting IOs in the i th busy period following a power saving
mode.
Probi (w) Probability of w waiting time for the IOs in the i th busy period following a power saving
mode.
ij Length of the j th idle interval following a power saving mode.
Prob(LLl ) Probability of two idle intervals of at least length L to be l lags apart.
Output Parameters and Estimated Metrics
I Amount of time that should elapse in an idle disk before it is put into a power saving
mode.
T Maximum amount of time that a disk is kept in a power saving mode.
D(I,T ) Achieved average degradation of response time due to power savings.
S(I,T ) Achieved time in power savings.
Note: All time units are in milliseconds.
from the manufacturer or measured in offline testing. Note that P is the extra delay
due to power saving. This delay is in addition to any queuing delays that requests
may experience due to bursty or heavy arrivals. Throughout this article, the focus
is on estimating and reducing the delay due to power saving. The set of monitored
metrics used in our framework include the cumulative data histogram (CDH) of idle
times observed in the system and the average response time RT of IOs (excluding
any slowdown effect that previous power saving modes may have had on average IO
response time). The CDH is a list of tuples (at most a few thousands of them). We
stress that this representation is very efficient both memory-wise and computation-
wise. As we show later in this section, the estimation of scheduling parameters to meet
the required targets only requires a few scans of the CDH, which can be executed
almost instantaneously. Each tuple contains a range of idle interval lengths and their
corresponding empirical cumulative probability. Note that the CDH of idle times is
used to capture the characteristics of the overall workload in our framework. As a
result, the granularity of the CDH bins determines the accuracy of the estimations and
calculations. The coarser the CDH, the less accurate is our solution.
The monitored metrics can be easily obtained from the arrival and departure times of
IO requests in the system, which are generally monitored or can be monitored without
complex instrumentation. The framework adapts its decisions to changes in workload
(captured via the histogram of idle times, system utilization, and IO response time)
and other inputs. As a result, the output of the framework (i.e., the schedule of the
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
10:6 F. Yan et al.
Fig. 1. Examples of relationship between idle periods length, requests arrival, and parameters I, T , and P.
Orange represents busy times, blue represents idle times, and green represents power saving mode.
power-saving modes) changes if the workload that arrives in the disk drive changes or
if the system quality targets change. For example, in an enterprise storage system, the
performance quality target D can be adjusted to be more stringent during the day (i.e.,
business hours) and less stringent during the night (i.e., nonbusiness hours). Another
example is that the framework can estimate for a given performance target D and
reliability target X the time in power saving if Level 3 is used or if Level 4 is used.
Comparing the resulting time in power saving S allows the system to decide which
power saving mode to use (if any) for the current workload.
In our framework, power saving modes always take advantage of only the idle periods
in a disk drive and are not purposely scheduled if user requests are waiting for service
in the system. This condition must be satisfied even if the target power saving S is set
and not met. Here, we assume that the user workload has always a higher priority,
although our framework can be adapted to a situation where power savings have the
same priority or higher than the performance of user workload.
Given this consideration, we model the power saving modes as low-priority tasks
that need P units of time to be preempted. The IO requests arriving in the system
are modeled as high-priority tasks. Because the penalty P to preempt the low-priority
work (i.e., the time to reactivate the disk) is orders of magnitude higher than the
expected service and response time of user IOs, the performance impact that power
saving modes could have on user IOs may be significant. Our framework schedules
power saving modes in disk drives proactively (i.e., average IO slowdown is limited to
the performance target D). The framework achieves its targets by scheduling power
saving modes according to parameters I and T , where
—I represents the amount of time the system remains idle before a power saving mode
starts, and
—T represents the maximum amount of time the disk remains in a power saving mode
(i.e., if an IO arrives before T elapses, the power saving mode is interrupted). T
includes the penalty P, which implies that T > P.
The scheduling pair (I, T ) is recalculated every time the monitored metrics are
updated or the system quality target changes, adapting the scheduling of power saving
modes to the dynamics in the storage system.
Figure 1 demonstrates three examples of the relationship between idle period length,
arriving requests, and parameters I, T , and P. Figure 1(a) shows an idle period smaller
than I, Figure 1(b) shows an idle period larger than I but smaller than I + T , and
Figure 1(c) shows an idle period that is larger than I + T .
3.1. Modeling Waiting Times Due to Power Saving Modes
In our framework, the scheduling pair (I, T ) is calculated such that it guarantees the
quality targets (reliability, performance, and/or amount of power savings). To meet
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
PREFiguRE: An Analytic Framework for HDD Management 10:7
Fig. 2. (a) No delay propagation. (b) Delay propagates two busy periods.
the performance or power saving target, it is critical to estimate correctly the waiting
time (or delay) caused to IOs arriving during or after a power saving mode. Without
loss of generality, we measure the idle interval length as well as the wait within the
1ms granularity. The coarser the granularity, the less the accuracy, but the monitoring
overhead is expected to reduce.
Assume that W is the average IO waiting due to the power savings (i.e., W =
RTw power saving − RTwo power saving ). Because a disk is loaded upon an IO arrival, W
can be at most P (i.e., the time it takes the disk to become active). By denoting a
possible delay by w and its respective probability by Prob(w), then
P
W= w · Prob(w). (1)
w=1
We define a busy period as the time period when there are one or several IO requests
being served without idle time between requests. The power saving mode preemption
time P may be longer than the average idle interval. As a result, the delay due to
a power saving mode may not be absorbed by the immediately following idle period
and may propagate to impact multiple user busy periods. Figure 2 shows an example
of no delay propagation and one where delay propagates two busy periods. As shown
in Figure 2(a), the idle period following the second busy period is longer than the
delay caused by power savings; therefore, the delay is absorbed and does not propagate
further. Figure 2(b) is an example of when delay propagates two busy periods. The idle
period after the second busy period is very short and the delay caused by power savings
propagating into the third busy period, and therefore IO requests in both the second
and third busy periods are delayed. Although all IOs in one busy period get delayed by
the same amount, the delay propagates to multiple busy periods and different delays
may be caused to IOs in future busy periods because of the activation of a single power
saving mode.
To estimate Prob(w) of a delay w, we identify the events that happen during disk
reactivation that result in a delay w and estimate their corresponding probabilities.
These events are the basis for the estimation of the average waiting W due to power
savings. Without loss of generality, we assume that a disk reactivation affects at most K
consecutive user busy periods. The larger the K, the more accurate is our framework. In
general, the larger the P, the larger the value of K should be for better estimation accu-
racy. In our estimations, K is set to be equal to P, which represents the largest practical
value that K should take. During disk reactivation, the delay propagates as follows:
—First delay: User IOs arrive during a power saving mode or disk reactivation and
find an empty queue and a disk that is not ready for service. These IOs would have
made up the first user busy period if the disk would have been ready. Their waiting
due to power saving is w1 ms (where the index i = 1 indicates the first busy period
and 1 ≤ w1 ≤ P).
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
10:8 F. Yan et al.
—Second delay: User IOs in the “would-be” second busy period in the absence of the
power saving mode could also be delayed if the preceding wait w1 is longer than the
idle interval i2 that would have followed the preceding first busy period. The waiting
time experienced by the IOs of the second busy period following a power saving mode
is w2 = (w1 − i2 ).
—Further propagation: In general, the delay propagates through multiple consecutive
user busy periods until all intermediate idle periods absorb the initial delay w1 .
Specifically, the delay propagates for K consecutive user busy periods if (i2 + i3 + · · · +
iK ) < w1 < (i2 + i3 + · · · + iK + iK+1 ). The waiting times experienced by the IOs due to
this power saving mode are w j for 1 ≤ j ≤ K.
Denoting with Probk(w) the probability that wait w occurs to the IOs of the kth delayed
K
busy period, we estimate the probability of delay w as Prob(w) = k=1 Probk(w). The
delay P may occur only to IOs of the first delayed busy period, because for the IOs of
the second (or higher) delayed busy period, the intermediate idle interval would absorb
some of the delay and would therefore reduce it. The same argument can be used to
claim that the delay of P − 1 can occur only to IOs of the first and second delayed busy
periods. In general, it is true that the delay w = P − k may occur only to IOs of the first
k + 1 delayed busy periods (0 ≤ k ≤ K).
The preceding fact is used as the base for our recursion that computes Prob(w) for
1 ≤ w ≤ P. The base is w = P, and Prob(w = P) = Prob1 (P) because the delay P is
caused only to the IOs of the first delayed busy period. For a scheduling pair (I, T ), the
delay to the first busy period following a power saving mode is P for all idle intervals
whose length falls between I and I + T − P. The probability of this event is given as
CDH(I + T − P) – CDH(I), where CDH(.) indicates the cumulative probability value
of an idle interval in the monitored histogram.
The delay w caused to the IOs in the first busy period following a power saving mode
may be any value between 1 and P. This delay cannot exceed P, as P is the time that
the disk required to revert from power saving mode to serving mode. Using the CDH
of idle times, the probability of any delay w caused to the IOs of the first busy period
are given by the following equation:
CDH(I + T − w + 1) − CDH(I + T − w), for 1 ≤ w < P,
Prob1 (w) = (2)
CDH(I + T − P) − CDH(I), for w = P.
If the length i2 of the idle interval following the first delayed busy period is less than
w, then the IOs of the second busy period may be delayed as well by w − i2 . The IOs
of the second busy period are delayed by w − i2 if (1) the idle interval following the
first delayed busy period is i2 , which happens with probability Porb(i2 ), and (2) the
first delay was w + i2 , which happens with probability Porb1 (w + i2 ). Since there is
independence between the arrival and service processes, the delay propagation is also
independent of the process of idle lengths. Therefore, the probability Prob2 (w) is given
by the following equation:
P−w
Prob2 (w) = Prob1 (w + j) · p( j), (3)
j=1
where Prob1 (w + j) for 1 ≤ j ≤ P − w − 1 is defined in Equation (2) and p( j) is the
probability of an idle interval of length j.
The delay P − 1 can occur only to the IOs of the first busy period with probability
Prob1 (P − 1) and to the second busy period with probability Prob2 (P − 1). Using
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
PREFiguRE: An Analytic Framework for HDD Management 10:9
Equations (2) and (3), we get
Prob(P − 1) = Prob1 (P − 1) + Prob(P) · p(1). (4)
This implies that Prob(P − 1) depends only on Prob1 (.) and Prob(P), which are both
defined in Equation (2), and represents how the base Prob(P) of our recursion is used
to compute the next probability, Prob(P − 1).
Similarly, we determine the probabilities of delays propagated to the IOs of the busy
periods following the power saving mode and establish recursion for all 1 ≤ w ≤ P. For
clarity, we show how we develop the next step recursion and then generalize. Specifi-
cally, delay w is caused to the IOs of the third delayed busy period, and w takes values
from 1 to at most P − 2 (recall that the granularity of the idle interval length is 1ms).
P−w
j−1
Prob3 (w) = Prob1 (w + j) Prob2 ( j − j2 ) · p( j2 ). (5)
j=1 j2 =1
The delay of P − 2 does not propagate beyond the third delayed busy period, and its
probability is given as the sum of probabilities of its occurrence to IOs of the first
delayed busy period, Prob1 (P − 2), second delayed busy period, Prob2 (P − 2), and
third delayed busy period, Prob3 (P − 2). Using Equations (2), (3), and (5), we obtain
Prob(P − 2) = Prob1 (P − 2)
+ Prob1 (P − 1) · p(1) + Prob1 (P) · p(2)
+ Prob1 (P) · p(1) · p(1). (6)
Substituting Prob1 (P −1) + Prob(P)· p(1) with Prob(P −1) from Equations (4), we get
Prob(P − 2) = Prob1 (P − 2)
+ Prob(P − 1) · p(1) + Prob(P) · p(2). (7)
In general, for the k delayed busy period, delay w occurs with probability Probk(w)
th
given by the following equation:
P−w
Probk(w) = Prob1 (w + j)
j=1
j−1
· Prob2 ( j − o2 )
o2 =1
2 −1
o
· Prob3 (o2 − o3 ) · . . .
o3 =1
ok−2 −1
· Probk−1 (ok−2 − ok−1 ) · p(ok−1 ). (8)
ok−1 =1
Recursion in Equation (7) is generalized using probabilities defined in Equation (8) as
follows:
P
Prob(w) = Prob1 (w) + Prob( j) · p( j − w). (9)
j=w+1
To estimate the average delay W, first all Prob1 (w) for 1 ≤ w ≤ P can be estimated
using Equation (2). Then starting from w = P, all probabilities Prob(w) for 1 ≤ w ≤ P
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
10:10 F. Yan et al.
Fig. 3. Estimation of probabilities for propagation delay.
are computed using the recursion in Equation (9). Note that the granularity of the CDH
bins determines the granularity of the recursion step. In the preceding presentation,
we assumed, without loss of generality, that each bin is 1ms.
We stress that only Prob1 (w) for 1 ≤ w ≤ P in Equation (2) depends on the scheduling
pair (I, T ). The rest depends on the probabilities of the monitored CDH of idle times
(as depicted in Figure 3). This is important to the computational complexity of the
framework because the majority of components in the recursion of Equation (9) are
computed only once.
3.2. Meeting Performance Target D
Here, we develop the method to determine the pair (I, T ) for scheduling the power
saving modes such that performance does not degrade more than the target percent-
age D on the average. Because we want to control performance degradation, T , the
time that the disk stays in a power saving mode includes the penalty P (i.e., T > P)
for reactivating the disk and represents a proactive measure to control performance
degradation.
To find the best scheduling pair (I, T ), we scan the CDH of idle times for (Il , T j ) pairs
that would not violate the target D. Note that Il and Il + T j correspond to successive
histogram bins. A pair (Il , T j ) guarantees the performance target D if
W(Il ,T j )
D≥ , (10)
RTw/o power saving
where RTw/o power saving is monitored and W(Il ,T j ) is computed using Equations (1) and (9).
If (Il , T j ) satisfies the performance target D, then the corresponding “time in power
savings” Sl, j can also be computed. Because T j includes P, for all idle intervals longer
than (Il + T j − P), the time in power saving is (T j − P). For all idle intervals with
length i between Il and Il + T j − P, the time in power saving i − Il becomes
Il +T j −P max
o=Il p(o) · (o − Il ) o=Il +T j −P p(o) · (T − P)
Sl, j = + , (11)
E[idle] E[idle]
where max is the value of the last bin in the CDH and E[idle] is the average idle
interval length.
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
PREFiguRE: An Analytic Framework for HDD Management 10:11
We choose the scheduling pair (I, T ) to be the pair (Il , T j ) that results in highest
time in power saving Sl, j . Recall that the estimation of Sl, j is done only for these pairs
(Il , T j ) that meet the performance degradation target D of Equation (10).
The computational complexity of the procedure to choose the scheduling pair (I, T )
is O(n2 ), where n is the number of CDH bins. Note that the recursion for estimating W
has a time complexity of O(n).
3.3. Meeting Power Target S
In this scenario, the system quality target is the time in power savings S, which means
that the scheduling pair (I, T ) should achieve a time in power saving of at least S%.
The scheduling pair (I, T ) should satisfy the targeted time in power saving S and
degrade performance at the lowest possible minimum (i.e., in this scenario, there is no
D defined). Note that if S is larger than the idleness in the system, then our procedure
does not estimate an (I, T ) pair, because power savings should not be scheduled when
there are user requests outstanding.
Here, we need to find the scheduling pair (I, T ) that meets the target S and causes
the smallest performance degradation D. If every idle interval would be used for power
saving, then S can be expressed as the time in power savings per idle interval S and
would relate to the average idle interval length E[idle] and utilization U according to
the following equation:
S · E[idle]
S= . (12)
1−U
However, for an (Il , T j ) pair, only (1 − CDH(Il )) idle intervals can be used for power
savings. It follows that the target S can be met only if the time in power saving T j − P
for the idle intervals to be used for power saving is such that if normalized over all idle
intervals, then it is at least S, as shown by the following equation:
S · E[idle]
S= ≤ (T j − P)(1 − CDH(Il )). (13)
1−U
All possible pairs (Il , T j ) as defined by the bin values of the CDH of idle times are
evaluated against Equation (13) (a scan that requires O(n2 ) steps). Those pairs (Il , T j )
that satisfy Equation (13) meet the power saving target S. Among these pairs, we select
the one with the smallest performance degradation DIl ,T j that is estimated according
to Equation (10). The actual anticipated time in power savings for a pair (Il , T j ) is SIl ,T j
and is estimated using Equation (11).
3.4. Meeting Reliability Target X
The reliability target X is another quality target in our framework and is measured
as the rate of power saving modes (measured usually at coarse granularity, e.g., 1 day)
that the disk can have without impacting its lifetime. This rate is equal to the rate of
spin-ups that a disk can tolerate without premature wear-out.
Let us denote utilization as U = E[busy]/(E[idle] + E[busy]), where E[busy] is the
average busy interval and E[idle] is the average idle interval. Let us denote X̂ as the
rate of opportunities for power savings, and X̂ = 1/(E[idle] + E[busy]) = U/E[busy] =
(1 − U )/E[idle]. If X is smaller than X̂, then an idle interval should be used for power
savings with probability X/ X̂. Otherwise, all idle intervals are to be utilized for power
savings. Denote X as
X
, for X < X̂
X = X̂ (14)
1, otherwise.
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
10:12 F. Yan et al.
Because a scheduling pair (I, T ) uses only (1-CDH(I)) of idle intervals for power
savings, the reliability target X is violated only if (1-CDH(I)) is larger than X. In this
case, fewer idle intervals than (1-CDH(I)) should be used for power savings. As a result,
the delay W should reflect the potential fewer power saving modes and the resulting
lower delay. For this, we redefine Equation (2) to reflect that the delay caused to the
IOs of the first busy period following a power saving mode happens with probability
X/(1-CDH(I)). Note that if X > 1 − CDH(I), then no correction needs to take place, as
X is not violated.
Reflecting the reliability target X in Equation (2) results in the following:
CDH(I + T − w + 1) − CDH(I + T − w), for 1 ≤ w < P,
Prob1 (w) = C · (15)
CDH(I + T − P) − CDH(I), for w = P,
where C is defined as
X
, for X < 1 − CDH(I)
C= 1−CDH(I) (16)
1, otherwise.
Using Equation (15) to estimate the first delay ensures that the average delay W
is estimated accurately based on Equation (1) and the recursion of Equation (9). As a
result, the framework meets both reliability and performance targets. The reliability
target is reflected similarly in the estimation of power savings achieved by a scheduling
pair (I, T ). Equation (11) is updated to account for the reliability target as follows:
Il +T j −P max
o=Il p(o) · (o − Il ) o=Il +T j −P p(o) · (T − P)
Sl, j = C · +C · , (17)
E[idle] E[idle]
where C is defined in Equation (16). By using these improved formulas, we can achieve
the reliability target.
3.5. Correlation-Based Enhancement
So far, the scheduling pair (I, T ) is computed by heavily using the CDH of idle times. As
a result, the decisions are made on the probability of an idle interval length assuming
that the sequence of idle intervals is a renewal process. However, the utilization of
idle time would improve further if the length of idle intervals were predicted more
accurately than by using only the marginal distribution (i.e., CDH). Here, we show
how to exploit any existing short-term correlation in idle interval lengths.
For this, we define the category of long idle intervals as all idle intervals longer than
L, where L is defined such that idle intervals of at least length L are observed at a rate
close to the reliability target X. We compute online, similar to the CDH of idle times,
the probabilities that two consecutive idle intervals, up to G lags apart, are both long.
We denote these probabilities as Prob(LLl ) (i.e., two idle intervals of at least length L
that are l lags apart).
The lag l with the highest Prob(LLl ) is selected for prediction. Although any
Prob(LLl ) value can be used in the framework, only Prob(LLl ) above 0.5 is recom-
mended for a good power savings effect because when Prob(LLl ) is above 0.5, the
correlation structure is considered strong and yields a good prediction accuracy. There-
fore, once a long idle interval is observed, the upcoming idle interval l lags in the
future are also to be used for power savings. This correlation-based prediction is used
to enhance the performance of our framework in addition to the regularly estimated
scheduling pair (I, T ).
We argue that if a long idle interval is predicted, then the probability of causing a
delay is less than when the regular probabilities in the CDH are used. As a result, we
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
PREFiguRE: An Analytic Framework for HDD Management 10:13
propose using a shorter I and a longer T without violating the performance target D.
Specifically, we denote the scheduling pair that results from such prediction as (IL, TL),
where IL is defined such that CDH(IL) = 0.5 and TL is defined such that it corresponds
to the length of the long idle interval L (i.e., TL = L − IL). Although we define L such
that the occurrence of idle intervals of at least length L is at most X, it is expected that
for most enterprise workloads, the number of idle intervals of length at least equal to
L should be less than X within a specified time period. For this reason, we generate
two scheduling pairs, (I, T ) and (IL, TL), where the first one is estimated as a regular
scheduling pair using the CDH of idle times and is used to “fill up” the quota X left
unused by the second pair.
The most important characteristic in our framework is the ability to accurately
estimate performance of a scheduling pair (I, T ). In the case when two scheduling
pairs are used, we combine the estimations of delay W and power savings S for both
scheduling pairs. We define
W = (1 − Y ) · WL + Y · W R, S = (1 − Y ) · SL + Y · SR, (18)
where W R and SR are the delay and power savings yield by the regular scheduling
pair (I, T ), and WL and SL are the delay and power savings yield by the predictive
scheduling pair (IL, TL). The coefficient Y captures the portion of X that is contributed
by (I, T ). This coefficient is zero if the probability of having long idle intervals is larger
than the allowance A(X). We define Y as
A(X) − (1 − CDH(L)), for A(X) > 1−CDH(L)
Y = (19)
0, otherwise.
Although W R and SR are defined in the previous sections, we need to define WL
and SL. From the conditional probability Prob(LLl ), we know that we need to have
Prob(LLl ) true positives in prediction of idle intervals longer than L and 1 − Prob(LLl )
false positives (i.e., the predicted long idle interval is in fact shorter than L). Because
this prediction occurs only if a long idle interval is observed, with probability 1 −
CDH(L), the (IL, TL) scheduling pair causes a power saving mode with probability
(1 − CDH(L))(1 − CDH(IL)). This means that a delay P is caused with probability
(1 − CDH(L))(1 − CDH(IL))(1 − Prob(LLl )), whereas the savings of TL − P units of time
occur with probability (1 − CDH(L))(1 − CDH(IL))Prob(LLl ). We have
Prob(P) L = (1 − CDH(L)) · (1 − CDH(IL)) · (1 − Prob(LLl )), (20)
SL = (1 − CDH(L)) · (1 − CDH(IL)) · Prob(LLl )(L − IL − P),
where Prob(P) L is used as the basis for the recursion to compute WL as given by
Equations (9), (10), and (15).
4. EXPERIMENTAL EVALUATION
In this section, we evaluate PREFiguRE with regard to accuracy, robustness, flexibility,
and adaptivity in estimating schedules for power saving modes while meeting system
quality targets, including the performance slowdown target D, the reliability target
X, and the power savings target S. One of the most important aspects of PREFiguRE
is making decisions based only on metrics that are monitored in real time and does
not depend on static models or knowledge of the underlying disk drive characteristics.
As a result, for the evaluation of PREFiguRE, we use trace-driven simulations as
long as they allow for the calculation of the PREFiguRE input parameters like the
histogram of idle times. Recall that PREFiguRE does not interfere with disk request
service or scheduling; as a result, we do not need a full-disk simulator. PREFiguRE is
computationally lightweight, as it only scans the CDH of idle times, which is at most a
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
10:14 F. Yan et al.
Table III. General Trace Characteristics
Potential Time in
Idle Length Power Savings (%)
Util Mean
Trace: Entries (%) (in milliseconds) CV Lev. 3 Lev. 4
Code 1: 379,490 5.6 192.6 8.4 55 48
Code 2: 56,631 0.5 1,681.6 2.3 92 87
Code 3: 286,612 4.8 233.95 22.5 66 55
Code 4: 18,865 0.1 8,293.67 7.8 97 94
File 1: 135,629 1.7 767.5 2.3 70 53
File 2: 44,607 0.7 2,000.2 3.8 94 90
File 3: 44,607 0.1 2,046.51 9.1 87 79
File 4: 14,160 0.1 2,615.74 11.3 95 92
Note: All traces have a duration of 12 hours.
few thousand entries, at a frequency of every few hours. PREFiguRE computes a nearly
optimal (as our experiments show) scheduling pair almost instantaneously. In this
section, we show the proximity of the scheduling pair (I, T ) given by PREFiguRE to the
optimal pair that is found by exhaustive search (i.e., by simulating and evaluating all
possible pairs for scheduling power saving modes). In addition, we show how one could
use workload patterns in the time series of idle intervals to further improve on power
savings without deviating from the preset reliability and performance constraints.
4.1. Performance of PREFiguRE
Our evaluation is driven by a set of disk-level enterprise traces collected at mid-size en-
terprise storage systems hosting dedicated server applications, such as a development
server (“Code”) and a file server (“File”) [Riska and Riedel 2006]. Each trace corresponds
to a single drive in a RAID. For an unbiased treatment, we focus on the performance
requirement of each disk. We monitor the workload of each disk drive and determine
whether to put it to sleep or not. Storage systems that deploy advanced redundancy
schemes may schedule a request such that it avoids the disks that are in power saving
modes. However, our method is orthogonal to such solutions, as we monitor the disk
workload after those policies have been applied. In addition, our framework works with
a lower priority compared to the upper-level policies. Therefore, our framework can be
applied at individual storage nodes (e.g., single disk drive) without interfering with
upper-level power saving policies.
The traces are collected at the disk level and measured using a SCSI or IDE an-
alyzer that intercepts the IO bus electrical signals and stores them. The final traces
are produced by decoding the electrical signals. This trace collection method does not
require modifying the software stack of the targeted system and does not affect system
performance. We stress that our framework only requires knowledge of idleness and
is completely independent of the complexity of the arrival and service processes, as
well as complex scheduling behavior in the various levels of the storage stack (e.g.,
the RAID setup). More importantly, they record the arrival and departure time of each
disk-level request, allowing for exact calculation of the histogram of idle times.
The traces that we use to evaluate PREFiguRE have varying characteristics
(Table III provides an overview). From this table, we notice that these traces are
characterized by very low utilization, yet their idleness is highly fragmented. Notice
the differences in the mean idle intervals and their coefficients of variation (CVs). The
columns labeled “Time in Power Savings” include the percentage of time relative to
the duration of the entire trace that is used for power savings if all idle intervals that
can be used for Level 3 or Level 4 savings are indeed used, and if perfect knowledge of
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
PREFiguRE: An Analytic Framework for HDD Management 10:15
future workload is available. This is of course not practical, but this value represents
an absolute upper bound on power savings. The table shows that the eight traces are
quite diverse and thus constitute an excellent set to evaluate PREFiguRE’s ability to
estimate the best scheduling pair (I, T ) for any workload. We stress that our traces are
measured in enterprise systems with idle intervals that yield power savings only for
Levels 3 and 4, whose penalty P is up to 1s, but not Levels 5 and 6, whose penalty P
is several seconds. Consequently, we do not show results from Levels 5 and 6 of power
saving modes and do not discuss wear-out because of spin-ups/downs. The reliability
aspect of power savings is evaluated in association with load/unload cycles that occur
when Levels 3 and 4 of power saving modes apply on a disk drive.
We use the first half of each trace as the “training period” during which we construct
the CDH of idle times and determine other monitored metrics. PREFiguRE computes
the scheduling pair (I, T ) using the metrics collected during the training period using
the analytic methodology presented in Section 3. The second half of each trace is used
as the “testing period,” during which we run a simulation that uses the computed (I, T )
pair to schedule power saving modes. The testing period validates the accuracy of the
PREFiguRE scheduling decision. Specifically in the trace-driven simulation, the power
saving modes are activated only after I idle time units elapse. The disk remains in
a power saving mode for at most T time units. A new IO arrival always preempts a
power savings mode and reactivates the disk drive, which takes P units of time.
Table IV gives an overview of the effectiveness of PREFiguRE. All columns labeled
“Estim.” represent values estimated by PREFiguRE, and the ones labeled “Actual” are
obtained via trace-driven simulation. The “Target D” column is the performance target
input to PREFiguRE. Performance target D is not violated if columns labeled “Perfor-
mance Degradation” are less than or equal to “Target D.” Finally, Smax corresponds to
the optimal value found by exhaustive search of all possible (I, T ) pairs to identify the
one that offers best savings with performance degradation equal to or under the target
D. The penalty to reactivate the drive is set to P = 500ms (Level 3) [Seagate Technology
2012; Hitachi Global Storage Technologies 2007]. The reliability target X is set to 200
for Level 3 or Level 4 power saving modes per day [Kim and Suk 2007], assuming a
lifetime of 4 years.
The main observations from this table are the following:
—The performance D is never violated by the scheduling pair computed by PREFiguRE,
as validated by multiple simulation experiments.
—PREFiguRE consistently estimates excellent scheduling parameters for maximum
power saving while limiting the number of load/unloads per day.
—The time in power savings S(I,T ) estimated analytically by PREFiguRE is accurate
most of the time; see its proximity to the actual values given by simulation. The
errors come from two sources: first, the estimation method relies on past information
to predict the future. Consequently, its accuracy depends on the change in workload
characteristics used by the framework between future and past. Second, the estima-
tion method is a statistical approach that relies on the granularity and accuracy of
characterization measurements (e.g., finer granularity of CDH of idle periods yields
better prediction accuracy than coarse granularity).
—High accuracy of PREFiguRE and its ability to estimate the scheduling outcome in
the form of D(I,T ) and S(I,T ) is critical because it suggests that PREFiguRE can be
used to drive analysis in the system.
—Monitoring of metrics in the short past (“training period” of several hours) yields
good and robust predictions for the near future (“testing period” of several hours).
—For D > 5%, the accuracy of estimations is consistently high. For D = 1%, the
accuracy reduces, as it becomes difficult for PREFiguRE to capture the very small
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
10:16 F. Yan et al.
Table IV. Power Savings and Performance Degradation Estimated Using PREFiguRE (Columns “Estim.”) and
Simulation (Columns “Actual.”)
“Code 1” “Code 3”
Performance Time in Max Time in Performance Time in Max Time in
Degradation Power Saving Power Saving Degradation Power Saving Power Saving
Target D Estim. Actual Estim. Actual Smax Estim. Actual Estim. Actual Smax
1 1 0.0 1.68 1.37 2.06 1 0.0 12.54 10.87 17.24
5 3 0.0 2.22 1.94 2.06 2 0.0 15.93 11.69 17.99
10 3 0.0 2.22 1.94 2.06 2 0.0 15.93 11.69 17.24
20 3 0.0 2.22 1.94 2.06 2 0.0 15.93 11.69 17.99
100 3 0.0 2.22 1.94 2.06 2 0.0 15.93 11.69 17.99
“Code 2” “Code 4”
Performance Time in Max Time in Performance Time in Max Time in
Degradation Power Saving Power Saving Degradation Power Saving Power Saving
Target D Estim. Actual Estim. Actual Smax Estim. Actual Estim. Actual Smax
1 1 0.0 0.09 0.09 0.33 1.0 1.0 8.18 4.99 12.57
5 5 0.0 0.28 0.32 0.33 4.0 1.0 13.68 8.03 13.07
10 10 2.0 0.29 0.33 0.33 9.0 3.0 21.47 18.89 18.89
20 20 20.0 0.31 0.35 0.35 20.0 10.0 35.73 35.35 35.35
100 22 21.0 0.31 0.35 0.37 31.0 25.0 37.79 37.51 37.57
“File 1” “File 3”
Performance Time in Max Time in Performance Time in Max Time in
Degradation Power Saving Power Saving Degradation Power Saving Power Saving
Target D Estim. Actual Estim. Actual Smax Estim. Actual Estim. Actual Smax
1 1.00 0.00 0.50 0.39 0.39 1.00 0.00 2.69 1.77 5.76
5 5.00 3.00 0.73 0.69 0.70 4.00 2.00 6.32 4.42 5.76
10 7.00 4.00 0.75 0.71 0.71 10.00 4.00 8.47 6.98 6.98
20 7.00 4.00 0.73 0.71 0.71 20.00 6.00 12.02 10.79 10.80
100 7.00 4.00 0.73 0.71 0.71 28.00 21.00 13.45 11.17 11.17
“File 2” “File 4”
Performance Time in Max Time in Performance Time in Max Time in
Degradation Power Saving Power Saving Degradation Power Saving Power Saving
Target D Estim. Actual Estim. Actual Smax Estim. Actual Estim. Actual Smax
1 1.00 0.00 0.31 0.30 0.87 1.00 1.00 0.44 0.36 2.60
5 5.00 5.00 1.59 1.37 1.55 4.00 3.00 11.78 8.75 8.75
10 9.00 6.00 1.90 1.69 1.87 8.00 4.00 14.67 12.70 12.70
20 19.00 10.00 1.92 1.72 1.75 19.00 19.00 17.38 15.86 15.86
100 18.00 12.00 1.92 1.72 1.75 44.00 44.00 27.08 26.33 26.34
Note: Level 3 savings are used. All values are in percentages (%). For example, for the columns of time, it
means the percentage of timecompared to the entire trace duration.
variations in performance. Recall that estimation of delays is the most critical aspect
of the framework, and its accuracy depends on the CDH bin granularity. As a result,
discrepancies become noticeable for very small performance targets, such as D = 1%.
A phenomenon worth discussing is that PREFiguRE estimates for various target D’s
are the same for “Code 1” and “Code 3.” This happens because PREFiguRE calculates
the same (I, T ) pair for D ≥ 5%. The CDHs of “Code 1” and “Code 3” reveal that these
two workloads have many small idle intervals but only a few long ones. Indeed, 95%
of “Code 1” idle intervals are smaller than the Level 3 penalty (500ms), and thus they
are excluded from PREFiguRE as a scheduling choice. As a consequence, a large idle
waiting time I is used to prevent small idle intervals from being used for power savings.
Therefore, W in Equation (10) is small and results in the same D that is always less
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
PREFiguRE: An Analytic Framework for HDD Management 10:17
Table V. Various What-If Scenarios That Can Be Answered Using the Estimation Engine in PREFiguRE to Assist
with Making Power Saving Decisions in a Storage System
What-If Question “Code 1” “Code 2” “File 1” “File 2”
How much should I slow down the user traffic
to get power savings of 10%? 33.0% 59.0% 195.0% 27.0%
How much should I slow down the user traffic
to get power savings of 20%? 61.0% 104.0% 458.0% 140.0%
How much power savings do I get
if I slow the user traffic with 10%? 1.94% 0.33% 0.71% 1.69%
How much power savings do I get
if I slow the user traffic with 20%? 1.94% 0.35% 0.71% 1.72%
Which power saving level should I use, Level 3 or
Level 4, if I slow the user traffic with 10%? Level 3 Level 3 Level 4 Level 3
Which power saving level should I use,
Level 3 or Level 4, to get power savings 10%? Level 3 Level 3 Level 3 Level 3
If I relax the X condition for the next 12 hours and
slow the user traffic with 10%, how much additional 6.59% 3.36% 0.73% 8.15%
savings will I get and by how much is X violated? (50) (19) (23) (285)
than the target Ds we set in Table IV. This results in selecting the same (I, T ) pairs
for D ≥ 5%.
Overall, the table shows that PREFiguRE is robust across all workloads and ranges
of performance targets, with excellent accuracy for both power and average delay
estimation, without compromising on the reliability constraint X. This makes the case
that PREFiguRE can be also used very effectively in analysis to select among power
saving options, as shown in the following section.
4.2. “What-If” Analysis
In system design and online resource management, it is critical to be able to know
the outcome of features and enable them only if beneficial. Specifically, because power
savings in disk drives impact both performance and reliability, the disk should be put
into power saving modes only if the savings are significant for the system. Because
of its analytic core, PREFiguRE has the ability to compute schedules and estimate
their outcome. As such, it facilitates the automation of online decisions on disk power
savings by giving answers to a wide range of “what-if ” questions.
Table V lists a set of what-if questions that could be answered using the PREFiguRE
framework. The table shows how PREFiguRE predicts for a given workload if a specific
power saving target can be met. For example, in a cluster with the four disks (and
workloads), a target of 10% time in power saving can be achieved by Code 1 with a
performance degradation of 33.0%.
Similarly, the system can also estimate beforehand if it is worth increasing the
performance target D for higher power savings. In this table, we can clearly see that
it is not beneficial to increase the performance degradation to 20%, as it does not offer
additional savings for any of the workloads in the four disks in the storage cluster. It
is obvious in this table that for most workloads when the penalty due to power savings
is low (i.e., Level 3), the power savings are better. Finally, we can estimate beforehand
whether it is worth relaxing the reliability condition to achieve better power savings.
The last what-if scenario presented in this table indicates the power savings when
we relax the reliability target X. Given that X captures the wear-out effect that power
savings have on disk drives over their lifetime, X can be set higher at times and lower at
other times. Specifically, for “Code 2,” the savings are considerable and the compromise
in reliability is small compared to the original reliability constraint. The system may
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
10:18 F. Yan et al.
decide to relax X for that disk for a while and account for it at a later time when the
workload has changed and savings are limited.
4.3. PREFiguRE’s Adaptivity and Estimation Capabilities
PREFiguRE is a framework that monitors current workload and updates its schedul-
ing decisions (i.e., the (I, T ) pair) accordingly. So far in this section, the learning (or
training) has occurred for 6 hours and the computed (I, T ) pair is used for the following
6 hours. However, there are various ways to learn the workload characteristics (i.e., the
histogram of idle times) and update the corresponding scheduling parameters. Here,
we evaluate the robustness of PREFiguRE against the length of the learning window
and the granularity of updates in the computed (I, T ) pair.
We experiment with two additional learning window sizes (i.e., 3 and 5 hours), and
scheduling parameters update every half hour or only at the end of a learning window.
Specifically, we evaluate the following variations in learning a CDH:
—Learning1: Learning windows are nonoverlapping, and (I, T ) is computed only at
the end of a learning window.
—Learning2: CDH of idle times is accumulated from the beginning, and (I, T ) is com-
puted every half hour.
—Learning3: The learning window slides with half an hour of granularity, and (I, T )
is computed every half hour.
—Baseline: This is similar to Learning1, but the CDH is built with the knowledge of
idleness in the current learning period, not the previous one. It is included only for
comparative purposes as a best case.
We present the results from our trace-driven simulation in Figure 4, where the left
column plots the performance degradation in the system validating the accuracy of the
framework with regard to a performance slowdown target of D%. The right column of
plots in Figure 4 captures the power savings resulting from the scheduling framework.
It is clear that different learning methods and granularity achieve different accuracy.
We observe that it is important to learn over longer rather than shorter periods of time
(compare the first row of results corresponding to 5-hour learning to the second row of
results corresponding to 3-hour learning in Figure 4). Another important observation
is that updating the (I, T ) pair every half hour yields better robustness to the learning
window size changing than updating it less frequently, as it can reduce the impact
caused by the changing workload. Recall that the computation complexity of computing
the (I, T ) pair is minimal, and a frequency of every half hour that we suggest here is
expected to have an equally minimal impact on the overall system performance.
4.4. Comparison with Common Practice Methods
The efficiency of PREFiguRE is shown by comparing its performance to common prac-
tices used for power savings in storage systems. The most widespread approach is to
idle wait for a fixed amount of time before putting a disk into a power saving mode.
Usually the fixed amount of time is set to be a multiple of the penalty P to bring back
the disk into operational state. Here, we show results obtained when the idle wait I
is set to 2P [Eggert and Touch 2005]. A second approach is to guide power savings by
the current utilization levels in the storage node (i.e., disk drive). Here, we apply the
first approach of fixed idle wait only if the utilization in the last 10 minutes is below a
predefined threshold (set to the average utilization in the trace).
In Figure 5,we plot the performance degradation and power saving results of PRE-
FiguRE and the preceding two common practice methods. For PREFiguRE, three per-
formance targets (i.e., 10%, 50%, and 100%) are evaluated. For the two common practice
methods, the performance target cannot be set beforehand, and the slowdown may be
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
PREFiguRE: An Analytic Framework for HDD Management 10:19
Fig. 4. Performance degradation and time in power savings over time for Code 2, three different learning
methods, and two different lengths of learning (the first row of plots corresponds to 5 hours of learning and
the second row to 3 hours of learning). The performance degradation target is 50%. P1 is the evaluation
period that starts at the fourth hour, P2 starts at the fifth hour, P3 starts at the sixth hour, and P4 starts
at the seventh hour. For fair comparison, the evaluation lasts for 5 hours in each evaluation period for both
learning length cases.
Fig. 5. Performance degradation and time in power savings for Code 2 under PREFiguRE and other common
practices (i.e., fixed idle wait and utilization guided). Because the y-axis is in log scale, the y-axis values are
shown for each bar.
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
10:20 F. Yan et al.
Fig. 6. Conditional probability values of a long interval being followed by another long with k lags apart.
The long interval length L is defined in the legend.
unbounded. Often in practice, to limit the performance slowdown, the fixed idle wait
and/or the utilization threshold are set such that the system goes into power savings
only occasionally.
In Figure 5, the y-axis is in log scale, and the absolute values are shown above
each bar. The fixed idle wait method for I = 2P results in a slowdown of 5,662% (i.e.,
several orders of magnitude more than PREFiguRE for less than 10 times the power
savings). The utilization-guided method reduces performance degradation of the fixed
idle wait method, but its power savings are 10 times lower than PREFiguRE for similar
performance slowdowns.
The results in Figure 5 clearly illustrate that PREFiguRE outperforms common
practice methods. By taking into consideration the idleness, which in a way confines
in a compact measure the complex interaction of the arrival and service processes,
PREFiguRE meets performance targets while achieving high power savings.
4.5. Correlation-Based Enhancement: PREFiguRE-LL
To further extend power savings without violating the performance degradation target,
we enhance PREFiguRE with the predictive capabilities of the conditional probabilities
of successive idle intervals (Figure 6). We construct conditional probabilities of two idle
intervals up to G = 10 lags apart being at least of length L, where L represents long idle
intervals observed in the system such that the number of such intervals is close to X
(i.e., the reliability target). The length L of long idle intervals depends on the workload
characteristics (i.e., the average, maximum, and variability in the distribution of idle
intervals as captured by the CDH), which means that for more idle workloads, this
value is higher than for the busier ones.
In our evaluation of this enhancement, which we call PREFiguRE-LL, we focus on
workloads “Code 1,” “Code 2,” “Code 3,” and “Code 4.” Figure 6 shows the probability
that successive idle intervals of at most 10 lags apart are at least of length L. We
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
PREFiguRE: An Analytic Framework for HDD Management 10:21
Fig. 7. Power savings (Level 3) for performance degradation targets 1%, 5%, 10%, 20%, and 100% for “Code
1,” “Code 2,” “Code 3,” and “Code 4.”
observe that for the “Code 2” and “Code 3” workloads, these conditional probabilities
are higher than 0.5 for at least one lag. This suggests that the enhanced PREFiguRE-
LL could benefit from the prediction capabilities embedded in these probabilities and
harvest these long idle intervals to extend power savings according to the discussion in
Section 3.5. For “Code 1” and “Code 4,” the conditional probabilities have small values,
and therefore PREFiguRE-LL is expected to not increase power savings. However, we
stress that in these cases, PREFiguRE-LL reduces seamlessly to PREFiguRE and still
meets both reliability and performance targets. For PREFiguRE-LL, we pick among all
10 evaluated lags the one with the highest conditional probability to predict the future
long idle interval and define the scheduling parameters as explained in Section 3.5.
Power savings with PREFiguRE and PREFiguRE-LL are shown in Figure 7, whereas
the corresponding performance degradations are given in Table VI. Consistently
with the expectations set from the probability values in Figure 6, we observe that
PREFiguRE-LL extends power savings for “Code 2” and “Code 3” workloads. The high
correlation between successive long idle intervals enables PREFiguRE-LL to start early
and stay longer in a power saving mode and almost double the overall power savings
for several of the performance targets D. For “Code 1” and “Code 4,” however, such
information does not exist, and as expected, PREFiguRE-LL performs the same as
PREFiguRE. As supported by the results in Figure 7, the gains of as much as double
in power savings come at no cost in performance degradation. PREFiguRE-LL does
not violate the performance target D for the entire spectrum of evaluated slowdowns.
Note that there are cases when higher performance degradation targets are set, but
the actual performance degradation and power savings stay the same. This is because
of the reliability targets in the framework. In addition, the policy remains robust, as
stochastic information on the sequence of idle times is incorporated in the framework.
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
10:22 F. Yan et al.
Table VI. Actual Performance Degradation under PREFiguRE and PREFiguRE-LL
for Level 3 Savings
“Code 1” “Code 2”
Performance Degradation Performance Degradation
Target D PREFiguRE PREFiguRE-LL PREFiguRE PREFiguRE-LL
1 0.0 0.0 0.0 0.0
5 0.0 3.0 0.0 0.0
10 0.0 3.0 2.0 2.0
20 0.0 3.0 20.0 17.0
100 0.0 3.0 21.0 17.0
“Code 3” “Code 4”
Performance Degradation Performance Degradation
Target D PREFiguRE PREFiguRE-LL PREFiguRE PREFiguRE-LL
1 0.0 0.0 1.0 0.0
5 0.0 2.0 1.0 1.0
10 0.0 2.0 3.0 3.0
20 0.0 2.0 10.0 12.0
100 0.0 2.0 25.0 25.0
Note: Values are in percentages.
4.6. Caveats and Limitations
The interplay between the device driver decisions and upper-level policies is an im-
portant factor to consider when implementing PREFiguRE. When the framework is
implemented at lower levels (e.g., at the device driver level), the interplay between de-
vice driver decisions and higher-level scheduling is less likely to happen, as the lower
levels are transparent to upper levels. For example, during the periods that the disk is
in power saving mode, upper-level policies see that the disk is available and idle. We
propose PREFiguRE to be implemented at the lower levels (i.e., at the storage controller
or HDDs rather than other levels of the IO hierarchy to avoid the potential interfer-
ence with upper-level non-FCFS schedulers). If PREFiguRE needs to be deployed in the
same level as other non-FCFS schedulers, interference is likely to happen, but such in-
terference is usually harmless for performance. This is because for the non-FCFS disk
scheduler, the more the requests to choose from, the better the performance. Therefore,
the actual extra delays caused by the waking-up process become even smaller than the
values we estimate, as we consider the propagated delay affected up to k consecutive
busy periods in our estimation (see Equation (8)). In addition, there are many activities
that occur in the path that add variability on measurements more than what we are
adding by controlling the sleep times. Sources of variability on higher-level schedul-
ing include multiple interrupts of the communication protocols to communicate with
each other: application-OS-client-driver-RAID controller-SAS/PCIcable-HDD. Sources
of the HDD variability could be missed rotation, failure in seeks, and failure in rota-
tion. All of these happen regularly and only increase latency at the HDD level. They
happen regularly, but for a small portion and are easily discarded in other scheduling
decisions. We leave the rigorous estimations (via measurement data and/or simulation)
of the exact indirect effect as our future work.
5. RELATED WORK
For the past two decades, there has been a host of work on power efficiency in computer
systems and more specifically in disk drives. In general, there are two types of power
saving techniques: proactive and reactive. The proactive techniques usually allow users
to use less storage to achieve the power saving purpose (e.g., through compression
[Kothiyal et al. 2009], deduplication [Costa et al. 2011], and thin provisioning [Edwards
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
PREFiguRE: An Analytic Framework for HDD Management 10:23
et al. 2008]). Such techniques usually can be applied only to specific applications or
content and sometimes are difficult to deploy due to the lack of communication between
the system and applications. There are also several works focusing on modeling power
saving modes as low-priority tasks [Gandhi et al. 2009; Jaiswal 1968; Gaver 1959,
1962; Keilson 1962], but they do not provide guarantees on power, performance, and
reliability. A comprehensive comparison of power saving algorithms for disk drives on
personal computers is presented in Douglis et al. [1994]. The algorithms are evaluated
based on trace-driven simulation for two known disk drive models. The baseline used for
power savings assumes a priori knowledge of the idle interval duration. The compared
algorithms vary based on when and for how long a disk is placed in power savings. A
fundamental difference with the work presented here is that these algorithms apply to
personal, not enterprise, systems, and therefore no power, performance, or reliability
guarantees are provided.
In Garg et al. [2009] a Markov model of a cluster of disks is used to predict disk idle-
ness and schedule the spin-down of disks for power savings. This model is based on two
states—ON and OFF—and a prediction mechanism that relies on a probability matrix.
Simulations using DiskSim with synthetic (exponential and Pareto) and real work-
loads show that the Markov model has 87.5% prediction accuracy, reduces energy by
35.5%, performs better than other multispeed models, and has a performance penalty
that is negligible (less than 1%). Another analytical model is introduced in Greenawalt
[1994] and is applied to predict the idle interval duration to spin down a disk for power
savings. The Poisson assumption used in this work is questionable, especially given the
bursty nature and correlation between interarrivals in real traces [Riska and Riedel
2006]. For this analytical model, a “critical rate” is defined as the number of accesses
per unit time at which it is more power efficient to leave the disk active than to spin
it down. The preceding models are useful for offline disk spin-down policies but not
for anticipating workload changes on the fly that are necessary for the development of
online algorithms.
An adaptive algorithm based on the idea of “sessions” is presented in Lu and Micheli
[1999]. A session is similar to a busy period. Different sessions are separated by in-
tervals of inactivity of duration τ . The inactivity period is defined by monitoring and
adaptation of the algorithm (i.e., increase or decrease τ based on inactivity periods
characteristics). The algorithm does not minimize energy consumption compared to
other adaptive algorithms, but it reduces power while preserving performance and re-
liability. However, no specific guarantees are given for the performance and reliability
of this power saving algorithm.
A dynamic power management (DPM) algorithm is introduced in Irani et al. [2003]
that extends the power savings states from idle and busy to multiple power saving
states based on a stochastic optimization. This algorithm has the best power savings
(i.e., 25% less) and best performance (i.e., 40% less) compared to other DPM algorithms.
It is based on online observations and learning of the probabilistic length of an interval.
The effects of power management on disk request latency for personal computers
are studied in Ramanathan et al. [2000]. The authors find the upper bound of IO
request latencies to demonstrate the worst-case scenario and how to handle it with
efficient system design. A simple adaptive power management algorithm is presented
that predicts the duration of the next idle period based on the previous one. Immediate
shutdown of disks is studied, and the authors conclude that even though it increases
power savings, it also has a negative impact by increasing latency.
A large amount of literature in conserving power in disk drives proposes techniques
that alter the way the storage system workload is served such that the work done at
a subset of disks is reduced and power is conserved. The work on massive array of idle
disks (MAIDs) [Colarelli and Grunwald 2002] uses caches to redirect the workload
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
10:24 F. Yan et al.
to spin down disks. The applicability of MAIDs is limited to back-up or archival
storage systems. RIMAC [Yao and Wang 2006] uses a two-level IO cache in addition
to the redundancy of RAID5 to reduce disk spin-ups of standby disks and improve on
power savings and overall IO response time. Energy-efficient RAID (EERAID) [Li and
Wang 2004] achieves power savings while keeping performance degradation under
10%. However, EERAID does not address the reliability concern associated with the
additional disk spin-ups. Similar to RIMAC, EERAID requires additional hardware
to be efficient. The redundancy in RAID is exploited also in Pinheiro et al. [2006],
where power saving is achieved by powering down lightly loaded redundant disks in
a RAID setting. Hibernator [Zhu et al. 2005] is another framework that addresses
power savings in a storage system setting. In Hibernator, the workload is redirected to
active disks that are dynamically deployed with different rotation speeds. Redundancy
in storage systems is exploited further in the FS2 framework [Huang et al. 2005],
which saves power and enhances performance by serving IOs from the closest replica
of data. This scheme reorganizes data by exploiting free blocks based on access
patterns.
WRITE offloading [Narayanan et al. 2008] is proposed as a workload shaping tech-
nique. It extends idleness in a disk drive by offloading the WRITE traffic elsewhere in
the storage system. This method is effective in extending power savings for WRITE-
intensive workloads. Similarly, SRCMap [Verma et al. 2010] is a workload shaping
technique that uses an energy versus workload intensity proportionality model to de-
termine which disks in the system can be used for power savings and which to serve IO.
SRCMap develops an intelligent replication scheme that aims at serving a workload
with the optimal number of active disks in the system. Their model is based on the
observation that the power drain for a workload increases linearly as the load intensity
increases. SRCMap addresses reliability by offloading READs as well as WRITEs to
increase the idle period duration. As a consequence, fewer spin-downs are needed for
power saving, as few but large idle intervals are created. Both Narayanan et al. [2008]
and Verma et al. [2010] use fixed idle waiting periods (in the order of minutes) to limit
performance degradation, albeit no guarantees are given on performance degradation
because of power savings.
Although the preceding works represent efforts to combine caches and redundancy
for power savings in storage systems, Zhu et al. [2004] propose a new power-aware
cache retention policy that achieves as much as 16% power savings compared to LRU
by sending fewer requests down to the disk. However, any performance impact of the
new caching policy is not addressed. DRPM [Gurumurthi et al. 2003] represents a
technique that saves power in a disk drive by dynamically changing the rotation speed
of disk drives. Although DRPM addresses performance and reliability concerns, its
main drawback is on the substantial hardware changes that need to be done in disk
drives to support multiple active rotation speeds.
Different from all of the preceding works, in Mountrouidou et al. [2011] the authors
introduce disk drive reliability as an additional target of the power savings algorithm.
The work [Yan et al. 2012] focuses on how to estimate the performance impact of power
saving by taking into consideration the propagation delay effects.
The theoretical contribution of PREFiguRE lies in that it provides both performance
and reliability guarantees by unifying the work presented in Mountrouidou et al.
[2011] and Yan et al. [2012], and by improving the online analytic estimations in a way
that promotes efficiency. Furthermore, PREFiguRE significantly improves accuracy
and robustness via a new probabilistic model that further improves its underlying
analytic framework. The correlation-based enhancement of PREFiguRE introduced in
this article leads to more power savings under reliability and performance guarantees.
Finally, the problem is reversed to achieve specific power savings while keeping
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
PREFiguRE: An Analytic Framework for HDD Management 10:25
performance degradation at acceptable levels. PREFiguRE nearly instantaneously yet
accurately estimates power savings while always meeting performance targets and
strictly following reliability targets in the form of rigid numbers of spin-ups/downs
within a time period. PREFiguRE consistently achieves nearly excellent power savings
by judiciously selecting nearly optimal scheduling parameters across very different
and challenging workloads.
To the best of our knowledge, PREFiguRE is unique in its focus of simultaneously
aiming at maximizing a specific primary target while meeting predefined secondary
measures within strict, preset reliability constraints. Any comparison with any of the
power saving policies that have been proposed in the literature would not be possible
due to this unique characteristic: no other policy offers such performance or reliability
guarantees, making potential comparisons uneven or possibly unfair.
6. CONCLUSIONS
We have presented a compact analytic model and its integration into an algorithmic
framework that provides given performance and reliability targets, as well as answers
to the following difficult questions: “when” and for “how long” idle periods in disk drives
can be utilized for putting the system in a specific power saving mode such that the
targets are met. A detailed analytic model is also developed that quite precisely deter-
mines the respective amount of power savings that is possible to save. The effectiveness
of the proposed heuristics of PREFiguRE are demonstrated using a set of traces from
enterprise storage systems. PREFiguRE can also be used forworkloads with dynamic
idle period distribution if combined with workload forecasting techniques, which we
leave as our future work.
REFERENCES
D. Colarelli and D. Grunwald. 2002. Massive arrays of idle disks for storage archives. In Proceedings of the
ACM/IEEE Conference on Supercomputing. 1–11.
L. B. Costa, S. Al-Kiswany, R. V. Lopes, and M. Ripeanu. 2011. Assessing data deduplication trade-
offs from an energy and performance perspective. In Proceedings of the 2011 International
Green Computing Conference and Workshops (IGCC’11). 1–6. DOI:http://dx.doi.org/10.1109/IGCC.2011.
6008567
Fred Douglis, P. Krishnan, and Brian Marsh. 1994. Thwarting the power-hungry disk. In Proceedings of the
1994 Winter USENIX Conference. 293–306.
John K. Edwards, Daniel Ellard, Craig Everhart, Robert Fair, Eric Hamilton, Andy Kahn, Arkady Kanevsky,
James Lentini, Ashish Prakash, Keith A. Smith, and Edward Zayas. 2008. FlexVol: Flexible, efficient
file volume virtualization in WAFL. In Proceedings of the USENIX 2008 Annual Technical Conference
on Annual Technical Conference (ATC’08). 129–142.
L. Eggert and J. D. Touch. 2005. Idletime scheduling with preemption intervals. In Proceedings of the 20th
ACM Symposium on Operating Systems Principles (SOSP’05). 249–262.
Anshul Gandhi, Mor Harchol-Balter, Rajarshi Das, and Charles Lefurgy. 2009. Optimal power allocation in
server farms. ACM SIGMETRICS Performance Evaluation Review 37, 1, 157–168.
Rajat Garg, Seung Woo Son, Mahmut T. Kandemir, Padma Raghavan, and Ramya Prabhakar. 2009. Markov
model based disk power management for data intensive workloads. In Proceedings of the 9th IEEE/ACM
International Symposium on Cluster Computing and the Grid (CCGRID’09). 76–83.
D. P. Gaver Jr. 1962. A waiting line with interrupted service, including priorities. Journal of the Royal
Statistical Society. Series B (Methodological) 24, 73–90.
Donald P. Gaver Jr. 1959. Imbedded Markov chain analysis of a waiting-line process in continuous time.
Annals of Mathematical Statistics 30, 3, 698–720.
Paul M. Greenawalt. 1994. Modeling power management for hard disks. In Proceedings of the 2nd Interna-
tional Workshop on Modeling, Analysis, and Simulation on Computer and Telecommunication Systems
(MASCOTS’94). 62–66.
Laura Grupp, John Davis, and Steven Swanson. 2012. The bleak future of NAND flash memory. In Proceed-
ings of the USENIX Conference on File and Storage Technologies.
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
10:26 F. Yan et al.
S. Gurumurthi, A. Sivasubramaniam, M. Kandemir, and H. Franke. 2003. DRPM: Dynamic speed control
for power management in server class disks. In Proceedings of the Annual International Symposium on
Computer Architecture (ISCA’03). 169–180.
Hitachi Global Storage Technologies 2007. Power and Acoustics Management.
H. Huang, W. Hung, and K. G. Shin. 2005. FS2: Dynamic data replication in free disk space for improving
disk performance and energy consumption. In Proceedings of the 20th ACM Symposium on Operating
Systems Principles (SOSP’05), Vol. 39. 263–276.
Sandy Irani, Sandeep Shukla, and Rajesh Gupta. 2003. Online strategies for dynamic power management
in systems with multiple power-saving states. ACM Transactions in Embedded Computing Systems 2,
325–346.
Narendra Kumar Jaiswal. 1968. Priority Queues. Elsevier.
Julian Keilson. 1962. Queues subject to service interruption. Annals of Mathematical Statistics 33, 4, 1314–
1322.
Patricia Kim and Mike Suk. 2007. Ramp Load/Unload Technology in Hard Disk Drives. Available at https://
www.hgst.com/sites/default/files/resources/LoadUnload_white_paper_FINAL.pdf.
Rachita Kothiyal, Vasily Tarasov, Priya Sehgal, and Erez Zadok. 2009. Energy and performance evaluation of
lossless file data compression on server systems. In Proceedings of the Israeli Experimental Systems Con-
ference (SYSTOR’09). ACM, New York, NY, Article No. 4. DOI:http://dx.doi.org/10.1145/1534530.1534536
D. Li and J. Wang. 2004. EERAID: Energy efficient redundant and inexpensive disk array. In Proceedings of
the 11th ACM SIGOPS European Workshop.
Kester Li, Roger Kumpf, Paul Horton, and Thomas Anderson. 1994. A quantitative analysis of disk drive
power management in portable computers. In Proceedings of the USENIX Winter 1994 Technical Con-
ference. 22–36.
Yung-Hsiang Lu and Giovanni De Micheli. 1999. Adaptive hard disk power management on personal com-
puters. In Proceedings of the IEEE Great Lakes Symposium. 50–53.
X. Mountrouidou, A. Riska, and E. Smirni. 2011. Saving power without compromising disk drive reliability.
In Proceedings of the Workshop on Energy Consumption and Reliability of Storage Systems.
D. Narayanan, A. Donnelly, and A. I. T. Rowstron. 2008. Write off-loading: Practical power management
for enterprise storage. In Proceedings of the USENIX Conference on File and Storage Technologies
(FAST’08). 253–267.
Dushyanth Narayanan, Eno Thereska, Austin Donnelly, Sameh Elnikety, and Antony Rowstron. 2009. Mi-
grating server storage to SSDs: Analysis of tradeoffs. In Proceedings of ACM EuroSys Conference.
145–158.
E. Pinheiro, R. Bianchini, and C. Dubnicki. 2006. Exploiting redundancy to conserve energy in storage
systems. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer
Systems (SIGMETRICS’06/Performance’06). 15–26.
Dinesh Ramanathan, Sandy Irani, and Rajesh Gupta. 2000. Latency effects of system level power manage-
ment algorithms. In Proceedings of the 2000 IEEE/ACM International Conference on Computer-Aided
Design (ICCAD’00). 350–356.
A. Riska and E. Riedel. 2006. Disk drive level workload characterization. In Proceedings of the USENIX
Annual Technical Conference. 97–103.
A. Riska and E. Smirni. 2010. Autonomic exploration of trade-offs between power and performance in disk
drives. In Proceedings of the 7th IEEE/ACM International Conference on Autonomic Computing and
Communications (ICAC’10). 131–140.
Seagate Technology 2012. Constellation ES Product Overview: High Capacity Storage Designed for Seamless
Enterprise Integration. Available at http://www.seagate.com.
Seagate Technology 2014. Seagate Enterprise Capacity 3.5 HDD v4 Serial ATA Product Manual. Available
at http://www.seagate.com.
Anil Vasudeva 2011. Are SSDs Ready for Enterprise Storage Systems? Available at http://www.snia.org/.
A. Verma, R. Koller, L. Useche, and R. Rangaswami. 2010. SRCMap: Energy proportional storage using
dynamic consolidation. In Proceedings of the 8th USENIX Conference on File and Storage Technologies
(FAST’10). 154–168.
Feng Yan, Xenia Mountrouidou, Alma Riska, and Evgenia Smirni. 2012. Quantitative estimation of the
performance delay with propagation effects in disk power savings. In Proceedings of the 2012 USENIX
HotPower Workshop.
X. Yao and J. Wang. 2006. RIMAC: A redundancy-based, hierarchical I/O architecture for energy-efficient
storage systems. In Proceedings of the 1st ACM EuroSys Conference. 249–262.
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.
PREFiguRE: An Analytic Framework for HDD Management 10:27
Q. Zhu, Z. Chen, L. Tan, Y. Zhou, K. Keeton, and J. Wilkes. 2005. Hibernator: Helping disk arrays sleep
through the winter. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’05).
177–190.
Q. Zhu, F. M. David, C. F. Devaraj, Z. Li, and Y. Zhou. 2004. Reducing energy consumption of disk storage using
power-aware cache management. In Proceedings of the International Symposium on High-Performance
Computer Architecture (HPCA’04). 118–129.
Received December 2014; revised November 2015; accepted December 2015
ACM Trans. Model. Perform. Eval. Comput. Syst., Vol. 1, No. 3, Article 10, Publication date: May 2016.