Lecture 4 ITI

ITI
Instructor:
Dr. Shireen Tahira
NON-FUNCTIONAL ATTRIBUTES
• IT infrastructures provide services to applications.
• Some of these infrastructure services can be well defined as a
function, like providing disk space, or routing network messages.
• Non-functional attributes, on the other hand, describe the
qualitative behaviour of a system, rather than specific
functionalities.
• Some examples of non-functional attributes are:
• Availability
• Scalability
• Reliability
• Stability
• Testability
• Recoverability
Cont..
• The most important non-functional attributes
for most IT infrastructures are security,
performance, and availability.
• When credit card transactions are not stored
in a secure way in the infrastructure, and as a
result leak to hackers, the organization that
stored the credit card data will have a lot of
explaining to do to their customers.
Cont..
• Non-functional attributes are very functional
indeed, but they are not directly related to the
primary functionalities of a system.
• Instead of using the term non-functional
attribute, it would be much better to use the
term quality attributes.
• architects and certainly infrastructure specialists are typically very
aware of the importance of non-functional attributes of their
infrastructure,
– many other stakeholders may not have the same feelings about
them.
• Users normally think of functionalities,
while non-functional attributes are
considered a hygiene factor and taken for
granted.
Example
• A car has to bring you from A to B, but many quality
attributes are taken for granted. For instance, the car
has to be safe to drive in (leading to the
implementation of anti-lock brakes, air bags, and
safety belts) and reliable (the car should not break
down every day), and the car must adhere to certain
industry standards (the gas pedal must be the right-
most pedal). All of these extras cost money and might
complicate the design, construction, and
maintenance of the car. While all clients have these
non-functional requirements, they are almost never
expressed as such when people are ordering a new
car.
Non-functional Requirements
• It is the IT architect or requirements engineer’s
job to find implicit requirements on non-
functional attributes (the non-functional
requirements - NFRs).
• This can be very hard, since what is obvious or
taken for granted by the customers or end users
of a system is not always obvious to the designers
and builders of the system.
• the acceptance of a system is largely dependent
on the implemented non-functional
requirements.
Cont..
• A website can be very beautiful and
functional, but if loading the site
(performance, a non-functional requirement)
takes 30 seconds, most customers are gone!
• A large part of the budget for building an
infrastructure is usually spent in fulfilling non-
functional requirements that are not always
clearly defined.
Cont..
• Most stakeholders have no clue how hard it
can be to realize a certain nonfunctional
requirement. It sometimes helps to quantify
these requirements; to make them explicit:
“How bad would it be if the website was not
available for 5 minutes per day?” or “What if
it will take $500,000 to implement this
requirement? Is it still important then?”
AVAILABILITY CONCEPTS
• Everyone expects their infrastructure to be
available all the time.
• In this age of global, always-on, always connected
systems, disturbances in availability are noticed
immediately.
• A 100% guaranteed availability of an
infrastructure, however, is impossible.
• No matter how much effort is spent on creating
high available infrastructures, there is always a
chance of downtime. It's just a fact of life.
Cont..
• According to a survey from the 2014 Uptime
Symposium [4] , 46% of companies using their
own datacenter had at least one “business-
impacting” datacenter outage over 12
months.
Calculating availability
• In general, availability can neither be calculated, nor
guaranteed upfront.
• It can only be reported on afterwards, when a system has run
for some years.
• Over the years, much knowledge and experience is gained on
how to design high available systems, using design patterns
like
– failover,
– redundancy,
– structured programming,
– avoiding Single Points of Failures (SPOFs),
– and implementing sound systems management.
Availability percentages and intervals
• The availability of a system is usually expressed
as a percentage of uptime in a given time
period (usually one year or one month).
• The following table shows the maximum
downtime for a particular percentage of
availability.
Availability percentages and intervals
• Typical requirements used in service level
agreements today are 99.8% or 99.9% availability
per month for a full IT system.
• To meet this requirement, the availability of the
underlying infrastructure must be much higher,
typically in the range of 99.99% or higher.
• 99.999% uptime is also known as carrier grade
availability; this level of availability originates
from telecommunication system components (not
full systems!) that need an extremely high
availability.
• Higher availability levels for a complete system
are very uncommon, as they are almost
impossible to reach.
Unavailability Frequency
• 99.9% uptime means 525 minutes of downtime per year, this
downtime should not occur in one event, nor should one-minute
downtimes occur 525 times a year.
• It is therefore good practice to agree on the maximum frequency
of unavailability. An example is shown in Table
Cont..
• In this example, it means that the system can
be unavailable for 25 minutes no more than
twice a year.
• It is also allowed, however, to be unavailable
for 3 minutes three times each month.
• For each availability requirement, a frequency
table should be provided, in addition to each
given availability percentage.
MTBF and MTTR
• The factors involved in calculating availability
are:
• Mean Time Between Failures (MTBF), which is
the average time that passes between failures.
• Mean Time To Repair (MTTR), which is the time
it takes to recover from a failure.
MTBF (Mean Time between Failure)
• The MTBF is expressed in hours (how many hours will the

component or service work without failure). Some typical MTBF
figures are shown in Table.
• It is important to understand how these numbers are calculated.

• No manufacturer can test if a hard disk will continue to work
without failing for 750,000 hours (=85 years).
MTBF
• Instead, manufacturers run tests on large batches of components.
For instance in case of hard disks, 1000 disks could have been
tested for 3 months. If in that period of time five disks fail, the
MTBF is calculated as follows:
• The test time is 3 months. One year has four of those periods. So,
if the test would have lasted one year, 4 × 5 = 20 disks would
have failed.
• In one year, the disks would have run: 1000 disks × 365 × 24 =
8,760,000 running hours.
• Total operational time/ Total number of failures =
MTBF
• This means that the MTBF= 8760000/no of failed devices= 438000
hours
• Another example:
• An asset may have been operational for 1,000 hours in a year.
MTTR (Mean Time To Repair)
• When a component breaks, it needs to be repaired.
• Usually the repair time (expressed as Mean Time To
Repair – MTTR) is kept low by having a service
contract with the supplier of the component.
• Sometimes spare parts are kept onsite to lower the
MTTR (making MTTR more like Mean Time To
Replace).
• Total maintenance time / Total number of repairs
= MTTR
• A simple example of MTTR might look like this: if you
have a pump that fails four times in one workday and
you spend an hour repairing each of those instances
of failure, your MTTR would be 15 minutes (60
MTTR
Some examples of what might be needed for to complete repairs
are:
• Notification of the fault (time before seeing an alarm message)
• Processing the alarm
• Finding the root cause of the error
• Looking up repair information
• Getting spare components from storage
• Having technician come to the data centre with the spare
component
• Physically repairing the fault
• Restarting and testing the component
Instead of these manual actions, the best way to keep the MTTR
low is to introduce automated redundancy and failover, that
will be discussed later.
Calculation
• Decreasing MTTR and increasing MTBF both increase availability.
• Dividing MTBF by the sum of MTBF and MTTR results in the
availability expressed as a percentage:
• Availability =(MTBF/(MTBF+MTTR)) x 100%
• For example:
• A power supply's MTBF is 150,000 hours.
• This means that on average this power supply fails once every
150,000 hours (= once per 17 years).
• If the time to repair the power supply is 8 hours, the availability
can be calculated as follows:
• This means that because of the repair time alone this component
can never reach an average availability of 99.999%!
Serial Availability
• As system complexity increases, usually availability decreases.
• When a failure of any one part in a system causes a failure of
the system as a whole, the availability is called serial availability.
• E.g a server consists of the following components and the MTTR
of any part of the server is 8 hours.
Serial Availability
• The availability of the total server is:
• This is lower than the availability of any single component in the

system.
• To increase the availability, systems (composed of a various
components) can be deployed in parallel.
Systems in Parallel
• The chance of both systems being unavailable
at the same time is very small and can be
calculated as follows:
• A= 1 - (1 - A 1) n
• where
• A = Availability
• n = Number of systems in parallel
• A1 = The availability of one system
• When A1 (the availability of one system) is
estimated to be 99% (which is very
pessimistic as explained above), the combined
Systems in Parallel

Lecture 4 ITI

Uploaded by

Copyright:

Available Formats

Lecture 4 ITI

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 4 ITI

Uploaded by

Copyright:

Available Formats

ITI

• The MTBF is expressed in hours (how many hours will the

• It is important to understand how these numbers are calculated.

• This is lower than the availability of any single component in the

You might also like