HIAA Manual
HIAA Manual
HIAA Manual
MK-96HIAA004-00
May 2016
© 2016 Hitachi, Ltd. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic
or mechanical, including copying and recording, or stored in a database or retrieval system for
commercial purposes without the express written permission of Hitachi, Ltd., or Hitachi Data Systems
Corporation (collectively “Hitachi”). Licensee may make copies of the Materials provided that any such
copy is: (i) created as an essential step in utilization of the Software as licensed and is used in no
other manner; or (ii) used for archival purposes. Licensee may not make any other copies of the
Materials. “Materials” mean text, data, photographs, graphics, audio, video and documents.
Hitachi reserves the right to make changes to this Material at any time without notice and assumes
no responsibility for its use. The Materials contain the most current information available at the time
of publication.
Some of the features described in the Materials might not be currently available. Refer to the most
recent product announcement for information about feature and product availability, or contact
Hitachi Data Systems Corporation at https://support.hds.com/en_us/contact-us.html.
Notice: Hitachi products and services can be ordered only under the terms and conditions of the
applicable Hitachi agreements. The use of Hitachi products is governed by the terms of your
agreements with Hitachi Data Systems Corporation.
By using this software, you agree that you are responsible for:
1. Acquiring the relevant consents as may be required under local privacy laws or otherwise from
authorized employees and other individuals to access relevant data; and
2. Verifying that data continues to be held, retrieved, deleted, or otherwise processed in
accordance with relevant laws.
Notice on Export Controls. The technical data and technology inherent in this Document may be
subject to U.S. export control laws, including the U.S. Export Administration Act and its associated
regulations, and may be subject to export or import regulations in other countries. Reader agrees to
comply strictly with all such regulations and acknowledges that Reader has the responsibility to obtain
licenses to export, re-export, or import the Document and any Compliant Products.
Hitachi is a registered trademark of Hitachi, Ltd., in the United States and other countries.
AIX, AS/400e, DB2, Domino, DS6000, DS8000, Enterprise Storage Server, eServer, FICON,
FlashCopy, IBM, Lotus, MVS, OS/390, PowerPC, RS/6000, S/390, System z9, System z10, Tivoli,
z/OS, z9, z10, z13, z/VM, and z/VSE are registered trademarks or trademarks of International
Business Machines Corporation.
Active Directory, ActiveX, Bing, Excel, Hyper-V, Internet Explorer, the Internet Explorer logo,
Microsoft, the Microsoft Corporate Logo, MS-DOS, Outlook, PowerPoint, SharePoint, Silverlight,
SmartScreen, SQL Server, Visual Basic, Visual C++, Visual Studio, Windows, the Windows logo,
Windows Azure, Windows PowerShell, Windows Server, the Windows start button, and Windows Vista
are registered trademarks or trademarks of Microsoft Corporation. Microsoft product screen shots are
reprinted with permission from Microsoft Corporation.
All other trademarks, service marks, and company names in this document or website are properties
of their respective owners.
2
HIAA Data Analytics and Performance Monitoring Overview
Contents
Preface................................................................................................. 5
Intended audience................................................................................................... 6
Product version........................................................................................................6
Related documents.................................................................................................. 6
Document conventions............................................................................................. 6
Conventions for storage capacity values.....................................................................7
Accessing product documentation............................................................................. 8
Getting help.............................................................................................................8
Comments...............................................................................................................8
1 Introduction..................................................................................... 9
Product overview................................................................................................... 10
Key features.......................................................................................................... 11
Unified infrastructure monitoring dashboard....................................................... 11
Advanced reporting...........................................................................................12
SLO management............................................................................................. 12
System and resource events.............................................................................. 12
End-to-end monitoring...................................................................................... 13
Problem identification and root cause analysis.................................................... 14
Logging on to Infrastructure Analytics Advisor .........................................................14
Accessing Data Center Analytics.............................................................................. 15
3
HIAA Data Analytics and Performance Monitoring Overview
3 End-to-end performance troubleshooting.......................................... 29
Identifying performance problems........................................................................... 30
Infrastructure components and key performance metrics.......................................... 30
Troubleshooting high response times....................................................................... 32
Troubleshooting workflow....................................................................................... 32
Detecting performance problems.............................................................................33
Analyzing performance bottleneck........................................................................... 34
Analyzing in E2E view....................................................................................... 34
Analyzing in Verify Bottleneck window................................................................ 35
Analyzing in Sparkline view................................................................................36
Analyzing in Detail view.....................................................................................37
Analyzing the root cause of the bottleneck............................................................... 38
Identify affected resources................................................................................ 38
Analyze shared resources.................................................................................. 38
Analyze related changes....................................................................................39
Solving performance problems................................................................................ 41
4
HIAA Data Analytics and Performance Monitoring Overview
Preface
This preface includes the following information:
□ Intended audience
□ Product version
□ Related documents
□ Document conventions
□ Getting help
□ Comments
Preface 5
HIAA Data Analytics and Performance Monitoring Overview
Intended audience
This document provides an overview of the Hitachi Infrastructure Analytics
Advisor software. This document is intended for storage administrators and
infrastructure administrators.
Product version
This document revision applies to Infrastructure Analytics Advisor 2.0 or later.
Related documents
The following documents are referenced or contain more information about
the features described in this manual.
Document conventions
This document uses the following typographic conventions:
Convention Description
Bold • Indicates text in a window, including window titles, menus, menu options,
buttons, fields, and labels. Example:
Click OK.
• Indicates emphasized words in list items.
Italic • Indicates a document title or emphasized words in text.
• Indicates a variable, which is a placeholder for actual text provided by the
user or for output by the system. Example:
pairdisplay -g group
(For exceptions to this convention for variables, see the entry for angle
brackets.)
Monospace Indicates text that is displayed on screen or entered by the user. Example:
pairdisplay -g oradb
6 Preface
HIAA Data Analytics and Performance Monitoring Overview
Convention Description
• Variables in headings.
[ ] square brackets Indicates optional values. Example: [ a | b ] indicates that you can choose a,
b, or nothing.
{ } braces Indicates required or expected values. Example: { a | b } indicates that you
must choose either a or b.
| vertical bar Indicates that you have a choice between two or more options or arguments.
Examples:
WARNING Warns the user of a hazardous situation which, if not avoided, could
result in death or serious injury.
Logical storage capacity values (for example, logical device capacity) are
calculated based on the following values:
Preface 7
HIAA Data Analytics and Performance Monitoring Overview
Logical capacity unit Value
Open-systems:
• OPEN-V: 960 KB
• Others: 720 KB
1 KB 1,024 (210) bytes
1 MB 1,024 KB or 1,0242 bytes
1 GB 1,024 MB or 1,0243 bytes
1 TB 1,024 GB or 1,0244 bytes
1 PB 1,024 TB or 1,0245 bytes
1 EB 1,024 PB or 1,0246 bytes
Getting help
Hitachi Data Systems Support Connect is the destination for technical support
of products and solutions sold by Hitachi Data Systems. To contact technical
support, log on to Hitachi Data Systems Support Connect for contact
information: https://support.hds.com/en_us/contact-us.html.
Comments
Please send us your comments on this document to doc.comments@hds.com.
Include the document title and number, including the revision level (for
example, -07), and refer to specific sections and paragraphs whenever
possible. All comments become the property of Hitachi Data Systems
Corporation.
Thank you!
8 Preface
HIAA Data Analytics and Performance Monitoring Overview
1
Introduction
This module introduces Infrastructure Analytics Advisor.
□ Product overview
□ Key features
Introduction 9
HIAA Data Analytics and Performance Monitoring Overview
Product overview
With Infrastructure Analytics Advisor, you can define and monitor storage
service level objectives (SLOs) for resource performance. You can identify
and analyze historical performance trends to optimize storage system
performance and plan for capacity growth.
10 Introduction
HIAA Data Analytics and Performance Monitoring Overview
Key features
The key features of Infrastructure Analytics Advisor are described in this
section.
The consolidated dashboard view allows for the unified management of the
server, storage, and network infrastructure resources. You can ensure the
health of your data center by proactively monitoring the consumer groups,
storage components, volumes, VMs, servers, and network devices. The
advanced visual analytics aids in visualizing the performance data in easy-to-
use graphs and charts. The visual cues allow for intuitive performance
management.
Introduction 11
HIAA Data Analytics and Performance Monitoring Overview
Advanced reporting
Infrastructure Analytics Advisor reporting capabilities enable you to monitor
the infrastructure resources and assess their current performance, capacity
and utilization. Reporting data provides you the information you need to
make informed business decisions and plan for future growth.
Standard reports:
• Default reports. The first time you log on to Infrastructure Analytics
Advisor, the Dashboard shows the following reports by default: System
Status Summary, Event Trends, System Resource Status , and Resource
Events. You can customize which reports display by default.
• Critical reports. Critical reports show resources in your storage
infrastructure that exceeded their thresholds. Critical reports are available
for consumers, VMs, volumes, hosts, and system resources.
• Summary reports. Summary reports give you a high-level view of storage
infrastructure resources. These reports are available for consumers, VMs,
volumes, and system resources. Each summary report shows the number
of resources with critical and warning alerts.
• Other reports. Infrastructure Analytics Advisor provides additional reports
about hypervisors, switches, and system and resource events.
Custom reports:
By integrating with Data Center Analytics, you can create custom reports by
running queries on performance data that is collected from monitored
resources. You can also create real-time and historical reports that are
specific to your business needs.
SLO management
SLOs are measurable parameters which are defined for monitoring the
performance of user resources. With Infrastructure Analytics Advisor you can
evaluate, define, and customize the service level objectives defined for the
monitored resources such as volumes and VMs. By monitoring the SLOs you
can determine whether your objectives comply with your business
requirements.
12 Introduction
HIAA Data Analytics and Performance Monitoring Overview
The Events tab allows you to display details about significant events in your
monitored environment.
The All Events tab displays both Resource events and System events, and
you are only able to display the end-to-end network topology view when you
select a Performance event.
Each event indicates the level of the alert, the date and time of the alert
message, category, device name, and component name. Click a message in
the Message column to display the Event Detail window.
Use the Event Detail window to display more event details, such as the
device type and component type. You can click Up and Down to scroll through
more events. If the event is a Resource event, you can click Show E2E View
to view the network topology.
End-to-end monitoring
The E2E topology view provides detailed configuration of the infrastructure
resources and lets you view the relationship between the infrastructure
Introduction 13
HIAA Data Analytics and Performance Monitoring Overview
components. You can manually analyze the dependencies between the
components in your environment and identify the resource causing
performance problems. By using the topology maps, you can easily monitor
and manage your resources. You can use this view to monitor resources in
your data center from applications, virtual machines, server, network to
storage.
In the E2E view, each node represents a resource and the connecting links
represent the relationship between the infrastructure components. You can
analyze a resource which is the target of analysis and all the associated
resources. You can also view the alerts associated with all the related
resources and trace the problem at the root level. The node based E2E view
helps you analyze the problem on the affected node and its impact on the
rest of the infrastructure resources.
Procedure
where:
• ip-address is the IP address of the Infrastructure Analytics Advisor
management server.
• port-number is the port number of the Infrastructure Analytics Advisor
management server. The default port number is 22015.
To access Infrastructure Analytics Advisor in secure mode, enter:
https://ip-address:port-number/Analytics/login.htm
The default port number for secure mode is 22016.
3. Type a user ID and password to log on.
4. Click Log In.
14 Introduction
HIAA Data Analytics and Performance Monitoring Overview
Accessing Data Center Analytics
Use Data Center Analytics to conduct historical trend analysis across a wide
set of infrastructure statistics, create advanced monitoring custom reports,
and interactively do additional troubleshooting and diagnostics.
Access Data Center Analytics from the Tools menu. Type a user ID and
password to log on.
Use the Data Center Analytics online help to view details about reporting
tasks and features.
Introduction 15
HIAA Data Analytics and Performance Monitoring Overview
16 Introduction
HIAA Data Analytics and Performance Monitoring Overview
2
Performance monitoring using
advanced threshold settings
Infrastructure Analytics Advisor ensures health of your data center by
measuring, monitoring, and optimizing the performance of your infrastructure
resources.
□ Threshold profiles
□ Dynamic thresholds
□ Static thresholds
The profile details page contains information about the profile name,
description, and if the profile uses the preset parameters defined in the
monitoring template.
Dynamic thresholds
Dynamic thresholds are calculated automatically by analyzing the load
pattern from the historical data. These values are adaptive in nature and
changes over a period of time depending on the performance of your
resources, workload changes and so on. You can monitor only the user
resources, such as volumes, VMs, and hosts by using dynamic thresholds.
The scenarios when you would use dynamic threshold values for monitoring
your environment are as follows:
• When SLOs and other performance parameters are not established with
the customer
• When you want to monitor your environment for stable performance and
detect irregular behavior
Manually altering the thresholds each time there is a change in the system
dynamics is a futile effort. By automating the threshold setting you gain
better visibility into your environment and performance trend patterns.
Dynamic thresholds adapt to your environment and proactively sends alerts
before the performance bottleneck occurs.
The application workloads might vary at different times of the day or week.
For example, the workload pattern of an OLTP application might be different
on weekdays and weekends. You can manage varying workloads that occur at
different time periods for an application by creating monitoring plans. The
system analyzes the performance data accumulated in the scheduled baseline
period for computing the dynamic threshold values.
• Detects and removes the occasional outliers: In the following example, the
data points that deviates from the norm represent the outliers. The system
ignores the outliers appearing at irregular intervals to calculate an
appropriate threshold value.
• Calculates the maximum value: The upper limit of the values in the normal
range is used to calculate the maximum value. After determining the
maximum value, the system adds the margin of error to the computed
value.
Using the user resource threshold profile, you can apply dynamic thresholds
across user resources within your environment. For example, using a user
resource threshold profile, you can apply a dynamic threshold setting for all
volumes in an application.
You can create monitoring plans for an OLTP application, whose workloads
vary during weekdays and weekends. You can also create a separate plan for
monitoring batch jobs that run at night. The procedure for enabling dynamic
thresholds is as follows:
Procedure
You can use predefined static threshold values in the following scenarios:
• When you have a well-defined service level objective which clearly
establishes the performance goals.
For example, if you have a service level agreement with the customer to
support online transactions at a response time of less than 1 second for a
business critical application, then you can create a User resource threshold
profile to establish the response time and other performance requirements
for the application and then assign the target resources for monitoring. If
there is a SLO violation, the system sends a critical alert or a warning and
notifies the user before the problem becomes serious. You can also
generate a report that compares the actual response time of the business
critical application to the SLO and see if your objectives are in compliance
and take necessary measures to fix the problem.
• When you can assess the workload patterns in your environment and know
what values to assign
For example, define the threshold for a system resource based on the
architecture of the storage system. If the storage system is VSP G1000,
then the recommended MPB (MP Blade) usage is under 60%.
Create threshold profiles for user or system resources based on the resource
type, and then assign the resources you want to monitor.
Procedure
Procedure
□ Troubleshooting workflow
Memory VM ESX VM
contention • Balloon • Usage % • Usage %
• Balloon* • Active memory
VM • Balloon
• Balloon*
Pool
• Utilization
Parity Group
• Utilization %
• Read Hit %
Troubleshooting workflow
The basic workflow for analyzing and troubleshooting the performance
problems using Infrastructure Analytics Advisor is as follows:
Dashboard
The dashboard displays when you log on to the Infrastructure Analytics
Advisor. You can create a custom dashboard, and choose to view the reports
of monitored resources.
In the following figure, the warnings display on the monitored VMs and
volumes. From the report widgets, you can click links to access the E2E view
to analyze the cause of the threshold violations.
Events tab
The Events tab displays a list of resource and system events. You can view
the severity of each event, date and time of the occurrence, category, device
Email notifications
Infrastructure Analytics Advisor allows you to configure email notifications.
When the threshold values are exceeded, the system sends an email to notify
you of the potential performance problem.
Search
The search feature in the Analytics tab lets you search for a resource in the
Consumers, Servers, Storage Systems and Volumes categories. From the
returned search results, you can select the resources you would like to
analyze, and launch the E2E view or Sparkline view for further analysis.
You can identify and analyze the component causing the bottleneck in any of
the following views:
• E2E view
• Analyze bottleneck > Verify Bottleneck tab
• Sparkline view
• Detail view
You can change the base point of analysis to narrow down the topology
associated with the affected volumes. Select the affected volume, right-click,
and then select Change Base Point.
In the Verify Bottleneck window, you can analyze the performance trends of
the potential bottleneck candidate with the base point resources. If the
performance charts display similar trend patterns in the same time period,
you can assume that the selected resource is the bottleneck candidate. If
not, you can repeat the analysis for other resources with alerts in the Verify
Bottleneck window.
The Sparkline view displays performance charts for multiple nodes in the
same pane to enable quick comparison between different nodes. You can
display detailed performance metrics for each node and find the correlation
with other nodes.
The volumes (00:00:03, 00:00:05, and 00:00:06) belong to the same parity
group. If the volumes (logical resources) share the same parity group
(physical resource) and if one of the logical volumes utilizes the parity group
more than the others in the shared infrastructure, the total efficiency of the
physical resource is degraded and the parity group utilization rate increases.
High Parity Group utilization rate causes delay in reading from, or writing to
disk in the parity group, which increases the response time of the application.
You can consider allocating the affected volumes to a different parity group
for load balancing. You can also check the I/O performance of the parity
group and see if any other servers access the same parity group to
troubleshoot the bottleneck.
Following are the high-level steps used to analyze the root cause in the
Analyze Shared Resource window:
You can resolve the bottleneck caused by the shared resources by adopting
efficient load balancing methodologies, which enables optimal utilization of
the resources in the shared infrastructure.
Following are the high-level steps used to analyze the root cause in the
Analyze Related Changes window:
1. In the Analyze Bottleneck window, click Analyze Related Changes tab.
2. In the Analyze Related Changes window, a combination chart that
combines the features of the line chart and the bar chart is displayed. In
the combination chart you can compare the performance data of the
bottleneck candidate with the system configuration changes for a
specified time period.
The details of the configuration change events that occurred in the
specified time period is displayed in the lower pane. You can analyze the
change events to see if any of these changes caused performance
variations in the bottleneck candidate. You can also zoom in on the
performance trend chart to select a shorter time period, and view the
change events that occurred in the selected time range.
The following table lists the commonly observed storage related problems
and possible solutions:
The following table lists the commonly observed server related problems and
possible solutions:
All reports are included in the Reports dock, and are available when you
select any storage system object in the storage systems hierarchy. Predefined
reports differ based on your selection of the storage system object. An
interactive chart and filtering resources enable you to view every detail in any
report. You can also filter reports to display the most relevant data, and can
print, create a PDF, and export a report to a CSV file.
You can also compare how one metric affects the other metrics. For example,
you can create an ad-hoc report that compares IOPS with Response Time.
This most commonly used report shows whether an increasing load on the
system (IOPS) affects the performance (response time).
Custom reports
If the predefined charts and ad-hoc are not sufficient, you can create custom
reports by building your own query. The Custom Reports feature is based on
the Data Center Analytics query language. This regex-based expressive query
language retrieves and filters the data in the Data Center Analytics database.
The Data Center Analytics query language allows complex analysis on the
data in real time with constant run-time. The syntax makes it possible to
traverse relations, identify the patterns in the data, and establish a
comparison between metrics of a single component or multiple nodes.
The Data Center Analytics UI helps you build your custom query in the
following three ways:
• Write the query directly using Data Center Analytics query language.
The problem could be in any storage component such as the front-end ports,
controllers, or disk drives. Infrastructure Analytics Advisor automatically
sends a notification to you when a monitored metric of a storage component
exceeds the defined threshold. The notification contains details of the
component that exceeded the threshold to enable you to quickly identify the
problem and troubleshoot it.
In the example, you navigate from the tree view of Data Center Analytics,
which shows a hierarchical representation of the various storage system
objects, to the highlighted storage system, and then selects an object to
analyze. In this example, controller 0 exceeds the defined threshold of a
monitored metric.
You choose to review similar Configuration and Performance reports for other
components, DP Pools, RAID Groups and other storage array components to
analyze the affect on performance at the application level.
Because of the short time window in which the report is created, the change
in capacity is minimal, but for a longer period of time, it will be more visible.
These reports are useful for you to do additional capacity planning closer to
the time of actual requirement.
Regional Contact
Information
Americas
+1 408 970 1000
info@hds.com
Asia Pacific
+852 3189 7900
hds.marketing.apac@hds.
com
MK-96HIAA004-00
May 2016