Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Dell Emc Xtremio Storage Array: Fru Replacement Procedures

Download as pdf or txt
Download as pdf or txt
You are on page 1of 154

DELL EMC CONFIDENTIAL

Dell EMC
XtremIO Storage Array
X1 Cluster Type
XMS Version 6.4.0
XIOS Versions 4.0.15, 4.0.25, 4.0.26, 4.0.27 and 4.0.31

FRU Replacement Procedures


P/N XD0-100-010
REV 01
DELL EMC CONFIDENTIAL

Copyright © 2022 Dell Inc. or its subsidiaries. All rights reserved. Published in the USA.

Published January, 2022

Dell believes the information in this publication is accurate as of its publication date. The information is subject to change without
notice.

The information in this publication is provided as is. Dell makes no representations or warranties of any kind with respect to the
information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use,
copying, and distribution of any Dell software described in this publication requires an applicable software license.

Dell, EMC, and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be the property of their respective
owners.

For the most up-to-date regulatory document for your product line, go to Dell EMC Online Support (https://support.emc.com).

2 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL

CONTENTS

Preface

Chapter 1 General Information


Required Tools and Part Numbers ............................................................... 10
Missing, Wrong or Damaged Components................................................... 11
Cable Management Brackets....................................................................... 11
Obtaining Item References for XtremIO FRU Procedures............................... 12
Accessing the XMS via a Cluster Storage Controller ..................................... 13
Option 1 - Accessing the XMS Using a Tunnel From the Storage Controller ..
14
Option 2 - Accessing the XMS Using PuTTY Port Forwarding (or Similar on
Another SSH client) from the Storage Controller .................................... 16
Checking the XtremIO Cluster Health........................................................... 17

Chapter 2 Replacing the Servers and Components


Replacing a Storage Controller .................................................................... 20
Tolerance .............................................................................................. 20
Accessing the XMS via a Cluster Storage Controller ............................... 21
Identifying the Defective Storage Controller........................................... 21
Confirming the Open Network Ports for Storage Controller Replacement 23
Checking the XtremIO Cluster Health ..................................................... 24
Replacing the Defective Storage Controller Using the Technician Advisor
Utility .................................................................................................... 24
Replacing Storage Controller DIMMs ........................................................... 27
Tolerance .............................................................................................. 27
Accessing the XMS via a Cluster Storage Controller ............................... 27
Determining if DIMM Replacement is Required ...................................... 27
Checking the XtremIO Cluster Health ..................................................... 29
Replacing the Defective DIMM Using the Technician Advisor Utility........ 30
Replacing Storage Controller Power Supply Units ........................................ 31
Tolerance .............................................................................................. 31
Accessing the XMS via a Cluster Storage Controller ............................... 31
Identifying the Defective Storage Controller Power Supply Unit.............. 31
Checking the XtremIO Cluster Health ..................................................... 32
Disabling All Notifiers............................................................................ 32
Replacing the Defective Storage Controller Power Supply Unit ............... 33
Configuring the Replaced Storage Controller Power Supply Unit ............ 34
Performing the Post Replacement Procedures ....................................... 35
Replacing an SFP+ ...................................................................................... 36
Tolerance .............................................................................................. 36
Accessing the XMS via a Cluster Storage Controller ............................... 36
Procedure Prerequisite .......................................................................... 36
Identifying the Defective SFP+ ............................................................... 37
Checking the Defective SFP+ Using an SFP+ Loopback Tool ................... 39
Checking the XtremIO Cluster Health ..................................................... 39
Disabling All Notifiers............................................................................ 39
Replacing a Defective SFP+.................................................................... 40
Performing the Post Replacement Procedures ....................................... 44

Dell EMC XtremIO Storage Array FRU Replacement Procedures 3


DELL EMC CONFIDENTIAL
Contents

Replacing the XMS ...................................................................................... 45


Tolerance .............................................................................................. 45
Accessing the XMS via a Cluster Storage Controller ............................... 45
Identifying the Defective XMS................................................................ 45
Checking the XtremIO Cluster Health ..................................................... 46
Replacing the Physical XMS................................................................... 46
Replacing a Virtual XMS ........................................................................ 51
Recovering the XMS .............................................................................. 55
Performing the Post Replacement Procedures ....................................... 62

Chapter 3 Replacing the DAE Components


Replacing the SSDs..................................................................................... 64
Tolerance .............................................................................................. 64
Accessing the XMS via a Cluster Storage Controller ............................... 64
Identifying the Defective SSD ................................................................ 65
Checking the XtremIO Cluster Health ..................................................... 65
Handling Defective SSDs, Detected by 5D SMART Error.......................... 66
Checking the XtremIO Cluster Health ..................................................... 66
Physically Locating the Defective SSD (Using LEDs) ............................... 66
Replacing a Defective SSD, Using the Technician Advisor Utility ............ 67
Replacing a DAE Chassis............................................................................. 68
Tolerance .............................................................................................. 68
Accessing the XMS via a Cluster Storage Controller ............................... 68
Identifying the Defective DAE Chassis.................................................... 68
Physically Locating the Defective DAE Chassis (Using LEDs) .................. 69
Checking the XtremIO Cluster Health ..................................................... 69
Replacing the Defective DAE Chassis..................................................... 69
Configuring the Replaced DAE Chassis .................................................. 72
Performing the Post Replacement Procedures ....................................... 74
Replacing the DAE Controllers (LCCs) .......................................................... 75
Tolerance .............................................................................................. 75
Accessing the XMS via a Cluster Storage Controller ............................... 75
Identifying the Defective DAE Controller ................................................ 75
Physically Locating the Defective DAE Controller (Using LEDs) ............... 76
Checking the XtremIO Cluster Health ..................................................... 77
Replacing the Defective DAE Controller.................................................. 77
Configuring the Replaced DAE Controller ............................................... 79
Performing the Post Replacement Procedures ....................................... 80
Replacing the DAE Power Supply Units........................................................ 81
Tolerance .............................................................................................. 81
Accessing the XMS via a Cluster Storage Controller ............................... 81
Identifying the Defective DAE Power Supply Unit ................................... 81
Checking the XtremIO Cluster Health ..................................................... 81
Replacing the Defective DAE Power Supply Unit..................................... 82
Configuring the Replaced DAE Power Supply Unit .................................. 83
Performing the Post Replacement Procedures ....................................... 84

Chapter 4 Replacing the InfiniBand Switch Components


Replacing the InfiniBand Switch.................................................................. 86
Tolerance .............................................................................................. 86
Accessing the XMS via a Cluster Storage Controller ............................... 86
Identifying the Defective InfiniBand Switch ........................................... 87
Checking the XtremIO Cluster Health ..................................................... 88

4 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Contents

Replacing the Defective InfiniBand Switch............................................. 88


Configuring the Replaced InfiniBand Switch .......................................... 90
Performing the Post Replacement Procedures ....................................... 93
Replacing the InfiniBand Switch Power Supply Units................................... 94
Tolerance .............................................................................................. 94
Accessing the XMS via a Cluster Storage Controller ............................... 94
Identifying the Defective InfiniBand Switch Power Supply Unit .............. 94
Checking the XtremIO Cluster Health ..................................................... 94
Replacing the Defective InfiniBand Switch Power Supply Unit................ 95
Performing the Post Replacement Procedures ....................................... 96
Replacing the InfiniBand Switch Fan Units .................................................. 97
Tolerance .............................................................................................. 97
Accessing the XMS via a Cluster Storage Controller ............................... 97
Identifying the Defective InfiniBand Switch Fan Unit .............................. 97
Checking the XtremIO Cluster Health ..................................................... 98
Replacing the Defective InfiniBand Switch Fan Unit ............................... 98
Performing the Post Replacement Procedures ....................................... 99

Chapter 5 Replacing the Battery Backup Units


Replacing a Battery Backup Unit (BBU) Using the Technician Advisor Utility102
Battery Backup Unit Types................................................................... 102
5P Battery Backup Unit Kit Sub-Parts................................................... 102
Tolerance ............................................................................................ 103
Accessing the XMS via a Cluster Storage Controller ............................. 103
Identifying the Defective BBU .............................................................. 103
Replacing a BBU.................................................................................. 104
Replacing a Serial Communication Cable for a 5P 1550i BBU .................... 105
Tolerance ............................................................................................ 105
Accessing the XMS via a Cluster Storage Controller ............................. 105
Verifying Failed Serial Communication Cables ..................................... 106
Disabling All Notifiers.......................................................................... 106
Replacing the Defective Cable ............................................................. 106
Verifying Replaced Serial Communication Cables ................................ 107
Performing the Post Replacement Procedures ..................................... 107

Appendix A Software Re-Installation


Writing the XtremIO Rescue Image to a USB Drive...................................... 110
Re-Installing a Storage Controller .............................................................. 112
Re-Installing a Physical XMS ..................................................................... 114

Appendix B Using LEDs to Identify Hardware Components


Hardware Components’ LEDs .................................................................... 118
Using the GUI to Activate Identification LEDs............................................. 119
Using the CLI to Activate the Identification LEDS ....................................... 120
control-led .......................................................................................... 120
show-leds ........................................................................................... 121

Appendix C Priority Failure Analysis

Appendix D Manually Replacing the Storage Controllers


Replacing a Storage Controller Manually ................................................... 126
Accessing the XMS via a Cluster Storage Controller ............................. 126

Dell EMC XtremIO Storage Array FRU Replacement Procedures 5


DELL EMC CONFIDENTIAL
Contents

Identifying the Defective Storage Controller......................................... 126


Confirming the Open Network Ports for Storage Controller Replacement .....
126
Checking the XtremIO Cluster Health ................................................... 127
Replacing the Defective Storage Controller .......................................... 127
Configuring the Replaced Storage Controller ....................................... 134
Fastening the Storage Controller Cables to the Cable Management Bracket
134
Installing the Bezel ............................................................................. 136
Performing the Post Replacement Procedures ..................................... 136
Removing the Old Storage Controller Disks.......................................... 137

Appendix E Manually Replacing the SSDs


Replacing an SSD Manually....................................................................... 140
Accessing the XMS via a Cluster Storage Controller ............................. 140
Identifying the Defective SSD .............................................................. 140
Checking the XtremIO Cluster Health ................................................... 140
Replacing a Defective SSD................................................................... 140
Troubleshooting.................................................................................. 145
Performing the Post Replacement Procedures ..................................... 145

Appendix F Post Replacement Procedures


Clearing Repeating Alert Counters ............................................................. 148
Generating and Collecting the Bundle ....................................................... 148
Uploading the Bundle Collection............................................................... 149
Disabling Path Redundancy Monitoring for VPLEX-Connected XtremIO Clusters
149
Checking the XtremIO Cluster Health (Post Replacement).......................... 150
Restoring All Notifiers ............................................................................... 150
Closing the Tunnel Between a Storage Controller and the XMS .................. 151
Sending Defective Component to Priority Failure Analysis ......................... 151

Appendix G Essential Pre-Customer-Visit Preparations for Technician Advisor Utility


Use
Checking the Network Ports with the Customer ......................................... 154
Preparing a Replacement Battery Backup Unit........................................... 154

6 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL

PREFACE

As part of an effort to improve its product lines, Dell EMC periodically releases revisions of
its software and hardware. Therefore, some functions described in this document might
not be supported by all versions of the software or hardware currently in use. The product
release notes provide the most up-to-date information on product features.
Contact your Dell EMC technical support professional if a product does not function
properly or does not function as described in this document.

Note: This document was accurate at publication time. Go to Dell EMC Online Support
(https://support.emc.com) to ensure that you are using the latest version of this
document.

Purpose
This document provides the required information for replacing EMC XtremIO Storage Array
Field Replaceable Units (FRUs) that have been identified as unserviceable.

Audience
This document is intended for the Dell EMC field support personnel.

Related Documentation
The following Dell EMC publications provide additional information:
 XtremIO Storage Array Technician Advisor Utility (Ver.2.X) User Guide
 XtremIO Storage Array Hardware Installation and Upgrade Guide
 XtremIO Storage Array Software Installation and Upgrade Guide
 XtremIO Storage Array User Guide
 XtremIO Storage Array Release Notes XIOS
 XtremIO Storage Array Release Notes XMS

Preface 7
DELL EMC CONFIDENTIAL
Preface

Conventions Used in this Document


Dell EMC uses the following conventions for special notices:

Note: A note presents information that is important, but not hazard-related.

Typographical conventions
Dell EMC uses the following type style conventions in this document:
Bold Use for names of interface elements, such as names of windows, dialog
boxes, buttons, fields, tab names, key names, and menu paths (what the
user specifically selects or clicks)
Italic Use for full titles of publications referenced in text
Monospace Use for:
• System output, such as an error message or script
• System code
• Pathnames, filenames, prompts, and syntax
• Commands and options
Monospace italic Use for variables.
Monospace bold Use for user input.
[] Square brackets enclose optional values
| Vertical bar indicates alternate selections — the bar means “or”
{} Braces enclose content that the user must specify, such as x or y or z
... Ellipses indicate nonessential information omitted from the example

Where to Get Help


Dell EMC support, product, and licensing information can be obtained as follows:
Product information — For documentation, release notes, software updates, or
information about Dell EMC products, go to Dell EMC Online Support at:
https://support.emc.com
Technical support — Go to Dell EMC Online Support and click Service Center. You will see
several options for contacting Dell EMC Technical Support. Note that to open a service
request, you must have a valid support agreement. Contact your Dell EMC sales
representative for details about obtaining a valid support agreement or with questions
about your account.

Your Comments
Your suggestions will help us continue to improve the accuracy, organization, and overall
quality of the user publications. Send your opinions of this document to:
techpubcomments@emc.com

8 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL

CHAPTER 1
General Information

This chapter includes the following topics:


 Required Tools and Part Numbers ........................................................................... 10
 Missing, Wrong or Damaged Components............................................................... 11
 Cable Management Brackets................................................................................... 11
 Obtaining Item References for XtremIO FRU Procedures........................................... 12
 Accessing the XMS via a Cluster Storage Controller ................................................. 13
 Checking the XtremIO Cluster Health....................................................................... 17

General Information 9
DELL EMC CONFIDENTIAL
General Information

Required Tools and Part Numbers


 It is recommended to wear an ESD bracelet or grounding heels when handling
hardware components.
 A #2 Phillips screwdriver is required for removing and tightening the screws of all
XtremIO hardware components.
 Install the XtremIO Technician Advisor Utility on your local machine. For details on
installing the Technician Advisor, refer to the XtremIO Technician Advisor Utility User
Guide that is posted in the XtremIO SolVe Generator, under XtremIO > XtremIO
X1 (XIOS 2.x, 3.x, 4.x) > Service Scripts and Utilities >
XtremIO Technician Advisor > Install XtremIO Technician
Advisor.

Note: Only Technician Advisor version 2.x supports the X1 cluster type.

 A KVM, or keyboard and monitor are required on-site in case there is a need to
re-install a physical XMS and/or Storage Controllers.
 To view the part numbers of the XtremIO cluster components, from the GUI hover the
mouse pointer over the desired component; a ToolTip appears, showing the
component’s details, including its part number.
 ssh command using command line or SSH client (e.g. PuTTY).
 sftp command using command line or SFTP client (e.g. WinSCP).
Example: Executing the sftp command using command line to transfer an XtremIO
Health Check Script to the XMS with the xmsupload credentials.
# sftp xmsupload@<IP of XMS>
...
Connected to <IP of XMS>.
sftp> pwd
Remote working directory: /images
sftp> cd scripts/
sftp> put system_health-v203.5.4-s4.0.0.py.gpg
Uploading system_health-v203.5.4-s4.0.0.py.gpg to /images/scripts/system_health-v203.5.4-s4.0.0.py.gpg
system_health-v203.5.4-s4.0.0.py.gpg 100% 10MB 45.6MB/s 00:00
sftp> exit

 The following tools are required for the Storage Controller and XMS replacement
procedures:
Prepare a USB flash drive to restore the Storage Controller or XMS to its original state.
• For details on preparing a USB flash drive before a Storage Controller replacement,
refer to “Re-Installing a Storage Controller” on page 112.
• For details on preparing a USB flash drive before an XMS replacement, refer to
“Re-Installing a Physical XMS” on page 114.

10 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
General Information

 For SFP+ replacement, the following tools are required:


• SFP+ extraction tool for raising the SFP+ bail. If an SFP+ extraction tool is not
available, use a flat-headed screwdriver instead.
• SFP+ loopback tool to further check the state of the SFP considered for
replacement.

Note: In Storage Controllers with XIOS version 4.0.25-22 (or later versions) and in XMS
with version 4.2.2-18 (or later versions) the sshd configuration was enhanced to
disable weak ciphers for SSL connectivity. This was done to enhance security and
resolve some vulnerabilities when this algorithm was used. Due to this change, some
older versions of the PuTTY SSH client and WinSCP SFTP client may cause an error to
occur when accessing or transferring files to or from the XMS. For further details, refer
to Dell EMC KB# 504645 (https://support.emc.com/kb/504645).
To avoid such errors, make sure that WinSCP and PuTTY (or any SSH client and SFTP
client used) are updated to the most recent version to enable accessing and
transferring files to, or from, the XMS.

Missing, Wrong or Damaged Components


For detailed information on how to handle missing, wrong or damaged items, access the
Missing, Wrong, or Damaged (MWD) Customer Complaints Capture System via the
following URL:
https://emcmwd.emc.com/default.asp

Cable Management Brackets


Each Storage Controller in clusters installed after Version 4.0 is fitted with a cable
management bracket on the server’s rear side.
The older clusters do not include cable management brackets. For all such clusters, ignore
any instructions regarding the cable management bracket in this guide.

Missing, Wrong or Damaged Components 11


DELL EMC CONFIDENTIAL
General Information

Obtaining Item References for XtremIO FRU Procedures


This section describes all of the items that are referenced for performing XtremIO FRU
procedures. Make sure that you obtain all of the following items prior to executing any
XtremIO FRU procedure described in this document:
 XtremIO SolVe Generator
 Technician Advisor installation packages - XtremIO SolVe Generator at
XtremIO > XtremIO X1 (XIOS 2.x, 3.x, 4.x) > Service Scripts
and Utilities > XtremIO Technician Advisor > Install XtremIO
Technician Advisor
 Technician Advisor User Guide - XtremIO SolVe Generator at XtremIO >
XtremIO X1 (XIOS 2.x, 3.x, 4.x) > Service Scripts and
Utilities > XtremIO Technician Advisor > Install XtremIO
Technician Advisor
 Dell EMC KB# 464336 - Resolve XtremIO Health Check Script Error - Master Article -
https://support.emc.com/kb/464336
 Dell EMC KB# 482666 - XtremIO: Replaced (FRU) Storage Controller (SC) May Not Start
on an Encryption Enabled Cluster Following a Cluster Power-Cycle or Shutdown -
https://support.emc.com/kb/482666
 Dell EMC KB# 479972 - XtremIO: RecoverPoint Replication During an XtremIO NDU May
Cause Data Unavailability - https://support.emc.com/kb/479972
 XtremIO Release Notes document for the XIOS/XMS version running on the affected
cluster, available at support.emc.com
 XtremIO Storage Array Software Installation and Upgrade Guide for the version running
on the affected cluster, available at support.emc.com
 For Storage Controller replacement:
• A USB drive for creating a bootable USB drive.
• A KVM or a connected keyboard and monitor.
• XtremIO Storage Controller Rescue Image, available at support.emc.com - Refer to
the XtremIO Release Notes document for details on which image to use for the
version running on the affected cluster.
• How-To video for Storage Controller Replacement, using Technician Advisor -
XtremIO SolVe Generator at XtremIO > XtremIO X1 (XIOS 2.x, 3.x,
4.x) > Service Scripts and Utilities > XtremIO Technician
Advisor > Install XtremIO Technician Advisor.
• XtremIO Storage Array Site Preparation Guide for the version running on the
affected cluster, available at support.emc.com

12 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
General Information

 For physical XMS replacement:


• A USB drive for creating a bootable USB drive.
• A KVM or a connected keyboard and monitor.
• XtremIO XMS Rescue Image available at support.emc.com - Refer to the XtremIO
Release Notes document for details on which image to use for the version running
on the affected cluster.
 For virtual XMS replacement:
• XtremIO Virtual XMS OVA package available at support.emc.com - Refer to the
XtremIO Release Notes document for details on which OVA package to use for the
version running on the affected cluster.
 For SSD replacement:
• Dell EMC KB# 205558 - XtremIO Dial Home (DH) Diagnostics detected a problem in
the SSD - https://support.emc.com/kb/205558

Accessing the XMS via a Cluster Storage Controller


Select one of the following options to access the XMS when physically attaching a
computer to a cluster’s X-Brick 1 active Storage Controller:
Option 1 - Open a tunnel to access the XMS via the Storage Controller and close it once
case is completed. Selecting this option compels the technician to coordinate with the
customer in order to initially access XMCLI on the XMS, and to then open a tunnel between
the XMS and the Storage Controller.
Option 2 - Use PuTTY port forwarding (or similar feature on another SSH client) to launch
an SSH session (and WebUI session) to the XMS, via the Storage Controller. This option
can be used only when the technician know the XMS IP address.

Accessing the XMS via a Cluster Storage Controller 13


DELL EMC CONFIDENTIAL
General Information

Option 1 - Accessing the XMS Using a Tunnel From the Storage Controller

Note: If you are using the Technician Advisor utility, instead of this procedure, you can
launch an XMCLI session on the XMS. For details, refer to the Technician Advisor Utility
User Guide matching the software version currently installed.

Note: This procedure is performed by the customer, able to access to XMCLI on the XMS.

Note: Make sure to access the XMS via a Storage Controller that is healthy.

Note: An alert is raised to inform the that the technician port tunnel is open. This alert is
cleared once the tunnel is closed.

To open a tunnel between the Storage Controller and the XMS:


1. Check which Storage Controller on X-Brick 1 is healthy and which Storage Controller on
the X-Brick is affected.
2. Run the following CLI command to check that the SSH firewall is disabled:
modify-ssh-firewall ssh-firewall-mode=unlocked
3. Run the show-clusters-info and show-storage-controllers XMCLI
commands to get Cluster-ID and Storage Controller ID for affected cluster and Storage
Controller.
4. Run the following modify-technician-port-tunnel CLI command to verify or open a
tunnel from the Storage Controller to the XMS:
modify-technician-port-tunnel cluster-id=<Cluster ID>
sc-id=<Storage Controller ID> open

Note: Since the Storage Controller may not be able to open an SSH tunnel due to
security issues, the tunnel is opened from the XMS’s side.

5. Connect the computer to the TECH Ethernet(RJ45) port (marked " 2") on the rear of
the of the healthy Storage Controller on X-Brick 1 (X1-SC1 or X1-SC2), as selected in
step 1 of this procedure.
6. Upon completion of the procedure (when access to XMS is no longer required), make
sure to close the tunnel.

14 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
General Information

To access the XMS via the Storage Controller on a tunnel between the XMS and Storage
Controller:

Note: This procedure should be performed after a tunnel is opened between the XMS and
Storage Controller.

1. Connect your laptop to the TECH Ethernet(RJ45) port (marked " 2") on the rear of
Storage Controller.
2. Configure your laptop's network interface with the following:
• IP: 169.254.254.2
• Prefix mask: 255.255.240.0
3. Connect to the XMS using the following ssh command:
ssh xmsadmin@169.254.254.1 -p 10022
The actual connection is made by SSH to the TECH Ethernet port’s IP address (static
169.254.254.1) on port 10022, with the Username xmsadmin (the password for
xmsadmin is supplied by Dell EMC).
The Storage Controllers can now forward traffic between the Ethernet Tech port and XMS.

To close the tunnel that was opened between the Storage Controller and the XMS:

Note: This procedure should be performed following the completion of the replacement,
once XMCLI access is no longer needed. The subsequent commands referenced in this
procedure (following closure of the tunnel to the XMS) should be completed by the
customer, able to access XMCLI on the XMS.

 Run the following CLI command:


modify-technician-port-tunnel cluster-id=<Cluster ID>
sc-id=<Storage Controller ID> close

Note: You can verify whether or not a tunnel was opened, by using the following
command:
show-technician-port-tunnels:

xmcli (tech)> show-technician-port-tunnels


Cluster-Name Index Storage-Controller-Name Index Tech-Tunnel-State
xbrick742 1 X1-SC1 1 closed
xbrick742 1 X1-SC2 2 closed
xbrick742 1 X2-SC1 3 closed
xbrick742 1 X2-SC2 4 closed
xmcli (tech)>

Accessing the XMS via a Cluster Storage Controller 15


DELL EMC CONFIDENTIAL
General Information

Option 2 - Accessing the XMS Using PuTTY Port Forwarding (or Similar on Another
SSH client) from the Storage Controller

Note: This procedure specifically refers to the PuTTY SSH client. To execute this procedure
using another SSH client, consult your SSH client documentation to determine if a similar
port forwarding feature is provided, and how to use the feature on your SSH client.

Note: Make sure to access the XMS via a Storage Controller that is healthy.

Note: If you are using the Technician Advisor utility, instead of this procedure, you can
launch an XMCLI session on the XMS. For details, refer to the Technician Advisor Utility
User Guide matching the software version currently installed.

To access the XMS using PuTTY port forwarding to the XMS:


1. Work with the customer to obtain the IP address of the XMS. The show-xms
command can be used, if needed, to find the IP address of the XMS.
2. Connect your laptop to the Storage Controller’s TECH Ethernet(RJ45) port (marked
" 2"). The TECH Ethernet port has a predefined IP address of "169.254.254.1/20".
3. Configure your laptop's network interface with the following values, respectively:
• IP: 169.254.254.2
• Prefix mask: 255.255.240.0
4. Open PuTTY and connect to the Storage Controller using IP 169.254.254.1. and access
the Storage Controller with the xinstall account. (The password for xinstall is supplied
by Dell EMC.)
5. Once connected to the Storage Controller with PuTTY, click the upper left corner, or
right-click the title bar, and select Change Settings.
6. Select Connection > SSH > Tunnels.
7. Set the source port to 80 and the Destination port to <XMS_IP_ADDR:80>, select
"local" and "Auto", click Add.
8. Set the source port to 443 and the Destination port to <XMS_IP_ADDR:443>, select
"local" and "Auto", click Add.
9. Set the source port to 22 and the Destination port to <XMS_IP_ADDR:22>, select
"local" and "Auto", click Add.

16 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
General Information

10. Select Apply.

11. While the PuTTY SSH session to the Storage Controller remains active, perform one of
the following actions:
• To access the XMS using XMCLI - Use PuttY to open another SSH session to IP
address 127.0.0.1, with the username "xmsadmin" (password for xmsadmin
supplied by Dell EMC), followed by appropriate XMCLI user credentials (e.g.
"tech").
• To access the XMS using WebUI - Using a web browser, open a session at URL:
https://127.0.0.1/webui, with appropriate XMCLI user credentials (e.g. "tech").

Checking the XtremIO Cluster Health


Before and after replacing a defective component, check the cluster’s health by using the
XtremIO Health-Check Script (HCS).
Download the latest HCS, available on the Dell EMC XtremIO SolVe generator.

Note: You can access the Dell EMC SolVe Desktop at:
https://solve.emc.com/desktopbinaries/setup.exe

The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s6.0.0.py"
arguments="<cluster name>"


For guidance on running the XtremIO Health-Check Script and on resolving its output, refer
to Dell EMC KB # 206076 (https://support.emc.com/kb/206076). If an unexpected error
is reported by the HCS, submit a standard Service Request to XtremIO Global Technical
Support.

Checking the XtremIO Cluster Health 17


DELL EMC CONFIDENTIAL
General Information

18 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL

CHAPTER 2
Replacing the Servers and Components

This chapter includes the following topics:


 Replacing a Storage Controller ................................................................................ 20
 Replacing Storage Controller DIMMs ....................................................................... 27
 Replacing Storage Controller Power Supply Units .................................................... 31
 Replacing an SFP+ .................................................................................................. 36
 Replacing the XMS .................................................................................................. 45

Replacing the Servers and Components 19


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

Replacing a Storage Controller



Priority Failure Analysis (Priority FA) is required only for XtremIO FRU replacements
involved in an outage (DU/DL).


The Storage Controller replacement procedure should be performed, using the XtremIO
Technician Advisor utility, following a Service Request (SR) determined by XtremIO Global
Technical Support. If you have any questions or encounter problems, contact XtremIO
Global Technical Support.


If RecoverPoint is connected to an XtremIO cluster, notify the customer to pause the
activity of Consistency Groups that are configured to replicate with the cluster, using
RecoverPoint native replication, during this FRU procedure.
If the customer requires assistance to pause in RecoverPoint, contact RecoverPoint Global
Technical Support.
If the customer is unable to perform this operation, do not perform this FRU procedure
and contact XtremIO Global Technical Support before taking any further action.
For further details, provide the customer with Dell EMC KB# 479972
(https://support.emc.com/kb/479972).

Note: Before arriving on the site, make sure that you have the updated Storage Controller
rescue image for the cluster’s version. In addition, ensure that the latest version of
Technician Advisor utility is installed on your laptop.

Note: If the customer has a Disk Retention Agreement with Dell EMC, remove the hard
disks and SSDs from the replaced Storage Controller and give them to the customer. For
instructions, refer to “Removing the Old Storage Controller Disks” on page 137.

Tolerance
 Failure of a single Storage Controller may result in a performance degradation.
 Failure of both Storage Controllers in the same X-Brick results in:
• Loss of service in a multiple X-Brick cluster
• Data loss in a single X-Brick cluster
 Failure of both InfiniBand links and/or both SAS ports in the same Storage Controller
results in a Storage Controller failure.

20 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

Accessing the XMS via a Cluster Storage Controller


Before replacing a defective component, a tunnel must be opened in order to access the
XMS via a Storage Controller, and be closed upon the procedure’s completion (when
access to XMS is no longer required). Once this is done, when handling a replacement
case on site, connect to the TECH port of a cluster’s Storage Controller, and access the
XMS. For instructions, refer to “Accessing the XMS via a Cluster Storage Controller” on
page 13.

Identifying the Defective Storage Controller


Access the XMS via the Storage Controller Tech port to identify a defective Storage
Controller, as described in “Accessing the XMS via a Cluster Storage Controller”.

Note: Make sure to access the XMS via a Storage Controller that is healthy.

To identify the defective Storage Controller, using the CLI:


1. Log in to the XMCLI as tech.
2. List the clusters, using the following command:
show-clusters

xmcli (tech)> show-clusters


Cluster-Name Index State Conn-State Num-of-Vols Num-of-Internal-Volumes Vol-Size ...
xbrick335 1 active connected 18 12 9.712T ...

Note: It is recommended to keep the CLI window in a maximized mode. Minimizing the
window may cause the activation progress bar to be displayed on new lines instead of
the same line.

3. Verify that the Conn-state column shows connected.


4. List the Storage Controllers status, using the following command:
show-storage-controllers cluster-id="<cluster name>"

Note: It is recommended to use the cluster name (and not the cluster ID) as the cluster
identifier in cluster-related XMCLI commands.

Note: The cluster-id parameter is not mandatory for single cluster configurations.

xmcli (tech)> show-storage-controllers cluster-id="xbrick717"


Storage-Controller-Name Index Mgr-Addr IB-Addr-1 IB-Addr-2 Brick-Name Index Cluster-Name Index State Health-State Enabled-State Stop-Reason Conn-State Journal-State
X1-SC1 1 10.82.80.46 169.254.0.1 169.254.0.2 X1 1 xbrick306-309 1 healthy healthy enabled none connected healthy
X1-SC2 2 10.82.80.24 169.254.0.17 169.254.0.18 X1 1 xbrick306-309 1 disconnected healthy system_disabled lost_connectivity_with_node unknown failed

5. Verify which Storage Controller is problematic.


For example: state = Disconnected OR Journal-state = failed

Replacing a Storage Controller 21


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

6. Use Table 1 to record the configuration data of the defective Storage Controller, and
refer to it when you configure the new Storage Controller.

Table 1 Storage Controller Configuration Data

Parameter Value Retrieval Value

Management Interface IP Address

X-Brick Name

Storage Controller Name Run the following command:


show-storage-controllers
Storage Controller Index cluster-id="<cluster name>"

Cluster Name

cluster-id

Network Subnet Mask Information may be retrieved via the Network


Subnet Mask from the peer Storage
Controller of the same X-Brick as the
detective Storage Controller. Information
may also be retrieved from the customer.

Note: Make sure to close the tunnel between the Storage Controller and XMS when access
to XMS is no longer required, as described in “Accessing the XMS via a Cluster Storage
Controller” on page 13.

To identify the defective Storage Controller, using the GUI:


 From the GUI, view the Inventory; the defective Storage Controller appears in orange.

22 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

Confirming the Open Network Ports for Storage Controller Replacement


Storage Controller replacement requires specific network ports to be open between the
XMS and Storage Controllers in the XtremIO cluster. For the list of network ports required
for Storage Controller replacement, refer to the XtremIO Site Preparation Guide.

Network ports are assigned for each Storage Controller, two at a time. For a (partial)
example, refer to Table 2 to determine the range of ports assigned for each Storage
Controller.

Table 2 Required Network Ports for Storage Controller Replacement (Partial Example)

Storage Controller Network Port

X1-SC1 11000-11001, 22000-22001, 23000-23001

X1-SC2 11002-11003, 22002-22003, 23002-23003

X2-SC1 11004-11005, 22004-22005, 23004-23005

X2-SC2 11006-11007, 22006-22007, 23006-23007

.... ....

Note: The network port 11112 is only required if Storage Controllers are using an IPv6
management IP address.

It is necessary to confirm that each required network port from the XMS is open to its
respective Storage Controller.

To confirm the required Storage Controller network ports:


1. Access the XMS as tech.
2. Issue the following command for each Storage Controller and each port in the Storage
Controller's corresponding set of network ports:
test-xms-tcp-connectivity port=<port number> server="<IP of
Storage Controller>"
For example: To confirm that network port 11000 is open from the XMS to X1-SC1, run
the following command:
test-xms-tcp-connectivity port=11000 server="<IP of X1-SC1>"
Connectivity checked successfully

Replacing a Storage Controller 23


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

If a required network port is detected as blocked, discontinue the Storage Controller


replacement procedure. Troubleshoot with the customer to determine why the required
network port is blocked.

Note: For checking the required ports to a defective Storage Controller, use the existing
Storage Controller to verify whether the port is open. However, if the defective Storage
Controller is not responsive, work with the customer to check the required ports for the
peer Storage Controller instead.

Checking the XtremIO Cluster Health


Before replacing the defective Storage Controller, check the cluster’s health by using the
XtremIO Health-Check Script (HCS). For instructions, refer to “Checking the XtremIO Cluster
Health” on page 17.

Note: For Storage Controller replacement procedures, the run-script command


should be in the following format (where "exclude Xn-SCn" refers to the Storage
Controller to be replaced):
xmcli (tech)> run-script
script="system-health-v[x.x]-s[x.x.x].py" arguments="--exclude
Xn-SCn --check-type fru_sc --cluster-id 1"

Closing the Tunnel Between the Storage Controller and the XMS (if Previously Opened)
After checking the cluster’s health using the XtremIO Health-Check Script (HCS), make
sure to close the tunnel between the Storage Controller and XMS (if one previously
opened), as described in “Accessing the XMS via a Cluster Storage Controller” on page 13.

Replacing the Defective Storage Controller Using the Technician Advisor Utility

The Storage Controller replacement procedure should be performed using the XtremIO
Technician Advisor utility following a Service Request (SR), determined by XtremIO Global
Technical Support. If you have any questions or encounter problems, contact XtremIO
Global Technical Support.

24 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

Installing a Compatible Storage Controller


Specific XtremIO clusters are compatible with specific Storage Controllers. Following is the
complete list of compatible Storage Controllers:
 On clusters with the following PSNT P/N:
• PSNT P/N 900-586-002 (10TB X-Brick type)
You can only install a Storage Controller with the following P/Ns:
• 100-586-007-xx
• 100-586-017-xx
 On clusters with one of the following PSNT P/Ns:
• PSNT P/N 900-586-003 (20TB X-Brick type - Encryption Capable)
• PSNT P/N 900-586-004 (10TB X-Brick type - Encryption Capable)
• PSNT P/N 900-586-005 (5TB X-Brick type - Encryption Capable)
You can only install a Storage Controller with the following P/Ns:
• 100-586-017-xx
• 100-586-018-xx
• 100-586-077-xx
 On clusters with the following PSNT P/N:
• PSNT P/N 900-586-006 (40TB X-Brick type - Encryption Capable)
You can only install a Storage Controller with the following P/N:
• 100-586-025-xx
• 100-586-078-xx

Replacing a Storage Controller 25


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

Prior to commencing the replacement procedure, check the XtremIO XIOS version installed
on the cluster, and the replacement Storage Controller’s P/N. If the cluster XIOS version is
4.0.25-22 or earlier, and the replacement Storage Controller is P/N 100-586-077 or P/N
100-586-078 (equipped with a Cobra-F type HDD), do not commence the replacement
procedure! Instead, contact XtremIO Global Technical Support to obtain a compatible
replacement Storage Controller.

Note: For information on essential preparations required for using the XtremIO Technician
Advisor utility at a customer's site prior to your arrival, refer to Appendix G.

Note: For details on the XtremIO Technician Advisor utility, refer to the XtremIO Technician
Advisor Utility User Guide, which is posted in the XtremIO SolVe Generator, under XtremIO
> XtremIO X1 (XIOS 2.x, 3.x, 4.x) > Service Scripts and Utilities > XtremIO Technician Advisor
> Install XtremIO Technician Advisor.

Note: If XtremIO Global Technical Support instructs you to follow the manual configuration
procedures, refer to Appendix D.

26 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

Replacing Storage Controller DIMMs



Priority Failure Analysis (Priority FA) is required only for XtremIO FRU replacements
involved in an outage (DU/DL).


The Storage Controller DIMM replacement procedure should be performed, using the
XtremIO Technician Advisor utility version 2.6.0 (or later 2.x versions), following a Service
Request (SR) determined by XtremIO Global Technical Support. If you have any questions
or encounter problems, contact XtremIO Global Technical Support.
Approval from XtremIO Global Tech Support is required prior to any DIMM Replacement.


If RecoverPoint is connected to an XtremIO cluster, notify the customer to pause the
activity of Consistency Groups that are configured to replicate with the cluster, using
RecoverPoint native replication, during this FRU procedure.
If the customer requires assistance to pause in RecoverPoint, contact RecoverPoint Global
Technical Support.
If the customer is unable to perform this operation, do not perform this FRU procedure
and contact XtremIO Global Technical Support before taking any further action.
For further details, provide the customer with Dell EMC KB# 479972
(https://support.emc.com/kb/479972).

Tolerance
 Significant DIMM issues may cause a Storage Controller to fail.
 Failure of a single Storage Controller may result in a performance degradation.

Accessing the XMS via a Cluster Storage Controller


Before replacing a defective component, a tunnel must be opened in order to access the
XMS via a Storage Controller, and be closed upon the procedure’s completion (when
access to XMS is no longer required). Once this is done, when handling a replacement
case on site, connect to the TECH port of a cluster’s Storage Controller, and access the
XMS. For instructions, refer to “Accessing the XMS via a Cluster Storage Controller” on
page 13.

Determining if DIMM Replacement is Required

Note: A DIMM replacement procedure is only supported when a single Storage Controller
channel requires the replacement, and can only be performed once, per Storage
Controller.

To determine if DIMM replacement is required, using the CLI:


1. Log in to the XMS CLI as tech.

Replacing Storage Controller DIMMs 27


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

2. List the currently outstanding alerts for the cluster, using the following command:
show-alerts
3. Perform a DIMM replacement procedure when the following alert is raised on the
affected Storage Controller:
• node_dimm_level_5_major - (Alert Code 0403305) 2500 DIMM Correctable Errors
When the node_dimm_level_5_major alert is raised, the affected Storage Controller is
disabled, as well as its journaling (alerts node_system_disabled and journal_failed
are also raised on the affected Storage Controller).
Example: node_dimm_level_5_major alert raised

xmcli (tech)> show-alerts


Index Description Severity Raise-Time Entity Name Index Cluster-Name
Index Alert-Type State Alert-Code
20 Memory card (DIMM) health fault. major Mon May 1 00:32:58 2017 StorageController X4-SC2 8 xio8761
1 node_dimm_level_5_major clear_unacknowledged 0403305
39 The cluster has detected a journal fault in this Storage Controller. major Mon May 1 00:24:29 2017 StorageController X4-SC2 8 xio8761
1 journal_failed outstanding 0404702
32 The Storage Controller has been deactivated by the cluster. Reason for deactivation is too_many_dimm_correctable_error major Mon May 1 00:24:18 2017 StorageController X4-SC2 8 xio8761
1 node_system_disabled outstanding 0402603
xmcli (tech)>

4. In case the above alert (0403305) does not appear in the show-alerts, DIMM
replacement may still be appropriate, as determined by XtremIO Global Tech Support.
For further details, refer to Dell EMC KB# 459351
(https://support.emc.com/kb/459351).

Note: DIMM replacement can also be considered if suspicion is raised (from the
customer or from field personnel) that a DIMM replacement procedure is required on a
Storage Controller, which should be performed, following a Service Request (SR)
determined by XtremIO Global Technical Support.

5. Confirm that DIMM FRU has not been previously been performed on this Storage
Controller. For instructions on running a script that displays whether or not the DIMM
replacement for this Storage Controller has occurred already, and the historical CE
count, refer to Dell EMC KB# 459351 (https://support.emc.com/kb/459351).
In exceptional cases, a Storage Controller replacement is required instead of replacing the
DIMM. The Technician Advisor utility should be used to determine this.

To determine if a Storage Controller replacement is applicable using Technician Advisor:


1. Log in to the cluster remotely via remote ESRS connection, using Technician Advisor.

Note: Refer to the Technician Advisor User Guide for directions on how to log into the
cluster remotely, via Technician Advisor.

28 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

2. Commence the Technician Advisor DIMM replacement wizard’s validation steps.

Note: If the Technician Advisor utility fails to validate performing a DIMM replacement,
it is necessary to replace the Storage Controller to resolve the issue, as described in
“Replacing a Storage Controller” on page 20.

3. Upon the DIMM replacement wizard’s completion of the “Cluster Health” phase (prior
to the “Deactivate” phase), cancel the wizard by clicking Exit, located in the wizard
window’s lower-left corner.
4. A DIMM Replacement should be performed only if the Technician Advisor DIMM FRU
wizard validations passed successfully.

Note: The identity of the channel with DIMM errors appears in the “Details” pop-up of the
“Querying IPMI log”, and the “Check Storage Controller status” of the “Query DIMMs”
phase, even when the green check-mark is displayed (adjacent to the Details button).

Checking the XtremIO Cluster Health


Before replacing the defective DIMM, check the cluster’s health by using the XtremIO
Health-Check Script (HCS). For instructions, refer to “Checking the XtremIO Cluster Health”
on page 17.

Note: For DIMM replacement procedures, the run-script command should be in the
following format (where "exclude Xn-SCn" refers to the Storage Controller containing
the DIMM to be replaced):
xmcli (tech)> run-script
script="system-health-v[x.x]-s[x.x.x].py" arguments="--exclude
Xn-SCn --check-type fru_sc --cluster-id 1"

Closing the Tunnel Between the Storage Controller and the XMS (if Previously Opened)
After checking the cluster’s health using the XtremIO Health-Check Script (HCS), make
sure to close the tunnel between the Storage Controller and XMS (if one previously
opened), as described in “Accessing the XMS via a Cluster Storage Controller” on page 13.

Replacing Storage Controller DIMMs 29


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

Replacing the Defective DIMM Using the Technician Advisor Utility



The DIMM replacement procedure should be performed using the XtremIO Technician
Advisor utility version 2.6.0 (or later 2.x versions), following a Service Request (SR),
determined by XtremIO Global Technical Support. If you have any questions or encounter
problems, contact XtremIO Global Technical Support.

Note: For information on essential preparations required for using the XtremIO Technician
Advisor utility at a customer's site prior to your arrival, refer to Appendix G.

Note: For details on the XtremIO Technician Advisor utility version 2.6.0 (or later 2.x
versions), refer to the XtremIO Technician Advisor Utility User Guide, which is posted in the
XtremIO SolVe Generator, under XtremIO > XtremIO X1 (XIOS 2.x, 3.x, 4.x) > Service Scripts
and Utilities > XtremIO Technician Advisor > Install XtremIO Technician Advisor.

Note: If XtremIO Technician Advisor utility cannot be used to replace the defective DIMM,
replace the defective DIMM’s Storage Controller. For instructions, refer to “Replacing a
Storage Controller” on page 20.

30 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

Replacing Storage Controller Power Supply Units



Priority Failure Analysis (Priority FA) is required only for XtremIO FRU replacements
involved in an outage (DU/DL).

Tolerance
 Failure of a single Storage Controller power supply unit does not affect the Storage
Controller operation.
 Failure of both Storage Controller power supply units in the same Storage Controller
results in Storage Controller failure.

Accessing the XMS via a Cluster Storage Controller


Before replacing a defective component, a tunnel must be opened in order to access the
XMS via a Storage Controller, and be closed upon the procedure’s completion (when
access to XMS is no longer required). Once this is done, when handling a replacement
case on site, connect to the TECH port of a cluster’s Storage Controller, and access the
XMS. For instructions, refer to “Accessing the XMS via a Cluster Storage Controller” on
page 13.

Identifying the Defective Storage Controller Power Supply Unit

To identify the defective Storage Controller power supply unit, using the CLI:
1. Log in to the XMCLI as tech.
2. List the Storage Controller power supply unit’s status, using the following command:
show-storage-controllers-psus cluster-id="<cluster name>"

xmcli (tech)> show-storage-controllers-psus cluster-id="xbrickdrm919"


Name Index Serial-Number Location-Index Power-Feed Lifecycle-State Input Location HW-Revision Part-Number Storage-Controller-Name Index Brick-Name Index Cluster-Name Index PSU-HW-Label
X1-SC1-PSU-L 1 E98791D1503016261 1 PWR-B failed on left 05 None X1-SC1 1 X1 1 xbrickdrm919 1 PSU1
X1-SC1-PSU-R 2 E98791D1503016259 2 PWR-A healthy on right 05 None X1-SC1 1 X1 1 xbrickdrm919 1 PSU2
X1-SC2-PSU-L 3 E98791D1513057133 1 PWR-B healthy on left 05 None X1-SC2 2 X1 1 xbrickdrm919 1 PSU1
X1-SC2-PSU-R 4 E98791D1513057136 2 PWR-A healthy on right 05 None X1-SC2 2 X1 1 xbrickdrm919 1 PSU2

3. Note the Index of Storage Controller power supply units with a non-healthy
Lifecycle-State.

Replacing Storage Controller Power Supply Units 31


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

Table 3 describes the possible failed Storage Controller power supply unit states:

Table 3 Non-Healthy Storage Controller Power Supply Unit States

State Description

failed The system diagnoses a failed Storage Controller


power supply unit.

disconnected The system diagnoses that no Storage Controller


power supply unit is physically present.
Note: The State column shows disconnected until
the “replace” command is invoked following the
connection of a new Storage Controller power supply
unit with a different serial number (than that of the
previous component), in the same location.

4. Note the Index and Serial Number of Storage Controller power supply units
showing a non-healthy state.

To identify the defective Storage Controller power supply unit, using the GUI:
 From the GUI, view the Inventory; the defective Storage Controller power supply unit
appears in orange.

Checking the XtremIO Cluster Health


Before replacing the defective component, check the cluster’s health by using the XtremIO
Health-Check Script (HCS). For instructions, refer to “Checking the XtremIO Cluster Health”
on page 17.

Disabling All Notifiers

To disable all Notifiers:


1. Log in to the XMCLI as tech.
2. Disable all Notifiers, using the following command:
disable-notifiers cluster-id="<cluster name>".

xmcli (tech)> disable-notifiers cluster-id="xbrick711-714"


Event notifiers were disabled

Closing the Tunnel Between the Storage Controller and the XMS (if Previously Opened)
After checking the cluster’s health using the XtremIO Health-Check Script (HCS), make
sure to close the tunnel between the Storage Controller and XMS (if one previously
opened), as described in “Accessing the XMS via a Cluster Storage Controller” on page 13.

32 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

Replacing the Defective Storage Controller Power Supply Unit


A defective Storage Controller power supply unit is indicated with an amber LED.

To remove the defective Storage Controller power supply unit:


1. Tilt the cable tray of the cable management bracket downwards, by simultaneously
pulling the latches on the left and right, and then pushing the tray downwards.

Note: If there are two Storage Controllers adjacent to each other, first tilt the cable
management bracket's tray furthest from the component being replaced and then tilt
the tray of the other Storage Controller.

2. Disconnect the power cable from the defective Storage Controller power supply unit.
To revoke cable retention, release the power cord latch. The cables should remain
fastened by the cable strap in the cable management bracket.
3. To remove the Storage Controller power supply unit, push the green lever and then pull
on the handle.

Note: If the defective Storage Controller power supply unit should be sent to Dell EMC for
Failure Analysis (FA), refer to Appendix C for the procedure details.

Replacing Storage Controller Power Supply Units 33


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

To install the new Storage Controller power supply unit:


1. Insert the new Storage Controller power supply unit.

2. Connect the power cable to the new Storage Controller power supply unit. To resume
cable retention, fasten the power cord latch.
3. Lift the cable tray of the cable management bracket, while pulling the latches (on the
left and right sides of the bracket) until the latches click in.

Note: Make sure that the latches are engaged and the tray is locked in position.

Note: If there are two Storage Controllers adjacent to each other, first return the cable
management bracket's tray nearest to the component being replaced, to its original
position, and then return the second tray.

Configuring the Replaced Storage Controller Power Supply Unit


Upon replacing the Storage Controller power supply unit, the following alert is issued:

xmcli (tech)> show-alerts


Index Description Severity Raise-Time Entity Name Index Cluster-Name Index Alert-Type State Alert-Code
25 Storage Controller's PSU is disconnected from the cluster. major Thu Nov 17 17:03:40 2016 StorageControllerPSU X3-SC2-PSU-R 12 xbrick711-716 1 nodepsu_fru_disconnected outstanding 0600304
30 The cluster has detected that a new Storage Controller PSU has been added. minor Thu Nov 17 17:06:13 2016 StorageController X3-SC2 6 xbrick711-716 1 alert_def_node_discover_scpsu_true outstanding
0405402

To configure the replaced Storage Controller power supply unit:


1. Log in to the XMS CLI as tech.
2. Confirm that the new Storage Controller power supply unit is available, using the
following command:
show-storage-controllers-psus cluster-id="<cluster name>"

34 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

3. Run the following command:


replace-storage-controller-psu sc-psu-id=<ID>
cluster-id="<cluster name>"
where id is the Index of the defective Storage Controller power supply unit.

xmcli (tech)> replace-storage-controller-psu sc-psu-id=2 cluster-id="xbrick717"


psu X1-SC1-psu2 [2] replacement initiated

To verify that the new Storage Controller power supply unit is healthy, using the CLI:
1. Log in to the XMS CLI as tech.
2. Wait several seconds, then run the following command:
show-storage-controllers-psus cluster-id="<cluster name>"

xmcli (tech)> show-storage-controllers-psus cluster-id=xbrick779-780


Name Index Serial-Number Location-Index Power-Feed Lifecycle-State Input Location HW-Revision Part-Number Storage-Controller-Name Index Brick-Name Index Cluster-Name Index
X1-SC1-PSU-L 1 E98791D1609026024 1 PWR-B healthy on left 05 None X1-SC1 1 X1 1 xbrick779-780 1
X1-SC1-PSU-R 2 E98791D1609025813 2 PWR-A healthy on right 05 None X1-SC1 1 X1 1 xbrick779-780 1
X1-SC2-PSU-L 3 E98791D1609025906 1 PWR-B healthy on left 05 None X1-SC2 2 X1 1 xbrick779-780 1
X1-SC2-PSU-R 4 E98791D1609025958 2 PWR-A healthy on right 05 None X1-SC2 2 X1 1 xbrick779-780 1
X2-SC1-PSU-L 5 E98791D1519079002 1 PWR-B healthy on left 05 None X2-SC1 3 X2 2 xbrick779-780 1
X2-SC1-PSU-R 6 E98791D1609025924 2 PWR-A healthy on right 05 None X2-SC1 3 X2 2 xbrick779-780 1
X2-SC2-PSU-L 7 E98791D1609026041 1 PWR-B healthy on left 05 None X2-SC2 4 X2 2 xbrick779-780 1
X2-SC2-PSU-R 8 E98791D1609025937 2 PWR-A healthy on right 05 None X2-SC2 4 X2 2 xbrick779-780 1

3. If the State is not healthy, inspect the Storage Controller power supply unit.

To verify that the new Storage Controller power supply is healthy, using the GUI:
1. Hover the mouse pointer over the new Storage Controller power supply unit; a ToolTip
appears, showing the power supply status.
2. Verify that the State is Healthy.

Performing the Post Replacement Procedures


After configuring the replaced component, it is necessary to perform the following post
replacement procedures:
 Checking for and clearing any active repeating alerts
 Generating and uploading a log bundle
 Checking the cluster’s health by running the XtremIO Health Check Script (HCS)
 Restoring all Notifiers
 Closing the tunnel between a Storage Controller and the XMS
For instructions on performing post configuration procedures, refer to Appendix F.

Replacing Storage Controller Power Supply Units 35


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

Replacing an SFP+

Priority Failure Analysis (Priority FA) is required only for XtremIO FRU replacements
involved in an outage (DU/DL).


The SFP+ replacement procedure should be performed following a Service Request (SR)
determined by XtremIO Global Technical Support.

Tolerance
 Failure of an SFP+ may result in performance degradation.

Accessing the XMS via a Cluster Storage Controller


Before replacing a defective component, a tunnel must be opened in order to access the
XMS via a Storage Controller, and be closed upon the procedure’s completion (when
access to XMS is no longer required). Once this is done, when handling a replacement
case on site, connect to the TECH port of a cluster’s Storage Controller, and access the
XMS. For instructions, refer to “Accessing the XMS via a Cluster Storage Controller” on
page 13.

Procedure Prerequisite
Make sure to perform the following instruction prior to replacing an SFP+.

Note: XtremIO Global Tech Support should confirm this procedure prerequisite with the
respective Dell EMC network connectivity teams and with the customer.

For suspected SFP+ errors and/or iSCSI/Fibre Channel “connection to XtremIO cluster”
errors, arrange for the Connectivity team to confer with the customer in order to confirm
that iSCSI and Fibre Channel environment(s) to the XtremIO Storage Controller iSCSI or
Fibre Channel ports are validated. This includes confirming the network or Fibre Channel
switches, switch ports, network patch panels, cables and cable reseating (at both ends).


An SFP+ replacement procedure must only be performed after all other network
components and configurations have been verified. If not, replacing an SFP+ may not
resolve the issue.

36 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

Identifying the Defective SFP+

To identify the defective SFP+, using the CLI:


1. Log in to the XMS CLI as tech.
2. Run the following command:
show-targets cluster-id="<cluster name>"

xmcli (tech)> show-targets


Name Index Cluster-Name Index Port-Type Port-Address Mac-Addr Port-Speed Port-State .. Storage-Controller-Name Index .. .. Relative-Id Target-Port-HW-Label
X1-SC1-target1 1 xbrick736 1 iscsi iqn.2008-05.com.xtremio:xio00164507136-514f0c50df07a001 00:90:fa:c4:a1:cb unknown down .. X1-SC1 1 .. .. 5 Port1
X1-SC1-target2 2 xbrick736 1 iscsi iqn.2008-05.com.xtremio:xio00164507136-514f0c50df07a000 00:90:fa:c4:a1:c9 unknown down .. X1-SC1 1 .. .. 6 Port2
X1-SC1-target3 3 xbrick736 1 fc 51:4f:0c:50:df:07:a0:01 unknown down .. X1-SC1 1 .. .. 1 Port3
X1-SC1-target4 4 xbrick736 1 fc 51:4f:0c:50:df:07:a0:00 8GFC up .. X1-SC1 1 .. .. 2 Port4
X1-SC2-target1 5 xbrick736 1 iscsi iqn.2008-05.com.xtremio:xio00164507136-514f0c50df07a005 00:90:fa:c4:9b:69 1Gb down .. X1-SC2 2 .. .. 15 Port1
X1-SC2-target2 6 xbrick736 1 iscsi iqn.2008-05.com.xtremio:xio00164507136-514f0c50df07a004 00:90:fa:c4:9b:67 10Gb down .. X1-SC2 2 .. .. 16 Port2
X1-SC2-target3 7 xbrick736 1 fc 51:4f:0c:50:df:07:a0:05 4GFC up .. X1-SC2 2 .. .. 11 Port3
X1-SC2-target4 8 xbrick736 1 fc 51:4f:0c:50:df:07:a0:04 16GFC up .. X1-SC2 2 .. .. 12 Port4

3. Note the Name, Index, Cluster-Name, Port-Type, Target-Port-HW-Label,


and Storage-Controller-Name of defective FC or iSCSI targets, with one of the
following scenarios:
• Port-Speed as unknown and Port-State as down
• For FC target - Port-Speed lower than 8Gb* FC and Port-State as up
• For iSCSI target - Port-Speed as 10Gb** and Port-State as up

Note: Identify the Storage-Controller-Name for each target by either the Name
value, or by running the following command:
show-targets prop-list=["Storage-Controller-Name"]

Note: In the example provided, following this step, a subset of SFP+s on the cluster is
detected as potentially defective. However, other SFP+s on the cluster can also be
defective. Complete the remaining steps in this procedure to determine thoroughly
which of the cluster’s SFP+s are defective.

* Assuming the FC network supports 8GFC and was tested as noted in the prerequisites, prior to
starting this procedure.
** Assuming the iSCSI network supports 10Gb and was tested as noted in the prerequisites, prior to
starting this procedure.

Replacing an SFP+ 37
DELL EMC CONFIDENTIAL
Replacing the Servers and Components

4. Run the following command to identify the defective FC SFP+s:


show-targets-fc-error-counters cluster-id="<cluster name>"
xmcli (tech)> show-targets-fc-error-counters
Name Index Cluster-Name Index Dumped-Frames Sync-Loss Signal-Loss Invalid-Crc Link-Failure Prim-Seq-Err
X1-SC1-fc1 1 xbrick736 2 0 0 0 0 1 0
X1-SC1-fc2 2 xbrick736 2 0 468 34 78 16 0
X1-SC2-fc1 5 xbrick736 2 0 1 0 0 0 0
X1-SC2-fc2 6 xbrick736 2 0 0 0 0 2 0

Note: If necessary, rerun the show-targets-fc-error-counters command


(once or twice, as needed) to confirm that the defective FC targets’ error counters
actually increase.

5. From the show-targets-fc-error-counters command output, note the


Index values of FC targets for which the Sync-Loss, Signal-Loss, Invalid-Crc
and Lync-Failure are far greater, or have increased, compared to the other
(non-defective) FC targets.
The SFP+s connected to FC targets noted in either step 3 or step 5 are potentially
defective, and should be replaced.
6. Run the following command to identify defective iSCSI SFP+s:
show-targets-iscsi-counters cluster-id="<cluster name>"

xmcli (tech)> show-targets-iscsi-counters


Name Index Cluster-Name Index Port-Address Num-PKTS-Rx Total-KB-Rx Num-PKTS-Tx Total-KB-Tx Num-Crc-Err Num-NO-Buff-Err Num-Tx-Err
X1-SC1-iscsi1 3 xbrick736 1 iqn.2008-05.com.xtremio:xio00162306680-514f0c50d0c7e000 4361442 256388 2 0 516 121 56
X1-SC1-iscsi2 4 xbrick736 1 iqn.2008-05.com.xtremio:xio00162306680-514f0c50d0c7e001 4410131 262370 2 0 501 112 28
X1-SC2-iscsi1 7 xbrick736 1 iqn.2008-05.com.xtremio:xio00162306680-514f0c50d0c7e004 12111048 710477 52 4 0 1 0
X1-SC2-iscsi2 8 xbrick736 1 iqn.2008-05.com.xtremio:xio00162306680-514f0c50d0c7e005 12224042 722502 6107 252 1 0 1

Note: If necessary, rerun the show-targets-iscsi-counters command (once


or twice, as needed) to confirm that the defective iSCSI targets’ error counters actually
increase.

7. From the show-targets-iscsi-counters command output, note the Index


values of iSCSI targets for which the Num-Crc-Err, Num-NO-Buff-Err, and
Num-Tx-Err are far greater, or have increased, compared to the other
(non-defective) iSCSI targets.
The SFP+s connected to iSCSI targets noted in either step 3 or step 7 are potentially
defective and should be replaced.
8. For each defective SFP+ to be replaced, note the following details:
• Name
• Index
• Cluster-Name
• Port-Type (FC or iSCSI)
• Target-Port-HW-Label
• Storage-Controller-Name

38 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

Checking the Defective SFP+ Using an SFP+ Loopback Tool


The following steps should be performed using an SFP loopback tool, to further check the
SFP state of each SFP identified as defective.

Note: If an SFP+ loopback tool is not available, skip this section and proceed with the rest
of the SFP replacement procedure.

To check a defective SFP+ using SFP+ Loopback tool:


1. Disconnect the original SFP+ cable and connect the SFP+ loopback tool to the SFP+.
2. Perform the steps listed in “Identifying the Defective SFP+” on page 37 to check
whether the SFP+ is defective, even when disconnected to customer network.
3. If the SFP+ still shows as defective, insert the original SFP+ cable, and proceed to
“Replacing a Defective SFP+” on page 40 to replace the defective SFP+.
4. However, if the SFP+ is shown as healthy once it is connected to the SFP+ loopback
tool, connect the original SFP+ cable, and escalate the case to XtremIO Global Tech
Support, as the issue appears to be related to a network connectivity issue. For details
on next steps in this case refer to “Procedure Prerequisite” on page 36.

Checking the XtremIO Cluster Health


Before replacing the defective component, check the cluster’s health by using the XtremIO
Health-Check Script (HCS). For instructions, refer to “Checking the XtremIO Cluster Health”
on page 17.

Disabling All Notifiers

To disable all Notifiers:


1. Log in to the XMCLI as tech.
2. Disable all Notifiers, using the following command:
disable-notifiers cluster-id="<cluster name>".

xmcli (tech)> disable-notifiers cluster-id="xbrick711-714"


Event notifiers were disabled

Replacing an SFP+ 39
DELL EMC CONFIDENTIAL
Replacing the Servers and Components

Replacing a Defective SFP+

To remove a defective SFP+:


1. Log in to the XMS CLI as tech.
2. To physically locate the Storage Controller with the defective SFP+ (using LEDs), enter
the control-led CLI command using the defective SFP+’s noted Cluster-Name and
Storage-Controller-Name.
For example, you can use the following command:
control-led cluster-id="Cluster_One"
entity="StorageController" led-mode="on"
object-id-list=["X1-SC1"]

Note: For further details on using LEDs to identify components, refer to Appendix B.

3. Using the noted details of the defective SFP+ (Name, Index, Port-Type, and
Target-Port-HW-Label), physically locate the SFP+ on the Storage Controller located
following step 2 of “Identifying the Defective SFP+”. For details, refer to the
Connecting the Cluster to Host section of the XtremIO Hardware Installation and
Upgrade Guide.
4. From the rear of the Storage Controller, unplug the (iSCSI or Fibre Channel) cable
connected to a defective SFP+.

40 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

5. Raise the SFP+ bail.

Note: Use an orderable SFP+ extraction tool to raise the SFP+ bail. If an SFP+ extraction
tool is not available, carefully use a flat-headed screwdriver to lift the SFP+ bail.

6. Grasp the bail and slide the SFP+ out from the Storage Controller.

Note: The defective SFP+ should be sent to Dell EMC for Failure Analysis (FA) if possible.
Refer to Appendix C.

Replacing an SFP+ 41
DELL EMC CONFIDENTIAL
Replacing the Servers and Components

To install the new SFP+:


1. Verify the S/N of the new SFP+.

Note: For details on the required replacement SFP+ with XtremIO, refer to the XtremIO
Part Number List on XtremIO SolVe (Solve Desktop > XtremIO Generator > XtremIO X1
(XIOS 2.x, 3.x, 4.x) > FRU Replacement Procedures > XtremIO FRU Part Number List).

2. Make sure that the mating connector of the new SFP+ is free of dirt and/or obstacles.
3. Align the new SFP+ with the guides in the slot, and insert the SFP+ by sliding it into the
slot until slight resistance is felt.

42 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

4. Reconnect the (iSCSI or Fibre Channel) cable that was disconnected.


Wait for 15 minutes before verifying that the replacement was successful.

5. Run the following command to verify the SFP+ replacement was successful:
show-targets cluster-id="<cluster name>"
6. On the show-target output, locate the information for the replaced SFP+(s), using
the Name and Index of the replaced defective SFP+.
7. Verify a successful FC SFP+ replacement, as follows:
a. Run the following command to verify that the Port-Speed is 8GFC and that the
Port-State is up:
show-targets-fc-error-counters cluster-id="<cluster name>"
b. In the show-targets-fc-error-counters output, locate the corresponding
FC target, using the Index of the replaced FC SFP+.
c. Verify that for this FC target, the Sync-Loss and Lync-Failure column values
no longer increase.

Note: If necessary, run the show-targets-fc-error-counters command again to


confirm that the two error counters (Sync-Loss and Lync-Failure) no longer increase.

Note: If performing either step a or step c of this procedure was unsuccessful,


discontinue the SFP+ replacement procedure, and proceed by replacing the
affected Storage Controller.

Replacing an SFP+ 43
DELL EMC CONFIDENTIAL
Replacing the Servers and Components

8. Verify a successful iSCSI SFP+ replacement, as follows:


a. Run the following command to verify that the Port-Speed is 10Gb and that the
Port-State is up:
show-targets-iscsi-counters cluster-id="<cluster name>"
b. In the show-targets-iscsi-counters output, locate the corresponding
iSCSI target, using the Index of the replaced iSCSI SFP+.
c. Verify that for this iSCSI target, the Num-Crc-Err, Num-NO-Buff-Err, and Num-Tx-Err
column values no longer increase.

Note: If necessary, run the show-targets-iscsi-counters command again to confirm


that the three error counters (Num-Crc-Err, Num-NO-Buff-Err, and Num-Tx-Err)no
longer increase.

Note: If performing either step a or step c of this procedure was unsuccessful,


discontinue the SFP+ replacement procedure, and proceed by replacing the
affected Storage Controller.

Performing the Post Replacement Procedures


After configuring the replaced component, it is necessary to perform the following post
replacement procedures:
 Checking for and clearing any active repeating alerts
 Generating and uploading a log bundle
 Checking the cluster’s health by running the XtremIO Health Check Script (HCS)
 Restoring all Notifiers
 Closing the tunnel between a Storage Controller and the XMS
For instructions on performing post configuration procedures, refer to Appendix F.

44 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

Replacing the XMS



Priority Failure Analysis (Priority FA) is required only for XtremIO FRU replacements
involved in an outage (DU/DL).

Tolerance
 This procedure erases all historical performance and event data previously stored in
the XMS.
 Failure of an XMS prevents cluster management.

Note: Failure of an XMS does not have an impact on cluster I/O operations.

Accessing the XMS via a Cluster Storage Controller

Note: If the affected XMS is hard down, the procedure described in this section cannot be
performed and should be skipped.

Before replacing a defective component, a tunnel must be opened in order to access the
XMS via a Storage Controller, and be closed upon the procedure’s completion (when
access to XMS is no longer required). Once this is done, when handling a replacement
case on site, connect to the TECH port of a cluster’s Storage Controller, and access the
XMS. For instructions, refer to “Accessing the XMS via a Cluster Storage Controller” on
page 13.

Identifying the Defective XMS

Note: If the affected XMS is hard down, the information required for the affected XMS
cannot be retrieved using the Install Menu and XMCLI. In such case, consult with the
customer or a current or recent cluster log file bundle in order to collect the required
information.

Inability to connect to cluster management (after ruling out network problems) indicates
that the XMS is not healthy.

Replacing the XMS 45


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

Use Table 4 to record the configuration data of the defective XMS, and refer to it when you
configure the new XMS.

Table 4 XMS Configuration Data

Parameter Value Retrieval Value

Management Interface IP Address


Log in as xinstall to the XMS and select
XMS Host Name Display configuration from the xinstall menu.
Primary DNS Server Name

Secondary DNS Server Name

Network Prefix Mask

Default Gateway

Before replacing the affected XMS, check with the customer to determine if the affected
XMS was part of a Native Replication environment. In addition, you can use the
show-remote-protection-peer-xms XMCLI command to assist in determining
this. For further details, refer to the XtremIO Storage Array User Guide of the version
running on the affected XMS.

Table 5 XMS Native Replication Configuration

Parameter Value Retrieval Value

remote-ip-addr
Issue the
remote-xms-alias-name show-remote-protection-peer-
xms XMCLI command.
remote-xms-user

remote-user-password

Checking the XtremIO Cluster Health

Note: In some cases, it may not be possible to execute the XtremIO Health Check Script, as
the XMS may be down or not functional.

Before replacing the defective XMS, check the cluster’s health by using the XtremIO
Health-Check Script (HCS). For instructions, refer to “Checking the XtremIO Cluster Health”
on page 17.

Replacing the Physical XMS


This section describes the steps to replace a defective physical XMS.

Note: For replacing a defective virtual XMS, refer to “Replacing a Virtual XMS” on page 51.

46 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

Replacing the Defective Physical XMS

To remove the defective physical XMS:


1. Disconnect all cables from the back of the XMS.

Note: Make sure that all cables are clearly labeled before disconnecting them from the
XMS.

2. Remove the bezel that covers the front of the server by simultaneously pressing the
tabs on both sides of the bezel to release it from its latches, then pull the bezel off the
component.

Note: The picture is for illustration purposes only.

3. Remove the stabilizing screw behind the latch bracket on each side.

Replacing the XMS 47


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

4. Pull the server forward until is locks in place, then, slide the blue disconnect tabs
forward to release the inner rails from the slide rails.

5. Remove each inner rail as follows:


a. On the middle of the inner rail, push in and hold the metal latch.
b. Push the rail forward to release the connection studs from the small end of the rail
notches.
c. When the connections studs are in the large end of the rail notches, release the
metal latch.
d. Pull the inner rails away from the server.

Note: If the defective XMS should be sent to Dell EMC for Failure Analysis (FA), refer to
Appendix C for the procedure details.

Note: For more detailed instructions on installing the physical XMS, refer to the XtremIO
Storage Array Hardware Installation and Upgrade Guide.

To install the new physical XMS:


1. Attach an inner rail to each side of the server, as follows:
a. Align the large end of the rail notches on the inner rail with the connection studs on
the side of the server.
b. Push the flat side of the inner rail onto the connection studs.
c. Slide the inner rail backwards along the server, until the studs fit securely into the
small end of the rail notches.
An audible click indicates that the rail is secure.
2. From the front of the rack, align the inner rails that are attached to the server with the
channels on the inside of the slide rails.
3. Slide the server into the slide rails and push the server into the rack.
An audible click indicates that the slide rails are engaged and locked.

48 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

4. On the outside of each rail assembly, slide the blue disconnect tab forward to unlock
the server, and push the server completely into the rack.

5. Insert and tighten a small stabilizer screw directly behind each bezel latch.
6. Connect the two power cables to the XMS.
7. Connect the network cable to the MGMT1 Ethernet port (marked "1") on the physical
XMS.
8. Press the Power button to power on the XMS.
9. Reinstall the XMS bezel.

Configuring a Replaced Physical XMS


The flow for configuring a replaced physical XMS includes:
 Deploying the XMS server and reimaging it using XtremIO XMS Rescue Image
 Configuring the XMS (Network connectivity, XMS Server DNS Name, Network interface
information)
 Updating the XMS server to the required software version
 Recovering the XMS via the xinstall Install menu or via the XMCLI

Configuring the Replaced Physical XMS

To configure the replaced physical XMS:


1. Connect to the XMS via the TECH Ethernet port (marked " 2") on the console.

Note: For the detailed procedure, refer to XtremIO Storage Array Software Installation
and Upgrade Guide.

Note: If the TECH Ethernet port connection fails, or the OS fails to load, reinstall the
physical XMS with the appropriate XtremIO XMS Rescue Image. Refer to “Re-Installing
a Physical XMS” on page 114 for details.

Replacing the XMS 49


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

2. Log in as xinstall, to display the Install menu.


3. From the Install menu, select Configure XMS.

Note: For the detailed procedure, refer to XtremIO Storage Array Software Installation
and Upgrade Guide.

Provide the following parameters:


• XMS Server DNS Name
• Network interface information (IP Address, Prefix, and GW)
4. Access the support page for XtremIO to acquire the XtremIO software package that
matches the highest XtremApp version of all clusters managed by this XMS.

Note: If the package is not on the Support page for XtremIO, contact the XtremIO
Global Technical Support.

Note: When downloading a software package, access the Dell EMC Support page and
verify that the MD5/SHA-256 checksum of the downloaded package matches the MD5
or SHA-256 checksum that appears on the support page for that package.

5. Upload the software image to /images. Use the sftp command via command line or
an SFTP client (e.g. Filezila, WinSCP) to log in as the xmsupload user and transfer the
package downloaded on your computer to the XMS.
6. When the file transfer is complete, close the SFTP client or exit the SFTP session (if one
was created by executing the sftp command from the command line).
7. Re-open SSH connectivity to the XMS by running the ssh command via command line
or the SSH client.
8. From the Install menu, select Install XMS only.

XtremIO install interface

xbrickTMP

Install Menu
-------------------------------------
1. Configure XMS
2. Check XMS Configuration
3. Display XMS Information
4. Install XMS only
5. Install Storage Controllers
6. ESRS Menu
7. Recover XMS
8. Power Menu
9. Collect Log Bundle
10. IPMI Access Menu
11. Disable Remote Shell
12. Restricted Shell
13. Installation Package Pre-loaded on Storage Controller Menu
99. Exit Install Menu
>
>4

50 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

9. From the Install XMS sub-menu, select Installation using image filename.

xbrickTMP

Install XMS sub-menu


-------------------------------------
1. Installation using image filename
2. Installation using package pre-loaded on a Storage Controller
99. Exit sub-menu
>
>1

10. Enter the image file name from the available packages listed.

XtremIO install interface


Available packages:
upgrade-to-6.0.0-21.1_for_shay_4a6f4092_rnd_X2.tar
upgrade-to-6.0.0-22_X2.tar
upgrade-to-6.0.0-28_X2.tar
upgrade-to-6.0.0-30_X2.tar
Please enter installation image filename
upgrade-to-6.0.0-30_X2.tar
Warning: You are about to reinstall an installed XMS.
Reinstalling will delete all performance data and events history.
Are you sure you want to continue with reinstallation? (yes/no)

11. Verify that it is the correct package to install, and select Yes.
12. Proceed to “Recovering the XMS” on page 55.

Replacing a Virtual XMS


This section describes the steps to replace a defective virtual XMS.

Note: For replacing a defective physical XMS, refer to “Replacing the Physical XMS” on
page 46.

Flow for Configuring a Replaced Virtual XMS


The flow for configuring a replaced Virtual XMS includes:
 Deleting the defective Virtual XMS VM

Note: Request the customer to delete the defective Virtual XMS VM from their vSphere
virtual infrastructure.

 Deploying a new Virtual XMS VM, using the XtremIO XMS OVA package
 Configuring the XMS (Network connectivity, XMS Server DNS Name, Network interface
information)
 Updating the XMS server to the required software version
 Recovering the XMS via the xinstall Install menu or via the XMCLI

Replacing the XMS 51


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

Deploying and Configuring a Replaced Virtual XMS

To deploy a replaced virtual XMS:


1. Deploy a new XMS VM using the XtremIO XMS OVA package.

Note: For detailed instructions, refer to XtremIO Storage Array Software Installation
and Upgrade Guide.

2. Power on the XMS VM.


3. Connect to the XMS via the VMware console.

Note: For the detailed procedure, refer to XtremIO Storage Array Software Installation
and Upgrade Guide.

4. Log in as xinstall, to display the Install menu.


5. Configure the XMS.

Note: For the detailed procedure, refer to XtremIO Storage Array Software Installation
and Upgrade Guide.

Provide the following parameters:


• XMS Server DNS Name
• Network interface information (IP, Mask, GW)
6. Access the Dell EMC Support page for XtremIO to acquire the XtremIO software
package that matches the version currently installed on the server.

Note: If the package is not on the Support page for XtremIO, contact XtremIO Global
Technical Support.

Note: When downloading a software package, access the Dell EMC Support page and
verify that the MD5/SHA-256 checksum of the downloaded package matches the MD5
or SHA-256 checksum that appears on the support page for that package.

7. Upload the software image to /images. Use the sftp command via command line or
an SFTP client (e.g. Filezila, WinSCP) to log in as the xmsupload user and transfer the
package downloaded on your computer to the XMS.
8. When the file transfer is complete, close the SFTP client or exit the SFTP session (if one
was created by executing the sftp command from the command line).
9. Re-open SSH connectivity to the XMS by running the ssh command via command line
or the SSH client.

52 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

10. Make sure that the software image is of the same version as that used by the running
cluster.

XtremIO install interface


Checking XMS health
XMS health check passed
Install Menu
-------------------------------------
1. Configure XMS
2. Check XMS Configuration
3. Display XMS Information
4. Install XMS only
5. Install Storage Controllers
6. ESRS Menu
7. Recover XMS
8. Power Menu
9. Collect Log Bundle
10. IPMI Access Menu
11. Disable Remote Shell
12. Restricted Shell
13. Installation Package Pre-loaded on Storage Controller Menu
99. Exit Install Menu
>
> 1

Replacing the XMS 53


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

11. From the Install menu, select Perform XMS install only. Enter the image file name that
was used in the previous step as input.

XtremIO install interface


Checking XMS health
XMS health check passed

Install Menu
-------------------------------------
1. Configure XMS
2. Check XMS Configuration
3. Display XMS Information
4. Install XMS only
5. Install Storage Controllers
6. ESRS Menu
7. Recover XMS
8. Power Menu
9. Collect Log Bundle
10. IPMI Access Menu
11. Disable Remote Shell
12. Restricted Shell
13. Installation Package Pre-loaded on Storage Controller Menu
99. Exit Install Menu
>
>4

xms-xbrick277

Install XMS sub-menu


-------------------------------------
1. Installation using image filename
2. Installation using package pre-loaded on a Storage Controller
99. Exit sub-menu
>
>1
XtremIO install interface
Available packages:
upgrade-to-6.0.0-41_X2.tar
Please enter installation image filename
upgrade-to-6.0.0-41_X2.tar
Skip delegate False
Delegating to first_install from the package file

Skip delegate True


Installing XMS
Reformatting XMS
Checking XMS installation...
Installation ended successfully

Logging off

12. Proceed to “Recovering the XMS” on page 55.

54 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

Recovering the XMS


This section describes the following two options available for recovering the XMS:
 Recovering the XMS Using the Install Menu
 Recovering the XMS Using the XMCLI
In general, recovering the XMS should be done using the install menu.
Considerations when recovering the XMS using the install menu include:
 This method enables to keep the old GUID on the affected XMS and to not generate a
new GUID for it. Therefore, this method must be used when the XMS is part of a Native
Replication or Recover Point replication configuration.
 This method should also be used when looking to use the "dry-run" functionality,
which is used for checking the technical probability of an XMS recovery in certain
cases, without impacting an existing environment.
 Recovering the XMS using the install menu should definitely be done when
xinstall is the only available cluster access and there is no access to XMCLI.
Alternatively, if required, recovering the XMS can also be done using XMCLI. Consider the
following before choosing this alternative:
 CLI tech access is required (XMS daemon should be active and responsive).
 Multi-cluster recovery is required when applying multiple Storage Controller manager
hosts in sc-mgr-hosts parameter.
 The "dry-run" functionality is not required.
 This method ensures that a new GUID will always be generated for the affected XMS.
Therefore, this method should not be used when the affected XMS is part of a Native
Replication or Recover Point replication configuration.

Replacing the XMS 55


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

Recovering the XMS Using the Install Menu


When recovering an XMS, the XMS’s database is dropped and restored, using the image
from the managing Storage Controller.
If the recovery process fails, the XMS becomes unresponsive. Performing a dry-run before
the actual recovery process enables you to verify the level of the environment's
recoverability, thereby preventing an unsuccessful recovery when an XMS and/or cluster
may be the cause of the problem impacting successful recovery.

To perform a dry-run before recovering the XMS:


1. Log in to the XMS, using the xinstall user credentials.
2. In the Install menu, type the number for the Recover XMS option.
Last login: Sun May 29 18:09:58 2016 from 10.76.51.35
XtremIO install interface
Checking XMS health
XMS health check passed

Install menu
1. Configure XMS
2. Check XMS configuration
3. Display XMS Information
4. Install XMS only
5. Install Storage Controllers
6. ESRS Menu
7. Recover XMS
8. Power Menu
9. Collect Log Bundle
10. IPMI Access Menu
11. Disable Remote Shell
12. Restricted Shell
13. Installation Package Pre-loaded on Storage Controller Menu
99. Exit Install Menu

> 7

3. Provide the IP address (or Host Name) of the System Manager Storage Controller and
append the --keep-guid --dry-run flags.

Note: The --keep-guid parameter must be applied when the affected XMS is part of
a Native Replication or Recover Point replication configuration.

Note: In a multi-cluster environment, type a list of IP addresses (or Host Names) of the
managing Storage Controllers in all clusters that are connected to the affected XMS.

Note: Make sure to type a single space character between each pair of IP addresses (or
Host Names).

Please enter IP Address or Host Name of the System Manager Storage


Controller (ususally the first):
> 10.55.120.49 10.55.120.37 --keep-guid --dry-run

56 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

4. Select whether you want to restore the original database in case of failure.

Please enter selection for restoring the original DB in case of


failure (yes/no):
> no

Note: Respond `no` to this question if replacing a defective/unhealthy XMS.


Otherwise, respond `yes` if recovering a working/healthy XMS.

5. Wait for the dry-run to complete and the XMS to restart.

Running: /xtremapp/bin/xms-recovery 10.55.120.49 10.55.120.37


--keep-guid --dry-run --no-restore
Starting XMS DB synchronization...
Creating the DB
Backing up existing db file to /var/lib/xms/xms.sql.bak.0.
Starting Load Objects from SYM...
.............................Shutting down system lo [ OK ]
Starting system logger: [ OK ]
.......Load Objects from SYM finished successfully
Starting target discovery...
...finished fixing volumes
Recovering debug info...
Successfully recovered debug info
Changing password for user root.
passwd: all authentication tokens updated successfully
XMS DB Sync dry run finished successfully

Restarting XMS...
Restart XMS using services
Starting XMS system using XROOT /xtremapp
connectemc stop/waiting
connectemc start/running, process 9050
xtremapp-xms stop/waiting
xtremapp-xms start/running, process 9136

Replacing the XMS 57


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

To perform XMS recovery:


1. In the Install menu, type the number for the Recover XMS option.

Last login: Sun May 29 18:09:58 2016 from 10.76.51.35


XtremIO install interface
Checking XMS health
XMS health check passed

Install menu
1. Configure XMS
2. Check XMS configuration
3. Display XMS Information
4. Install XMS only
5. Install Storage Controllers
6. ESRS Menu
7. Recover XMS
8. Power Menu
9. Collect Log Bundle
10. IPMI Access Menu
11. Disable Remote Shell
12. Restricted Shell
13. Installation Package Pre-loaded on Storage Controller Menu
99. Exit Install Menu

> 7

2. Provide the IP address of the managing Storage Controller and append the
--keep-guid flag.

Note: The --keep-guid parameter must be applied when the affected XMS is part of
a Native Replication or Recover Point replication configuration.

Note: In a multi-cluster environment, type a list of IP addresses (or Host Names) of the
managing Storage Controllers in all clusters that are connected to the affected XMS.

Note: Make sure to type a single space character between each pair of IP addresses (or
Host Names).

Please enter IP Address or Host Name of the System Manager Storage


Controller (ususally the first):
> 10.55.120.49 10.55.120.37 --keep-guid

3. Select whether you want to restore the original database in case of failure.

Please enter selection for restoring the original DB in case of


failure (yes/no):
> yes

Note: Respond `no` to this question if replacing a defective/unhealthy XMS.


Otherwise, respond `yes` if recovering a working/healthy XMS.

58 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

4. Wait for the recovery process to complete and for the XMS to restart.

Running: /xtremapp/bin/xms-recovery 10.55.120.49 10.55.120.37


--keep-guid stop/waiting
initctl: unknown instance:
Starting XMS DB synchronization...
Creating the DB
Backing up existing db file to /var/lib/xms/xms.sql.bak.1.
Backing up existing passwd file to /etc/passwd.bak.1.
Warning: no sshusers directory to backup
Warning: xms’s ssh user group [xmsusers] does not exist, nothing to
clear
Starting Load Objects from SYM...
.............................Shutting down system lo [ OK ]
Starting system logger: [ OK ]
.......Load Objects from SYM finished successfully
Starting target discovery...
..Recovering ssh users...
successfully recovered ssh users
finished fixing volumes
Recovering debug info...
successfully recovered debug info
Changing password for user root.
passwd: all authentication tokens updated successfully.
XMS DB Sync finished successfully
Please restart the XMS (xms-restart)

Restarting XMS...
Restart XMS using services
Starting XMS system using XROOT /xtremapp
connectemc stop/waiting
connectemc start/running, process 9050
initctl: Unknown instance
xtremapp-xms start/running, process 9136

Replacing the XMS 59


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

Recovering the XMS Using the XMCLI

Note: If the affected XMS is part of a Native Replication or Recover Point replication
configuration, use the Install Menu for XMS recovery instead. For details refer to
“Recovering the XMS Using the Install Menu” on page 56

Perform one of the following:


 “Recovering a Physical XMS Using the XMCLI”
 “Recovering a Virtual XMS Using the XMCLI” on page 61

Recovering a Physical XMS Using the XMCLI

To recover a physical XMS using the XMCLI:


1. Log in to the XMS CLI as tech.
2. Run the recover-xms command, and enter the IP address of a Storage Controller for
each of the clusters that should be managed by the XMS, followed by the force flag
(to override earlier cluster-XMS associations).

xmcli (tech) > recover-xms sc-mgr-hosts=["10.102.36.220"] force

Note: For multi-cluster environments, list the cluster Storage Controller IP


addresses/host names for all clusters to be managed by the XMS.
For example:
“recover-xms sc-mgr-hosts = ["10.102.36.220",
"10.102.36.221",...]force”.

3. Select to recover the XMS, by typing "yes".

Old XMS and all of its data will be lost. Are you sure you want to recover the XMS? (Yes/No): yes
XMS recovery has been started

Note: Respond `no` to this question if replacing a defective/unhealthy XMS.


Otherwise, respond `yes` if recovering a working/healthy XMS.

4. Wait for the recovery process to complete.

Done!
XMS recovery finished successfully

5. Optional: Following the XMS recovery process, if you want to refresh the SSH key, run
the following command:
refresh-xms-ssh-key
6. After the recovery has successfully completed, log out of XMS CLI.

60 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

Recovering a Virtual XMS Using the XMCLI

To recover a virtual XMS using the XMCLI:


1. Log in to the XMS CLI as tech.
2. Run the recover-xms command and enter the IP addresses of all the clusters that
should be managed by the XMS.

Note: Even if working with a single cluster, ensure to add the single IP address.

xmcli (tech) > recover-xms sc-mgr-hosts=["10.102.36.220", "10.103.224.119"] force

3. Select to recover the XMS, by typing "yes".

Old XMS and all of its data will be lost. Are you sure you want to recover the XMS? (Yes/No): yes
XMS recovery has been started

Note: Respond `no` to this question if replacing a defective/unhealthy XMS.


Otherwise, respond `yes` if recovering a working/healthy XMS.

4. Wait for the recovery process to complete.

Done!
XMS recovery finished successfully

5. After the recovery has successfully completed, log out of XMS CLI.

Replacing the XMS 61


DELL EMC CONFIDENTIAL
Replacing the Servers and Components

Performing the Post Replacement Procedures

Note: Following an XMS replacement procedure, run the following XMCLI command to
check the XMS Remote Support configuration:
show-syr-notifier
It may be necessary to manually reconfigure XMS Remote Support.
This is especially important when using the SRS-VE XMS Remote Support configuration on
an XMS running version 6.0.1 (or later).
For further details, refer to Dell EMC KB# 524863 (https://support.emc.com/kb/524863).

After configuring the replaced component, it is necessary to perform the following post
replacement procedures:
 With XMS version 6.1 (or later versions), if the XMS you just replaced was previously
part of a Native Replication environment, to restore the Native Replication
configuration on the replaced XMS, run the following command:
add-remote-protection-peer-xms
remote-ip-addr=”Remote_XMS_IP”
remote-xms-alias-name="Remote XMS alias used in Remote
Protection domain"
remote-xms-user=”admin” remote-user-password=”XXX” force

Note: Make sure to use the force flag while running the
add-remote-protection-peer-xms XMCLI command.

Note: For remote-xms-user and remote-user-password parameters, use admin level


user account on peer XMS (typically admin).

Note: Check with the customer to determine if the affected XMS was part of a Native
Replication environment. In addition, you can use the
show-remote-protection-peer-xms XMCLI command to assist for determining this. For
further details, refer to the XtremIO Storage Array User Guide of the version running on
the affected XMS.

 Checking for and clearing any active repeating alerts


 Generating and uploading a log bundle
 Disabling path redundancy monitoring for an XtremIO cluster connected to VPLEX.
For more details and instructions, see “Disabling Path Redundancy Monitoring for
VPLEX-Connected XtremIO Clusters” on page 149.
 Checking the cluster’s health by running the XtremIO Health Check Script (HCS)
 Restoring all Notifiers
 Closing the tunnel between a Storage Controller and the XMS
For instructions on performing post configuration procedures, refer to Appendix F.

62 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL

CHAPTER 3
Replacing the DAE Components

This chapter includes the following topics:


 Replacing the SSDs................................................................................................. 64
 Replacing a DAE Chassis ......................................................................................... 68
 Replacing the DAE Controllers (LCCs) ...................................................................... 75
 Replacing the DAE Power Supply Units.................................................................... 81

Replacing the DAE Components 63


DELL EMC CONFIDENTIAL
Replacing the DAE Components

Replacing the SSDs



Priority Failure Analysis (Priority FA) is required only for XtremIO FRU replacements
involved in an outage (DU/DL).


The SSD replacement procedure should be performed, using the XtremIO Technician
Advisor utility, following a Service Request (SR) determined by XtremIO Global Technical
Support. If you have any questions or encounter problems, contact XtremIO Global
Technical Support.


SSD Integration on X1 cluster types is possible only after DPG rebuild has completed.

Tolerance
 Failure of up to two SSDs in a single X-Brick results in performance degradation during
rebuild.
 Concurrent failure of three SSDs in the same X-Brick results in a loss of service.
 Failure of six SSDs in the same XDP group results in a degraded state which is called
“degraded (single failure)”, where the data has only a single parity protection. For a
10TB Starter X-Brick (5TB) it is five SSDs.
 Failure of seven SSDs in the same XDP group results in dual-degraded state, which is
called “degraded (dual failure)”, where the data has no parity protection. For a 10TB
Starter X-Brick (5TB) it is six SSDs.
 Failure of eight SSDs in the same XDP group results in loss of service. For a 10TB
Starter X-Brick (5TB) it is seven SSDs.
 Insufficient SSD space may prevent the cluster from rebuilding the XDP group,
resulting in a degraded state where the data does not have double-parity protection.

Accessing the XMS via a Cluster Storage Controller


Before replacing a defective component, a tunnel must be opened in order to access the
XMS via a Storage Controller, and be closed upon the procedure’s completion (when
access to XMS is no longer required). Once this is done, when handling a replacement
case on site, connect to the TECH port of a cluster’s Storage Controller, and access the
XMS. For instructions, refer to “Accessing the XMS via a Cluster Storage Controller” on
page 13.

64 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the DAE Components

Identifying the Defective SSD

Note: This procedure does not apply to defective SSDs detected by 5D SMART Error. For
more information, see “Handling Defective SSDs, Detected by 5D SMART Error” on
page 66.

To identify the defective SSD, using the CLI:


1. Log in to the XMCLI as tech.
2. List the SSDs status, using the following command:
show-ssds cluster-id="<cluster name>"

Note: It is recommended to use the cluster name (and not the cluster ID) as the cluster
identifier in cluster-related XMCLI commands.

Note: The cluster-id parameter is not mandatory for single cluster configurations.

xmcli (tech)> show-ssds cluster-id="xbrick141"


Name Index Cluster-Name Index Brick-Name Index Slot Model-Name FW-Version FW-State Part-Number SSD-Size DPG-Name Index SSD-DPG-State Lifecycle-State Endurance-Remaining-% Certainty Encryption-Status
SSD-Temperature
wwn-0x5000cca05006873c 1 xbrick141 1 X1 1 0 HITACHI HUSMM111CLAR1600 C250 no_error 5051059 1.455T X1-DPG 1 in_rg healthy 99 ok
enc_supported_locked_cluster_pin None
wwn............
wwn-0x5000cca050067774 9 xbrick141 1 X1 1 8 HITACHI HUSMM111CLAR1600 C250 no_error 5051059 1.455T X1-DPG 1 in_rg healthy 99 ok
enc_supported_locked_cluster_pin None
wwn-0x5000cca050066ca8 10 xbrick141 1 X1 1 9 HITACHI HUSMM111CLAR1600 C250 no_error 5051059 1.455T not_in_rg disconnected 99 ok
enc_supported_locked_cluster_pin None
wwn-0x5000cca050066cb4 11 xbrick141 1 X1 1 10 HITACHI HUSMM111CLAR1600 C250 no_error 5051059 1.455T X1-DPG 1 in_rg healthy 99 ok
enc_supported_locked_cluster_pin None
wwn-..........

3. Note the Index of the SSD with a non-healthy state.


Defective SSDs can be identified in the SSD-DPG-State column with a status of
either failed_in_rg or eject_pending.

To identify the defective SSD, using the GUI:


 From the GUI, view the Inventory; the defective SSD appears in red.

Checking the XtremIO Cluster Health


Before replacing the defective component, check the cluster’s health by using the XtremIO
Health-Check Script (HCS). For instructions, refer to “Checking the XtremIO Cluster Health”
on page 17.

Replacing the SSDs 65


DELL EMC CONFIDENTIAL
Replacing the DAE Components

Handling Defective SSDs, Detected by 5D SMART Error


A 5D SMART error is a diagnostic-level device error that is detected on the SSD. If a
defective SSD is detected with a 5D SMART error, the system raises the following error:
alert_def_ssd_diag_level_4_minor
Defective SSDs are also detectable by running the show-slots CLI command. Any 5D
SMART error SSDs detected are identified by the text "smart_failed" text in the
output’s Error-Reason column.
Replace the defective SSD as soon as possible (refer to “Replacing a Defective SSD, Using
the Technician Advisor Utility” on page 67).


If the alert is raised on more than one SSD in the XtremIO cluster, make sure to replace the
defective SSDs systematically, one at a time. Therefore, it is necessary to wait for the
rebuild and integration of each new SSD to complete entirely BEFORE proceeding to
replace the next SSD, after each SSD is replaced. Refer to Dell EMC KB 205558 for further
details, and up-to-date information on this scenario.

Checking the XtremIO Cluster Health


Before replacing the defective Storage Controller, check the cluster’s health by using the
XtremIO Health-Check Script (HCS). For instructions, refer to “Checking the XtremIO Cluster
Health” on page 17.

Closing the Tunnel Between the Storage Controller and the XMS (if Previously Opened)
After checking the cluster’s health using the XtremIO Health-Check Script (HCS), make
sure to close the tunnel between the Storage Controller and XMS (if one previously
opened), as described in “Accessing the XMS via a Cluster Storage Controller” on page 13.

Physically Locating the Defective SSD (Using LEDs)


To activate the SSD identification LED, using the CLI:
1. Log in to the XMCLI as tech.
2. Enter the control-led CLI command to locate the defective SSD.
For example, you can use the following command:
control-led cluster-id="Cluster_One" entity="SSD"
led-mode="blinking" object-id-list=[3]
This causes the LED to blink on SSD number 3 on Cluster_One.

Note: For further details on using LEDs to identify components, refer to Appendix B.

66 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the DAE Components

To activate the SSD identification LED, using the GUI:


 For instructions on using LEDs to identify components, refer to Appendix B.

Note: Make sure to close the tunnel between the Storage Controller and XMS (if one was
opened) when access to XMS is no longer required, as described in “Accessing the XMS
via a Cluster Storage Controller” on page 13.

Replacing a Defective SSD, Using the Technician Advisor Utility



The SSD replacement procedure should be performed, using the XtremIO Technician
Advisor utility, following a Service Request (SR) determined by XtremIO Global Technical
Support. If you have any questions or encounter problems, contact XtremIO Global
Technical Support.

Note: For details on the XtremIO Technician Advisor utility, refer to the XtremIO Technician
Advisor Utility User Guide, which is posted in the XtremIO SolVe Generator, under XtremIO
> XtremIO X1 (XIOS 2.x, 3.x, 4.x) > Service Scripts and Utilities > XtremIO Technician Advisor
> Install XtremIO Technician Advisor.

Note: If XtremIO Global Tech Support instructs you to follow the manual configuration
procedures, refer to Appendix E.

Replacing the SSDs 67


DELL EMC CONFIDENTIAL
Replacing the DAE Components

Replacing a DAE Chassis



Priority Failure Analysis (Priority FA) is required only for XtremIO FRU replacements
involved in an outage (DU/DL).


If RecoverPoint is connected to an XtremIO cluster, notify the customer to pause the
activity of Consistency Groups that are configured to replicate with the cluster, using
RecoverPoint native replication, during this FRU procedure.
If the customer requires assistance to pause in RecoverPoint, contact RecoverPoint Global
Tech Support.
If the customer is unable to perform this operation, do not perform this FRU procedure and
contact XtremIO Global Tech Support before taking any further action.
For further details, provide the customer with Dell EMC KB# 479972
(https://support.emc.com/kb/479972).

Tolerance
 Failure of a DAE chassis results in loss of service.

Accessing the XMS via a Cluster Storage Controller


Before replacing a defective component, a tunnel must be opened in order to access the
XMS via a Storage Controller, and be closed upon the procedure’s completion (when
access to XMS is no longer required). Once this is done, when handling a replacement
case on site, connect to the TECH port of a cluster’s Storage Controller, and access the
XMS. For instructions, refer to “Accessing the XMS via a Cluster Storage Controller” on
page 13.

Identifying the Defective DAE Chassis

To identify the defective DAE chassis, using the CLI:


1. Log in to the XMCLI as tech.
2. List the DAE Chassis status, using the following command:
show-daes cluster-id="<cluster name>"
3. Note the Index of the DAE chassis with a non-healthy state.

To identify the defective DAE chassis, using the GUI:


 From the GUI, view the Inventory; the defective DAE chassis appears in orange.

68 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the DAE Components

Physically Locating the Defective DAE Chassis (Using LEDs)

To activate the DAE chassis identification LED, using the CLI:


1. Log in to the XMCLI as tech.
2. Enter the control-led CLI command to locate the defective DAE chassis.
For example, you can use the following command:
control-led cluster-id="Cluster_One" entity="DAE"
led-mode="blinking" object-id-list=[1]
This causes the LED to blink on DAE chassis number 1 on Cluster_One.

Note: For further details on using LEDs to identify components, refer to Appendix B.

To activate the DAE chassis identification LED, using the GUI:


 For instructions on using LEDs to identify components, refer to Appendix B.

Checking the XtremIO Cluster Health


Before replacing the defective component, check the cluster’s health by using the XtremIO
Health-Check Script (HCS). For instructions, refer to “Checking the XtremIO Cluster Health”
on page 17.

Replacing the Defective DAE Chassis

To remove the defective DAE chassis:


1. Log in to the XMCLI as tech.
2. If the cluster is still running, stop the cluster using the following command:
stop-cluster cluster-id="<cluster name>"


Verify that you specify the correct cluster name.

xmcli (tech)> stop-cluster cluster-id="xbrick717" cluster-psnt="XIO00172309176"


Warning: You are about to stop the cluster service. All connected initiators will be denied access to cluster data.
Are you sure you want to stop Cluster xbrick717 [1]? (Yes/No): Yes
The stop process may take several minutes. Please wait for successful completion prior to powering off the cluster.
[###################################################] 100% (elapsed time 00:01:33)
Stopped Cluster xbrick717 [1]. Cluster state: stopped

Replacing a DAE Chassis 69


DELL EMC CONFIDENTIAL
Replacing the DAE Components

3. In the CLI, run the following command:


replace-dae-prepare dae-id=<ID> cluster-id=<"cluster-name">

xmcli (tech)> replace-dae-prepare dae-id=1 cluster-id="xbrick717"


DAE X1-DAE [1] preparation finished successfully

Note: If the DAE is healthy and connected, the force flag command must be run to
successfully run the replace-dae-prepare command. For guidance and
directions, refer to XtremIO Global Technical Support for assistance.

4. If the cluster is in a factory-assembled rack, remove the shipping bracket.


5. If necessary, from the rear side of the Storage Controller that is adjacent to the
component you are replacing, tilt the cable management bracket's tray (up/down) to
gain better access. Simultaneously pull the latches on the left and right sides of the
cable management bracket, and then push the tray either up or down.

Note: If there are two Storage Controllers adjacent to each other, first tilt the cable
management bracket's tray furthest from the component being replaced and then tilt
the tray of the other Storage Controller.

6. If cables are not marked, label them so that you can reconnect them as required to the
new DAE chassis.
7. Disconnect the power cables from the DAE’s PSUs.
8. Disconnect the SAS cables from the DAE Controllers.
9. Remove the DAE Controller (LCC) units from the defective DAE and immediately insert
them into the new DAE Chassis (for details, refer to “Replacing the DAE Controllers
(LCCs)”).
10. Remove the DAE power supply units from the defective DAE and immediately insert
them into the new DAE Chassis (for details, refer to “Replacing the DAE Power Supply
Units”).
11. Remove the DAE bezel.
12. Remove each SSD (one at a time) from the defective DAE chassis and immediately
insert it into the same slot in the new DAE Chassis.

70 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the DAE Components

13. If you are replacing the DAE of a 10TB Starter X-Brick (5TB):
a. Remove the 12 plastic air seals from slots 13 through 24 of the defective DAE
chassis.
b. Insert the removed air seals into slots 13 through 24 of the new DAE chassis.

If you are replacing the DAE of a regular X-Brick, ignore this step.
14. Remove the four screws (two per side) that secure the front of the enclosure to the
front vertical channels of the cabinet, and save the screws.
15. With help from another person, slide the enclosure out of the cabinet.

Note: If the defective DAE chassis should be sent to Dell EMC for Failure Analysis (FA), refer
to Appendix C for the procedure details.

Replacing a DAE Chassis 71


DELL EMC CONFIDENTIAL
Replacing the DAE Components

To install the new DAE chassis:


1. Slide the DAE chassis into the DAE chassis rails in the cabinet. Ensure that the
enclosure is fully inside the cabinet. The rail stops in the back seat into the back of the
enclosure at the correct depth, and the front of the enclosure is aligned with the
cabinet face.
2. When the DAE chassis is in place, insert and tighten all of the screws. It may be easier
to install the screws working in a diagonal pattern, such as bottom left and top right,
bottom right and top left.

3. Reinstall the DAE bezel.


4. Connect the SAS cables.
5. Connect the power cables.
6. If you initially tilted the cable management bracket's tray (up/down) on the Storage
Controller adjacent to the DAE, return it to its original position, by pulling the latches
(on the left and right sides of the bracket) until the latches click in.

Note: Make sure that the latches are engaged and the tray is locked in its position.

Note: If there are two Storage Controllers adjacent to each other, first return the cable
management bracket's tray nearest to the component being replaced, to its original
position, and then return the second tray.

7. If you removed any shipping brackets, re-install them.

Configuring the Replaced DAE Chassis

To configure the replaced DAE chassis:


1. Log in to the XMCLI as tech.
2. Display the DAE, using the following command:
show-daes

72 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the DAE Components

3. Replace the DAE, Run the following command:


replace-dae dae-id=<ID> cluster-id=<"cluster-name">

xmcli (tech)> replace-dae dae-id=1 cluster-id="xbrick717"


DAE X1-DAE [1] replacement initiated

4. Wait for several seconds and make sure that the new DAE is in a healthy state, using
the following command:
show-daes cluster-id="<cluster name>"

xmcli (tech)> show-daes cluster-id="xbrick717"


Name Index Model-Name Serial-Number HW-State Part-Number Brick-Name Index Cluster-Name Index HW-Revision FW-Version Fan-Pair1-Status Fan-Pair2-Status Fan-Pair3-Status Fan-Pair4-Status
X1-DAE 1 Quanta QTF0016100003 healthy 105-000-682-03 X1 1 xbrick717 1 G3E healthy healthy healthy healthy

Note: If the state of the DAE is other than healthy, contact XtremIO Global Technical
Support.

5. Run the following set of commands to verify that all of the other DAE components are
healthy and installed to their respective (correct) locations:
show-daes-controllers cluster-id="<cluster name>"

xmcli (tech)> show-daes-controllers cluster-id="xbrick717"


Name Index Model-Name Serial-Number Lifecycle-State Enabled-State HW-Revision Location-Index Location DAE-Controller-Temp FW-Version Part-Number DAE-Name DAE-Index Brick-Name Index
Cluster-Name Index DAE-Controller-HW-Label
X1-DAE-Controller-L 1 QUANTA_SIM MQX71800166 healthy enabled 1 left 36 1.22 105-000-470-XX X1-DAE 1 X1 1
xbrick717 1 SIM0
X1-DAE-Controller-R 2 QUANTA_SIM MQX71800168 healthy enabled 2 right 38 1.22 105-000-470-XX X1-DAE 1 X1 1
xbrick717 1 SIM1

Note: Verify the component’s Name against the Location.

Power up the cluster using the following command:


start-cluster cluster-id="<cluster name>""

xmcli (tech)> start-cluster cluster-id="xbrick717"


The process may take a few minutes. Please do not interrupt.
[###################################################] 100% (elapsed time 00:00:45)

6. Wait until the following message appears:


Cluster started
7. Verify that the cluster and modules are active, by running the following commands:
show-clusters
show-modules cluster-id="<cluster name>"

Note: If the cluster and modules are other than active and healthy, contact
XtremIO Global Technical Support.

Replacing a DAE Chassis 73


DELL EMC CONFIDENTIAL
Replacing the DAE Components

Performing the Post Replacement Procedures


After configuring the replaced component, it is necessary to perform the following post
replacement procedures:
 Checking for and clearing any active repeating alerts
 Generating and uploading a log bundle
 Checking the cluster’s health by running the XtremIO Health Check Script (HCS)
 Restoring all Notifiers
 Closing the tunnel between a Storage Controller and the XMS
For instructions on performing post configuration procedures, refer to Appendix F.

74 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the DAE Components

Replacing the DAE Controllers (LCCs)



Priority Failure Analysis (Priority FA) is required only for XtremIO FRU replacements
involved in an outage (DU/DL).


If RecoverPoint is connected to an XtremIO cluster, notify the customer to pause the
activity of Consistency Groups that are configured to replicate with the cluster, using
RecoverPoint native replication, during this FRU procedure.
If the customer requires assistance to pause in RecoverPoint, contact RecoverPoint Global
Technical Support.
If the customer is unable to perform this operation, do not perform this FRU procedure and
contact XtremIO Global Technical Support before taking any further action.
For further details, provide the customer with Dell EMC KB# 479972
(https://support.emc.com/kb/479972).

Note: Starting from version 4.0.10, if a SAS port’s error level exceeds that of the
(predefined) error threshold, the system disables the port. In such cases, it may be
necessary to replace a DAE Controller per guidance and directions from XtremIO Global
Technical Support.

Tolerance
 Failure of both DAE Controllers (or all SAS cables) in the same X-Brick results in loss of
service.
 Failure of one or more DAE Controller SAS ports results in degraded service.

Accessing the XMS via a Cluster Storage Controller


Before replacing a defective component, a tunnel must be opened in order to access the
XMS via a Storage Controller, and be closed upon the procedure’s completion (when
access to XMS is no longer required). Once this is done, when handling a replacement
case on site, connect to the TECH port of a cluster’s Storage Controller, and access the
XMS. For instructions, refer to “Accessing the XMS via a Cluster Storage Controller” on
page 13.

Identifying the Defective DAE Controller

To identify the defective DAE Controller, using the CLI:


1. Log in to the XMCLI as tech.

Replacing the DAE Controllers (LCCs) 75


DELL EMC CONFIDENTIAL
Replacing the DAE Components

List the DAE Controllers’ status, using the following command:


show-daes-controllers cluster-id="<cluster name>"

xmcli (tech)> show-daes-controllers


Name Index Model Serial-Number State Enabled-State HW-Revision Index-In-DAE Location FW-Version
X1-DAE-LCC-A 2 Derringer LCC US1D0123300597 disconnected enabled 2912 1 bottom 1.54
X1-DAE-LCC-B 1 Derringer LCC JWXEL151001077 healthy enabled 2912 2 top 1.54
xmcli (tech)>

2. Note the Index of the DAE Controller with a disconnected state. The DAE Controller with
the disconnected state indicates that it is the defective DAE Controller. Proceed to
“Replacing the Defective DAE Controller” on page 77.
3. If all DAE Controllers in the cluster are healthy, it means that there is no defective DAE
Controller, and it is therefore necessary to identify the system-disabled DAE
Controllers' SAS port, using the following command:
show-daes-controllers-sas-ports

xmcli (tech)> show-daes-controllers-sas-ports


Name Index Cluster-Name Index Port-Index Port-State Port-Health-State Health-Level Port-Enabled-State
X1-DAE-LCC-B 1 xbrick276 1 1 down failed level_6_critical System_disabled 0
X1-DAE-LCC-B 1 xbrick276 1 2 up healthy level_1_clear enabled 2
X1-DAE-LCC-A 2 xbrick276 1 1 up healthy level_1_clear enabled 0
X1-DAE-LCC-A 2 xbrick276 1 2 up healthy level_1_clear enabled 0
xmcli (tech)>

If the Port-Health-State column indicates a defective SAS port, make a note of the
following Indexes for future reference:
• DAE Controller index and port index with the "Port-Enabled-State" column
showing "System-disabled", or a Port-Health-State marked "failed"
• DAE Controller’s Index
4. Proceed to “Replacing the Defective DAE Controller” on page 77.

To identify the defective DAE Controller, using the GUI:


 From the GUI, view the Inventory; the defective DAE Controller appears in orange.

Physically Locating the Defective DAE Controller (Using LEDs)

To activate the DAE Controller identification LED, using the CLI:


1. Log in to the XMCLI as tech.
2. Enter the control-led CLI command to locate the defective DAE Controller.
For example, you can use the following command:
control-led cluster-id="Cluster_One" entity="DAELCC"
led-mode="blinking" object-id-list=[1]
This causes the LED to blink on DAE Controller number 1 on Cluster_One.

Note: For further details on using LEDs to identify components, refer to Appendix B.

76 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the DAE Components

To activate the DAE Controller identification LED, using the GUI:


 For instructions on using LEDs to identify components, refer to Appendix B.

Checking the XtremIO Cluster Health


Before replacing the defective component, check the cluster’s health by using the XtremIO
Health-Check Script (HCS). For instructions, refer to “Checking the XtremIO Cluster Health”
on page 17.

Replacing the Defective DAE Controller

To remove the defective DAE Controller:


1. Log in to the XMCLI as tech.
2. Run the following command to ensure that all Storage Controllers are healthy:
show-storage-controllers cluster-id="<cluster name>"

Note: If one of the Storage Controllers is not operating correctly, contact XtremIO
Global Technical Support before taking any further action.

3. If the cluster is a factory-assembled rack, remove the shipping bracket from behind
the DAE to be serviced.
4. If necessary, from the rear side of the Storage Controller that is adjacent to the
component you are replacing, tilt the cable management bracket's tray (up/down) to
gain better access. Simultaneously pull the latches on the left and right sides of the
cable management bracket, and then push the tray either up or down.

Note: If there are two Storage Controllers adjacent to each other, first tilt the cable
management bracket's tray furthest from the component being replaced and then tilt
the tray of the other Storage Controller.

5. Make sure that the SAS cables are labeled. If not, label them as necessary, so that you
can reconnect them as required to the DAE Controller.
6. Disconnect the SAS cables from the defective DAE Controller.

Note: When disconnecting the cables it is important to note the ports the cables were
disconnected from, so that you can reconnect them to the same ports after installing
the new DAE Controller.
For cabling guidelines refer to XtremIO Storage Array Hardware Installation and
Upgrade Guide.

Replacing the DAE Controllers (LCCs) 77


DELL EMC CONFIDENTIAL
Replacing the DAE Components

7. Remove the defective DAE Controller unit from the DAE as follows:
a. Locate the orange handle buttons on the DAE Controller handles.
b. Press the orange handle buttons to release the DAE Controller, pull the latches
outward, and remove the DAE Controller from its slot.

Note: If the defective DAE Controller should be sent to Dell EMC for Failure Analysis (FA),
refer to Appendix C for the procedure details.

To install the new DAE Controller:


1. Connect the SAS cables to the new DAE Controller.
2. Pull out the latches on the DAE Controller and make sure that they stay in the open
position.
3. Align the DAE Controller with the chassis opening and gently push it straight into the
chassis. Make sure that the DAE Controller is completely seated in the chassis.
4. Press the latches to secure the DAE Controller.
5. If you initially tilted the cable management bracket's tray (up/down) on the Storage
Controller adjacent to the DAE, return it to its original position, by pulling the latches
(on the left and right sides of the bracket) until the latches click in.

Note: Make sure that the latches are engaged and the tray is locked in its position.

Note: If there are two Storage Controllers adjacent to each other, first return the cable
management bracket's tray nearest to the component being replaced, to its original
position, and then return the second tray.

6. If you removed any shipping brackets, re-install them.

78 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the DAE Components

Configuring the Replaced DAE Controller


1. Log in to the XMCLI as tech.
2. Confirm that the new DAE Controller is available, using the following command:
show-daes-controllers cluster-id="<cluster name>"
3. Wait for 10 minutes and then run the following command:
show-storage-controllers cluster-id="<cluster name>"
Check the State column to ensure that all Storage Controllers are healthy.

Note: If one of the Storage Controllers is not operating correctly, contact XtremIO
Global Technical Support before taking any further action.

4. Replace the DAE Controller, using the following command:


replace-dae-controller dae-controller-id=<id>
cluster-id="<cluster name>"
where ID is the Index of the defective DAE Controller.
5. Wait for several seconds and then run the following command:
show-daes-controllers cluster-id="<cluster name>"
Make sure that for the new DAE Controller, the State column displays healthy.
6. Wait for several seconds and then run the following command:
show-daes-controllers-sas-ports
Make a note of the following Indexes:
• DAE Controller port’s Index (marked “System-disabled” or Port-Health-State
marked “degraded”)
• DAE Controller’s Index
7. Send a test pattern over the DAE Controller link, by running the following command:
activate-sas-port dae-controller-id=<dae-controller-id>
port-id=<port-id> cluster-id="<cluster name>" (using the DAE
Controller Index and port Index from the previous step).

Note: This command’s response can take up to one minute.

8. Repeat steps 6 and 7, until no ports are system disabled.


9. Wait for 10 minutes and then run the following commands:
show-clusters
show-storage-controllers cluster-id="<cluster name>"
show-modules cluster-id="<cluster name>"
show-storage-controllers-sas-ports cluster-id="<cluster
name>"
show-daes-controllers-sas-ports cluster-id="<cluster name>"
Make sure that the cluster and modules are active.

Replacing the DAE Controllers (LCCs) 79


DELL EMC CONFIDENTIAL
Replacing the DAE Components

Performing the Post Replacement Procedures


After configuring the replaced component, it is necessary to perform the following post
replacement procedures:
 Checking for and clearing any active repeating alerts
 Generating and uploading a log bundle
 Checking the cluster’s health by running the XtremIO Health Check Script (HCS)
 Restoring all Notifiers
 Closing the tunnel between a Storage Controller and the XMS
For instructions on performing post configuration procedures, refer to Appendix F.

80 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the DAE Components

Replacing the DAE Power Supply Units



Priority Failure Analysis (Priority FA) is required only for XtremIO FRU replacements
involved in an outage (DU/DL).

Tolerance
 Failure of a single DAE power supply unit bears no consequence.
 Failure of both DAE power supply units in the same DAE results in loss of service.

Accessing the XMS via a Cluster Storage Controller


Before replacing a defective component, a tunnel must be opened in order to access the
XMS via a Storage Controller, and be closed upon the procedure’s completion (when
access to XMS is no longer required). Once this is done, when handling a replacement
case on site, connect to the TECH port of a cluster’s Storage Controller, and access the
XMS. For instructions, refer to “Accessing the XMS via a Cluster Storage Controller” on
page 13.

Identifying the Defective DAE Power Supply Unit

To identify the defective DAE power supply unit, using the CLI:
1. Log in to the XMCLI as tech.
2. List the DAE power supply unit status, using the following command:
show-daes-psus cluster-id="<cluster name>"

xmcli (tech)> show-daes-psus cluster-id="xbrick711-713"


Name Index Serial-Number Location-Index Power-Feed Lifecycle-State Input Location HW-Revision Part-Number DAE-Name DAE-Index Brick-Name Index Cluster-Name Index PSU-HW-Label
X1-DAE-PSU-L 1 6E5QX01X5F410UV 1 PWR-B healthy on left 105-000-466-XX X1-DAE 1 X1 1 xbrick711-713 1 PSU0
X1-DAE-PSU-R 2 6E5QX01X5F4100A 2 PWR-A failed on right 105-000-466-XX X1-DAE 1 X1 1 xbrick711-713 1 PSU1
X2-DAE-PSU-L 3 6E5QX01X5F410DX 1 PWR-B healthy on left 105-000-466-XX X2-DAE 2 X2 2 xbrick711-713 1 PSU0
X2-DAE-PSU-R 4 6E5QX01X5F410HM 2 PWR-A healthy on right 105-000-466-XX X2-DAE 2 X2 2 xbrick711-713 1 PSU1
X3-DAE-PSU-L 5 6E5QX01X5F4101W 1 PWR-B healthy on left 105-000-466-XX X3-DAE 3 X3 3 xbrick711-713 1 PSU0
X3-DAE-PSU-R 6 6E5QX01X5F4102D 2 PWR-A healthy on right 105-000-466-XX X3-DAE 3 X3 3 xbrick711-713 1 PSU1

3. Note the PSU-HW-Label of the DAE power supply unit with a non-healthy State.

To identify the defective DAE power supply unit, using the GUI:
 From the GUI, view the Inventory; the defective DAE power supply unit appears in
orange.

Checking the XtremIO Cluster Health


Before replacing the defective component, check the cluster’s health by using the XtremIO
Health-Check Script (HCS). For instructions, refer to “Checking the XtremIO Cluster Health”
on page 17.

Replacing the DAE Power Supply Units 81


DELL EMC CONFIDENTIAL
Replacing the DAE Components

Replacing the Defective DAE Power Supply Unit

Note: Access to the disks in your DAE times out two minutes after a DAE power supply unit
is removed. While the system continues operating on a single PSU, the loss of the
removed PSU causes a timeout unless the PSU is replaced within two minutes. When
replacing a DAE PSU, ensure that the green light on the PSU remains permanently on for at
least five seconds before removing power on the second PSU.

To remove the defective DAE power supply unit:


1. If the cluster is in a factory-assembled rack, remove the shipping bracket.
2. If necessary, from the rear side of the Storage Controller that is adjacent to the
component you are replacing, tilt the cable management bracket's tray (up/down) to
gain better access. Simultaneously pull the latches on the left and right sides of the
cable management bracket, and then push the tray either up or down.

Note: If there are two Storage Controllers adjacent to each other, first tilt the cable
management bracket's tray furthest from the component being replaced and then tilt
the tray of the other Storage Controller.

3. Disconnect the power cable from the defective DAE power supply unit.

Note: Ensure that the new DAE PSU is prepared for insertion.

4. Remove the defective DAE power supply unit.

Note: If the defective DAE power supply unit should be sent to Dell EMC for Failure
Analysis (FA), refer to Appendix C for the procedure details.

82 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the DAE Components

To install the new DAE power supply unit:


1. Insert the new DAE power supply unit.

2. Connect the DAE power supply unit power cable. A green light indicates that the DAE
power supply unit is successfully connected.
3. If you initially tilted the cable management bracket's tray (up/down) on the Storage
Controller adjacent to the DAE, return it to its original position, by pulling the latches
(on the left and right sides of the bracket) until the latches click in.

Note: Make sure that the latches are engaged and the tray is locked in its position.

Note: If there are two Storage Controllers adjacent to each other, first return the cable
management bracket's tray nearest to the component being replaced, to its original
position, and then return the second tray.

4. If you removed any shipping brackets, re-install them.

Configuring the Replaced DAE Power Supply Unit


Upon replacing the DAE power supply unit, the following alerts are issued:

xmcli (tech)> show-alerts


Index Description Severity Raise-Time Entity Name Index Cluster-Name Index Alert-Type State Alert-Code
26 The left DAE PSU is disconnected. major Thu Nov 17 17:55:47 2016 DAEPSU X6-DAE-PSU1 11 xbrick711-716 1 jbodpsu_fru_disconnected outstanding 1700304
28 The cluster has detected that a new DAE PSU has been added. minor Thu Nov 17 17:56:06 2016 StorageController X6-SC2 12 xbrick711-716 1 node_discover_daepsu_true outstanding 0405102
27 The cluster has detected that a new DAE PSU has been added. minor Thu Nov 17 17:56:06 2016 StorageController X6-SC1 11 xbrick711-716 1 node_discover_daepsu_true outstanding 0405102

To configure a replaced DAE power supply unit:


1. Log in to the XMS CLI as tech.
2. Replace the DAE power supply unit, using the following command:
replace-dae-psu dae-psu-id=<ID> cluster-id="<cluster name>"
where id is the Index of the defective DAE power supply unit.

xmcli (tech)> replace-dae-psu dae-psu-id=1 cluster-id="xbrick717"


DAE PSU X1-DAE-PSU-L [1] replacement initiated

Replacing the DAE Power Supply Units 83


DELL EMC CONFIDENTIAL
Replacing the DAE Components

3. Wait for several seconds and make sure the S/N of the replaced power supply unit has
changed.
4. Make sure that the new DAE power supply unit is in healthy state, using the following
command:
show-daes-psus cluster-id="<cluster name>"

xmcli (tech)> show-daes-psus cluster-id="xbrick727"


Name Index Serial-Number Location-Index Power-Feed State Input Location HW-Revision Part-Number DAE-Name DAE-Index Brick-Name Index Cluster-Name Index
X1-DAE-PSU1 1 6E5QX01X4F2804F 1 PWR-A healthy on left 0 PS-2142-5Q-LF X1-DAE 1 X1 1 xbrick727 1
X1-DAE-PSU2 2 6E5QX0101G0206W 2 PWR-B healthy on right 0 PS-2142-5Q-LF X1-DAE 1 X1 1 xbrick727 1

5. Verify that the cluster and modules are active, by running the following commands:
show-clusters
show-modules cluster-id="<cluster name>"

Performing the Post Replacement Procedures


After configuring the replaced component, it is necessary to perform the following post
replacement procedures:
 Checking for and clearing any active repeating alerts
 Generating and uploading a log bundle
 Checking the cluster’s health by running the XtremIO Health Check Script (HCS)
 Restoring all Notifiers
 Closing the tunnel between a Storage Controller and the XMS
For instructions on performing post configuration procedures, refer to Appendix F.

84 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL

CHAPTER 4
Replacing the InfiniBand Switch Components

This chapter includes the following topics:


 Replacing the InfiniBand Switch.............................................................................. 86
 Replacing the InfiniBand Switch Power Supply Units............................................... 94
 Replacing the InfiniBand Switch Fan Units .............................................................. 97

Replacing the InfiniBand Switch Components 85


DELL EMC CONFIDENTIAL
Replacing the InfiniBand Switch Components

Replacing the InfiniBand Switch



Priority Failure Analysis (Priority FA) is required only for XtremIO FRU replacements
involved in an outage (DU/DL).


If RecoverPoint is connected to an XtremIO cluster, notify the customer to pause the
activity of Consistency Groups that are configured to replicate with the cluster, using
RecoverPoint native replication, during this FRU procedure.
If the customer requires assistance to pause in RecoverPoint, contact RecoverPoint Global
Technical Support.
If the customer is unable to perform this operation, do not perform this FRU procedure and
contact XtremIO Global Technical Support before taking any further action.
For further details, provide the customer with Dell EMC KB# 479972
(https://support.emc.com/kb/479972).

Note: As best practice, you can compare the switch’s actual S/N to the one presented in
the GUI. It is always advisable to check the cable connection and LED activities on the
Storage Controllers IB NIC to make sure that you are operating on the correct switch.

Note: Starting from version 4.0.10, if an InfiniBand Switch port’s error level exceeds that of
the (predefined) error threshold, the system disables the port. In such cases, it may be
necessary to replace an InfiniBand Switch per guidance and directions from XtremIO
Global Technical Support.

Tolerance
 Failure of a single InfiniBand Switch renders the cluster vulnerable to risk of failure of
the second InfiniBand Switch and, therefore, compromises redundancy.
 Failure of both InfiniBand Switches in the same cluster results in loss of service.
 Failure of one or more InfiniBand Switch ports results in degraded service.

Accessing the XMS via a Cluster Storage Controller


Before replacing a defective component, a tunnel must be opened in order to access the
XMS via a Storage Controller, and be closed upon the procedure’s completion (when
access to XMS is no longer required). Once this is done, when handling a replacement
case on site, connect to the TECH port of a cluster’s Storage Controller, and access the
XMS. For instructions, refer to “Accessing the XMS via a Cluster Storage Controller” on
page 13.

86 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the InfiniBand Switch Components

Identifying the Defective InfiniBand Switch

Note: System Status LEDs are located at the front and rear of the InfiniBand Switch. A solid
red LED indicates that a major error has occurred.

To identify the defective InfiniBand Switch, using the CLI:


1. Log in to the XMS CLI as tech.
2. List the InfiniBand Switch status, using the following command:
show-infiniband-switches cluster-id=”<cluster name>”

xmcli (tech)> show-infiniband-switches cluster-id="xbrick717"


Name Index Cluster-Name Index Location-Index Serial-Number Part-Number Lifecycle-State FW-Version FW-Version-Error ... FAN1-State FAN2-State FAN3-State FAN4-State ... Fan-Module1-HW-Label Fan-Module2-HW-Label
Fan-Module3-HW-Label Fan-Module4-HW-Label
IB-SW1 1 xbrick711-714 1 1 MT1533X04439 100-586-636-00 disconnected 09.03.8000 no_error ... healthy healthy healthy healthy ... Fan1 Fan2 Fan3
Fan4
IB-SW2 2 xbrick711-714 1 2 MT1648X00256 100-586-636-02 healthy 09.03.8000 no_error ... healthy healthy healthy healthy ... Fan1 Fan2 Fan3
Fan4

Note: It is recommended to use the cluster name (and not the cluster ID) as the cluster
identifier in cluster-related XMCLI commands.

Note: The cluster-id parameter is not mandatory for single cluster configurations.

3. Make a note of the Index of the defective InfiniBand Switch and proceed to “Checking
the XtremIO Cluster Health” on page 88.
If both InfiniBand Switches are healthy, it is necessary to identify the system-disabled
InfiniBand Switch port, using the following command:
show-infiniband-switches-ports cluster-id="<cluster name>".

xmcli (tech)> show-infiniband-switches-ports cluster-id=xbrick711-714


Port-Index Peer-Type Port-In-Peer-Index Link-Rate-In-Gbps Port-State Enabled-State Health-State Link-Health-Level Name Index Cluster-Name Index IBSwitch-Port-HW-Label
1 Switch 1 56 up enabled healthy level_1_clear IB-SW1 1 xbrick711-714 1 Port1
2 Switch 2 56 up enabled healthy level_1_clear IB-SW1 1 xbrick711-714 1 Port2
3 Storagecontroller 1 56 up enabled healthy level_1_clear IB-SW1 1 xbrick711-714 1 Port3
4 Storagecontroller 1 56 up enabled healthy level_1_clear IB-SW1 1 xbrick711-714 1 Port4
5 Storagecontroller 1 56 up enabled healthy level_1_clear IB-SW1 1 xbrick711-714 1 Port5
6 Storagecontroller 1 56 up enabled healthy level_1_clear IB-SW1 1 xbrick711-714 1 Port6
7 Storagecontroller 1 56 up enabled healthy level_1_clear IB-SW1 1 xbrick711-714 1 Port7
8 Storagecontroller 1 56 up enabled healthy level_1_clear IB-SW1 1 xbrick711-714 1 Port8
9 Storagecontroller 1 56 up enabled healthy level_1_clear IB-SW1 1 xbrick711-714 1 Port9
10 Storagecontroller 1 56 up enabled healthy level_1_clear IB-SW1 1 xbrick711-714 1 Port10

Note: If a port is down, the response table cannot show the connecting peer, and lists
it as “None”.
If a port is disabled by the system manager, the port is rendered disabled on a
multiple X-Brick cluster.

Make a note of the following Indexes for future reference:


• InfiniBand Switch port’s Index (marked “System-disabled”)
• InfiniBand Switch’s Index

To identify the defective InfiniBand Switch, using the GUI:


 From the GUI, view the Hardware; the defective InfiniBand Switch appears in orange.

Replacing the InfiniBand Switch 87


DELL EMC CONFIDENTIAL
Replacing the InfiniBand Switch Components

Checking the XtremIO Cluster Health


Before replacing the defective component, check the cluster’s health by using the XtremIO
Health-Check Script (HCS). For instructions, refer to “Checking the XtremIO Cluster Health”
on page 17.

Replacing the Defective InfiniBand Switch

Note: Make sure that all cables are clearly labeled to enable proper connection to the new
InfiniBand Switch.

To remove the defective InfiniBand Switch:


1. Remove the InfiniBand Switch Bezel.
2. If necessary, from the rear side of the Storage Controller located adjacent to the
component you are replacing, tilt the cable management bracket's tray (up/down) to
gain better access. Simultaneously pull the latches on the left and right side of the
cable management bracket, and then push the tray either up or down.
3. From the front of the InfiniBand Switch, disconnect the InfiniBand Switch power
cables.
4. Disconnect any other InfiniBand Switch cables.
5. If a shipping bracket is installed directly above or below the InfiniBand Switch, remove
it to prevent damage to the foam padding.
6. Carefully remove the InfiniBand Switch from the rack, taking care not to disconnect
any other cables.
7. Note the position of the inner rails on the defective InfiniBand Switch, so as to mount
them at the exact same position, on the new InfiniBand Switch.

88 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the InfiniBand Switch Components

8. Remove the inner rails from the InfiniBand Switch.

Note: It is recommended to remove and install one rail (for reference) before removing
the second rail.

120

18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1

PS1
PS2
UID
RST

Note: If the defective InfiniBand Switch should be sent to Dell EMC for Failure Analysis
(FA), refer to Appendix C for the procedure details.

To install the new InfiniBand Switch:


1. Align the screw holes of each inner rail with those on the side of the InfiniBand Switch,
as previously noted in step 7 on page 88 .

Note: Verify that the correct holes are aligned to ensure that the depth of the
InfiniBand Switch within the rack is adjusted correctly.

2. Secure each inner rail to the InfiniBand Switch, using three screws.
3. Lift the InfiniBand Switch and slide it onto the rails.
4. Align the screw hole of each bezel clip with those on the front side of the inner rails
(one on each side).

120
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1

2
S1
PS2
P
UIDT
RS

5. Through each bezel clip, tighten a screw (one on each side) to secure the unit to rack.
6. Connect the InfiniBand Switch power cables.

Replacing the InfiniBand Switch 89


DELL EMC CONFIDENTIAL
Replacing the InfiniBand Switch Components

7. Connect the InfiniBand Switch interlink cables (labeled IBSW1-P17 and IBSW1-P18).
8. If you removed a shipping bracket directly above or below the InfiniBand Switch,
re-install it.
9. Wait for the interlinks to synchronize, as shown by the green LEDs on the InfiniBand
Switch associated ports.
10. Connect the remaining InfiniBand cables from the Storage Controllers.
11. If you initially tilted the cable management bracket's tray (up/down), return it to its
original position, by pulling the latches (on the left and right side of the bracket) until
the latches click in.

Note: Make sure that the latches are engaged and the tray is locked in its position.

12. Install the bezel.

Configuring the Replaced InfiniBand Switch

To configure the InfiniBand Switch:


1. Log in to the XMCLI as tech.
2. Run the following command:
show-infiniband-switches cluster-id=”<cluster name>”

xmcli (tech)> show-infiniband-switches cluster-id="xbrick717"


Name Index Cluster-Name Index Location-Index Serial-Number Part-Number Lifecycle-State FW-Version FW-Version-Error FAN-Drawer-State FAN1-State FAN2-State FAN3-State FAN4-State IB-Switch-HW-Label
Fan-Module1-HW-Label Fan-Module2-HW-Label Fan-Module3-HW-Label Fan-Module4-HW-Label
IB-SW1 1 xbrick711-714 1 1 MT1533X04439 100-586-636-00 disconnected 09.03.8000 no_error not_available healthy healthy healthy healthy IBSW1
Fan1 Fan2 Fan3 Fan4
IB-SW2 2 xbrick711-714 1 2 MT1648X00256 100-586-636-02 healthy 09.03.8000 no_error not_available healthy healthy healthy healthy IBSW2
Fan1 Fan2 Fan3 Fan4

90 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the InfiniBand Switch Components

3. For the steps required to be performed, run the following command:


replace-infiniband-switch ibswitch-id=<id>"
cluster-id="<cluster name>"

xmcli (tech)> replace-infiniband-switch ibswitch-id=1 cluster-id="xbrick717"


11:34:22 - Step 1: Connect IB-Switch1-Port1 to IB-Swtich2-Port1 and IB-Switch1-Port2 to IB-Switch2-Port2.
Please Enter "Done" (to proceed with) or "Abort" (to cancel) the command. (Done/Abort): Done
11:35:56 - Performing validations, please wait for 190 seconds.
11:36:06 - Step 2: Connect IB-Switch1-Port3 to SC1-Port1
Please Enter "Done" (to proceed with) or "Abort" (to cancel) the command. (Done/Abort): Done
11:38:41 - Performing validations, please wait for 190 seconds.
11:38:52 - Step 3: Connect IB-Switch1-Port4 to SC2-Port1
Please Enter "Done" (to proceed with) or "Abort" (to cancel) the command. (Done/Abort): Done
11:39:55 - Performing validations, please wait for 190 seconds.
11:40:20 - Step 4: Connect IB-Switch1-Port5 to SC3-Port1
Please Enter "Done" (to proceed with) or "Abort" (to cancel) the command. (Done/Abort): Done
11:41:14 - Performing validations, please wait for 190 seconds.
11:41:24 - Step 5: Connect IB-Switch1-Port6 to SC4-Port1
Please Enter "Done" (to proceed with) or "Abort" (to cancel) the command. (Done/Abort): Done
11:42:31 - Performing validations, please wait for 190 seconds.
11:42:47 - Step 6: Connect IB-Switch1-Port7 to SC5-Port1
Please Enter "Done" (to proceed with) or "Abort" (to cancel) the command. (Done/Abort): Done
11:44:02 - Performing validations, please wait for 190 seconds.
11:44:12 - Step 7: Connect IB-Switch1-Port8 to SC6-Port1
Please Enter "Done" (to proceed with) or "Abort" (to cancel) the command. (Done/Abort): Done
11:45:08 - Performing validations, please wait for 190 seconds.
11:45:18 - Step 8: Connect IB-Switch1-Port9 to SC7-Port1
Please Enter "Done" (to proceed with) or "Abort" (to cancel) the command. (Done/Abort): Done
11:46:23 - Performing validations, please wait for 190 seconds.
11:46:33 - Step 9: Connect IB-Switch1-Port10 to SC8-Port1
Please Enter "Done" (to proceed with) or "Abort" (to cancel) the command. (Done/Abort): Done
11:47:27 - Performing validations, please wait for 190 seconds.
[###################################################] 100% Updating Object Model (elapsed time 00:14:06)

4. Perform the steps as per the XMCLI output.


5. Wait for several seconds and then run the following command:
show-infiniband-switches cluster-id="<cluster name>"
Make sure that for the new InfiniBand Switch and fans, the State columns all display
healthy.

xmcli (tech)> show-infiniband-switches cluster-id="xbrick717"


Name Index Cluster-Name Index Location-Index Serial-Number ... Lifecycle-State ... FW-Version-Error FAN-Drawer-State FAN1-State FAN2-State FAN3-State FAN4-State IB-Switch-HW-Label
Fan-Module1-HW-Label Fan-Module2-HW-Label Fan-Module3-HW-Label Fan-Module4-HW-Label
IB-SW1 1 xbrick711-714 1 1 MT1537X13379 ... healthy ... no_error not_available healthy healthy healthy healthy IBSW1 Fan1 Fan2
Fan3 Fan4
IB-SW2 2 xbrick711-714 1 2 MT1648X00256 ... healthy ... no_error not_available healthy healthy healthy healthy IBSW2 Fan1 Fan2
Fan3 Fan4

6. Verify that the new InfiniBand Switch’s Serial Number is shown in the output.
7. Verify that the PSUs are healthy, by running the following command:
show-infiniband-switches-psus cluster-id= "<cluster name>"

xmcli (tech)> show-infiniband-switches-psus cluster-id="xbrick717"


Name Index Cluster-Name Index Location-Index Location Input-Power Lifecycle-State Power-Feed PSU-HW-Label
IB-SW1-PSU-L 1 xbrick711-714 1 1 left on healthy PWR-A PSU1
IB-SW1-PSU-R 2 xbrick711-714 1 2 right on healthy PWR-B PSU2
IB-SW2-PSU-L 3 xbrick711-714 1 1 left on healthy PWR-A PSU1
IB-SW2-PSU-R 4 xbrick711-714 1 2 right on healthy PWR-B PSU2

Replacing the InfiniBand Switch 91


DELL EMC CONFIDENTIAL
Replacing the InfiniBand Switch Components

8. Run the following command:


show-infiniband-switches-ports cluster-id="<cluster name>"

xmcli (tech)> show-infiniband-switches-ports cluster-id="xbrick717"


Port-Index Peer-Type Port-In-Peer-Index Link-Rate-In-Gbps Port-State Enabled-State Health-State Link-Health-Level Name Index Cluster-Name Index IBSwitch-Port-HW-Label
1 Switch 1 56 up enabled healthy level_1_clear IB-SW1 1 xbrick711-714 1 Port1
2 Switch 2 56 up enabled healthy level_1_clear IB-SW1 1 xbrick711-714 1 Port2
3 Storagecontroller 1 56 up enabled healthy level_1_clear IB-SW1 1 xbrick711-714 1 Port3
4 Storagecontroller 1 56 up enabled healthy level_1_clear IB-SW1 1 xbrick711-714 1 Port4
5 Storagecontroller 1 56 up enabled healthy level_1_clear IB-SW1 1 xbrick711-714 1 Port5
6 Storagecontroller 1 56 up enabled healthy level_1_clear IB-SW1 1 xbrick711-714 1 Port6
7 Storagecontroller 1 56 up enabled healthy level_1_clear IB-SW1 1 xbrick711-714 1 Port7
8 Storagecontroller 1 56 up enabled healthy level_1_clear IB-SW1 1 xbrick711-714 1 Port8
9 Storagecontroller 1 56 up enabled healthy level_1_clear IB-SW1 1 xbrick711-714 1 Port9
10 Storagecontroller 1 56 up enabled healthy level_1_clear IB-SW1 1 xbrick711-714 1 Port10
1 Switch 1 56 up enabled healthy level_1_clear IB-SW2 2 xbrick711-714 1 Port1
2 Switch 2 56 up enabled healthy level_1_clear IB-SW2 2 xbrick711-714 1 Port2
3 Storagecontroller 2 56 up enabled healthy level_1_clear IB-SW2 2 xbrick711-714 1 Port3
4 Storagecontroller 2 56 up enabled healthy level_1_clear IB-SW2 2 xbrick711-714 1 Port4
5 Storagecontroller 2 56 up enabled healthy level_1_clear IB-SW2 2 xbrick711-714 1 Port5
6 Storagecontroller 2 56 up enabled healthy level_1_clear IB-SW2 2 xbrick711-714 1 Port6
7 Storagecontroller 2 56 up enabled healthy level_1_clear IB-SW2 2 xbrick711-714 1 Port7
8 Storagecontroller 2 56 up enabled healthy level_1_clear IB-SW2 2 xbrick711-714 1 Port8
9 Storagecontroller 2 56 up enabled healthy level_1_clear IB-SW2 2 xbrick711-714 1 Port9
10 Storagecontroller 2 56 up enabled healthy level_1_clear IB-SW2 2 xbrick711-714 1 Port10

9. On the replaced InfiniBand Switch ports, verify that no port is in a


system-disabled state.
10. Verify that the cluster and modules are active, by running the following commands:
show-clusters
show-storage-controllers cluster-id="<cluster name>"
show-modules cluster-id="<cluster name>"
The output for show-clusters when the cluster is online:

xmcli (tech)> show-clusters cluster-id="xbrick711-714"


Cluster-Name Index State Gates-Open Conn-State Num-Of-Vols Num-Of-Internal-Volumes Vol-Size UD-SSD-Space Logical-Space-In-Use UD-SSD-Space-In-Use Total-Writes Total-Reads Stop-Reason Size-and-Capacity
xbrick711-714 1 active True connected 20 0 93.438T 101.577T 24.285T 11.862T 10.762T 31.685T none 4 Bricks & 125TB

92 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the InfiniBand Switch Components

The output for show-modules when all modules are active:

xmcli (tech)> show-modules


Module-Name Index Cluster-Name Index XEnv-Name Index Storage-Controller-Name Index Used-DAE-Row-Controllers Assigned-To-DAE-Controller Module-Type
Module-State
X1-SC1-R1 1 xbrick711-714 1 X1-SC1-E1 1 X1-SC1 1 ROUTER active
X1-SC1-C1 2 xbrick711-714 1 X1-SC1-E1 1 X1-SC1 1 CONTROL active
X1-SC1-D1 3 xbrick711-714 1 X1-SC1-E1 1 X1-SC1 1 X1-DAE-Row-Controller-L1, X1-DAE-Row-Controller-R2, X1-DAE-Row-Controller-L3, X1-DAE-Row-Controller-R4 DATA active
X1-SC1-R2 4 xbrick711-714 1 X1-SC1-E2 2 X1-SC1 1 ROUTER active
X1-SC1-C2 5 xbrick711-714 1 X1-SC1-E2 2 X1-SC1 1 CONTROL active
X1-SC1-D2 6 xbrick711-714 1 X1-SC1-E2 2 X1-SC1 1 X1-DAE-Row-Controller-L1, X1-DAE-Row-Controller-R2, X1-DAE-Row-Controller-L3, X1-DAE-Row-Controller-R4 DATA active
X1-SC2-R1 7 xbrick711-714 1 X1-SC2-E1 3 X1-SC2 2 ROUTER active
X1-SC2-C1 8 xbrick711-714 1 X1-SC2-E1 3 X1-SC2 2 CONTROL active
X1-SC2-D1 9 xbrick711-714 1 X1-SC2-E1 3 X1-SC2 2 X1-DAE-Row-Controller-R1, X1-DAE-Row-Controller-L2, X1-DAE-Row-Controller-R3, X1-DAE-Row-Controller-L4 DATA active
X1-SC2-R2 10 xbrick711-714 1 X1-SC2-E2 4 X1-SC2 2 ROUTER active
X1-SC2-C2 11 xbrick711-714 1 X1-SC2-E2 4 X1-SC2 2 CONTROL active
X1-SC2-D2 12 xbrick711-714 1 X1-SC2-E2 4 X1-SC2 2 X1-DAE-Row-Controller-R1, X1-DAE-Row-Controller-L2, X1-DAE-Row-Controller-R3, X1-DAE-Row-Controller-L4 DATA active
X2-SC1-R1 13 xbrick711-714 1 X2-SC1-E1 5 X2-SC1 3 ROUTER active
X2-SC1-C1 14 xbrick711-714 1 X2-SC1-E1 5 X2-SC1 3 CONTROL active
X2-SC1-D1 15 xbrick711-714 1 X2-SC1-E1 5 X2-SC1 3 X2-DAE-Row-Controller-L1, X2-DAE-Row-Controller-R2, X2-DAE-Row-Controller-L3, X2-DAE-Row-Controller-R4 DATA active
X2-SC1-R2 16 xbrick711-714 1 X2-SC1-E2 6 X2-SC1 3 ROUTER active
X2-SC1-C2 17 xbrick711-714 1 X2-SC1-E2 6 X2-SC1 3 CONTROL active
X2-SC1-D2 18 xbrick711-714 1 X2-SC1-E2 6 X2-SC1 3 X2-DAE-Row-Controller-L1, X2-DAE-Row-Controller-R2, X2-DAE-Row-Controller-L3, X2-DAE-Row-Controller-R4 DATA active
X2-SC2-R1 19 xbrick711-714 1 X2-SC2-E1 7 X2-SC2 4 ROUTER active
X2-SC2-C1 20 xbrick711-714 1 X2-SC2-E1 7 X2-SC2 4 CONTROL active
X2-SC2-D1 21 xbrick711-714 1 X2-SC2-E1 7 X2-SC2 4 X2-DAE-Row-Controller-R1, X2-DAE-Row-Controller-L2, X2-DAE-Row-Controller-R3, X2-DAE-Row-Controller-L4 DATA active
X2-SC2-R2 22 xbrick711-714 1 X2-SC2-E2 8 X2-SC2 4 ROUTER active
X2-SC2-C2 23 xbrick711-714 1 X2-SC2-E2 8 X2-SC2 4 CONTROL active
X2-SC2-D2 24 xbrick711-714 1 X2-SC2-E2 8 X2-SC2 4 X2-DAE-Row-Controller-R1, X2-DAE-Row-Controller-L2, X2-DAE-Row-Controller-R3, X2-DAE-Row-Controller-L4 DATA active
X3-SC1-R1 25 xbrick711-714 1 X3-SC1-E1 9 X3-SC1 5 ROUTER active
X3-SC1-C1 26 xbrick711-714 1 X3-SC1-E1 9 X3-SC1 5 CONTROL active
X3-SC1-D1 27 xbrick711-714 1 X3-SC1-E1 9 X3-SC1 5 X3-DAE-Row-Controller-L1, X3-DAE-Row-Controller-R2, X3-DAE-Row-Controller-L3, X3-DAE-Row-Controller-R4 DATA active
X3-SC1-R2 28 xbrick711-714 1 X3-SC1-E2 10 X3-SC1 5 ROUTER active
X3-SC1-C2 29 xbrick711-714 1 X3-SC1-E2 10 X3-SC1 5 CONTROL active
X3-SC1-D2 30 xbrick711-714 1 X3-SC1-E2 10 X3-SC1 5 X3-DAE-Row-Controller-L1, X3-DAE-Row-Controller-R2, X3-DAE-Row-Controller-L3, X3-DAE-Row-Controller-R4 DATA active
X3-SC2-R1 31 xbrick711-714 1 X3-SC2-E1 11 X3-SC2 6 ROUTER active
X3-SC2-C1 32 xbrick711-714 1 X3-SC2-E1 11 X3-SC2 6 CONTROL active
X3-SC2-D1 33 xbrick711-714 1 X3-SC2-E1 11 X3-SC2 6 X3-DAE-Row-Controller-R1, X3-DAE-Row-Controller-L2, X3-DAE-Row-Controller-R3, X3-DAE-Row-Controller-L4 DATA active
X3-SC2-R2 34 xbrick711-714 1 X3-SC2-E2 12 X3-SC2 6 ROUTER active
X3-SC2-C2 35 xbrick711-714 1 X3-SC2-E2 12 X3-SC2 6 CONTROL active
X3-SC2-D2 36 xbrick711-714 1 X3-SC2-E2 12 X3-SC2 6 X3-DAE-Row-Controller-R1, X3-DAE-Row-Controller-L2, X3-DAE-Row-Controller-R3, X3-DAE-Row-Controller-L4 DATA active
X4-SC1-R1 37 xbrick711-714 1 X4-SC1-E1 13 X4-SC1 7 ROUTER active
X4-SC1-C1 38 xbrick711-714 1 X4-SC1-E1 13 X4-SC1 7 CONTROL active
X4-SC1-D1 39 xbrick711-714 1 X4-SC1-E1 13 X4-SC1 7 X4-DAE-Row-Controller-L1, X4-DAE-Row-Controller-R2, X4-DAE-Row-Controller-L3, X4-DAE-Row-Controller-R4 DATA active
X4-SC1-R2 40 xbrick711-714 1 X4-SC1-E2 14 X4-SC1 7 ROUTER active
X4-SC1-C2 41 xbrick711-714 1 X4-SC1-E2 14 X4-SC1 7 CONTROL active
X4-SC1-D2 42 xbrick711-714 1 X4-SC1-E2 14 X4-SC1 7 X4-DAE-Row-Controller-L1, X4-DAE-Row-Controller-R2, X4-DAE-Row-Controller-L3, X4-DAE-Row-Controller-R4 DATA active
X4-SC2-R1 43 xbrick711-714 1 X4-SC2-E1 15 X4-SC2 8 ROUTER active
X4-SC2-C1 44 xbrick711-714 1 X4-SC2-E1 15 X4-SC2 8 CONTROL active
X4-SC2-D1 45 xbrick711-714 1 X4-SC2-E1 15 X4-SC2 8 X4-DAE-Row-Controller-R1, X4-DAE-Row-Controller-L2, X4-DAE-Row-Controller-R3, X4-DAE-Row-Controller-L4 DATA active
X4-SC2-R2 46 xbrick711-714 1 X4-SC2-E2 16 X4-SC2 8 ROUTER active
X4-SC2-C2 47 xbrick711-714 1 X4-SC2-E2 16 X4-SC2 8 CONTROL active
X4-SC2-D2 48 xbrick711-714 1 X4-SC2-E2 16 X4-SC2 8 X4-DAE-Row-Controller-R1, X4-DAE-Row-Controller-L2, X4-DAE-Row-Controller-R3, X4-DAE-Row-Controller-L4 DATA active

Performing the Post Replacement Procedures


After configuring the replaced component, it is necessary to perform the following post
replacement procedures:
 Checking for and clearing any active repeating alerts
 Generating and uploading a log bundle
 Checking the cluster’s health by running the XtremIO Health Check Script (HCS)
 Restoring all Notifiers
 Closing the tunnel between a Storage Controller and the XMS
For instructions on performing post configuration procedures, refer to Appendix F.

Note: For guidance on running the XtremIO Health-Check Script and on resolving its
output, refer to Dell EMC KB # 206076 (https://support.emc.com/kb/206076). If an
unexpected error is reported by the HCS, submit a standard Service Request to XtremIO
Global Technical Support.

Replacing the InfiniBand Switch 93


DELL EMC CONFIDENTIAL
Replacing the InfiniBand Switch Components

Replacing the InfiniBand Switch Power Supply Units



Priority Failure Analysis (Priority FA) is required only for XtremIO FRU replacements
involved in an outage (DU/DL).

InfiniBand Switches are equipped with two replaceable power supply units that work in a
redundant configuration. Either unit may be extracted without bringing down the system.

Note: Make sure that the power supply unit that you are NOT replacing is showing all
green, for both the power supply unit and System Status LEDs.

Tolerance
 Failure of a single InfiniBand Switch power supply unit does not affect the InfiniBand
Switch operation.
 Failure of both InfiniBand Switch power supply units will lead to an InfiniBand Switch
failure.

Accessing the XMS via a Cluster Storage Controller


Before replacing a defective component, a tunnel must be opened in order to access the
XMS via a Storage Controller, and be closed upon the procedure’s completion (when
access to XMS is no longer required). Once this is done, when handling a replacement
case on site, connect to the TECH port of a cluster’s Storage Controller, and access the
XMS. For instructions, refer to “Accessing the XMS via a Cluster Storage Controller” on
page 13.

Identifying the Defective InfiniBand Switch Power Supply Unit

To identify the defective InfiniBand Switch power supply unit, using the CLI:
1. Log in to the XMCLI as tech.
2. List the InfiniBand Switches status, using the following command:
show-infiniband-switches-psus cluster-id="<cluster name>"
Name Index Cluster-Name Index Location-Index Location Input-Power Lifecycle-State Power-Feed PSU-HW-Label
IB-SW1-PSU-L 1 xbrick711-713 1 1 left on healthy PWR-A PSU1
IB-SW1-PSU-R 2 xbrick711-713 1 2 right on failed PWR-B PSU2
IB-SW2-PSU-L 3 xbrick711-713 1 1 left on healthy PWR-A PSU1
IB-SW2-PSU-R 4 xbrick711-713 1 2 right on healthy PWR-B PSU2

3. Note the Name and PSU-HW-Label of the InfiniBand Switch power supply unit with a
non-healthy Lifecycle-State.

Checking the XtremIO Cluster Health


Before replacing the defective component, check the cluster’s health by using the XtremIO
Health-Check Script (HCS). For instructions, refer to “Checking the XtremIO Cluster Health”
on page 17.

94 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the InfiniBand Switch Components

Replacing the Defective InfiniBand Switch Power Supply Unit

To remove the defective InfiniBand Switch power supply unit:


1. Unlatch the power cord retainer, and remove the power cord from the power supply
unit.
2. Grasping the handle with your right hand, push the latch release with your thumb
while pulling the handle outward. As the power supply unit unseats, the power supply
unit status LEDs turn off.

Push Here Release Latch

3. Remove the power supply unit.

To insert the new InfiniBand Switch power supply unit:


1. Make sure that the mating connector of the new power supply unit is free of dirt
and/or obstacles.

Note: Do not attempt to insert a power supply unit with a power cord connected to it.

2. Insert the power supply unit by sliding it into the opening until a slight resistance is
felt.
3. Continue pressing the power supply unit until the latch snaps into place, confirming
proper installation.
4. Insert the power cord into the power supply unit connector, until the power cord
retainer is latched.

Note: The green power supply unit indicator should illuminate. If not, repeat the whole
procedure to extract the power supply unit, and re-insert it.

Note: Make sure that the latches are engaged and the tray is locked in its position.

Replacing the InfiniBand Switch Power Supply Units 95


DELL EMC CONFIDENTIAL
Replacing the InfiniBand Switch Components

Performing the Post Replacement Procedures


After configuring the replaced component, it is necessary to perform the following post
replacement procedures:
 Checking for and clearing any active repeating alerts
 Generating and uploading a log bundle
 Checking the cluster’s health by running the XtremIO Health Check Script (HCS)
 Restoring all Notifiers
 Closing the tunnel between a Storage Controller and the XMS
For instructions on performing post configuration procedures, refer to Appendix F.

96 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the InfiniBand Switch Components

Replacing the InfiniBand Switch Fan Units



Priority Failure Analysis (Priority FA) is required only for XtremIO FRU replacements
involved in an outage (DU/DL).

Tolerance
 Failure of one or more fan units does not affect the InfiniBand Switch operation, as
long as the ambient temperature is below 45° Celsius.
 If one or more fan units fail and the ambient temperature exceeds 45° Celsius, the
InfiniBand Switch fails.

Note: Operation without a fan unit should not exceed two minutes.
During a fan hot-swap procedure, if the LED indicator is OFF, the fan unit is
disconnected.

Note: Make sure that the fans have the air flow that matches the model number. An air
flow opposite to the system design will cause the system to operate at a higher (less
than optimal) temperature.

Accessing the XMS via a Cluster Storage Controller


Before replacing a defective component, a tunnel must be opened in order to access the
XMS via a Storage Controller, and be closed upon the procedure’s completion (when
access to XMS is no longer required). Once this is done, when handling a replacement
case on site, connect to the TECH port of a cluster’s Storage Controller, and access the
XMS. For instructions, refer to “Accessing the XMS via a Cluster Storage Controller” on
page 13.

Identifying the Defective InfiniBand Switch Fan Unit

To identify the defective InfiniBand Switch fan unit, using the CLI:
1. Log in to the XMCLI as tech.
2. List the InfiniBand Switch power supply unit status, using the following command:
show-infiniband-switches cluster-id="<cluster name>"

xmcli (tech)> show-infiniband-switches cluster-id="xbrick711-713"


Name Index Cluster-Name Index Location-Index ... ... Lifecycle-State ... ... FAN-Drawer-State FAN1-State FAN2-State FAN3-State FAN4-State IB-Switch-HW-Label Fan-Module1-HW-Label Fan-Module2-HW-Label Fan-Module3-HW-Label
Fan-Module4-HW-Label
IB-SW1 1 xbrick711-713 1 1 ... ... failed ... ... not_available failed healthy healthy healthy IBSW1 Fan1 Fan2 Fan3 Fan4
IB-SW2 2 xbrick711-713 1 2 ... ... healthy ... ... not_available healthy healthy healthy healthy IBSW2 Fan1 Fan2 Fan3 Fan4

3. Note the IB-Switch-HW-Label of the defective InfiniBand Switch fan unit.

Replacing the InfiniBand Switch Fan Units 97


DELL EMC CONFIDENTIAL
Replacing the InfiniBand Switch Components

Checking the XtremIO Cluster Health


Before replacing the defective component, check the cluster’s health by using the XtremIO
Health-Check Script (HCS). For instructions, refer to “Checking the XtremIO Cluster Health”
on page 17.

Replacing the Defective InfiniBand Switch Fan Unit

To remove the defective InfiniBand Switch fan unit:


1. If necessary, from the rear side of the Storage Controller that is adjacent to the
component you are replacing, tilt the cable management bracket's tray (up/down) to
gain better access. Simultaneously pull the latches on the left and right side of the
cable management bracket, and then push the tray either up or down.
2. Unseat the fan unit by grasping the handle with your right hand and pushing the latch
release with your thumb while pulling the handle outward. As the fan unit unseats, the
fan unit status LEDs turn off.
3. Remove the fan unit.

Push Here Push Here

To install the new InfiniBand Switch fan unit:


1. Make sure that the mating connector of the new unit is free of dirt and/or obstacles.
2. Insert the fan unit by sliding it into the opening until slight resistance is felt. Continue
pressing the fan unit until it seats completely.


The green Fan Status LED should illuminate. If not, extract the fan unit and reinsert it.
After two unsuccessful attempts to install the fan unit, contact XtremIO Global
Technical Support for guidance and directions. No further action should be taken
without explicit direction from XtremIO Global Technical Support.

3. If you initially tilted the cable management bracket's tray (up/down), return it to its
original position, by pulling the latches (on the left and right side of the bracket) until
the latches click in.

Note: Make sure that the latches are engaged and the tray is locked in its position.

98 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the InfiniBand Switch Components

Performing the Post Replacement Procedures


After configuring the replaced component, it is necessary to perform the following post
replacement procedures:
 Checking for and clearing any active repeating alerts
 Generating and uploading a log bundle
 Checking the cluster’s health by running the XtremIO Health Check Script (HCS)
 Restoring all Notifiers
 Closing the tunnel between a Storage Controller and the XMS
For instructions on performing post configuration procedures, refer to Appendix F.

Replacing the InfiniBand Switch Fan Units 99


DELL EMC CONFIDENTIAL
Replacing the InfiniBand Switch Components

100 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL

CHAPTER 5
Replacing the Battery Backup Units

The Battery Backup Unit replacement procedure should be performed, using the XtremIO
Technician Advisor utility, following a Service Request (SR) determined by XtremIO Global
Technical Support. If you have any questions or encounter problems, contact XtremIO
Global Technical Support. Technician Advisor is initially used to identify defective Battery
Backup Units on the cluster, and is then used to replace each Battery Backup Unit that is
identified as defective.

This chapter includes the following topics:


 Replacing a Battery Backup Unit (BBU) Using the Technician Advisor Utility .......... 102
 Replacing a Serial Communication Cable for a 5P 1550i BBU ................................ 105

Replacing the Battery Backup Units 101


DELL EMC CONFIDENTIAL
Replacing the Battery Backup Units

Replacing a Battery Backup Unit (BBU) Using the Technician


Advisor Utility

Priority Failure Analysis (Priority FA) is required only for XtremIO FRU replacements
involved in an outage (DU/DL).


The Battery Backup Unit is heavy and should be removed from and installed into the rack
by two people. To avoid personal injury and/or damage to the equipment, do not attempt
to lift or install the BBU without a mechanical lift and/or help from another person.


If RecoverPoint is connected to an XtremIO cluster, notify the customer to pause the
activity of Consistency Groups that are configured to replicate with the cluster, using
RecoverPoint native replication, during this FRU procedure.
If the customer requires assistance to pause in RecoverPoint, contact RecoverPoint Global
Technical Support.
If the customer is unable to perform this operation, do not perform this FRU procedure and
contact XtremIO Global Technical Support before taking any further action.
For further details, provide the customer with Dell EMC KB# 479972
(https://support.emc.com/kb/479972).

Battery Backup Unit Types


Battery Backup Units of an XtremIO cluster can be of one of the following types:
 5P 1550i R
 1550 Evolution
Both of these BBU types are replaced with a 5P BBU FRU Kit. One 5P BBU FRU Kit must be
used for each BBU replacement that is required.

Note: Ordering a 5P BBU FRU Kit is done using 100-586-122-00 (Eaton 5P 078-000-122-xx
FRU)

5P Battery Backup Unit Kit Sub-Parts


Each 5P BBU Kit includes the following sub-parts:
 1 X 5P BBU P/N 078-000-122-01
 1 X Wire Bracket and Bail Latch Kit P/N 042-027-015 (kit includes 1 X Wire Bail Latch
Bracket, 2 X Bail Latches, 2 X Bail Latch Bracket screws)
 4 X Dummy Plugs: P/N 040-001-255 (inserted in the intended receptacles prior to
shipping)
 Front Label (attached to the BBU’s front side prior to shipping)

102 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Battery Backup Units

Tolerance
 Failure of more than half of the BBUs in the same cluster results in loss of service.

Accessing the XMS via a Cluster Storage Controller


Before replacing a defective component, a tunnel must be opened in order to access the
XMS via a Storage Controller, and be closed upon the procedure’s completion (when
access to XMS is no longer required). Once this is done, when handling a replacement
case on site, connect to the TECH port of a cluster’s Storage Controller, and access the
XMS. For instructions, refer to “Accessing the XMS via a Cluster Storage Controller” on
page 13.

Identifying the Defective BBU

To identify the defective BBU, using the CLI:


1. Log in to the XMCLI as tech.
2. List the BBUs status, using the following command:
show-bbus cluster-id="<cluster name>"

Note: It is recommended to use the cluster name (and not the cluster ID) as the cluster
identifier in cluster-related XMCLI commands.

Note: The cluster-id parameter is not mandatory for single cluster configurations.

3. Note the Index of the BBUs with a non-healthy state.

To identify the defective BBU, using the GUI:


 From the GUI, view the Inventory; the defective BBU appears in orange.

Note: Make sure to close the tunnel between the Storage Controller and XMS when access
to XMS is no longer required, as described in “Closing the Tunnel Between a Storage
Controller and the XMS” on page 151.

Replacing a Battery Backup Unit (BBU) Using the Technician Advisor Utility 103
DELL EMC CONFIDENTIAL
Replacing the Battery Backup Units

Replacing a BBU

Before performing a Battery Backup Unit replacement procedure, ensure that the BBU
power cables (box and PDU) are plugged in tightly (not loose) at both ends, as power
cables have a tendency sometimes to lose connection.

The Battery Backup Unit replacement procedure should be performed, using the XtremIO
Technician Advisor utility, following a Service Request (SR) determined by XtremIO Global
Technical Support. If you have any questions or encounter problems, contact XtremIO
Global Technical Support.

Note: For details on the XtremIO Technician Advisor utility, refer to the XtremIO Technician
Advisor Utility User Guide, which is posted in the XtremIO SolVe Generator, under XtremIO
> XtremIO X1 (XIOS 2.x, 3.x, 4.x) > Service Scripts and Utilities > XtremIO Technician Advisor
> Install XtremIO Technician Advisor.

Note: For information on essential preparations required for using the XtremIO Technician
Advisor utility at a customer's site prior to your arrival, refer to Appendix G.

Note: If the XtremIO Technician Advisor Utility User Guide instructs that the Technician
Advisor utility cannot be used to replace Battery Backup Units on your cluster, contact
XtremIO Global Technical Support for directions on how to manually replace the Battery
Backup Units.


Replacing a Battery Backup Unit manually may lead to data-loss if not performed
correctly! Therefore, every effort must be made to use Technician Advisor to automatically
replace a cluster’s Battery Backup Unit.

104 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Battery Backup Units

Replacing a Serial Communication Cable for a 5P 1550i BBU



Priority Failure Analysis (Priority FA) is required only for XtremIO FRU replacements
involved in an outage (DU/DL).

Note: These instructions are not applicable for 1550 Evolution BBU serial communication
cables.


Incorrect replacement of 5P 1550i BBU serial communication cables may result in
damage to connectors and/or component ports.

5P 1550i Battery Backup Units are supplied with DB9-RJ45 serial data cables
accompanied by DB9-RJ50 adapters, or with RJ45-RJ50 serial communication cables with
labeling clearly indicating which devices and ports to plug into, depending on the XtremIO
hardware version in use.
A defective cable and/or cable adapter of this type must be replaced with a new RJ45-RJ50
serial communication cable.

Note: Replacement RJ45-RJ50 serial communication cables may not be labeled to indicate
which devices and ports to plug into.

This section describes recabling a 5P 1550i BBU to a Storage Controller using a


replacement RJ45-RJ50 serial communication cable.

Tolerance
 In single X-Brick clusters, a failure of both communication cables (one for each BBU)
results in loss of service.
 In multiple X-Brick clusters, a failure of more than half of the overall communication
cables in the cluster results in loss of service.

Accessing the XMS via a Cluster Storage Controller


Before replacing a defective component, a tunnel must be opened in order to access the
XMS via a Storage Controller, and be closed upon the procedure’s completion (when
access to XMS is no longer required). Once this is done, when handling a replacement
case on site, connect to the TECH port of a cluster’s Storage Controller, and access the
XMS. For instructions, refer to “Accessing the XMS via a Cluster Storage Controller” on
page 13.

Replacing a Serial Communication Cable for a 5P 1550i BBU 105


DELL EMC CONFIDENTIAL
Replacing the Battery Backup Units

Verifying Failed Serial Communication Cables

To verify whether failed serial communication cables exist within the cluster:
1. Log in to the XMCLI as tech.
2. Run the following command:
show-bbus cluster-id="<cluster name>"

Name Index Model Serial-Number Power-Feed State Connectivity-State Enabled-State Input Battery-Charge BBU-Load Voltage FW-Version Part-Number Brick-Name Index Cluster-Name Index ...
X1-BBU 1 Evolution 1550 DV0P2308A PWR-A healthy connected enabled on 100 24 210 9901DC 078-000-114 X1 1 xtremio-svt-003 1 ...
X2-BBU 2 Evolution 1550 DV0P23078 PWR-B healthy sc_2_disconnected enabled on 100 22 211 9901DC 078-000-114 X2 2 xtremio-svt-003 2 ...

Disabling All Notifiers


To disable all Notifiers:
1. Log in to the XMCLI as tech.
2. Disable all Notifiers, using the following command:
disable-notifiers

xmcli (tech)> disable-notifiers


Event notifiers were disabled

Replacing the Defective Cable


To replace the defective cable:
1. If necessary, from the rear side of the connecting Storage Controller, tilt the cable
management bracket's tray (up/down) to gain better access, by simultaneously
pulling the latches on the left and right, and then pushing the tray downwards, as
described on page 33.
2. Disconnect the RJ45 end of the defective communication cable from the 10101 port of
the Storage Controller.

10101

3. Disconnect the RJ50 end of the defective communication cable (or cable adapter) from
the COM (R) port of the BBU.

COM (R)

Note: Discard the defective communication cable/cable and adapter.

106 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Replacing the Battery Backup Units

4. Connect the RJ45 end of the replacement communication cable (as indicated in the
figure below) to the 10101 port of the Storage Controller.

RJ45 Connector RJ50 Connector (with 'Plug Boot')

5. Connect the RJ50 end of the replacement communication cable (as indicated in the
figure above) to the COM (R) port of the BBU.

Note: Verify that the RJ50 end of the cable is connected to the BBU COM (R) port, and
that the RJ45 end of the cable is connected to the Storage Controller 10101 port.

6. If you initially tilted the cable management bracket's tray (up/down) of the connecting
Storage Controller, return it to its original position by pulling the latches (on the left
and right sides of the bracket) until the latches click in.

Note: Make sure that the latches are engaged and the tray is locked in position.

Verifying Replaced Serial Communication Cables

To verify replaced serial communication cables:


1. Log in to the XMCLI as tech.
2. Run the following command:
show-bbus cluster-id="<cluster name>"

Name Index Model Serial-Number Power-Feed State Connectivity-State Enabled-State Input Battery-Charge BBU-Load Voltage FW-Version Part-Number Brick-Name Index Cluster-Name Index ...
X1-BBU 1 Evolution 1550 DV0P2308A PWR-A healthy connected enabled on 100 24 210 9901DC 078-000-114 X1 1 xtremio-svt-003 1 ...
X2-BBU 2 Evolution 1550 DV0P23078 PWR-B healthy connected enabled on 100 22 211 9901DC 078-000-114 X2 2 xtremio-svt-003 2 ...

Performing the Post Replacement Procedures


After configuring the replaced component, it is necessary to perform the following post
replacement procedures:
 Checking for and clearing any active repeating alerts
 Generating and uploading a log bundle
 Checking the cluster’s health by running the XtremIO Health Check Script (HCS)
 Restoring all Notifiers
 Closing the tunnel between a Storage Controller and the XMS
For instructions on performing post configuration procedures, refer to Appendix F.

Replacing a Serial Communication Cable for a 5P 1550i BBU 107


DELL EMC CONFIDENTIAL
Replacing the Battery Backup Units

108 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL

APPENDIX A
Software Re-Installation

This section provides instructions for downloading and re-installing a software image on
the Storage Controller and XMS.
This section includes the following topics:
 Writing the XtremIO Rescue Image to a USB Drive.................................................. 110
 Re-Installing a Storage Controller .......................................................................... 112
 Re-Installing a Physical XMS ................................................................................. 114

Software Re-Installation 109


DELL EMC CONFIDENTIAL
Software Re-Installation

Writing the XtremIO Rescue Image to a USB Drive


Before writing the XtremIO rescue image to a USB drive, perform the following steps:

Note: Verify that you have a USB drive that is at least 2GB in capacity.

1. Locate the XtremIO Rescue Image from the XtremIO Global Technical Support page in
support.emc.com.
For details on the XtremIO Storage Controller Rescue Image or XtremIO virtual XMS
Rescue Image to download from the support page, refer to the latest Release Notes for
the XtremIO installed version.

Note: When downloading a software package, access the Dell EMC Support page and
verify that the MD5/SHA-256 checksum of the downloaded package matches the MD5
or SHA-256 checksum that appears on the support page for that package.

2. Download the image to the local machine where the USB drive will be created.

Note: Before you proceed, verify that the USB drive is available.

To write the XtremIO image to a USB drive (on Windows 7):


1. Download and unpack the Win32 Disk Imager utility
(http://sourceforge.net/projects/win32diskimager/).
2. Launch the Win32 Disk Imager utility on the local machine.
3. Insert the USB drive into the USB port on the Windows machine.
4. Under Image File in the Win32 Disk Imager dialog box, click the folder icon and select
the XtremIO Rescue Image file you downloaded earlier.
5. Under Device, click the drop-down menu and select the device drive letter for the USB
drive.

Note: Use Window Explorer to make sure that the correct drive letter is selected.

110 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Software Re-Installation

6. Click Write to write the image file to the USB Drive; a warning appears to indicate that
existing data on the selected drive will be overwritten.

7. Verify that the correct drive letter is selected and click Yes to confirm.
8. Follow the write operation progress. When the operation is completed, a message
appears, indicating that the write was successful.

9. From the Windows Notification Area, click Safely Remove Hardware and Eject Media.

10. From the menu, select Eject USB drive.

Note: The menu option includes the USB drive’s brand name (e.g. "Eject Cruzer Blade"
appears when SanDisk Cruzer Blade USB drive is used).

Wait for the "Safe to remove hardware" message to appear in the Notification Area and
remove the USB drive.

Writing the XtremIO Rescue Image to a USB Drive 111


DELL EMC CONFIDENTIAL
Software Re-Installation

Re-Installing a Storage Controller


Note: Unless instructed otherwise in this document, always consult with XtremIO before
reinstalling the Storage Controller.

An X-Brick Storage Controller image is available for USB flash drives to restore a Storage
Controller to its original state.
Extract the image to a USB flash drive (refer to “Writing the XtremIO Rescue Image to a USB
Drive” on page 110).

Note: Before starting the procedure, verify that you have a KVM or keyboard and monitor
connected.


Before using a USB flash drive with a Storage Controller Rescue Image, validate that the
USB flash drive was successfully written with the correct Storage Controller Rescue Image
that is running on the cluster. For further details, refer to “Writing the XtremIO Rescue
Image to a USB Drive” on page 110.

To re-install a Storage Controller:


1. Connect the USB flash drive to the Storage Controller USB port.
2. Disconnect the InfiniBand and SAS cables to the Storage Controller.

Note: It is important to keep the affected Storage Controller isolated from the rest of
the XtremIO cluster, throughout the re-installation procedure.

3. Power-cycle the Storage Controller by unplugging and re-connecting its two power
cables.
4. When the Storage Controller is booted-up, select Install XtremApp from the GRUB
menu.

112 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Software Re-Installation

5. Wait for the installation to complete and for the Storage Controller to reboot.
6. Remove the USB drive.
7. Reconnect the InfiniBand and SAS cables to the Storage Controller.
For cabling guidelines refer to XtremIO Storage Array Hardware Installation and
Upgrade Guide.

Repeating Alert Counters


You should check for active repeating alerts. If repeated alerts exist, it is necessary to clear
the alerts in order to verify whether the replacement procedure remedied the component
failure.

To check for repeating alerts:


 Run the following command: show-alerts

xmcli (tech)> > show-alerts


Index Description Severity Raise-Time ...
34 Repeating: Storage Controller InfiniBand port 2 is down. major Mon Apr 18 11:22:03 2016.....
33 Repeating: InfiniBand port 2: link status is not healthy. The port state is down. major Mon Apr 18 11:22:03 2016.....
xmcli (tech)>

If the response shows alerts with the “repeating” text in the prefix, it is necessary to
clear the alert counters.

Note: Clearing alert counters clears all of the system’s alerts. In case of multiple alerts,
make a note of the components with repeated active alerts, prior to clearing alert
counters.

To clear alert counters:


1. Log in to the XMCLI as tech.
2. Clear all alert counters, using the following command:
clear-alert-table-counters

Re-Installing a Storage Controller 113


DELL EMC CONFIDENTIAL
Software Re-Installation

Re-Installing a Physical XMS


Note: Always consult with XtremIO Technical Support before re-installing the physical
XMS.

An XMS image is available for USB flash drives to install physical XMS node.
Extract the image to a USB flash drive (refer to “Writing the XtremIO Rescue Image to a USB
Drive” on page 110) and connect the USB flash drive to the XMS USB port.

Note: Before starting the procedure, verify that you have a KVM or keyboard and monitor
connected.


Before using a USB flash drive with an XMS Rescue Image, validate that the USB flash
drive was successfully written with the correct XMS Rescue Image that is running on the
cluster. For further details, refer to “Writing the XtremIO Rescue Image to a USB Drive” on
page 110.

To re-install the physical XMS:


1. If necessary, from the rear side of the Storage Controller that is adjacent to the
component you are replacing, tilt the cable management bracket's tray (up/down) to
gain better access.
2. Power-cycle the XMS by unplugging and reconnecting its two power cables.
3. If you initially tilted the cable management bracket's tray (up/down), return it to its
original position, by moving the cable management tray (up/down) until the latches
are aligned and an audible click is heard.
4. As the XMS powers up, press F6 to enter the Boot Device menu.
5. When prompted, type the BIOS password to display the Boot Device menu.

Note: If the Boot Device menu is not displayed, F6 was pressed too late. Go back to
step 1 and repeat the procedure.

6. In the Boot Device menu, select USB device.

Note: The menu option includes the USB drive’s brand name (e.g. "Eject Cruzer Blade"
appears when SanDisk Cruzer Blade USB drive is used).

114 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Software Re-Installation

7. When the server is booted-up, select Install XMS from the GRUB menu.

8. Wait for the installation to complete and for the XMS to reboot.
9. Remove the USB drive.

Re-Installing a Physical XMS 115


DELL EMC CONFIDENTIAL
Software Re-Installation

116 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL

APPENDIX B
Using LEDs to Identify Hardware Components

This section provides instructions for locating LEDs through CLI commands and using the
GUI.
This section includes the following topics:
 Hardware Components’ LEDs ................................................................................ 118
 Using the GUI to Activate Identification LEDs......................................................... 119
 Using the CLI to Activate the Identification LEDS ................................................... 120

Using LEDs to Identify Hardware Components 117


DELL EMC CONFIDENTIAL
Using LEDs to Identify Hardware Components

Hardware Components’ LEDs


Many of the XtremIO Storage Array hardware components are equipped with two LED types
that enable you to monitor the components’ health:
 Identification LED - Used to identify a component in the cluster.
 Status LED - Used to indicate the status of the component.
In addition to the actual LEDs on the physical hardware components, identical graphical
representation of the LEDs appear in the GUI’s Inventory, Graphical View image.
The possible states of the LEDs are:
 Off
 On (beacon)
Table 6 provides details of the hardware components’ LEDs.

Table 6 Hardware Components’ LEDs

Component Identification LED Status LED

Storage Controller Yes Yes

Storage Controller SSD Yes Yes

Storage Controller HDD Yes Yes

Storage Controller PSU & Fan No Yes

DAE Yes Yes

DAE SSD Yes Yes (called "Data LED")

DAE Controller Yes Yes

DAE PSU & Fan No Yes

Battery Backup Unit No Yes

InfiniBand Switch No Yes

InfiniBand Switch PSU No Yes

InfiniBand Switch Fan No Yes

Physical XMS No Yes

Physical XMS PSU & Fan No Yes

118 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Using LEDs to Identify Hardware Components

Using the GUI to Activate Identification LEDs


You can identify a component in the cluster by turning on its identification LED, or, if the
component has failed and does not respond, by turning on the LEDs of all other
components (all but the selected component).

To turn on a component’s identification LED:


1. In the Inventory, hover the mouse pointer over the relevant hardware component and
right-click to open the drop-down menu.
2. Select Turn On Identification LED for <component’s name>; a message appears, stating
that the component’s LED will be turned On/Off.
3. Click OK.

Note: If the component’s identification LED is already turned on, a check sign appears
next to the Turn On Identification LED option and the message box that follows states
that the LED will be turned off.

To turn all other identification LEDs on or off:


1. In the Inventory, hover the mouse pointer over the relevant hardware component and
right-click to open the drop-down menu.
2. Select Change all other <component type> Identification LEDs.

3. In the Change All Other Identification LEDs dialog box, select the desired state of the
LEDs (On or Off) and click OK; LEDs of all components, except for the LED of the
component you want to identify, change their state.

Using the GUI to Activate Identification LEDs 119


DELL EMC CONFIDENTIAL
Using LEDs to Identify Hardware Components

Using the CLI to Activate the Identification LEDS


You can identify a component in the cluster by turning on its identification LED, or, if the
component has failed and does not respond, by turning on the LEDs of all other
components (all but the selected component).
The control-led and show-leds commands are used for activating the LEDs.

control-led
The control-led command beacons the identification LED.

Input Parameter Description Value Mandatory

cluster-id Cluster ID id: name or index No

entity FRU 'SSD', 'DAEController', 'LocalDisk', Yes


'StorageController', 'DAE'

inverse-mode Apply on all except for N/A No


the specified one.

led-mode The desired LED mode 'On'1, ’Blinking’2 or 'Off' Yes

object-id-list Object ID list List of IDs: name or index Yes


if class=node, format is ["X1-N1",
X1-N2"]
1. On applies to 'LocalDisk' and 'StorageController' only.
2. Blinking applies to 'SSD', 'DAEController' and 'DAE' only.

LEDs Format Names


The format names for the objects that are returned by the control-led and show-led
commands are shown in the table below.

Entity Name Format Example

DAE X1-DAE

DAEController X1-DAE-LCC-A

LocalDisk X1-SC1-LocalDisk1

StorageController X1-SC1

SSD wwn-0x5000cca02b0555dc

Note: ’1’ in X1 represents the X-Brick number.

Note: It is possible to have SC1, SC2 and/or LCC-A, LCC-B, etc. (per X-Brick).

120 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Using LEDs to Identify Hardware Components

show-leds
The show-leds command displays the values for the identification and status LEDs.

Output Parameter Description

Entity The type of the entity represented by the LED

Name The name of the entity represented by the LED

Index The index of the entity represented by the LED

Identify-Beacon The identification LED status ('On', ’Blinking’ or 'Off')

Status-Beacon The status LED status ('On', ’Blinking’ or 'Off')

Using the CLI to Activate the Identification LEDS 121


DELL EMC CONFIDENTIAL
Using LEDs to Identify Hardware Components

122 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL

APPENDIX C
Priority Failure Analysis

Priority Failure Analysis (Priority FA) is required only for XtremIO FRUs involved in an
outage (DU/DL).

This section provides instructions for shipping failed hardware parts to Dell EMC for
Failure Analysis (FA).
When Failure Analysis should be performed, the failed parts should be shipped to Dell
EMC via FedEx.

To send a failed hardware component to Dell EMC for analysis:


1. Use one of the following shipping addresses:
• For returns inside the USA, Canada and Mexico:
Dell EMC Corporation
111 Constitution Blvd
Franklin, MA 02038, USA
Attn: Bob Pontes, Tel: 508-435-1000
• For all other returns:
Dell EMC Information Systems International
C/O WiseTek Solutions Ltd.
IDA Business and Technology Park
Carrigtwohill, Co. Cork, Ireland
Attn: Daniel O’ Leary, Tel: +353-21 4945888
2. Set the “Priority FA” flag in the CSI debrief and add in the debrief “Please route to
SBMT 3rd party, 50 Franklin, SLOC ST22”.
This step is critical to the FA process, and if not done correctly it will hinder the
product group’s ability to root cause failures.
3. Update the Priority FA ticket with the AWB# for tracking purposes and reply all with all
available tracking numbers (FedEx and Priority tag).

Priority Failure Analysis 123


DELL EMC CONFIDENTIAL
Priority Failure Analysis

124 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Manually Replacing the Storage Controllers

APPENDIX D
Manually Replacing the Storage Controllers

This section provides procedures for manually replacing defective Storage Controllers
(without the use of the Technician Advisor utility).


This manual installation should only be performed in situations where the Technician
Advisor utility cannot be used.

This section includes the following topic:


 Replacing a Storage Controller Manually ............................................................... 126

125
DELL EMC CONFIDENTIAL
Manually Replacing the Storage Controllers

Replacing a Storage Controller Manually



Priority Failure Analysis (Priority FA) is required only for XtremIO FRU replacements
involved in an outage (DU/DL).


The manual Storage Controller replacement procedure should be performed following a
Service Request (SR) determined by XtremIO Global Technical Support.


If RecoverPoint is connected to an XtremIO cluster, notify the customer to pause the
activity of Consistency Groups that are configured to replicate with the cluster, using
RecoverPoint native replication, during this FRU procedure.
If the customer requires assistance to pause in RecoverPoint, contact RecoverPoint Global
Technical Support.
If the customer is unable to perform this operation, do not perform this FRU procedure and
contact XtremIO Global Technical Support before taking any further action.
For further details, provide the customer with Dell EMC KB# 479972
(https://support.emc.com/kb/479972).

Accessing the XMS via a Cluster Storage Controller


Before replacing a defective component, a tunnel must be opened in order to access the
XMS via a Storage Controller, and be closed upon the procedure’s completion (when
access to XMS is no longer required). Once this is done, when handling a replacement
case on site, connect to the TECH port of a cluster’s Storage Controller, and access the
XMS. For instructions, refer to “Accessing the XMS via a Cluster Storage Controller” on
page 13.

Identifying the Defective Storage Controller


For instructions on identifying the defective Storage Controller, refer to “Identifying the
Defective Storage Controller” on page 21.


Before proceeding to replace the defective Storage Controller, contact XtremIO Global
Technical Support for guidance and directions. No further action should be taken without
explicit direction from XtremIO Global Technical Support.

Confirming the Open Network Ports for Storage Controller Replacement


For instructions to confirm the open network ports for Storage Controller replacement,
refer to “Confirming the Open Network Ports for Storage Controller Replacement” on
page 23.

126 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Manually Replacing the Storage Controllers

Checking the XtremIO Cluster Health


Before replacing the defective component, check the cluster’s health by using the XtremIO
Health-Check Script (HCS). For instructions, refer to “Checking the XtremIO Cluster Health”
on page 17.

Replacing the Defective Storage Controller



Specific clusters are compatible with specific Storage Controllers and must therefore be
matched accordingly, as described in “Installing a Compatible Storage Controller” on
page 25.


Do not remove the defective Storage Controller until the new Storage Controller is
configured by XtremIO Global Technical Support and is ready to take over.

For further details on replacement Storage Controller P/Ns according to XtremIO cluster
model numbers and installed XtremIO versions, refer to the XtremIO Hardware
Compatibility Matrix on XtremIO SolVe (Solve Desktop > XtremIO Generator > XtremIO X1
(XIOS 2.x, 3.x, 4.x) > FRU Replacement Procedures > XtremIO Hardware Compatibility
Matrix).

To remove the defective Storage Controller:


1. Log in to the XMCLI as tech.
2. XtremIO Global Technical Support will deactivate the defective Storage Controller.
3. Verify that the deactivation process is complete, by running the following command:
show-storage-controllers cluster-id ="<cluster name>"
Verify that the value of the Enabled-State output parameter is user_disabled.
4. If necessary, from the rear side of the Storage Controller that is adjacent to the
component you are replacing, tilt the cable management bracket's tray (up/down) to
gain better access. Simultaneously pull the latches on the left and right sides of the
cable management bracket, and then push the tray either up or down.

Note: If there are two Storage Controllers adjacent to each other, first tilt the cable
management bracket's tray furthest from the component being replaced and then tilt
the tray of the other Storage Controller.

Replacing a Storage Controller Manually 127


DELL EMC CONFIDENTIAL
Manually Replacing the Storage Controllers

5. Disconnect all cables from the back of the Storage Controller.

Note: Make sure that all cables are clearly labeled before disconnecting them from the
Storage Controllers. Do not proceed with the replacement procedure until all cables
that are connected to the Storage Controller are labeled.

Note: The disconnected cables can remain fastened to the cable management bracket
during the Storage Controller replacement procedure.

6. If required, release the cables from the cable tray of the cable management bracket
(mounted on the rear side of the Storage Controller) by releasing its cable straps.

7. Pull the tabs on both sides of the cable management bracket to release the bracket
from the Storage Controller’s inner rail.

128 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Manually Replacing the Storage Controllers

8. Pull the cable management bracket out and remove it from the Storage Controller.

9. Remove the bezel that covers the front of the server as follows:
a. If the bezel is locked, unlock the bezel with the provided key.
b. Simultaneously press the tabs on both sides of the bezel to release it from its
latches, then pull the bezel off the component.

Replacing a Storage Controller Manually 129


DELL EMC CONFIDENTIAL
Manually Replacing the Storage Controllers

10. Remove the stabilizing screw behind the latch bracket on each side.

Note: A JIS screwdriver may be required if the rails are from an older version.

11. If a shipping bracket is installed directly above or below the server, remove it to
prevent damage to the foam padding.
12. Pull the server forward until it locks in place, then, slide the blue disconnect tabs
forward to release the inner rails from the slide rails.

130 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Manually Replacing the Storage Controllers

13. Remove each inner rail as follows:


a. On the middle of the inner rail, push in and hold the metal latch.
b. Push the rail forward to release the connection studs from the small end of the rail
notches.
c. When the connections studs are in the large end of the rail notches, release the
metal latch.
d. Pull the inner rails away from the server.


Execute the following procedure to install the new Storage Controller only when requested
by XtremIO Global Technical Support.

To install the new Storage Controller:


1. Attach an inner rail to each side of the server as follows:
a. Align the large end of the rail notches on the inner rail with the connection studs on
the side of the server.
b. Push the flat side of the inner rail onto the connection studs.
c. Slide the inner rail backwards along the server until the studs fit securely into the
small end of the rail notches.
An audible click indicates that the rail is secure.
2. From the front of the cabinet, align the inner rails that are attached to the server with
the channels on the inside of the slide rails.
3. Slide the server into the slide rails and push the server into the cabinet.
An audible click indicates that the slide rails are engaged and locked.

Replacing a Storage Controller Manually 131


DELL EMC CONFIDENTIAL
Manually Replacing the Storage Controllers

4. On the outside of each rail assembly, slide the blue disconnect tab forward to unlock
the server, and push the server completely into the cabinet.

5. If you removed a shipping bracket directly above or below the server, reinstall it.
6. To further secure the rail assembly and server in the cabinet, insert and tighten a small
stabilizer screw directly behind each bezel latch.
7. From the rear side of the Storage Controller, align the rails of the cable management
bracket with the server's inner rails.

132 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Manually Replacing the Storage Controllers

8. Insert the rails of the cable management bracket onto the inner rails of the Storage
Controller.

9. Push to slide in the cable management bracket until an audible click is heard. This
indicates that the cable management bracket and the Storage Controller rails are
engaged and locked.
10. Tilt the cable tray down by simultaneously pulling both latches, on the left and right
sides of the cable management bracket, and then pushing the tray downwards.

Note: If there are two Storage Controllers adjacent to each other, first tilt the cable
management bracket's tray furthest from the component being replaced and then tilt
the tray of the other Storage Controller.

Replacing a Storage Controller Manually 133


DELL EMC CONFIDENTIAL
Manually Replacing the Storage Controllers

11. Connect the MGMT network cable to the Storage Controller’s " 1" port (leftmost
port), and connect the InfiniBand, SAS, LAN and COM cables.

Note: Leave the FC/iSCSI cables disconnected until you are instructed to connect
them.

2
1

Note: Make sure that the InfiniBand, SAS, LAN and COM cables are properly
connected, before connecting the two power cables to the Storage Controller, and
powering on the Storage Controller.

12. Connect the two power cables to the Storage Controller.


13. Upon receiving the instruction from XtremIO Global Technical Support, press the
Power button to power up the Storage Controller.

Configuring the Replaced Storage Controller



To configure the new Storage Controller, contact XtremIO Global Technical Support.

Fastening the Storage Controller Cables to the Cable Management Bracket

Note: If the cables are properly fastened to the cable management bracket, ignore steps 1
and 2, and proceed to step 3.

To fasten the Storage Controller cables to the cable management bracket:


1. Place the Storage Controller cables in the tray of the cable management bracket and
route them to the left and right of the tray according to their direction towards the
sides of the rack.

134 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Manually Replacing the Storage Controllers

2. Fasten the cable straps.

3. Lift the cable tray, while pulling the latches (on the left and right sides of the bracket)
until the latches click in.

Note: Make sure that the latches are engaged and the tray is locked in position.

The figure below shows an example of the installed cable management bracket, with
the cables strapped to the tray.

Replacing a Storage Controller Manually 135


DELL EMC CONFIDENTIAL
Manually Replacing the Storage Controllers

Installing the Bezel

To install the front bezel:


1. Pushing on the ends (not the middle) of the bezel, press the bezel onto the latch
brackets until it snaps into place.
2. Lock the bezel with the provided key and store the key in a secure place.

Performing the Post Replacement Procedures


After configuring the replaced component, it is necessary to perform the following post
replacement procedures:
 Checking for and clearing any active repeating alerts
 Generating and uploading a log bundle
 Checking the cluster’s health by running the XtremIO Health Check Script (HCS)
 Restoring all Notifiers
 Closing the tunnel between a Storage Controller and the XMS
 Only if priority Failure Analysis is required, send the defective component to Dell EMC
for Priority Failure Analysis (Priority FA)
For instructions on performing post configuration procedures, refer to Appendix F.

136 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Manually Replacing the Storage Controllers

Removing the Old Storage Controller Disks


If the customer has a Disk Retention Agreement with Dell EMC, you should remove the
following disks (as shown below) from the replaced Storage Controller and give them to
the customer:
 2 x HDDs
 2 x SSDs

HDDs

SSDs

To remove each of the Storage Controller’s disk assemblies:


1. Press the green button (A) on the left side of the disk drive assembly to unlock the
module’s lever.

2. Pull the lever open and slide the disk drive assembly (B) from the server.

Note: Once all four disks have been removed, the Storage Controller can be shipped back
to Dell EMC.

Note: It is not always possible to perform Fault Analysis on Storage Controllers that have
been returned to Dell EMC without the Storage Controller’s disks.

Replacing a Storage Controller Manually 137


DELL EMC CONFIDENTIAL
Manually Replacing the Storage Controllers

138 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL

APPENDIX E
Manually Replacing the SSDs

This section provides procedures for manually replacing defective SSDs (without the use
of the Technician Advisor utility).


This manual installation should only be performed in situations where the Technician
Advisor utility cannot be used.

This section includes the following topic:


 Replacing an SSD Manually................................................................................... 140

Manually Replacing the SSDs 139


DELL EMC CONFIDENTIAL
Manually Replacing the SSDs

Replacing an SSD Manually



Priority Failure Analysis (Priority FA) is required only for XtremIO FRU replacements
involved in an outage (DU/DL).


The manual SSD replacement should only be performed with direction from XtremIO
Global Tech Support.

Accessing the XMS via a Cluster Storage Controller


Before replacing a defective component, a tunnel must be opened in order to access the
XMS via a Storage Controller, and be closed upon the procedure’s completion (when
access to XMS is no longer required). Once this is done, when handling a replacement
case on site, connect to the TECH port of a cluster’s Storage Controller, and access the
XMS. For instructions, refer to “Accessing the XMS via a Cluster Storage Controller” on
page 13.

Identifying the Defective SSD


For instructions on identifying the defective SSD, refer to “Identifying the Defective SSD”
on page 65.

Checking the XtremIO Cluster Health


Before replacing the defective component, check the cluster’s health by using the XtremIO
Health-Check Script (HCS). For instructions, refer to “Checking the XtremIO Cluster Health”
on page 17.

Replacing a Defective SSD

Note: Make sure to follow each step of the DAE SSD replacement procedure. Specifically,
do not forget to remove the defective SSD from the DAE, and do not reinsert it.

To check the state of the cluster before replacing a defective SSD:


1. Log in to the XMCLI as tech.
2. List the clusters status, using the following command:
show-clusters
3. Note if the state of the cluster is active and the conn-state is connected.
4. List the SSDs in the cluster, by using the following command:
show-ssds cluster-id="<cluster name>"
5. Note the name, slot and dpg-name of the defective SSD.
6. List the XDP Groups status, using the following command:
show-data-protection-groups cluster-id="<cluster name>"

140 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Manually Replacing the SSDs

7. Check the state of XDP Group for which the defective SSD belongs to
(normal/degraded/double-degraded).
8. If XDP state is normal, proceed to “No Rebuild in Progress”. If XDP state is degraded
or double_degraded, proceed to “Rebuild in Progress” on page 144.

No Rebuild in Progress

To remove the defective SSD:


1. Remove the bezel that covers the front of the DAE by simultaneously pressing the tabs
on both sides of the bezel to release it from its latches, then pull the bezel off the
component.

Note: The picture is for illustration purposes only.

2. Eject the defective SSD from the DAE, as follows:


a. Press on the latch button on the disk to release the latch.
b. Pull the latch slowly and pull the disk from its slot.

Note: The defective SSD should be sent to Dell EMC for Failure Analysis (FA) if possible.
Refer to Appendix C for the procedure details.

Replacing an SSD Manually 141


DELL EMC CONFIDENTIAL
Manually Replacing the SSDs

To remove the defective SSD entry from the cluster database, using the CLI:
1. In the XMCLI, run the following command
remove-ssd ssd-id="<ssd-id-name>"
cluster-id="<cluster-name>"
2. Once the system has received the command, you are prompted to confirm removing
this specific SSD.

xmcli (tech)> remove-ssd ssd-id="wwn-0x5000cca050066ca8" cluster-id="xbrick141"


Are you sure you want to remove SSD wwn-0x5000cca050066ca8 [10]? (Yes/No): yes
SSD wwn-0x5000cca050066ca8 [10] removed

3. Run the following command to verify that the defective SSD has been removed:
show-ssds cluster-id="<cluster name>"

To install the new SSD:


1. Insert the new SSD into the DAE, as follows:
a. Align the disk or module with the guides in the slot.
b. With the disk carrier latch fully open, gently push the disk into the slot. The latch
begins to rotate downward when its tabs meet the enclosure.
c. Push the handle down to engage the latch.

2. Reinstall the DAE bezel.

To add a new SSD to the relevant cluster and X-Brick using the CLI:
1. Log in to the XMS CLI as tech.
2. List the slot status, using the following command:
show-slots cluster-id="<cluster-name>"
3. Locate the slot with an uninitialized_ssd state and note the slot number
(Slot-Num), the SSD-UID and index of the X-Brick for this slot.

xmcli (tech)> show-slots


Cluster-Name Index Brick-Name Index Slot-Num State Slot-Power-State Error-Reason SSD-UID Product-Model SSD-Size
.......
xbrick141 1 X1 1 8 resident_ssd None none wwn-0x5000cca050067774 HITACHI HUSMM111CLAR1600 1562813784
xbrick141 1 X1 1 9 uninitialized_ssd None none wwn-0x5000cca050066ca8 HITACHI HUSMM111CLAR1600 1562813784
xbrick141 1 X1 1 10 resident_ssd None none wwn-0x5000cca050066cb4 HITACHI HUSMM111CLAR1600 1562813784
.......

142 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Manually Replacing the SSDs

4. Add the new SSD to the relevant cluster and X-Brick, using the following command:
add-ssd cluster-id="<cluster-name>" brick-id=<brick index>
ssd-uid="<SSD-UID>"
For example:
add-ssd cluster-id="xbrick141" brick-id=1
ssd-uid="wwn-0x5000cca050066ca8"

xmcli (tech)> add-ssd cluster-id="xbrick141" brick-id=1 ssd-uid="wwn-0x5000cca050066ca8"


SSD wwn-0x5000cca050066ca8 [10] added to Brick X1 [1]

Note: If the SSD added is not a new SSD (out of the box) and was previously used in
this cluster, use the is-foreign-xtremapp-ssd flag.
For example:
add-ssd cluster-id="xbrick141" brick-id=1
ssd-uid="wwn-0x5000cca050066ca8" is-foreign-xtremapp-ssd

To assign the new SSD to the XDP Group, using the CLI:
1. Log in to the XMCLI as tech.
2. Assign the SSD to the XDP Group, using the following command:
assign-ssd dpg-id="<DPG-Name>" ssd-id="<SSD Name>"
cluster-id="<cluster name>"
For Example:
assign-ssd dpg-id="X1-DPG" ssd-id="wwn-0x5000cca050066ca8"
cluster-id="xbrick141"

Note: The SSD name used in step 2 is shown under the "Name" field on
show-ssds. The newly added SSD is seen with SSD-DPG-State "not_in_rg".

xmcli (tech)> assign-ssd dpg-id="X1-DPG" ssd-id="wwn-0x5000cca050066ca8" cluster-id="xbrick141"


SSD wwn-0x5000cca050066ca8 [10] assigned to DPG X1-DPG [1]

3. Use the following command to check if the integration process has completed:
show-ssds cluster-id="<cluster name>"
The SSD-DPG-State field shown on show-ssds changes for this new SSD, from
"not_in_rg" to "assigning_to_rg" during the integration process. Once the
process finishes, it changes to "in_rg".

xmcli (tech)> show-ssds


Name Index Cluster-Name Index Brick-Name Index Slot Model-Name FW-Version FW-State Part-Number SSD-Size DPG-Name Index SSD-DPG-State Lifecycle-State Endurance-Remaining-% Certainty Encryption-Status
SSD-Temperature
...
Before assigning:
wwn-0x5000cca050066ca8 10 xbrick141 1 X1 1 9 HITACHI HUSMM111CLAR1600 C250 no_error 5051059 1.455T not_in_rg healthy 99 ok
enc_supported_unlocked None
During integration:
wwn-0x5000cca050066ca8 10 xbrick141 1 X1 1 9 HITACHI HUSMM111CLAR1600 C250 no_error 5051059 1.455T X1-DPG 1 assigning_to_rg healthy 99 ok
enc_supported_locked_cluster_pin None
After integration :
wwn-0x5000cca050066ca8 10 xbrick141 1 X1 1 9 HITACHI HUSMM111CLAR1600 C250 no_error 5051059 1.455T X1-DPG 1 in_rg healthy 99 ok
enc_supported_locked_cluster_pin None
...

Replacing an SSD Manually 143


DELL EMC CONFIDENTIAL
Manually Replacing the SSDs

4. You can also use the following command to check if the integration process has
completed:
show-data-protection-groups
• "Preparation-Progress" field changes to 0 when integration has
completed
• "Useful-SSD-Space" field increases when integration has completed.

xmcli (tech)> show-data-protection-groups


Name Index Cluster-Name Index State Num-Of-SSDs Useful-SSD-Space User-Space User-Space-In-Use Rebuild-Progress Preparation-Progress Proactive-Metadata-Loading Rebuild-Prevention Brick-Name Index
After assign-ssd is run. Shortly after, integration starts automatically:
X1-DPG 1 xbrick141 1 normal 25 34.932T 30.587T 5.081T 0 0 False none X1 1
During integration:
X1-DPG 1 xbrick141 1 normal 25 34.932T 30.587T 5.081T 0 41 False none X1 1
X1-DPG 1 xbrick141 1 normal 25 34.932T 30.587T 5.081T 0 56 False none X1 1
After integration:
X1-DPG 1 xbrick141 1 normal 25 36.387T 30.555T 5.081T 0 0 False none X1 1

5. Verify that the cluster and modules are active, by running the following commands:
show-clusters
show-modules cluster-id="<cluster name>"
6. Generate and upload a log bundle (refer to “Post Replacement Procedures” on
page 147).

Rebuild in Progress

To verify that the rebuild has completed, using the CLI:


1. Log in to the XMS CLI as tech.
2. List the XDP Groups status, using the following command:
show-data-protection-groups cluster-id=<cluster-name>
Using the "Rebuild-Progress" field, follow the progress of the rebuild of the XDP
Group for which the defective SSD(s) belongs to.
When it gets to 100%, it is expected to move to 0, and the State of the XDP Group is
expected to move to Normal.

xmcli (tech)> show-data-protection-groups


Name Index Cluster-Name Index State Num-Of-SSDs Useful-SSD-Space User-Space User-Space-In-Use Rebuild-Progress Preparation-Progress
Proactive-Metadata-Loading Rebuild-Prevention Brick-Name Index
X1-DPG 1 xbrick141 1 degraded 24 36.387T 30.587T 5.081T 34 0 False
none X1 1

3. Verify that the states of all XDPs are normal.

Note: If all XDPs are not in a state of normal, it is necessary to wait until they are.

4. For each SSD showing SSD-DPG-State "not_in_rg" , perform the steps in “No
Rebuild in Progress” on page 141.

144 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Manually Replacing the SSDs

Troubleshooting

To identify unexpected issues during a rebuild process, using the CLI:


1. Log in to the XMS CLI as tech.
2. List the XDPs status, using the following command:
show-data-protection-groups cluster-id="<cluster name>"
3. If the state of Rebuild-Prevention is other than none or the state of
Proactive-Metadata-Loading is not False, it is necessary to escalate the case
to XtremIO Technical Support.

Performing the Post Replacement Procedures


After configuring the replaced component, it is necessary to perform the following post
replacement procedures:
 Checking for and clearing any active repeating alerts
 Generating and uploading a log bundle
 Checking the cluster’s health by running the XtremIO Health Check Script (HCS)
 Restoring all Notifiers
 Closing the tunnel between a Storage Controller and the XMS
 Only if priority Failure Analysis is required, send the defective component to Dell EMC
for Priority Failure Analysis (Priority FA)
For instructions on performing post configuration procedures, refer to Appendix F.

Replacing an SSD Manually 145


DELL EMC CONFIDENTIAL
Manually Replacing the SSDs

146 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL

APPENDIX F
Post Replacement Procedures

This section provides instructions for post replacement procedures to be performed


following configuration on a replaced component, including clearing repeating Alert
counters, generating and loading an XtremIO log bundle to FTP, and checking the XtremIO
cluster health.
This section includes the following topics:
 Clearing Repeating Alert Counters ......................................................................... 148
 Generating and Collecting the Bundle ................................................................... 148
 Uploading the Bundle Collection........................................................................... 149
 Disabling Path Redundancy Monitoring for VPLEX-Connected XtremIO Clusters ..... 149
 Checking the XtremIO Cluster Health (Post Replacement)...................................... 150
 Restoring All Notifiers ........................................................................................... 150
 Closing the Tunnel Between a Storage Controller and the XMS .............................. 151
 Sending Defective Component to Priority Failure Analysis ..................................... 151

Post Replacement Procedures 147


DELL EMC CONFIDENTIAL
Post Replacement Procedures

Clearing Repeating Alert Counters


Check for active repeating alerts. If repeated alerts exist, it is necessary to clear the alerts
in order to verify whether the replacement procedure remedied the component failure.

To check for repeating alerts:


 Run the following command: show-alerts

xmcli (tech)> > show-alerts


Index Description Severity Raise-Time ...
34 Repeating: Storage Controller InfiniBand port 2 is down. major Mon Apr 18 11:22:03 2016.....
33 Repeating: InfiniBand port 2: link status is not healthy. The port state is down. major Mon Apr 18 11:22:03 2016.....
xmcli (tech)>

If the response shows alerts with the “repeating” text in the prefix, it is necessary to
clear the alert counters.

Note: Clearing alert counters clears all of the system’s alerts. In case of multiple alerts,
make a note of the components with repeated active alerts, prior to clearing alert
counters.

To clear alert counters:


1. Log in to the XMS CLI as tech.
2. Clear all alert counters, using the following command:
clear-alert-table-counters

Generating and Collecting the Bundle


To generate and collect the bundle:
1. Log in to the XMS CLI as admin.
2. Issue a dossier package collection, using the following command:
xmcli (admin)> create-debug-info cluster-id="<cluster name>"

Note: It is recommended to use the cluster name (and not the cluster ID) as the cluster
identifier in cluster-related XMCLI commands.

Note: The cluster-id parameter is not mandatory for single cluster configurations.

The following message appears:

The process may take a while. Please do not interrupt.


Debug info collected and could be accessed via http://...

3. Copy the link into a web browser and download the package.

148 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Post Replacement Procedures

Uploading the Bundle Collection


To upload the package:
1. Connect to the XtremIO FTP, using either of the following methods:
• With FTP client - connect to ftp://ftp.xtremio.com/ using an anonymous user and
your email address as password.
• With a browser - go to https://ftp.emc.com/. In the list box, select XtremIO and
type the anonymous user and your email address as password.
2. Create a directory with a name, containing the customer name and SR number (case
number).
For example:
Customer-12345678

Disabling Path Redundancy Monitoring for VPLEX-Connected


XtremIO Clusters
It is highly recommended to work with the customer to disable path redundancy
monitoring for any VPLEX initiators on the cluster, as they show as non-redundant, even
when set to VPLEX best practice configuration.
For further details on this recommendation, refer Dell EMC KB# 519349
(https://support.emc.com/kb/519349).

To disable path redundancy monitoring:


1. Access the XMS as tech or admin.
2. Issue the following modify-initiator XMCLI command on all imitators in the
cluster that are connected to VPLEX:

modify-initiator initiator-id=<Initiator ID> path-redundancy-monitor-mode=disabled

3. In addition, for a cluster connected to VPLEX, it is highly recommended to disable the


sending of the non-redundant Initiators alert (symptom-code XTR2400203). To
disable the sending of this alert, execute the following modify-alert-definition XMCLI
command:

modify-alert-definition alert-type="initiator_redundancy_state_non_redundant"
send-to-call-home="no"

Uploading the Bundle Collection 149


DELL EMC CONFIDENTIAL
Post Replacement Procedures

Checking the XtremIO Cluster Health (Post Replacement)


After completing the component replacement procedure, it is necessary to check the
cluster’s health again, by running the XtremIO Health Check Script (HCS).
Download the latest HCS, available on the Dell EMC XtremIO SolVe generator.
The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s6.0.0.py"
arguments="--cluster-id 1"


For guidance on running the XtremIO Health-Check Script and on resolving its output, refer
to Dell EMC KB # 206076 (https://support.emc.com/kb/206076). If an unexpected error
is reported by the HCS, submit a standard Service Request to XtremIO Global Technical
Support.

Restoring All Notifiers


To restore all Notifiers:
1. Log in to the XMCLI as tech.
2. Restore all Notifiers, using the following command:
restore-notifiers cluster-id="<cluster name>"

xmcli (tech)> restore-notifiers cluster-id="xbrick711-714"


Event notifiers were restored

150 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL
Post Replacement Procedures

Closing the Tunnel Between a Storage Controller and the XMS


Make sure to close the tunnel between the Storage Controller and the XMS upon
completion of the procedure.

To close the tunnel that was opened between the Storage Controller and the XMS:
 Run the following CLI command:
modify-technician-port-tunnel cluster-id=<Cluster ID>
sc-id=<Storage Controller ID> close

Sending Defective Component to Priority Failure Analysis



Priority Failure Analysis (Priority FA) is required only for XtremIO FRU replacements
involved in an outage (DU/DL).
After the component is successfully replaced, and only if Priority FA is required, send the
defective component to Dell EMC for analysis. Refer to “Priority Failure Analysis” on
page 123.

Closing the Tunnel Between a Storage Controller and the XMS 151
DELL EMC CONFIDENTIAL
Post Replacement Procedures

152 Dell EMC XtremIO Storage Array FRU Replacement Procedures


DELL EMC CONFIDENTIAL

APPENDIX G
Essential Pre-Customer-Visit Preparations for
Technician Advisor Utility Use

This section describes preparations for using the XtremIO Technician Advisor utility at a
customer's site prior to your arrival.
This section includes the following topics:
 Checking the Network Ports with the Customer ..................................................... 154
 Preparing a Replacement Battery Backup Unit....................................................... 154

Essential Pre-Customer-Visit Preparations for Technician Advisor Utility Use 153


DELL EMC CONFIDENTIAL
Essential Pre-Customer-Visit Preparations for Technician Advisor Utility Use

Checking the Network Ports with the Customer


Check the following with the customer prior to arriving at the customer's site:
 Verify that the following network ports are open between all of the cluster's Storage
Controllers and the XMS:

Table 7 Network Port Access Between Storage Controllers and XMS

All XtremIO versions TCP 22 and 443

Preparing a Replacement Battery Backup Unit


When replacing a BBU, the new (replacement) BBU's battery must be charged 90% or
greater prior to performing a BBU replacement procedure.

To prepare a replacement BBU:


1. Check the replacement BBU's front LCD to verify that its current battery charge is equal
to 90% or greater.

2. If necessary, charge the replacement BBU until it is charged 90% or greater.

154 Dell EMC XtremIO Storage Array FRU Replacement Procedures

You might also like