NetappTS ExerciseGuide
NetappTS ExerciseGuide
Exercise Guide
Content Version 2
NETAPP UNIVERSITY
ONTAP Troubleshooting
Exercise Guide
Course ID: CP-ILT-CATSP
Catalog Number: CP-ILT-CATSP-EG
COPYRIGHT
© 2017 NetApp, Inc. All rights reserved. Printed in the U.S.A. Specifications subject to change without notice.
No part of this document covered by copyright may be reproduced in any form or by any means—graphic, electronic, or
mechanical, including photocopying, recording, taping, or storage in an electronic retrieval system—without prior written
permission of NetApp, Inc.
TRADEMARK INFORMATION
NETAPP, the NETAPP logo, and the marks listed at http://www.netapp.com/TM are trademarks of NetApp, Inc. Other
company and product names may be trademarks of their respective owners.
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Exercise Equipment
The student lab environment consists of one vApp for each student.
The vApp is labeled OTS_X0Y, where X is the set number and Y is the student vApp number
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Objectives
This exercise focuses on enabling you to do the following:
Conduct a full and comprehensive health check of an ONTAP cluster
Access the cluster using OnCommand System Manager (System Manager)
Access cluster and node log files through HTTPS
1-2. Open the Remote Desktop Connection (RDC) application and connect to your access host.
1-5. Run the following commands to do a complete health check of the cluster.
cluster1::*> network interface show
cluster1::*> cluster show
cluster1::*> cluster ring show
cluster1::*> storage failover show
cluster1::*> event log show -severity WARNING
cluster1::*> event log show -severity EMERGENCY
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
NOTE: In ONTAP 8.3 software, you do not need to enable the web services and
HTTP.
6-3. Connect to the cluster1 management interface using the administrator account from the access
host. You might want to enable logging and save all your session output.
6-5. Access the URLs to view the log directory on each node. You must log in using the cluster
administration credentials.
https://<cluster-mgmt-ip>/spi/<node_name>/etc/log/
6-6. Access the URLs to view the directory in which the core files are saved on each node.
https://<cluster-mgmt-ip>/spi/<node_name>/etc/crash/
End of Exercise
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Objectives
This exercise focuses on enabling you to do the following:
Recover a replicated database (RDB) configuration
Resolve RDB replication problems
Perform cluster and node backups
Resolve an issue with /mroot
1-2. List two frequent reasons that could cause a cluster configuration backup to fail.
1-4. Identify a knowledge base article to resolve the error message in Step 1-3.
1-6. List the command that verifies that the scheduled backups were created and distributed within
the cluster.
1-7. List the command that you can use to recover a node’s configuration.
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
2-3. On node1, start a job to create a system configuration backup of the entire cluster and note the
job ID number.
2-4. Before the job finishes, review the job that you have created:
cluster1::*> job show
cluster1::*> job show –id <ID#>
(You use the job ID from the backup create command.)
cluster1::*> job show –id <ID#> -fields uuid
cluster1::*> job show -uuid UUID_from_the_previous_command
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
4-5. Log in to the node management interface for node2 again, and answer these questions:
Are you able to log in?
Why or why not?
4-6. Log back in to the systemshell of node2.
4-7. From node 2’s systemshell, unmount mroot, using the following commands:
% cd /etc
% sudo./netapp_mroot_unmount
% exit
4-8. Log in to the cluster management session, and check the cluster health.
4-11. Check the cluster health again, and answer these questions:
Do you see a difference?
If so, why?
What is nonoperational?
4-12. Fix this problem, and answer this question:
How did you verify that /mroot is mounted?
End of Exercise
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Objectives
This exercise focuses on enabling you to do the following:
Identify the network component and the data component interaction
Outline the networking implications of upgrading to ONTAP 8.3 software
Use network triage tools
Describe the implications of vifmgr going Out of Quorum (OOQ)
Error: rdb_ring_info: RDB ring state query of 127.0.0.1 for vifmgr failed on
RPC connect: clnttcp_create: RPC: Remote system error - Connection refused
cluster-01 vifmgr - - - - -
cluster-01 bcomd 4 4 14 cluster-02 secondary
cluster-01 crs 1 1 79 cluster-02 secondary
cluster-02 mgmt 4 4 136154 cluster-02 master
cluster-02 vldb 4 4 76 cluster-02 master
cluster-02 vifmgr 4 4 13220 cluster-02 master
cluster-02 bcomd 4 4 14 cluster-02 master
cluster-02 crs 1 1 79 cluster-02 master
cluster-03 mgmt 4 4 136154 cluster-02 secondary
cluster-03 vldb 4 4 76 cluster-02 secondary
cluster-03 vifmgr 4 4 13220 cluster-02 secondary
cluster-03 bcomd 4 4 14 cluster-02 secondary
cluster-03 crs 1 1 79 cluster-02 secondary
cluster-04 mgmt 4 4 136154 cluster-02 secondary
cluster-04 vldb 4 4 76 cluster-02 secondary
cluster-04 vifmgr 4 4 13220 cluster-02 secondary
cluster-04 bcomd 4 4 14 cluster-02 secondary
cluster-04 crs 1 1 79 cluster-02 secondary
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
The customer says that these LIFs are normally home on the node cluster-01. Explain
which vifmgr behavior might explain why these LIFs are now on node cluster-03.
1-3. A customer calls in and provided the following output. Examine the output.
cluster::*> cluster ring show
Node UnitName Epoch DB Epoch DB Trnxs Master Online
--------- -------- -------- -------- -------- ---------
Error: rdb_ring_info: RDB ring state query of 127.0.0.1 for vifmgr failed on RPC
connect:
clnttcp_create: RPC: Remote system error - Connection refused
primary
nfs_lif1 up/- 10.61.83.200/24 cluster-01 e0a
true
1-4.
The customer says that the entire cluster is not serving data. The customer
wants an explanation as to why the LIFs are home but not serving data. Identify the
vifmgr behavior that explains this situation.
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
2-2. View the current networking interface configuration for node4 by entering the following
command:
cluster1::> net int show -curr-node node4
2-4. From the systemshell prompt of node4, view the status of the network ports on the
node by running the following command:
node4% ifconfig -a
2-5. Correlate the output from Step 2-2 and Step 2-4 to determine whether the interface
configuration, as reported by the management component, agrees with the interface
configuration of the FreeBSD networking layer.
2-8. Repeat Step 2-2 through Step 2-5 to observe that the action taken in Step 2-7 was correctly
passed on to the FreeBSD networking layer of node4.
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Scenario: A customer has called to report that the command to create LIFs fails.
Step Action
3-1. Log in to the cluster management interface, try to create a data LIF, and then answer the
questions:
cluster1::*> net int create -vserver nassvm1 -lif task5 -role
data -data-protocol nfs,cifs,fcache -home-node node2 -home-port
e0d -address 192.168.81.150 -netmask 255.255.255.0
What error message do you see?
Is the error message valid?
What command would you use to check?
3-2. Check the cluster connectivity from node2 to all the nodes in the entire cluster, and then answer
the following questions:
What command do you use?
What do you see?
3-3. Check the interfaces and the ports on the problem node, node2, and list the command that you
use.
3-4. Attempt the same command from another node, and then answer the following questions:
What do you see?
Is there any warning or error?
What might be wrong?
3-5. Verify your hypothesis on the systemshell using rdb_dump and using ps to check the running
processes, and check the logs from the clustershell.
You might need to include vifmgr and mgwd by using the following command:
cluster1::*> debug log files modify -incl-files vifmgr, mgwd,
messages
3-6.
The logs might be verbose, so you might need to use debug log show and parse a
timestamp.
3-7.
3-8. Correct the problem using information that you learned in this module.
3-9. Log in to the cluster management interface, and again try to create the data LIF.
End of Exercise
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Objectives
This exercise focuses on enabling you to troubleshoot using the diag secd commands.
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
2-3. Check for specific protocol connections by running the following commands:
cluster1::> network connections active show -service
nfs
2-4.
NOTE: The –service iscsi argument always returns empty results because the
iSCSI service is not tracked here.
2-10. Display the properties of the selected CID by running the following command:
cluster1::*> network connections active show -cid <CID #> -
instance
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
3-2. Type the following command, and then answer the questions:
cluster1::> diag secd
Why does it fail?
What do you need to do to use the diag secd command for troubleshooting?
3-3. Identify the UNIX user that the Windows user student1 maps to, and use diag secd to
find this mapping.
3-4. Explain how you query for a Windows security identifier (SID) of student1 using diag
secd.
3-5. Explain how you can test a cifs login for a student1 user in diag secd. (Password for
student1:P@ssw0rd)
3-8. Explain how you show and set the current logging level in secd.
3-9. Explain how you enable tracing in secd to capture the logging level that is specified.
3-10. Explain how you check the cifs server information in secd and compare with what is in the
RDB.
3-11. Explain how you can view and clear active CIFS connections in secd.
End of Exercise
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Objectives
This exercise focuses on enabling you to do the following:
Resolve frequently seen mount issues
Resolve access issues
1-2.
Issue the following commands, and then answer the question.
[root@cats-cent ~]# mkdir /nassvm1
[root@catsp-cent ~]# mount -o nfsvers=3
192.168.6.115:/nassvm1_nfs /nassvm1
Does the command succeed?
1-4.
From that node, capture a packet trace while repeating the previous mount command, and then
answer the following questions:
Are you able to troubleshoot the issue using the packet trace?
What is the issue?
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
2-2. Explain why the customer is denied access, and then fix the problem.
2-3.
If you can mount now, cd into the mount point, and then answer the following questions:
Can you cd into the mount point?
If nonoperational, how do you resolve the issue?
If you unmount and remount, does it still work?
2-4.
Try to write a file into the /nassvm1 directory, and then answer this question:
Are you able to write the file?
2-5.
After the write succeeds, view the permissions using ls –la, and then answer the following
questions:
What are the file permissions on the file that you wrote?
Why are the permissions and owner set the way that they are?
2-6.
Change the export policy rule for the volume to make superuser and anon something other than
what they are, write another file and check permissions, and then answer this question:
What do these actions do?
2-7.
Open a new Secure Shell (SSH) session to your Linux computer, log in as the user “cmodeuser”
with the password “passwd,” and then answer these questions:
Can you cd to the mount directory?
If successful, can you write files to the mount?
If you notice an issue, what is the reason?
How do you resolve this issue?
End of Exercise
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Objectives
This exercise focuses on enabling you to do the following:
Identify LIFs that are involved in CIFS access
Troubleshoot using the diag secd commands
Troubleshoot domain controller login issues
Troubleshoot SMB user-authentication issues
Troubleshoot the export policy issues
1-6. Explain whether the most efficient network path to the volume is being used.
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
2-3.
Instead of using the host name, use the LIF IP to access the CIFS share, and then answer these
questions:
Can you access the share?
Why or why not?
2-4. Analyze the issues, and use related commands to troubleshoot and to fix the issues.
Hint: You use the command cifs session show -instance when you map using
vserver name and when you map using the IP address and you check the protocol that is being
used for authentication.
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
3-2. Click Start > Run > \\nassvm1, and then describe the error message that you see.
3-3. Use the diag secd commands to check whether the user name is valid.
3-4. Run the following command to view the logs, and then answer the question.
cluster1::> event log show
What do the logs show?
3-5. Use a diag secd command to verify the issue, and then answer this question:
Which other commands can you run to view configurations that verify the issue?
3-6. Go to the systemshell of the node that reported the error, look at the appropriate log file,
and then answer these questions:
Do you see that the issue that is logged is there?
Can you identify the root cause?
How do you fix it?
Are you able to access the share through SMB now?
3-7. If you still cannot access the share through SMB, check whether the user mapping is still a
problem.
3-9. Given that the user mapping succeeds, but you are still unable to access the share, explain
what the issue could be.
3-10. List the commands that are available to review security settings, such as permissions and
security style on volumes, shares, and so on.
3-11. From the clustershell, use vserver security file-directory show to view
permissions on the volumes that you are trying to access, and then answer this question:
Should the user have access to these volumes?
3-13. Change the security style of the volume to NTFS, and see whether you can access the volume
now.
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
4-2. Try to access \\nassvm1\vol1, and describe the error that you see.
End of Exercise
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Objectives
This exercise focuses on enabling you to do the following:
Use standard Linux commands to evaluate a Linux host in a NetApp scalable SAN environment
Use standard Linux commands to identify SAN disks in a NetApp scalable SAN environment
Use standard Linux commands to verify connectivity in a NetApp scalable SAN environment
Use standard Linux log files to evaluate the iSCSI subsystem in a NetApp scalable SAN environment
Troubleshoot a Linux host in a NetApp scalable SAN environment
Troubleshoot a Windows host in a NetApp scalable SAN environment
Restore LUN connectivity
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
1-4. Use the output of the service iscsi status command that is displayed in Step 1-3 to
answer the following questions:
List the Iface initiatorname: ____________________________________
List the iSCSI connection state: ________________________________
List the disks that are attached to SCSI12 Channel 00: ______________________
List the state of each disk: ____________________________________
List the current portal: ________________________________________
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
1-6. The state of the active internet connection between the host (local address) and target
(foreign address) is ESTABLISHED.
The Linux host records events about the iSCSI subsystem in the system messages
file, /var/log/messages.
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
2-4. Type the following command to verify connectivity between the host and target, and then
answer the following questions:
[root@cats-cent ~]# netstat -pant | grep iscsi
Do you see four connections in ESTABLISHED state?
If not, what could be the issue?
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
End of Exercise
© 2017 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.