Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Choosing and Architecting Storage For Your Environment

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

Choosing and Architecting Storage for Your Environment

Lucas Nguyen Technical Alliance Manager Mike DiPetrillo Specialist Systems Engineer

Agenda

VMware Storage Options


Fibre Channel NAS iSCSI DAS

Architecture Best Practices Sizing Case Study: Impact of Architecture on Performance

Storage Mechanisms
Technology Market Transfers Interface Performance High (due to dedicated network) Medium (depends on integrity of LAN) Medium (depends on integrity of LAN)

Fibre Channel

Data Center

Block access of data/LUN

FC HBA

NAS

SMB

File (no direct LUN access)

NIC

iSCSI

SMB

Block access of data/LUN

iSCSI HBA

DAS

Branch Office

Block access

SCSI HBA

High (due to dedicated bus)

Storage Mechanisms (Topology Comparison) DAS vs NAS vs SAN


Branch Office SMB Market Data Center

DAS
Direct Attached Storage

NAS
Network Attached Storage

SAN
Storage Area Network

Storage Disaster Recovery Options

DAS
Tape / RAID S/W Cluster

NAS
Tape / RAID NIC failover S/W Cluster Filer Cluster LAN backup Data Replication

SAN

Tape / RAID HBA / SP failover Fabric / ISL redundancy Data Replication technologies S/W Cluster within Virtual Machine LAN backup within Virtual Machine VMware HA VMware Consolidated Backup

Choosing Disks

Traditional performance factors


Capacity / Price Disk types (SCSI, ATA, FC, SATA) Access Time; IOPS; Sustained Transfer Rate Reliability (MTBF)

VM performance gated ultimately by IOPS density and storage space IOPS Density -> Number of read IOPS/GB
Higher = better

Disk Drive Statistics

Source: Comparison of Disk Drives For Enterprise Computing, Kurt Chan

Typical IOPS Density

Tier1 -> 144 GB, 15k RPM->180 IOPS/144GB = 1.25 IOPS/GB Tier2 -> 300 GB, 10k RPM-> 150 IOPS/300GB = 0.5 IOPS/GB Tier3 -> 500 GB, 7k RPM -> 90 IOPS/500 GB = 0.18 IOPS/GB Relative Performance
Tier1 -> 1.0 Tier2 -> 0.4 (40%) Tier3 -> 0.14 (14%)

Potential choices -> FC, LC-FC, SATAII

Volume Aggregation
Stripe virtual LUN across volumes from multiple RAID 5 groups. Some storage platforms only concat, but striping is preferred. Aggregate across volumes in the same ZBR zone. Do not mix volumes from different disk sizes, rotational velocity, or volume sizes. It is OK and preferred to stripe within the same volume groups. End result is one LUN presented to VMware spanning many physical disks.

Understanding SCSI Queuing and Throttling

Service Time: time for disk to complete requests Response Time (or svc_t) = wait time in queue + service time I/O active in device = actv Average wait queue response time = wsvc_t Average run queue response time = asvc_t

Understanding the Network Storage Stack SCSI Queuing and Throttling

SCSI is a connect/disconnect protocol so the array can make certain optimizations Wait queue - I/Os buffering in the HBA/sd queue - bad Active queue I/Os buffered in the storage array Service queue I/Os being serviced on the disk (read miss) or cache (read hit, or fast write)

SCSI and Storage Optimizations Keep that disk busy

Array writes written to hardware cache, destaged to disk with SCSI write buffering disabled Array reads Array can reorder reads to minimize storage contention
SCSI tag queuing can optimize reads on active disks

Why is this important?


A moderately busy disk services requests faster on whole than an inactive disk

Busy, but not backed into the HBA wait queue Average I/O 80-100 ms which is very slow (>50 ms)

R/s

w/ s

Kr/s

kw/s

wait

actv

wsvc_t

asvc_t

%w

%b

device

Utilization

Throughput (IOPS)

Av Read Sz (K)

Serv Time

215.6 215.8 216.0 217.6 216.3 216.4

2.0 2.4 1.9 2.1 2.0 2.0

5799.1 5814.6 5814.9 5820.9 5803.8 5801.3

29.5 38.5 30.1 32.0 31.0 29.8

0.0 0.0 0.0 0.0 0.0 0.0

20.0 15.3 15.4 25.0 18.6 18.1

0.0 0.0 0.0 0.0 0.0 0.0

91.8 69.9 70.6 113.9 85.1 83.1

0 0 0 0 0 0

88 84 84 92 89 88

c7t1d0 c7t2d0 c7t3d0 c8t9d0 c8t10d0 c8t11d0

0.88 0.84 0.84 0.92 0.89 0.88

217.60 218.20 217.90 219.70 218.30 218.40

26.90 26.94 26.92 26.75 26.83 26.81

4.04 3.85 3.85 4.19 4.08 4.03

Flooded, I/O serialized in wait queue Average I/O 200+ ms


r/s Dua41 1 121.3 121.2 120.6 121.8 123.0 123.8 94.9 94.6 95.4 w/s Dua4 6 1 0.7 0.6 0.4 0.0 0.0 0.0 1.1 0.8 0.9 5677.3 5648.6 5654.6 5781.2 5796.8 5834.6 2915.4 2905.1 2937.1 10.9 9.1 5.7 0.1 0.3 0.1 17.2 12.1 13.6 41.3 43 34.6 29 23.3 25.1 15.3 14 14.6 13.4 13.2 12.9 11.9 11.2 11.4 7.9 7.8 8 338.0 353.5 285.9 238.4 189.0 202.8 159.0 146.5 151.2 109.7 108.6 106.9 97.3 91.2 92.0 82.6 82.1 82.9 79 79 75 67 62 64 41 41 42 98 97 96 92 90 90 67 67 67 c6t0d0 c6t1d0 c6t2d0 c6t3d0 c6t4d0 c6t9d0 c6t16d 0 c6t17d 0 c6t18d 0 0.98 0.97 0.96 0.92 0.90 0.90 0.67 0.67 0.67 122.00 121.80 121.00 121.80 123.00 123.80 96.00 95.40 96.30 46.80 46.61 46.89 47.46 47.13 47.13 30.72 30.71 30.79 8.03 7.96 7.93 7.55 7.32 7.27 6.98 7.02 6.96 kr/s kw/s wait actv wsvc_t asvc_t %w %b device Utilization Throughput Av Read Sz Svc Time

LUN Queuing for VMware Queuing techniques different


In symmetric storage, path software can spread I/Os to different adapter ports (LUN queues in adapter ports) Typical open system can have several LUNs

VMware
LUN/VMFS active on one path (active/passive arrays) only VMFS volume much larger than typical OS LUN

Why is this important?


Default HBA queue depth usually too small

Controlling VMs from flooding your storage

Easiest method is setting the maximum outstanding disk requests


This setting can slow a read I/O intensive VM, but will protect the farm. Problems usually surface during backup/restore Advanced Settings Disk.SchedNumReqOutstanding (Number of outstanding commands to a target with competing worlds) [1-256: default = 16]: 16 Do not set this to the queue depth as this is intended to throttle multiple VMs

LUN Presentation SAN Zoning

Use WWPN zoning and zone the initiator (HBA) to the FA (storage port) in a 1:1 relationship This minimizes RSCN disruptions, device LI/LO, fail-over host based confusion

CASE STUDY

Impact of Architecture on Performance

Background

Architecture can have huge performance implications Every environment will be different Use tests in your environment to find bottlenecks

Our Current Architecture

Tests Run

IOMeter
70% Random, 70% Read, 64k Block 5 Minute run 10 GB disk
Fibre Channel Student Results VMFS Total I/Os per Second (IOPS) Total MBs per Second (Throughput ) Average I/O Response Time (ms) % CPU Utilization (total) RDM Fibre Channel Pre-Run VMFS 3294 RDM 3353 iSCSI Pre-Run VMFS 1813 RDM 1865 NAS Pre-Run VMDK 1691

206

209

113

116

105

1.21

1.19

2.20

2.14

2.36

33.87%

27.26%

24.00%

19.40%

23.00%

Scale Out Architecture

Results

Students got worse performance


Wheres the bottleneck?
Fibre Channel Student Results VMFS Total I/Os per Second (IOPS) Total MBs per Second (Throughput ) Average I/O Response Time (ms) % CPU Utilization (total) 1894 RDM 1868 Fibre Channel Pre-Run VMFS 3294 RDM 3353 iSCSI Pre-Run VMFS 1813 RDM 1865 NAS Pre-Run VMDK 1691

110

113

206

209

113

116

105

1.19

1.24

1.21

1.19

2.20

2.14

2.36

22.73%

21.72%

33.87%

27.26%

24.00%

19.40%

23.00%

Analysis iSCSI and NAS give good performance Tier your storage RDMs do not always give better performance than VMFS
(1894, 3294) for VMFS (1868, 3353) for RDM
Fibre Channel Student Results VMFS Total I/Os per Second (IOPS) Total MBs per Second (Throughput ) Average I/O Response Time (ms) % CPU Utilization (total) 1894 RDM 1868 Fibre Channel Pre-Run VMFS 3294 RDM 3353 iSCSI Pre-Run VMFS 1813 RDM 1865 NAS Pre-Run VMDK 1691

110

113

206

209

113

116

105

1.19

1.24

1.21

1.19

2.20

2.14

2.36

22.73%

21.72%

33.87%

27.26%

24.00%

19.40%

23.00%

Analysis
Located a potential bottleneck SP path

How could you improve performance?

Discover a Down Stream Bottleneck


Test to see if our path is the bottleneck
Use more downstream destinations

1 ESX Server 1 Array 2 Datastores

Discover a Down Stream Bottleneck


Split datastores give better performance because of more work queues
Path was not our bottleneck
IOPS VM1 VM2 Total Previous 1961 1983 3944 3294 MB/s 123 123 246 206 Latency 2.04 2.01 %CPU 22.27% 22.37%

Lab Session 4 Storage Performance Step 5


Test to see if HBA is bottleneck 2 ESX Servers (2 HBAs) 1 Array 2 Datastores

Lab Session 4 Storage Performance Step 5


Still bound at path to SP
IOPS VM-Host1 VM-Host2 Total Previous 1980 1989 3969 3944 MB/s 124 124 248 246 Latency 2.02 2.01 %CPU 20.30% 20.70%

Lab Session 4 Storage Performance Step 5


Test to see where SP path is bottleneck 1 ESX Server 2 Arrays (2 SPs) 2 Datastores

Lab Session 4 Storage Performance Step 5


Adding more SPs increased performance Hit HBA bound Manually load balance LUNs
IOPS VM-Array1 VM-Array2 Total Previous 2048 2153 4201 3969 MB/s 131 134 265 248 Latency 1.90 1.86 %CPU 20.88% 20.08%

Lab Session 4 Storage Performance Step 5


Test spans across volumes 1 ESX Server 1 Array Spanned Volume

Lab Session 4 Storage Performance Step 5


Spanned Volumes DO NOT increase performance
IOPS Student#-Storage Original 3328 3294 MB/s 208 206 Latency 1.20 %CPU 32.74%

Lab Session 4 Storage Performance Step 5


NOTE: Every environment is different. If you decide to run this test in your environment your numbers may be different for a variety of reasons. Many things will change the results of your tests such as SAN fabric architecture, speed of disks, speed of HBAs, number of HBAs, etc. The numbers introduced in this lab are by no means meant to be an official benchmark of the lab equipment. The tests run were simply used to create a desired performance issue so that a point could be made. Please consult your storage vendor contacts for official benchmarking numbers on their arrays in a number of environments

Questions?

Presentation Download Please remember to complete your

session evaluation form


and return it to the room monitors as you exit the session
The presentation for this session can be downloaded at

http://www.vmware.com/vmtn/vmworld/sessions/
Enter the following to download (case-sensitive):

Username: cbv_rep Password: cbvfor9v9r

Some or all of the features in this document may be representative of feature areas under development. Feature commitments must not be included in contracts, purchase orders, or sales agreements of any kind. Technical feasibility and market demand will affect final delivery.

You might also like