Apache con 2013-hadoop

Taking the guesswork out of
your Hadoop Infrastructure

Steve Watt

@wattsteve

Agenda – Clearing up common misconceptions

Web Scale Hadoop Origins
Single/Dual Socket 1+ GHz
4-8 GB RAM
2-4 Cores
1x 1GbE NIC
(2-4) x 1 TB SATA Drives

Commodity in 2013

Dual Socket 2+ GHz
24-48 GB RAM
4-6 Cores
(2-4) x 1GbE NICs
(4-14) x 2 TB SATA Drives

The enterprise perspective
is also different

2

You can quantify what is right for you
Balancing Performance and Storage Capacity with Price

$
PRICE

Storage Performance
Capacity

3

“W e’ve profiled our Hadoop applications
so we know what type of infrastructure
we need”

Said no-one. Ever.

4

Profiling your Hadoop Cluster
High Level Design goals

Its pretty simple:
1) Instrument the Cluster
2) Run your workloads
3) Analyze the Numbers

Don’t do paper exercises. Hadoop has a way of blowing all
your hypothesis out of the water.

Lets walk through a 10 TB TeraSort on Full 42u Rack:
- 2 x HP DL360p (JobTracker and NameNode)
- 18 x HP DL380p Hadoop Slaves (18 Maps, 12 Reducers)

64 GB RAM, Dual 6 core Intel 2.9 GHz, 2 x HP p420 Smart
Array Controllers, 16 x 1TB SFF Disks, 4 x 1GbE Bonded NICs

5

Instrumenting the Cluster
The key is to capture the data. Use whatever framework you’re comfortable with

Analysis using the Linux SAR Tool

- Outer Script starts SAR gathering scripts on each node. Then starts Hadoop
Job.

- SAR Scripts on each node gather I/O, CPU, Memory and Network Metrics for
that node for the duration of the job.

- Upon completion the SAR data is converted to CSV and loaded into MySQL so
we can do ad-hoc analysis of the data.

- Aggregations/Summations done via SQL.

- Excel is used to generate charts.

6

Examples
Performed for each node – results copied to repository node

ssh wkr01 sadf -d sar_test.dat -- -u > wkr01_cpu_util.csv

ssh wkr01 sadf -d sar_test.dat -- -b > wkr01_io_rate.csv

ssh wkr01 sadf -d sar_test.dat -- -n DEV > wkr01_net_dev.csv

ssh wkr02 sadf -d sar_test.dat -- -u > wkr02_cpu_util.csv

ssh wkr02 sadf -d sar_test.dat -- -b > wkr02_io_rate.csv

ssh wkr02 sadf -d sar_test.dat -- -n DEV > wkr02_net_dev.csv

… File name is prefixed with node name (i.e., “wrknn”)

7

I/ Subsystem Test Chart
O
I/O Subsystem has only around 10% of Total Throughput Utilization

RUN the DD Tests first to understand
what the Read and Write throughput
capabilities of your I/O Subsystem

- X axis is time
- Y axis is MB per second

TeraSort for this server design is not
I/O bound. 1.6 GB/s is upper bound,
this is less than 10% utilized.

8

Network Subsystem Test Chart
Network throughput utilization per server less than a ¼ of capacity

A 1GbE NICs can drive up to 1Gb/s for
Reads and 1Gb/s for Writes which is
roughly 4 Gb/s or 400 MB/s total across all
bonded NICs

- Y axis is throughput
- X axis is elapsed time

Rx = Received MB/sec
Tx = Transmitted MB/sec
Tot = Total MB/sec

9

CPU Subsystem Test Chart
CPU Utilization is High, I/O Wait is low – Where you want to be

Each data point is taken every 30 seconds
Y axis is percent utilization of CPUs
X axis is timeline of the run

CPU Utilization is captured by analyzing how
busy each core is and creating an average

IO Wait (1 of 10 other SAR CPU Metrics)
measures percentage of time CPU is waiting
on I/O

10

Memory Subsystem Test Chart
High Percentage Cache Ratio to Total Memory shows CPU is not waiting on memory

- X axis is Memory Utilization
- Y axis is elapsed time

- Percent Caches is a SAR Memory Metrics
(kb_cached). Tells you how many blocks of
memory are cached in the Linux Memory
Page Cache. The amount of memory used to
cache data by the kernel.

- Means we’re doing a good job of caching
data so the JVM is not having to do I/Os and
incurring I/O wait time (reflected in the
previous slide)

11

Tuning from your Server Configuration baseline

- We’ve established now that the Server Configuration we used works pretty well. But what if you want to
Tune it? i.e. sacrifice some performance for cost or to remove an I/O, Network or Memory limitation?

- Types of Disks / Controllers
- Number of Disks / - Type (Socket R 130 W vs.
Controllers Socket B 95 W)
- Amount of Cores
- Amount of Memory Channels
- 4GB vs. 8GB DIMMS
I/ Subsystem
O
COMPUTE

- Floor Space (Rack - Type of Network (1 or 10GbE)
Density) - Amount of Server of NICs
- Power and Cooling - Switch Port Availability
Constraints - Deep Buffering

Data CENTER Network

Annual Power and Cooling Costs of a Cluster are 1/3 the cost
12
of acquiring the cluster

Introducing: Hadoop Ops whack-a-mole

13

W you really want is a shared
hat
service...

The Server Building Blocks

NameNode and JobTracker Configurations

1u, 64GB of RAM, 2 x 2.9 GHz 6 Core Intel Socket R Processors, 4
Small Form Factor Disks, RAID Configuration, 1 Disk Controller

Hadoop Slave Configurations

- 2u 48 GB of RAM, 2 x 2.4 GHz 6 Core Intel Socket B Processors, 1
High Performing Disk Controller with twelve 2TB Large Form Factor
Data Disks in JBOD Configuration
- 24 TB of Storage Capacity, Low Power Consumption CPU
-

14

Single Rack Config
Single rack configuration
42u rack enclosure

This rack is the initia l build ing
blo c k that configures all the 1u 1GbE Switches 2 x 1GbE TOR switches

key management services for
a production scale out cluster 1u 2 Sockets, 4 Disks 1 x Management node
to 4000 nodes.
2u 2 Sockets, 12 Disks 1 x JobTracker node

1 x Name node

(3-18) x Worker nodes

Open for KVM switch
2u 2 Sockets, 12 Disks

Open 1u

15

Multi-Rack Config
1u Switches 2 x 10 GbE Aggregation switch

Scale out configuration
1 or more 42u Racks
The Ex te ns io n build ing blo c k.
One adds 1 or more racks of
this block to the single rack
Single rack RA 1u 1GbE Switches 2 x 1GbE TOR switches

configuration to build out a
cluster that scales to 4000 2u 2 Sockets, 12 Disks 19 x Worker nodes
nodes.

Open for KVM switch

2u 2 Sockets, 12 Disks

Open 2u

16

System on a Chip and Hadoop
Intel ATOM and ARM Processor Server Cartridges

Hadoop has come full circle

4-8 GB RAM
4 Cores 1.8 GHz
2-4 TB
1 GbE NICs

- Amazing Density Advances. 270 +
servers in a Rack (compared to 21!).

- Trevor Robinson from Calxeda is a
member of our community working on
this. He’d love some help:
trevor.robinson@calxeda.com

17

Thanks for attending

Thanks to our partner HP for letting me present their findings. If you’re
interested in learning more about these findings please check out
hp.com/go/hadoop and see detailed reference architectures and varying server
configurations for Hortonworks, MapR and Cloudera.

Steve Watt swatt@redhat.com @wattsteve

18

HBase Testing – Results still TBD
YCSB Workload A and C unable to Drive the CPU

CPU Utilization 2 Workloads Tested:
Network Usage
- YCSB workload “C” : 100%
100 300 read-only requests, random
250 access to DB, using a unique
80
200 primary key value
60
150
Axis Title MB/
sec
40 100 - YCSB workload “A” : 50% read
20 50 and 50% updates, random
- access to DB, using a unique
0
1 3 5 7 9 11 13 15 17 19 21
1 3 4 5 7 910 13 15 17 19 21
11 primary key value
Minutes Minutes

- When data is cached in HBase
cache, performance is really
good and we can occupy the
Disk Usage Memory Usage CPU. Ops throughput (and
hence CPU Utilization) dwindles
300 100
when data exceeds HBase
250 80
cache but is available in Linux
200 60
Axis Title
cache. Resolution TBD –
150 40
MB/
sec Related JIRA HDFS-2246
100 20
50
0
- 1 345 7 9 111315171921
10
1 3 5 7 9 11 13 15 17 19 21 Minutes
Minutes
19

Apache con 2013-hadoop

More Related Content

Apache con 2013-hadoop

Editor's Notes