Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Big Data Defined: Andrew J. Brust

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Big Data Defined

Andrew J. Brust
http://www.bluebadgeinsights.com
andrew.brust@bluebadgeinsights.com
Big Data Defined

100s of TB – x PB

Uses Hadoop

Three Vs

Too big for OLTP

Uses distributed/parallel processing


MapReduce

 Map step: split the data and pre-process it


 Reduce step: aggregate the results
 Most typical of Hadoop but employed by others, to various extents
A MapReduce Example

• Count by suite, on each floor

• Send per-suite, per platform totals to lobby

• Sort totals by platform

• Send two platform packets to 10th, 20th, 30th floor

• Tally up each platform

• Collect the tallies

• Merge tallies into one spreadsheet


Data Scientists

Near Abuse of
But with:
synonyms: term:
• Statisticians • Subject • Hadoop
• “Quants” matter experts
expertise • “R”
• Good at developers
knowing (although
the right that does
questions to overlap)
ask

You might also like