Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
HDFS and Mapreduce
-by Uday Vakalapudi
Agenda
 Core elements of Hadoop
 Basic Hadoop Storage Hierarchy
 HDFS Default Storage Style
 Anatomy of a File Read
 Anatomy of a File Write
 Blocks & Block Caching
 HDFS Basic Filesystem Operations
 Copy with distcp for data backups
 Commissioning and Decommissioning Nodes
 MapReduce inspiration
 Brief Mapreduce flow
 Job, Task, and Task Attempt IDs
 Mapreduce example code(with Compression codec)
Core elements of Hadoop
HDFS Namenode Datanode
Mapreduce JobTracker TaskTracker
Name Node
R1N1 R1N2 R1N3 R1N4
Rack R1
Rack R2
R2N1 R2N2 R2N3 R2N4
Data center D1
Basic Hadoop Storage Hierarchy

Recommended for you

HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals

HDFS stores files as blocks that are by default 64 MB in size to minimize disk seek times. The namenode manages the file system namespace and metadata, tracking which datanodes store each block. When writing a file, HDFS breaks it into blocks and replicates each block across multiple datanodes. The secondary namenode periodically merges namespace and edit log changes to prevent the log from growing too large. Small files are inefficient in HDFS due to each file requiring namespace metadata regardless of size.

big datahdfshadoop
Hadoop architecture by ajay
Hadoop architecture by ajayHadoop architecture by ajay
Hadoop architecture by ajay

More about Hadoop www.beinghadoop.com https://www.facebook.com/hadoopinfo This PPT Gives information about Complete Hadoop Architecture and information about how user request is processed in Hadoop? About Namenode Datanode jobtracker tasktracker Hadoop installation Post Configurations

hadoop definationdatanodehadoop configurations
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1

The document discusses Hadoop, its components, and how they work together. It covers HDFS, which stores and manages large files across commodity servers; MapReduce, which processes large datasets in parallel; and other tools like Pig and Hive that provide interfaces for Hadoop. Key points are that Hadoop is designed for large datasets and hardware failures, HDFS replicates data for reliability, and MapReduce moves computation instead of data for efficiency.

HDFS Default Storage Style
Name Node
R1N1 R1N2 R1N3 R1N4
Rack R1
Rack R2
R2N1 R2N2 R2N3 R2N4
Data.csv
B1 B2 B3
B3 B1 B1 B3 B2 B2
B1
B2
B3
R2N1 R2N2R1N1
R2N3 R2N4R1N1
R2N3 R2N1R1N1
Meta
data
Anatomy of a File Read
Anatomy of a FileWrite
Blocks & Block Caching
 Block size is the minimum amount of data that it can read or write
 Filesystem blocks are typically a few kilobytes in size, whereas disk blocks are normally 512 bytes
 HDFS, too, has the concept of a block, but it is a much larger unit—128 MB by default.
 Like in a filesystem for a single disk, files in HDFS are broken into block-sized chunks, which are
stored as independent units.
 hdfs fsck /user/file.txt -files –blocks
 for frequently-accessed files the blocks may be explicitly cached in the datanode’s memory, in an
off-heap block cache.
 By default a block is cached in only one datanode’s memory
 dfs.datanode.max.locked.memory property used to set max lock memory
 Usinf hdfs cacheadmin option we add cachepool, add directory, and also we can give TTL(time –
to-live)
 hdfs cacheadmin -addDirective -path <path> -pool <pool-name> [-force] [-replication
<replication>] [-ttl <time-to-live>]

Recommended for you

Hadoop HDFS
Hadoop HDFSHadoop HDFS
Hadoop HDFS

HDFS (Hadoop Distributed File System) is a distributed file system that stores large data sets across clusters of machines. It partitions and stores data in blocks across nodes, with multiple replicas of each block for fault tolerance. HDFS uses a master/slave architecture with a NameNode that manages metadata and DataNodes that store data blocks. The NameNode and DataNodes work together to ensure high availability and reliability even when hardware failures occur. HDFS supports large data sets through horizontal scaling and tools like HDFS Federation that allow scaling the namespace across multiple NameNodes.

hdfshadoopdistributed file system
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture

The Hadoop Distributed File System (HDFS) has a master/slave architecture with a single NameNode that manages the file system namespace and regulates client access, and multiple DataNodes that store and retrieve blocks of data files. The NameNode maintains metadata and a map of blocks to files, while DataNodes store blocks and report their locations. Blocks are replicated across DataNodes for fault tolerance following a configurable replication factor. The system uses rack awareness and preferential selection of local replicas to optimize performance and bandwidth utilization.

hdfsreplicationnamenode
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides

This document provides an overview of setting up a Hadoop cluster, including installing the Apache Hadoop distribution, configuring SSH keys for passwordless login between nodes, configuring environment variables and Hadoop configuration files, and starting and stopping the HDFS and MapReduce services. It also briefly discusses alternative Hadoop distributions from Cloudera and Yahoo, as well as using cloud platforms like Amazon EC2 for Hadoop clusters.

HDFS Basic Filesystem Operations
 hadoop fs –ls
 hadoop fs –lsr
 hadoop fs –put localdir hdfsdir <-copyFromLocal>
 hadoop fs –get hdfsdir localdir
 hadoop fs –rmr hdfsdir <rmdir>
 hadoop fs -getmerge <src> <localdst> [addnl]
Copy with distcp for data backups
 hadoop distcp file1 file2
 hadoop distcp dir1 dir2
 hadoop distcp -update dir1 dir2 <If you are unsure of the effect of a distcp operation>
 hadoop distcp -update -delete -p hdfs://namenode1/foo hdfs://namenode2/foo
 The -delete flag causes distcp to delete any files or directories from the destination that
are not present in the source, and -p means that file status attributes like permissions,
block size and replication are preserved.
Commissioning and Decommissioning
Nodes Commissioning new nodes
 Add the network addresses of the new nodes to the include file.
 Update the namenode with the new set of permitted datanodes using thiscommand:
 % hdfs dfsadmin -refreshNodes
 Update the resource manager with the new set of permitted node managers using:
 % yarn rmadmin -refreshNodes
 Update the slaves file with the new nodes, so that they are included in future operations performed by the Hadoop control
scripts.
 Start the new datanodes and node managers.
 Check that the new datanodes and node managers appear in the web UI.
 Decommissioning old nodes
 HDFS is set by the dfs.hosts.exclude property and for YARN by the yarn.resourcemanager.nodes.exclude-path property.
 Update the namenode with the new set of permitted datanodes, using this command:
 % hdfs dfsadmin -refreshNodes
 Update the resource manager with the new set of permitted node managers using:
 % yarn rmadmin -refreshNodes
MapReduce inspiration
 The name MapReduce comes from functional programming
- Map is the name of a higher-order function that applies a given function
to each element of a list. Sample in Scala:
val numbers = List(1,2,3,4,5)
numbers.map(x => x * x) == List(1,4,9,16,25)
- Reduce is the name of a higher-order function that analyze a recursive
data structure and recombine through use of a given combining
operation the results of recursively processing its constituent parts,
building up a return value. Sample in Scala:
val numbers = List(1,2,3,4,5)
numbers.reduce(_ + _) == 15
Note: MapReduce takes an input, splits it into smaller parts, execute the code of
the mapper on every part, then gives all the results to one or more reducers
that merge all the results into one.

Recommended for you

Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system

HDFS is a distributed file system designed to run on commodity hardware. It provides high-performance access to big data across Hadoop clusters and supports big data analytics applications in a low-cost manner. The NameNode stores metadata and manages the file system namespace, while DataNodes store file data in blocks and handle replication for fault tolerance. Clients interact with the NameNode for file operations like writing blocks to DataNodes for storage and reading file blocks.

replicationhadoophdfs
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture

The document provides an overview of the Hadoop architecture including its core components like HDFS for distributed storage, MapReduce for distributed processing, and an explanation of how data is stored in blocks and replicated across nodes in the cluster. Key aspects of HDFS such as the namenode, datanodes, and secondary namenode functions are described as well as how Hadoop implementations like Pig and Hive provide interfaces for data processing.

dhug meetup-1
Meethadoop
MeethadoopMeethadoop
Meethadoop

Hadoop is an open source framework for running large-scale data processing jobs across clusters of computers. It has two main components: HDFS for reliable storage and Hadoop MapReduce for distributed processing. HDFS stores large files across nodes through replication and uses a master-slave architecture. MapReduce allows users to write map and reduce functions to process large datasets in parallel and generate results. Hadoop has seen widespread adoption for processing massive datasets due to its scalability, reliability and ease of use.

Brief Mapreduce flow
Job,Task, andTask Attempt IDs
 application_1410450250506_0003
 job_1410450250506_0003
 task_1410450250506_0003_m_000003
 attempt_1410450250506_0003_m_000003_0
Mapreduce Word count

More Related Content

What's hot

Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Vaibhav Jain
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
Konstantin V. Shvachko
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
Apache Apex
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
Apache Apex
 
Hadoop architecture by ajay
Hadoop architecture by ajayHadoop architecture by ajay
Hadoop architecture by ajay
Hadoop online training
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
Jazan University
 
Hadoop HDFS
Hadoop HDFSHadoop HDFS
Hadoop HDFS
Vigen Sahakyan
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
Aisha Siddiqa
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
ryancox
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
Anshul Bhatnagar
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
Delhi/NCR HUG
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
IIIT-H
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapa
kapa rohit
 
Architecture of Hadoop
Architecture of HadoopArchitecture of Hadoop
Architecture of Hadoop
Knoldus Inc.
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Ovidiu Dimulescu
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
Mahendran Ponnusamy
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
sudhakara st
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
JigsawAcademy2014
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
Rommel Garcia
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologies
Kelly Technologies
 

What's hot (20)

Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
Hadoop architecture by ajay
Hadoop architecture by ajayHadoop architecture by ajay
Hadoop architecture by ajay
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Hadoop HDFS
Hadoop HDFSHadoop HDFS
Hadoop HDFS
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapa
 
Architecture of Hadoop
Architecture of HadoopArchitecture of Hadoop
Architecture of Hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologies
 

Similar to Introduction to HDFS and MapReduce

MapReduce1.pptx
MapReduce1.pptxMapReduce1.pptx
MapReduce1.pptx
ashimashahi1
 
Data analysis on hadoop
Data analysis on hadoopData analysis on hadoop
Data analysis on hadoop
Frank Y
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
Edureka!
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
Edureka!
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
Edureka!
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
Ferran Galí Reniu
 
HDFS_Command_Reference
HDFS_Command_ReferenceHDFS_Command_Reference
HDFS_Command_Reference
Tata Consultancy Services
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop Administration
Edureka!
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
Rahul Jain
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on Raspberry
IJRESJOURNAL
 
Big data-cheat-sheet
Big data-cheat-sheetBig data-cheat-sheet
Big data-cheat-sheet
masoodkhh
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2
Rohit Agrawal
 
SparkNotes
SparkNotesSparkNotes
SparkNotes
Demet Aksoy
 
Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS Cloud
Edureka!
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data training
agiamas
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
puneet yadav
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
kapa rohit
 
HDFS introduction
HDFS introductionHDFS introduction
HDFS introduction
injae yeo
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop Cluster
Edureka!
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
Oleksiy Krotov
 

Similar to Introduction to HDFS and MapReduce (20)

MapReduce1.pptx
MapReduce1.pptxMapReduce1.pptx
MapReduce1.pptx
 
Data analysis on hadoop
Data analysis on hadoopData analysis on hadoop
Data analysis on hadoop
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
 
HDFS_Command_Reference
HDFS_Command_ReferenceHDFS_Command_Reference
HDFS_Command_Reference
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop Administration
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on Raspberry
 
Big data-cheat-sheet
Big data-cheat-sheetBig data-cheat-sheet
Big data-cheat-sheet
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2
 
SparkNotes
SparkNotesSparkNotes
SparkNotes
 
Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS Cloud
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data training
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
HDFS introduction
HDFS introductionHDFS introduction
HDFS introduction
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop Cluster
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
 

More from Uday Vakalapudi

Introduction to pig
Introduction to pigIntroduction to pig
Introduction to pig
Uday Vakalapudi
 
Introduction to sqoop
Introduction to sqoopIntroduction to sqoop
Introduction to sqoop
Uday Vakalapudi
 
Introduction to hbase
Introduction to hbaseIntroduction to hbase
Introduction to hbase
Uday Vakalapudi
 
Introduction to Hive
Introduction to HiveIntroduction to Hive
Introduction to Hive
Uday Vakalapudi
 
Advanced topics in hive
Advanced topics in hiveAdvanced topics in hive
Advanced topics in hive
Uday Vakalapudi
 
Mapreduce total order sorting technique
Mapreduce total order sorting techniqueMapreduce total order sorting technique
Mapreduce total order sorting technique
Uday Vakalapudi
 
Repartition join in mapreduce
Repartition join in mapreduceRepartition join in mapreduce
Repartition join in mapreduce
Uday Vakalapudi
 
Hadoop Mapreduce joins
Hadoop Mapreduce joinsHadoop Mapreduce joins
Hadoop Mapreduce joins
Uday Vakalapudi
 
Oozie workflow using HUE 2.2
Oozie workflow using HUE 2.2Oozie workflow using HUE 2.2
Oozie workflow using HUE 2.2
Uday Vakalapudi
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
Uday Vakalapudi
 
How Hadoop Exploits Data Locality
How Hadoop Exploits Data LocalityHow Hadoop Exploits Data Locality
How Hadoop Exploits Data Locality
Uday Vakalapudi
 
Flume basic
Flume basicFlume basic
Flume basic
Uday Vakalapudi
 

More from Uday Vakalapudi (12)

Introduction to pig
Introduction to pigIntroduction to pig
Introduction to pig
 
Introduction to sqoop
Introduction to sqoopIntroduction to sqoop
Introduction to sqoop
 
Introduction to hbase
Introduction to hbaseIntroduction to hbase
Introduction to hbase
 
Introduction to Hive
Introduction to HiveIntroduction to Hive
Introduction to Hive
 
Advanced topics in hive
Advanced topics in hiveAdvanced topics in hive
Advanced topics in hive
 
Mapreduce total order sorting technique
Mapreduce total order sorting techniqueMapreduce total order sorting technique
Mapreduce total order sorting technique
 
Repartition join in mapreduce
Repartition join in mapreduceRepartition join in mapreduce
Repartition join in mapreduce
 
Hadoop Mapreduce joins
Hadoop Mapreduce joinsHadoop Mapreduce joins
Hadoop Mapreduce joins
 
Oozie workflow using HUE 2.2
Oozie workflow using HUE 2.2Oozie workflow using HUE 2.2
Oozie workflow using HUE 2.2
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
 
How Hadoop Exploits Data Locality
How Hadoop Exploits Data LocalityHow Hadoop Exploits Data Locality
How Hadoop Exploits Data Locality
 
Flume basic
Flume basicFlume basic
Flume basic
 

Recently uploaded

Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
kihus38
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
#kalyanmatkaresult #dpboss #kalyanmatka #satta #matka #sattamatka
 
( Call  ) Girls Nehru Place 9711199012 Beautiful Girls
( Call  ) Girls Nehru Place 9711199012 Beautiful Girls( Call  ) Girls Nehru Place 9711199012 Beautiful Girls
( Call  ) Girls Nehru Place 9711199012 Beautiful Girls
Nikita Singh$A17
 
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
Donghwan Lee
 
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
SARITA PANDEY
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
#kalyanmatkaresult #dpboss #kalyanmatka #satta #matka #sattamatka
 
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model SafeDelhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
dipti singh$A17
 
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers
Amazon Web Services Korea
 
( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...
( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...
( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...
seenu pandey
 
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
Delhi Call Girls
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
#kalyanmatkaresult #dpboss #kalyanmatka #satta #matka #sattamatka
 
Mira Bhayandar @Call @Girls Whatsapp 9920725232 With High Profile Offer
Mira Bhayandar @Call @Girls Whatsapp 9920725232 With High Profile OfferMira Bhayandar @Call @Girls Whatsapp 9920725232 With High Profile Offer
Mira Bhayandar @Call @Girls Whatsapp 9920725232 With High Profile Offer
amaa57820
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
javier ramirez
 
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model SafeKarol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
bookmybebe1
 
Kolkata @Call @Girls Service 0000000000 Rani Best High Class Kolkata Available
Kolkata @Call @Girls Service 0000000000 Rani Best High Class Kolkata AvailableKolkata @Call @Girls Service 0000000000 Rani Best High Class Kolkata Available
Kolkata @Call @Girls Service 0000000000 Rani Best High Class Kolkata Available
roshansa9823
 
@Call @Girls Mira Bhayandar phone 9920874524 You Are Serach A Beautyfull Doll...
@Call @Girls Mira Bhayandar phone 9920874524 You Are Serach A Beautyfull Doll...@Call @Girls Mira Bhayandar phone 9920874524 You Are Serach A Beautyfull Doll...
@Call @Girls Mira Bhayandar phone 9920874524 You Are Serach A Beautyfull Doll...
Disha Mukharji
 
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
punebabes1
 
Maruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekhoMaruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekho
kamli sharma#S10
 
iot paper presentation FINAL EDIT by kiran.pptx
iot paper presentation FINAL EDIT by kiran.pptxiot paper presentation FINAL EDIT by kiran.pptx
iot paper presentation FINAL EDIT by kiran.pptx
KiranKumar139571
 
@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time
@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time
@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time
manjukaushik328
 

Recently uploaded (20)

Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
 
( Call  ) Girls Nehru Place 9711199012 Beautiful Girls
( Call  ) Girls Nehru Place 9711199012 Beautiful Girls( Call  ) Girls Nehru Place 9711199012 Beautiful Girls
( Call  ) Girls Nehru Place 9711199012 Beautiful Girls
 
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
 
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
 
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model SafeDelhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
 
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers
 
( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...
( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...
( Call ) Girls South Mumbai phone 9930687706 You Are Serach A Beautyfull Doll...
 
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
 
Mira Bhayandar @Call @Girls Whatsapp 9920725232 With High Profile Offer
Mira Bhayandar @Call @Girls Whatsapp 9920725232 With High Profile OfferMira Bhayandar @Call @Girls Whatsapp 9920725232 With High Profile Offer
Mira Bhayandar @Call @Girls Whatsapp 9920725232 With High Profile Offer
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
 
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model SafeKarol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
 
Kolkata @Call @Girls Service 0000000000 Rani Best High Class Kolkata Available
Kolkata @Call @Girls Service 0000000000 Rani Best High Class Kolkata AvailableKolkata @Call @Girls Service 0000000000 Rani Best High Class Kolkata Available
Kolkata @Call @Girls Service 0000000000 Rani Best High Class Kolkata Available
 
@Call @Girls Mira Bhayandar phone 9920874524 You Are Serach A Beautyfull Doll...
@Call @Girls Mira Bhayandar phone 9920874524 You Are Serach A Beautyfull Doll...@Call @Girls Mira Bhayandar phone 9920874524 You Are Serach A Beautyfull Doll...
@Call @Girls Mira Bhayandar phone 9920874524 You Are Serach A Beautyfull Doll...
 
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
 
Maruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekhoMaruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekho
 
iot paper presentation FINAL EDIT by kiran.pptx
iot paper presentation FINAL EDIT by kiran.pptxiot paper presentation FINAL EDIT by kiran.pptx
iot paper presentation FINAL EDIT by kiran.pptx
 
@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time
@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time
@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time
 

Introduction to HDFS and MapReduce

  • 1. HDFS and Mapreduce -by Uday Vakalapudi
  • 2. Agenda  Core elements of Hadoop  Basic Hadoop Storage Hierarchy  HDFS Default Storage Style  Anatomy of a File Read  Anatomy of a File Write  Blocks & Block Caching  HDFS Basic Filesystem Operations  Copy with distcp for data backups  Commissioning and Decommissioning Nodes  MapReduce inspiration  Brief Mapreduce flow  Job, Task, and Task Attempt IDs  Mapreduce example code(with Compression codec)
  • 3. Core elements of Hadoop HDFS Namenode Datanode Mapreduce JobTracker TaskTracker
  • 4. Name Node R1N1 R1N2 R1N3 R1N4 Rack R1 Rack R2 R2N1 R2N2 R2N3 R2N4 Data center D1 Basic Hadoop Storage Hierarchy
  • 5. HDFS Default Storage Style Name Node R1N1 R1N2 R1N3 R1N4 Rack R1 Rack R2 R2N1 R2N2 R2N3 R2N4 Data.csv B1 B2 B3 B3 B1 B1 B3 B2 B2 B1 B2 B3 R2N1 R2N2R1N1 R2N3 R2N4R1N1 R2N3 R2N1R1N1 Meta data
  • 6. Anatomy of a File Read
  • 7. Anatomy of a FileWrite
  • 8. Blocks & Block Caching  Block size is the minimum amount of data that it can read or write  Filesystem blocks are typically a few kilobytes in size, whereas disk blocks are normally 512 bytes  HDFS, too, has the concept of a block, but it is a much larger unit—128 MB by default.  Like in a filesystem for a single disk, files in HDFS are broken into block-sized chunks, which are stored as independent units.  hdfs fsck /user/file.txt -files –blocks  for frequently-accessed files the blocks may be explicitly cached in the datanode’s memory, in an off-heap block cache.  By default a block is cached in only one datanode’s memory  dfs.datanode.max.locked.memory property used to set max lock memory  Usinf hdfs cacheadmin option we add cachepool, add directory, and also we can give TTL(time – to-live)  hdfs cacheadmin -addDirective -path <path> -pool <pool-name> [-force] [-replication <replication>] [-ttl <time-to-live>]
  • 9. HDFS Basic Filesystem Operations  hadoop fs –ls  hadoop fs –lsr  hadoop fs –put localdir hdfsdir <-copyFromLocal>  hadoop fs –get hdfsdir localdir  hadoop fs –rmr hdfsdir <rmdir>  hadoop fs -getmerge <src> <localdst> [addnl]
  • 10. Copy with distcp for data backups  hadoop distcp file1 file2  hadoop distcp dir1 dir2  hadoop distcp -update dir1 dir2 <If you are unsure of the effect of a distcp operation>  hadoop distcp -update -delete -p hdfs://namenode1/foo hdfs://namenode2/foo  The -delete flag causes distcp to delete any files or directories from the destination that are not present in the source, and -p means that file status attributes like permissions, block size and replication are preserved.
  • 11. Commissioning and Decommissioning Nodes Commissioning new nodes  Add the network addresses of the new nodes to the include file.  Update the namenode with the new set of permitted datanodes using thiscommand:  % hdfs dfsadmin -refreshNodes  Update the resource manager with the new set of permitted node managers using:  % yarn rmadmin -refreshNodes  Update the slaves file with the new nodes, so that they are included in future operations performed by the Hadoop control scripts.  Start the new datanodes and node managers.  Check that the new datanodes and node managers appear in the web UI.  Decommissioning old nodes  HDFS is set by the dfs.hosts.exclude property and for YARN by the yarn.resourcemanager.nodes.exclude-path property.  Update the namenode with the new set of permitted datanodes, using this command:  % hdfs dfsadmin -refreshNodes  Update the resource manager with the new set of permitted node managers using:  % yarn rmadmin -refreshNodes
  • 12. MapReduce inspiration  The name MapReduce comes from functional programming - Map is the name of a higher-order function that applies a given function to each element of a list. Sample in Scala: val numbers = List(1,2,3,4,5) numbers.map(x => x * x) == List(1,4,9,16,25) - Reduce is the name of a higher-order function that analyze a recursive data structure and recombine through use of a given combining operation the results of recursively processing its constituent parts, building up a return value. Sample in Scala: val numbers = List(1,2,3,4,5) numbers.reduce(_ + _) == 15 Note: MapReduce takes an input, splits it into smaller parts, execute the code of the mapper on every part, then gives all the results to one or more reducers that merge all the results into one.
  • 14. Job,Task, andTask Attempt IDs  application_1410450250506_0003  job_1410450250506_0003  task_1410450250506_0003_m_000003  attempt_1410450250506_0003_m_000003_0