Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

PIg in BIg Data

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 28

Presentation by Mayur Shirsat

PRN: 23070243046
INTRODUCTION

• Apache Pig is an abstraction over MapReduce.


• It is a tool/platform which is used to analzye larger sets of data
represnting them as data flows.
• It is generally used with hadoop
• To analyze data using Apache Pig programmers need to write cripts
using PIG LATIN.
WHY DO WE NEED?

• Programmers who are not so good at java normally used to struggle


working with Hadoop wspecially while performing any MapReduce
tasks.
• Pig Latin is SQL - like language and it is easy to learn.
• Apache Pig provides many built-in operators to support data operations
like joins, filters, ordering etc.
PIG VS MAPREDUCE

APACHE PIG MAP REDUCE


• Pig is a data flow language
• MapReduce is data processing paradigm.
• Any Novice programmer with a basic knowledge of SQL
• Exposure to Java is must to work with MapReduce
can work conveiently wiith apache pig
• Apache Pig uses multi query approach there by reducing • MapReduce will require almost 20 times more the number

the length of the codes to greate extent. of lines to perform the same task
• There is no need for compilation On Execution, every • Map Reduce jobs have a long compilation process.
Apache Pig operator is converted internally into a Map
Reduce job.
Word count code for Map Reduce in Java
Word count code in Apache Pig

This Apache Pig code takes a text file as input, splits each line into words, groups the words by
their value, counts the occurrences of each word, and finally stores the word counts in a
separate file.
PIG VS OTHER TOOLS
PIG ARCHITECTURE
HOW TO INSTALL PIG
step 1 Visit the website and download pig-0.17.0.tar.gz file
step 2 After download we need to extract the file.
step 3 Open environment settings
step 4 Giving path
step 5 opening the bin folder in pig
step 6 changing the file windows command script
step 7 opening window power shell as administrator
APACHE PIG EXECUTION MODES
Apache Pig Execution Modes
You can run Apache Pig in two modes, namely, Local Mode and HDFS mode.

Local Mode
In this mode, all the files are installed and run from your local host and local file system.
There is no need of Hadoop or HDFS. This mode is generally used for testing purpose.

MapReduce Mode
MapReduce mode is where we load or process the data that exists in the Hadoop File System
(HDFS) using Apache Pig. In this mode, whenever we execute the Pig Latin statements to
process the data, a MapReduce job is invoked in the back-end to perform a particular
operation on the data that exists in the HDFS.
READING DATA
READING DATA
READING DATA
READING DATA
READING DATA
function − We have to choose a function from the set of load functions provided by Apache Pig (BinStorage, JsonLoader, PigStorage,
TextLoader).
BASIC COMMANDS
PIG VS SQL

APACHE PIG SQL


• Pig Latin is a procedural language.
• SQL is a declarative language.
• In Apache Pig, schema is optional. We can store data
• Schema is mandatory in SQL.
without designing a schema (values are stored as $01, $02
• The data model used in SQL is flat relational.
etc.)
• The data model in Apache Pig is nested relational. • There is more opportunity for query optimization in SQL.

• Apache Pig provides limited opportunity for Query


optimization.
PIG PHILOSOPHY
Pigs eats anything

Pigs live anywhere

Pigs are domestic animals


THANK YOU

You might also like