PIg in BIg Data
PIg in BIg Data
PIg in BIg Data
PRN: 23070243046
INTRODUCTION
the length of the codes to greate extent. of lines to perform the same task
• There is no need for compilation On Execution, every • Map Reduce jobs have a long compilation process.
Apache Pig operator is converted internally into a Map
Reduce job.
Word count code for Map Reduce in Java
Word count code in Apache Pig
This Apache Pig code takes a text file as input, splits each line into words, groups the words by
their value, counts the occurrences of each word, and finally stores the word counts in a
separate file.
PIG VS OTHER TOOLS
PIG ARCHITECTURE
HOW TO INSTALL PIG
step 1 Visit the website and download pig-0.17.0.tar.gz file
step 2 After download we need to extract the file.
step 3 Open environment settings
step 4 Giving path
step 5 opening the bin folder in pig
step 6 changing the file windows command script
step 7 opening window power shell as administrator
APACHE PIG EXECUTION MODES
Apache Pig Execution Modes
You can run Apache Pig in two modes, namely, Local Mode and HDFS mode.
Local Mode
In this mode, all the files are installed and run from your local host and local file system.
There is no need of Hadoop or HDFS. This mode is generally used for testing purpose.
MapReduce Mode
MapReduce mode is where we load or process the data that exists in the Hadoop File System
(HDFS) using Apache Pig. In this mode, whenever we execute the Pig Latin statements to
process the data, a MapReduce job is invoked in the back-end to perform a particular
operation on the data that exists in the HDFS.
READING DATA
READING DATA
READING DATA
READING DATA
READING DATA
function − We have to choose a function from the set of load functions provided by Apache Pig (BinStorage, JsonLoader, PigStorage,
TextLoader).
BASIC COMMANDS
PIG VS SQL