Hadoop & Spark
Hadoop & Spark
Hadoop & Spark
● Master/Slave
(NameNode &
DataNodes).
● Each file is divided into
blocks of a
pre-determined size.
Features of HDFS
hdfs version
hdfs dfs -ls <path> : liệt kê file và folder tại path chỉ định
hdfs dfs -mkdir [-p] <path> : tạo folder. Nếu có -p, tạo luôn
folder cha nếu folder cha chưa tồn tại
hdfs dfs -ls [-R] <path>: liệt kê file và folder tại path. Nếu có -R
thì liệt kê cả thư mục con bên trong
HDFS Commands
hdfs dfs -put <localSrc> <dest> : put file or folder từ local lên hdfs
hdfs dfs -get <srcHdfs> <localDest> : lấy file or folder từ hdfs về local
hdsf dfs -mv <src> <dest>: di chuyển file hoặc folder trên hdfs
hdfs dfs -cp <src> <dest>: copy file or folder trên hdfs
HDFS Commands
- chown
- du
- df
- cat
- chmod
HDFS Commands (for admin)
● Source: https://www.quickprogrammingtips.com/big-data/how-to-install-hadoop-on-mac-os-x-el-capitan.html
Versions
● Source: https://www.quickprogrammingtips.com/big-data/how-to-install-hadoop-on-mac-os-x-el-capitan.html
Step 1: Install Java
Verify the Java version installed on the system.
● Source: https://www.quickprogrammingtips.com/big-data/how-to-install-hadoop-on-mac-os-x-el-capitan.html
Step 2: Configure SSH
● Source: https://www.quickprogrammingtips.com/big-data/how-to-install-hadoop-on-mac-os-x-el-capitan.html
Step 2: Configure SSH
● Source: https://www.quickprogrammingtips.com/big-data/how-to-install-hadoop-on-mac-os-x-el-capitan.html
Step 3: Install Hadoop
Download hadoop 2.7.3 binary zip file from this link (200MB).
Extract the contents of the zip to a folder of your choice.
(http://hadoop.apache.org/releases.html)
● Source: https://www.quickprogrammingtips.com/big-data/how-to-install-hadoop-on-mac-os-x-el-capitan.html
Step 4: Configure Hadoop
export
JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk
/Contents/Home
● Source: https://www.quickprogrammingtips.com/big-data/how-to-install-hadoop-on-mac-os-x-el-capitan.html
Step 4: Configure Hadoop
2. Modify various
hadoop
configuration files
to properly setup
hadoop and yarn.
These files are
located in
etc/hadoop.
● Source: https://www.quickprogrammingtips.com/big-data/how-to-install-hadoop-on-mac-os-x-el-capitan.html
Step 4: Configure Hadoop
● Source: https://www.quickprogrammingtips.com/big-data/how-to-install-hadoop-on-mac-os-x-el-capitan.html
If disk utilization goes above the configured threshold, yarn will report the
node instance as unhealthy nodes with error "local-dirs are bad".
Step 5: Initialize Hadoop Cluster
● From a terminal window switch to the hadoop home folder
● Run the following command to initialize the metadata for the
hadoop cluster. This formats the hdfs file system and configures
it on the local system. By default, files are created in
/tmp/hadoop-<username> folder.
● Source: https://www.quickprogrammingtips.com/big-data/how-to-install-hadoop-on-mac-os-x-el-capitan.html
Step 5: Initialize Hadoop Cluster
It is possible to modify the default location of name node
configuration by adding the following property in the
hdfs-site.xml file. Similarly the hdfs data block storage location
can be changed using dfs.data.dir property.
● Source: https://www.quickprogrammingtips.com/big-data/how-to-install-hadoop-on-mac-os-x-el-capitan.html
Step 6: Start Hadoop Cluster
● Source: https://www.quickprogrammingtips.com/big-data/how-to-install-hadoop-on-mac-os-x-el-capitan.html
Step 7: Configure HDFS Home Directory
1. Paul Zikopoulos, Chris Eaton. 2011. Understanding Big Data: Analytics for Enterprise Class Hadoop and
Streaming Data (1st ed.). McGraw-Hill Osborne Media.
2. https://www.tutorialspoint.com/hadoop/hadoop_hdfs_overview.htm
3. https://www.quickprogrammingtips.com/big-data/how-to-install-hadoop-on-mac-os-x-el-capitan.html
4. http://blog.prabeeshk.com/blog/2016/12/07/install-apache-spark-2-on-ubuntu-16-dot-04-and-mac-os/
5. http://data-flair.training/blogs/top-hadoop-hdfs-commands-tutorial/