Retrieving File Data From HDFS using Python Snakebite
Last Updated :
16 Jun, 2022
Prerequisite: Hadoop Installation, HDFS
Python Snakebite is a very popular Python library that we can use to communicate with the HDFS. Using the Python client library provided by the Snakebite package we can easily write Python code that works on HDFS. It uses protobuf messages to communicate directly with the NameNode. The python client library directly works with HDFS without making a system call to hdfs dfs. The Snakebite doesn't support python3. Â
The hdfs dfs provides multiple commands through which we can perform multiple operations on HDFS. The client library that Snakebite provides will contain various methods that allow us to retrieve data from HDFS. text() method is used to simply read the data from a file available on our HDFS. So let's perform a quick task to understand how we can retrieve data from a file from HDFS.Â
Task: Retrieving File Data From HDFS.
Step 1: Create a text file with the name data.txt and add some data to it.
cd Documents/ # Changing directory to Documents(You can choose as per your requirement)
touch data.txt # touch command is used to create file in linux environment
nano data.txt # nano is a command line text editor for Unix and Linux operating system
cat data.txt # to see the content of a file

Step 2: Send this data.txt file to Hadoop HDFS with the help of copyFromLocal Command. Â Â
Syntax:
hdfs dfs -copyFromLocal /path 1 /path 2 .... /path n /destination
Using the command to sending data.txt to the root directory of HDFS.
hdfs dfs -copyFromLocal /home/dikshant/Documents/data.txt /
Now, Check whether the file is reached to the root directory of HDFS or not with the help of the below command.
hdfs dfs -ls /

You can check it manually by visiting http://localhost:50070/ then Utilities -> Browse the file system.

Step 3: Now our task is to read the data from data.txt we send to our HDFS. So create a file data_read.py in your local file system and add the below python code to it. Â
Python
# importing the library
from snakebite.client import Client
# the below line create client connection to the HDFS NameNode
client = Client('localhost', 9000)
# iterate over data.txt file and will show all the content of data.txt
for l in client.text(['/data.txt']):
print l
Client() method explanation:
The Client() method can accept all the below listed arguments:
- host(string): IP Address of NameNode.
- port(int): RPC port of Namenode.
- hadoop_version (int): Hadoop  protocol version(by default it is: 9)
- use_trash (boolean): Use trash when removing the files.
- effective_use (string): Effective user for the HDFS operations (default user is current user).

Step 4: Run the read_data.py file and observe the result.
python read_data.py

We have successfully fetched the data from data.txt with the help of client library.
We can also copy any file from HDFS to our Local file system with the help of Snakebite. To copy a file from HDFS create a file fetch_file.py  and copy the below python code to it. copyToLocal() method is used to achieve this.
Python
from snakebite.client import Client
client = Client('localhost', 9000)
for a in client.copyToLocal(['/data.txt'], '/home/dikshant/Desktop'):
print a

Now, run this python file you will see the below output.
python fetch_file.py

We can observe that the file now has been copied to my /home/dikshant/desktop directory.

Similar Reads
Deleting Files in HDFS using Python Snakebite
Prerequisite: Hadoop Installation, HDFS Python Snakebite is a very popular Python library we can use to communicate with the HDFS. Using the Python client library provided by the Snakebite package we can easily write python code that works on HDFS. It uses protobuf messages to communicate directly w
3 min read
Creating Files in HDFS using Python Snakebite
Hadoop is a popular big data framework written in Java. But it is not necessary to use Java for working on Hadoop. Some other programming languages like Python, C++ can also be used. We can write C++ code for Hadoop using pipes API or Hadoop pipes. Hadoop pipes enable task-tracker with the help of s
3 min read
Hadoop - Python Snakebite CLI Client, Its Usage and Command References
Python Snakebite comes with a CLI(Command Line Interface) client which is an HDFS based client library. The hostname or IP address of the NameNode and RPC port of the NameNode must be known in order to use python snakebite CLI. We can list all of these port values and hostname by simply creating our
4 min read
Retrieve Image and File stored as a BLOB from MySQL Table using Python
Prerequisites: MySQL server should be installed In this post, we will be talking about how we can store files like images, text files, and other file formats into a MySQL table from a python script. Sometimes, just like other information, we need to store images and files into our database and provi
3 min read
How to fetch data from MongoDB using Python?
MongoDB is a cross-platform, document-oriented database that works on the concept of collections and documents. MongoDB offers high speed, high availability, and high scalability. Fetching data from MongoDB Pymongo provides various methods for fetching the data from mongodb. Let's see them one by on
2 min read
Retrieving And Updating Data Contained in Shelve in Python
In Python shelve you access the keys randomly. In order to access the keys randomly in python shelve we use open() function. This function works a lot like the file open() function in File handling. Syntax for open the file using Python shelve shelve.open(filename, flag='c' , writeback=True) In Orde
3 min read
Python - Read file from sibling directory
In this article, we will discuss the method to read files from the sibling directory in Python. First, create two folders in a root folder, and one folder will contain the python file and the other will contain the file which is to be read. Below is the dictionary tree: Directory Tree: root : | |__S
3 min read
Python IMDbPY - Retrieving movie using movie ID
In this article we will see how we can retrieve the data of movie using its movie ID, movie id is the unique id given to each movie by IMDb. We can use search_movie method to search the movies by their name but it gives many movies as they have same names therefore retrieving a movie by its id is a
1 min read
How to Retrieve Blob Datatype from Postgres with Python
In this article, We will learn How to retrieve BLOB from a PostgreSQL database. BLOB is a Binary large object (BLOB) is a data type that can store any binary data.To Retrieve Blob Datatype from Postgres with Python we will use psycopg2.Stepwise Implementation:Connect to the PostgreSQL server.Create
3 min read
Merge PDF stored in Remote server using Python
Prerequisites: Working with PDF files in Python There are many libraries for manipulating PDF files in Python but all are using when that all PDF files already downloaded in your local machine. But what if your target PDF files are in Remote server, you have only URLs of files and no required downlo
2 min read