Data Engineering Concepts #2 - Sending Data Using An API - by Bar Dadon - Dev Genius
Data Engineering Concepts #2 - Sending Data Using An API - by Bar Dadon - Dev Genius
Search Write
151 1
Introduction
One of the main responsibilities of data engineers is to transfer data between
a source and a destination. Data engineers can do it in many different ways.
Depending on the problem, this job often requires a data engineer to build
and maintain a complex data pipeline. However, data pipelines are not the
only way we can transfer data between machines or services.
In many cases, we can complete the request by building a simple API that
allows authorized users to request data from our services.
What is an API?
An API is simply an interface that allows users to send HTTP requests over
the internet to a server. Using these HTTP requests, a user can interact with
various services on the server, such as querying a database or executing a
function.
The developers who create the API control which operations users can
activate when they send HTTP requests.
For example, we can create an API that, given the correct request, activates a
function that is in charge of calling a query that retrieves the five most active
customer ids in the last month from a table called “customers”.
First, because APIs are used to send data over the internet, we can only send
relatively small amounts of data in each request. Also, if there’s a need for
highly complex processing of the data, then the API will be slow and
inefficient. In those cases, we should create a data pipeline instead.
However, APIs can replace a pipeline when the data needed is lightweight
and there’s no need for scheduling.
APIs also allow users to pull the data on their own. Users can interact with a
service whenever they choose, without having to request a data engineer to
execute a certain pipeline.
Example
To make this more concrete, let’s build a simple API using Flask. This API
will allow users to send a GET request to our service. If the request is valid,
the API will scrape the website: “example.com” and retrieve the requested
amount of letters from the website.
http://example.com/
To verify that we are currently in the virtual environment, the prompt should
look like this:
(api_example) root@DESKTOP-3U7IV4I:/projects/api_example#
Next, we need to pip install the libraries: flask, bs4 and requests.
Args:
- url(str)
default("<http://example.com/>")
Returns:
- text(str)
'''
def extract():
response = requests.get(url)
if response.status_code == 200:
print("Connection Succesful")
else:
raise ConnectionError("Something Went Wrong!")
return response
def transform(response):
text = ''
soup = BeautifulSoup(response.text, 'html.parser')
elements = soup.find_all(name='p')
for ele in elements:
text += ele.text
return text
return transform(extract())
if __name__ == "__main__":
data = scrape_data()
print(data)
Output:
Connection Succesful
This domain is for use in illustrative examples in documents. You may use this
domain in literature without prior coordination or asking for permission.Mor
Seems to be functioning properly. Let’s start building the API now. I will use
flask to create a local app that listens to port 5000.
Any user that sends a GET request to the URL: localhost:5000/ will activate
the above function and receive the text that we just scraped.
Args:
- url(str)
default("<http://example.com/>")
Returns:
- text(str)
'''
def extract():
response = requests.get(url)
if response.status_code == 200:
print("Connection Succesful")
else:
raise ConnectionError("Something Went Wrong!")
return response
def transform(response):
text = ''
soup = BeautifulSoup(response.text, 'html.parser')
elements = soup.find_all(name='p')
for ele in elements:
text += ele.text
return text
return transform(extract())
We can do that in many different ways. There are tons of tools for that, the
simplest one is to just use the Linux command “curl”.
Let’s use a curl command to grab this data and store it in a text file called
“scraped_data.txt”
(api_example) root@DESKTOP-3U7IV4I:/projects/api_example# curl -o scraped_data.t
Output:
We should now have all the scraped text in the text file:
scraped_data.txt
For this simple example, let’s say that the API key is 12345. We want to
modify the code so that only requests to the URL
localhost:5000/api_key=12345 will be granted data. All other requests will
fail.
This will ensure that only users that know the API key that we chose are
authorized to send GET requests.
Args:
- url(str)
default("<http://example.com/>")
Returns:
- text(str)
'''
def extract():
response = requests.get(url)
if response.status_code == 200:
print("Connection Succesful")
else:
raise ConnectionError("Something Went Wrong!")
return response
def transform(response):
text = ''
soup = BeautifulSoup(response.text, 'html.parser')
elements = soup.find_all(name='p')
for ele in elements:
text += ele.text
return text
return transform(extract())
API_KEY = '12345'
Now, let’s send a GET request, but this time with the API key 12345:
Output:
Great. Now, only authorized users that know that the API key is 12345 can
scrape our data.
b. Controlling the amount of data
Next, let’s allow users to control the amount of data they receive. Instead of
receiving all the data, users will be able to choose how many letters they
want. This can look like this:
Args:
- url(str)
default("<http://example.com/>")
Returns:
- text(str)
'''
def extract():
response = requests.get(url)
if response.status_code == 200:
print("Connection Succesful")
else:
raise ConnectionError("Something Went Wrong!")
return response
def transform(response):
text = ''
soup = BeautifulSoup(response.text, 'html.parser')
elements = soup.find_all(name='p')
for ele in elements:
text += ele.text
return text
return transform(extract())
API_KEY = '12345'
Now let’s say that we want only the first 100 letters. We can send a GET
request like this:
And the result is a text file with only the first 100 letters:
scraped_data.txt — only the first 100 letters
As we can see, APIs are a useful way to send small amounts of data online
and enable users to access services that developers provide.
This concludes the article. Hope you had a good read and learned something
new. If there are any questions, please don’t hesitate to ask in the comment
section.