Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
26 views

Data Engineering Concepts #2 - Sending Data Using An API - by Bar Dadon - Dev Genius

An API allows users to send HTTP requests to a server to interact with services like querying a database or executing functions. The document discusses building a simple API using Flask that scrapes text from a website when users send GET requests to the API. It then improves the API by adding an API key to authenticate requests and only return data to requests that include the correct key.

Uploaded by

teo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Data Engineering Concepts #2 - Sending Data Using An API - by Bar Dadon - Dev Genius

An API allows users to send HTTP requests to a server to interact with services like querying a database or executing functions. The document discusses building a simple API using Flask that scrapes text from a website when users send GET requests to the API. It then improves the API by adding an API key to authenticate requests and only return data to requests that include the correct key.

Uploaded by

teo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Open in app

Search Write

Data Engineering Concepts #2 —


Sending Data Using an API
Bar Dadon · Follow
Published in Dev Genius · 7 min read · Jul 17

151 1

Photo by Myriam Jessier on Unsplash

Introduction
One of the main responsibilities of data engineers is to transfer data between
a source and a destination. Data engineers can do it in many different ways.

Depending on the problem, this job often requires a data engineer to build
and maintain a complex data pipeline. However, data pipelines are not the
only way we can transfer data between machines or services.

In many cases, we can complete the request by building a simple API that
allows authorized users to request data from our services.

What is an API?
An API is simply an interface that allows users to send HTTP requests over
the internet to a server. Using these HTTP requests, a user can interact with
various services on the server, such as querying a database or executing a
function.

The developers who create the API control which operations users can
activate when they send HTTP requests.

For example, we can create an API that, given the correct request, activates a
function that is in charge of calling a query that retrieves the five most active
customer ids in the last month from a table called “customers”.

When to use an API instead of a data pipeline


APIs can be a great replacement for pipelines, but we should be aware when
to use them.

First, because APIs are used to send data over the internet, we can only send
relatively small amounts of data in each request. Also, if there’s a need for
highly complex processing of the data, then the API will be slow and
inefficient. In those cases, we should create a data pipeline instead.
However, APIs can replace a pipeline when the data needed is lightweight
and there’s no need for scheduling.

APIs also allow users to pull the data on their own. Users can interact with a
service whenever they choose, without having to request a data engineer to
execute a certain pipeline.

Of course, we can always use a hybrid approach. We can create a data


pipeline for transferring and processing large amounts of data into a
repository of our choice. Then create an API that can retrieve small amounts
of that processed data to users.

Example
To make this more concrete, let’s build a simple API using Flask. This API
will allow users to send a GET request to our service. If the request is valid,
the API will scrape the website: “example.com” and retrieve the requested
amount of letters from the website.

http://example.com/

1. Setting the environment


To get started, let’s create a virtual environment:

root@DESKTOP-3U7IV4I:/projects# python3 -m venv api_example

Then activate it:

root@DESKTOP-3U7IV4I:/projects/api_example# source bin/activate

To verify that we are currently in the virtual environment, the prompt should
look like this:

(api_example) root@DESKTOP-3U7IV4I:/projects/api_example#

Next, we need to pip install the libraries: flask, bs4 and requests.

(api_example) root@DESKTOP-3U7IV4I:/projects/api_example# pip install flask bs4

Next, create a folder called “app” and a file app.py:


app/app.py

Great. Now we can build the API.

2. Building the API


First, let’s write the function for scraping the website example.com and
retrieving all the text we can find.

from bs4 import BeautifulSoup


import requests

def scrape_data(url = "<http://example.com/>"):


'''
1. Send a GET request to <http://example.com/>.
2. Parse the response.
3. Return all the text in the website.

Args:
- url(str)
default("<http://example.com/>")
Returns:
- text(str)
'''
def extract():
response = requests.get(url)
if response.status_code == 200:
print("Connection Succesful")
else:
raise ConnectionError("Something Went Wrong!")
return response

def transform(response):
text = ''
soup = BeautifulSoup(response.text, 'html.parser')
elements = soup.find_all(name='p')
for ele in elements:
text += ele.text
return text

return transform(extract())

if __name__ == "__main__":
data = scrape_data()
print(data)

Output:

Connection Succesful
This domain is for use in illustrative examples in documents. You may use this
domain in literature without prior coordination or asking for permission.Mor

Seems to be functioning properly. Let’s start building the API now. I will use
flask to create a local app that listens to port 5000.

Any user that sends a GET request to the URL: localhost:5000/ will activate
the above function and receive the text that we just scraped.

from flask import Flask

from bs4 import BeautifulSoup


import requests

def scrape_data(url = "<http://example.com/>"):


'''
1. Send a GET request to <http://example.com/>.
2. Parse the response.
3. Return all the text in the website.

Args:
- url(str)
default("<http://example.com/>")
Returns:
- text(str)
'''
def extract():
response = requests.get(url)
if response.status_code == 200:
print("Connection Succesful")
else:
raise ConnectionError("Something Went Wrong!")
return response

def transform(response):
text = ''
soup = BeautifulSoup(response.text, 'html.parser')
elements = soup.find_all(name='p')
for ele in elements:
text += ele.text
return text

return transform(extract())

# Create a flask app


app = Flask(__name__)

# Implement a route to scrape data


@app.route('/')
def get_data(data):
data = scrape_data()
return data

# Run the app


if __name__ == "__main__":
app.run(debug=True, host = "localhost", port=5000)

To run the app, go to the folder “app” and run:


(api_example) root@DESKTOP-3U7IV4I:/projects/api_example# cd app
(api_example) root@DESKTOP-3U7IV4I:/projects/api_example/app# flask run

If we go to localhost:5000/ we will see the scraped text in our simple app:

Our app at: localhost:5000

3. Using the API


Now, let’s say that we are users that need this data and want to use this API
that the developers built. To access this data we need to send a GET request
to localhost:5000/.

We can do that in many different ways. There are tons of tools for that, the
simplest one is to just use the Linux command “curl”.

Let’s use a curl command to grab this data and store it in a text file called
“scraped_data.txt”
(api_example) root@DESKTOP-3U7IV4I:/projects/api_example# curl -o scraped_data.t

Output:

% Total % Received % Xferd Average Speed Time Time Time Current


Dload Upload Total Spent Left Speed
100 175 100 175 0 0 634 0 --:--:-- --:--:-- --:--:-- 636

We should now have all the scraped text in the text file:

scraped_data.txt

4. Improving the API


Let’s go back to “playing” the developers. As the developers that built this
API, we are also tasked with adding some layer of security. We can’t allow
anyone that sends a simple GET request to grab out data.

a. Adding an API key


A very common way of adding a layer of security is by adding an API key.

For this simple example, let’s say that the API key is 12345. We want to
modify the code so that only requests to the URL
localhost:5000/api_key=12345 will be granted data. All other requests will
fail.

This will ensure that only users that know the API key that we chose are
authorized to send GET requests.

from flask import Flask

from bs4 import BeautifulSoup


import requests

def scrape_data(url = "<http://example.com/>"):


'''
1. Send a GET request to <http://example.com/>.
2. Parse the response.
3. Return all the text in the website.

Args:
- url(str)
default("<http://example.com/>")
Returns:
- text(str)
'''
def extract():
response = requests.get(url)
if response.status_code == 200:
print("Connection Succesful")
else:
raise ConnectionError("Something Went Wrong!")
return response

def transform(response):
text = ''
soup = BeautifulSoup(response.text, 'html.parser')
elements = soup.find_all(name='p')
for ele in elements:
text += ele.text
return text

return transform(extract())

# Create a flask app


app = Flask(__name__)

API_KEY = '12345'

# Implement a route to scrape data


@app.route('/api_key=<api_key>')
def get_data(api_key):
if api_key != API_KEY:
raise ConnectionRefusedError("Wrong API key!")
else:
data = scrape_data()
return data

# Run the app


if __name__ == "__main__":
app.run(debug=True, host = "localhost", port=5000)

Now, let’s send a GET request, but this time with the API key 12345:

(api_example) root@DESKTOP-3U7IV4I:/projects/api_example# curl -o scraped_data.t

Output:

% Total % Received % Xferd Average Speed Time Time Time Current


Dload Upload Total Spent Left Speed
100 175 100 175 0 0 653 0 --:--:-- --:--:-- --:--:-- 655

Great. Now, only authorized users that know that the API key is 12345 can
scrape our data.
b. Controlling the amount of data
Next, let’s allow users to control the amount of data they receive. Instead of
receiving all the data, users will be able to choose how many letters they
want. This can look like this:

from flask import Flask

from bs4 import BeautifulSoup


import requests

def scrape_data(url = "<http://example.com/>"):


'''
1. Send a GET request to <http://example.com/>.
2. Parse the response.
3. Return all the text in the website.

Args:
- url(str)
default("<http://example.com/>")
Returns:
- text(str)
'''
def extract():
response = requests.get(url)
if response.status_code == 200:
print("Connection Succesful")
else:
raise ConnectionError("Something Went Wrong!")
return response

def transform(response):
text = ''
soup = BeautifulSoup(response.text, 'html.parser')
elements = soup.find_all(name='p')
for ele in elements:
text += ele.text
return text

return transform(extract())

# Create a flask app


app = Flask(__name__)

API_KEY = '12345'

# Implement a route to scrape data


@app.route('/api_key=<api_key>/number_of_letters=<number_of_letters>')
def get_data(api_key, number_of_letters):
if api_key != API_KEY:
raise ConnectionRefusedError("Wrong API key!")
else:
data = scrape_data()
return data[0:int(number_of_letters)]

# Run the app


if __name__ == "__main__":
app.run(debug=True, host = "localhost", port=5000)

Now let’s say that we want only the first 100 letters. We can send a GET
request like this:

(api_example) root@DESKTOP-3U7IV4I:/projects/api_example# curl -o scraped_data.t

And the result is a text file with only the first 100 letters:
scraped_data.txt — only the first 100 letters

As we can see, APIs are a useful way to send small amounts of data online
and enable users to access services that developers provide.

This concludes the article. Hope you had a good read and learned something
new. If there are any questions, please don’t hesitate to ask in the comment
section.

API Data Science Data Engineering Python Programming

You might also like