Python Projects With API
Python Projects With API
With API
Minal Pandey
PART 1: Introducing APIs
What is an API?
If you’ve heard the term API before, chances are it’s been used not
to refer to APIs in general, but instead to a specific kind of API, the
web API. A web API allows for information or functionality to be
manipulated by other programs via the internet. For example, with
Twitter’s web API, you can write a program in a language like
Python or Javascript that can perform tasks such as favoriting
tweets or collecting tweet metadata.
In programming more generally, the term API, short for
Application Programming Interface, refers to a part of a computer
program designed to be used or manipulated by another program,
as opposed to an interface designed to be used or manipulated by a
human. Computer programs frequently need to communicate
amongst themselves or with the underlying operating system, and
APIs are one way they do it. In this tutorial, however, we’ll be
using the term API to refer specifically to web APIs.
When to Create an API
In general, consider an API if:
1. Your data set is large, making download via FTP unwieldy or resource-
intensive.
2. Your users will need to access your data in real time, such as for display
on another website or as part of an application.
3. Your data changes or is updated frequently.
4. Your users only need access to a part of the data at any one time.
5. Your users will need to perform actions other than retrieve data, such as
contributing, updating, or deleting data.
If you have data you wish to share with the world, an API is one
way you can get it into the hands of others. However, APIs are not
always the best way of sharing data with users. If the size of the
data you are providing is relatively small, you can instead provide a
“data dump” in the form of a downloadable JSON, XML, CSV, or
SQLite file. Depending on your resources, this approach can be viable
up to a download size of a few gigabytes.
Remember that you can provide both a data dump and an API, and
individual users may find one or the other to better match their use
case. Open Library, for example, provides both a data dump and an
API, each of which serves different use cases for different users.
API Terminology
When using or building APIs, you will encounter these terms
frequently:
HTTP (Hypertext Transfer Protocol) is the primary means of
communicating data on the web. HTTP implements a number of
“methods,” which tell which direction data is moving and what
should happen to it. The two most common are GET, which pulls
data from a server, and POST, which pushes new data to a server.
URL (Uniform Resource Locator) - An address for a resource
on the web, such as https://programminghistorian.org/about. A
URL consists of a protocol (http://), domain
(programminghistorian.org), and optional path (/about). A URL
describes the location of a specific resource, such as a web page.
When reading about APIs, you may see the
terms URL, request, URI, or endpoint used to describe adjacent
ideas. This tutorial will prefer the terms URL and request to
avoid complication. You can follow a URL or make a GET
request in your browser, so you won’t need any special software
to make requests in this tutorial.
JSON (JavaScript Object Notation) is a text-based data storage
format that is designed to be easy to read for both humans and
machines. JSON is generally the most common format for
returning data through an API, XML being the second most
common.
REST (REpresentational State Transfer) is a philosophy that
describes some best practices for implementing APIs. APIs
designed with some or all of these principles in mind are called
REST APIs. While the API outlined in this lesson uses some
REST principles, there is a great deal of disagreement around
this term. For this reason, I do not describe the example APIs
here as REST APIs, but instead as web or HTTP APIs.
Using APIs
Why Use APIs as a Researcher?
The primary focus of this lesson is on creating an API, not
exploring or using an API that has already been implemented.
However, before we start building our own API, it may be useful to
discuss how APIs are useful for researchers. In this section, we’ll
see how APIs can be useful for approaching historical, textual, or
sociological questions using a “macroscopic” or “distant reading”
approach that makes use of relatively large amounts of information.
In doing so, we’ll familiarize ourselves with the basic elements of a
good API. Considering APIs from the perspective of a user will
come in useful when we begin to design our own API later in the
lesson.
An API Case Study: Sensationalism and Historical
Fires
Imagine that our research area is sensationalism and the press: has
newspaper coverage of major events in the United States become
more or less sensational over time? Narrowing the topic, we might
ask whether press coverage of, for example, urban fires has
increased or decreased with government reporting on fire-related
relief spending.
While we won’t be able to explore this question thoroughly, we can
begin to approach this research space by collecting historical data
on newspaper coverage of fires using an API—in this case,
the Chronicling America Historical Newspaper API. The
Chronicling America API allows access to metadata and text for
millions of scanned newspaper pages. In addition, unlike many
other APIs, it also does not require an authentication process,
allowing us to immediately explore the available data without
signing up for an account.
Our initial goal in approaching this research question is to find all
newspaper stories in the Chronicling America database that use the
term “fire.” Typically, use of an API starts with its documentation.
On the Chronicling America API page, we find two pieces of
information critical for getting the data we want from the API: the
API’s base URL and the path corresponding to the function we
want to perform on the API—in this case, searching the database.
Our base URL is:
http://chroniclingamerica.loc.gov
All requests we make to the API must begin with this portion of the
URL. All APIs have a base URL like this one that is the same
across all requests to the API.
Our path is:
/search/pages/results/
If we combine the base URL and the path together into one URL,
we’ll have created a request to the Chronicling America API that
returns all available data in the database:
http://chroniclingamerica.loc.gov/search/pages/results/
If you visit the link above, you’ll see all items available in
Chronicling America (12,243,633 at the time of writing), , not just
the entries related to our search term, “fire.” This request also
returns a formatted HTML view, rather than the structured view we
want to use to collect data.
According to the Chronicling America documentation, in order to
get structured data specifically relating to fire, we need to pass one
more kind of data in our request: query parameters.
http://chroniclingamerica.loc.gov/search/pages/results/?
format=json&proxtext=fire
The query parameters follow the ? in the request, and are seperated
from one another by the & symbol. The first query
parameter, format=json, changes the returned data from HTML to
JSON. The second, proxtext=fire, narrows the returned entries to
those that include our search term.
If you follow the above link in your browser, you’ll see a structured
list of the items in the database related to the search term “fire.”
The format of the returned data is called JSON, and is a structured
format that looks like this excerpt from the Chronicling America
results:
"city": [
"Washington"
],
"date": "19220730",
"title": "The Washington Herald.",
"end_year": 1939,
Scraping X-RATES
Scraping Xe
Scraping Yahoo Finance
Using ExchangeRate API
Using Fixer API
To get started, we have to install the required libraries for all the
methods below:
if __name__ == "__main__":
import sys
source_currency = sys.argv[1]
amount = float(sys.argv[3])
target_currency = "GBP"
price_datetime, exchange_rates =
get_exchange_list_xrates(source_currency, amount)
print("Last updated:", price_datetime)
pprint(exchange_rates)
Excellent, we use the built-in sys module to get the target currency
and the amount from the command line. Let's run this:
import requests
from bs4 import BeautifulSoup as bs
import re
from dateutil.parser import parse
Now let's make a function that accepts the source currency, target
currency, and the amount we want to convert, and then returns the
converted amount along with the exchange rate date and time:
url = f"https://www.xe.com/currencyconverter/convert/?Amount=
{amount}&From={src}&To={dst}"
content = requests.get(url).content
soup = bs(content, "html.parser")
exchange_rate_html = soup.find_all("p")[2]
# get the last updated datetime
last_updated_datetime = parse(re.search(r"Last updated (.+)",
exchange_rate_html.parent.parent.find_all("div")[-2].text).group()[12:])
return last_updated_datetime, get_digits(exchange_rate_html.text)
This time, we get the source and target currencies as well as the
amount from the command-lines, trying to convert 1000
EUR to USD:
$ python currency_converter_xe.py EUR USD 1000
Output:
Last updated datetime: 2022-02-01 13:04:00+00:00
1000.0 EUR = 1125.8987 USD
That's great! Xe usually updates every minute too, so it's real-time!
Scraping Yahoo Finance
Yahoo Finance provides financial news, currency data, stock quotes,
press releases, and financial reports. This section uses
the yahoo_fin library in Python to make a currency exchanger based
on Yahoo Finance data.
Importing the libraries:
import yahoo_fin.stock_info as si from datetime import datetime, timedelta
Output:
Last updated datetime: 2022-02-01 13:26:34
1000.0 EUR = 1126.1261701583862 USD
Using ExchangeRate API
As mentioned at the beginning of this tutorial, if you want a more
reliable way to make a currency converter, you have to choose an
API for that. There are several APIs for this purpose. However, we
have picked two APIs that seem convenient and easy to get started.
ExchangeRate API supports 161 currencies and offers a free
monthly 1,500 requests if you want to try it out, and there is an
open API as well that offers daily updated data, and that's what we
are going to use:
import requests from dateutil.parser import parse
def get_all_exchange_rates_erapi(src):
url = f"https://open.er-api.com/v6/latest/{src}"
# request the open ExchangeRate API and convert to Python dict using
.json()
data = requests.get(url).json()
if data["result"] == "success":
# request successful
# get the last updated datetime
last_updated_datetime = parse(data["time_last_update_utc"])
# get the exchange rates
exchange_rates = data["rates"]
return last_updated_datetime, exchange_rates
The above function requests the open API and returns the exchange
rates for all the currencies with the latest date and time. Let's use
this function to make a currency converter function:
def convert_currency_erapi(src, dst, amount):
# get all the exchange rates
last_updated_datetime, exchange_rates =
get_all_exchange_rates_erapi(src)
# convert by simply getting the target currency exchange rate and
multiply by the amount
return last_updated_datetime, exchange_rates[dst] * amount
Running it:
$ python currency_converter_erapi.py EUR USD 1000
Output:
Last updated datetime: 2022-02-01 00:02:31+00:00
1000.0 EUR = 1120.0 USD
The rates update daily, and it does not offer the exact exchange
number as it's an open API; you can freely sign up for an API
key to get precise exchange rates.
Using Fixer API
One of the promising alternatives is Fixer API. It is a simple and
lightweight API for real-time and historical foreign exchange rates.
You can easily create an account and get the API key.
After you've done that, you can use the /convert endpoint to convert
from one currency to another. However, that's not included in the
free plan and requires upgrading your account.
There is the /latest endpoint that does not require an upgrade and
works in a free account just fine. It returns the exchange rates for
the currency of your region. We can pass the source and target
currencies we want to convert and calculate the exchange rate
between both. Here's the function:
import requests
from datetime import datetime
API_KEY = "<YOUR_API_KEY_HERE>"
Below is the function that uses the /convert endpoint in case you
have an upgraded account:
if __name__ == "__main__":
import sys
source_currency = sys.argv[1]
destination_currency = sys.argv[2]
amount = float(sys.argv[3])
# free account
last_updated_datetime, exchange_rate =
convert_currency_fixerapi_free(source_currency, destination_currency,
amount)
# upgraded account, uncomment if you have one
# last_updated_datetime, exchange_rate =
convert_currency_fixerapi(source_currency, destination_currency, amount)
print("Last updated datetime:", last_updated_datetime)
print(f"{amount} {source_currency} = {exchange_rate}
{destination_currency}")
Before running the script, make sure to replace API_KEY with the
API key you get when registering for an account.
Running the script:
import requests
from bs4 import BeautifulSoup as bs
from dateutil.parser import parse
from pprint import pprint
if __name__ == "__main__":
import sys
source_currency = sys.argv[1]
amount = float(sys.argv[2])
price_datetime, exchange_rates =
get_exchange_list_xrates(source_currency, amount)
print("Last updated:", price_datetime)
pprint(exchange_rates)
currency_converter_xe.py
import requests
from bs4 import BeautifulSoup as bs
import re
from dateutil.parser import parse
url = f"https://www.xe.com/currencyconverter/convert/?Amount=
{amount}&From={src}&To={dst}"
content = requests.get(url).content
soup = bs(content, "html.parser")
exchange_rate_html = soup.find_all("p")[2]
# get the last updated datetime
last_updated_datetime = parse(re.search(r"Last updated (.+)",
exchange_rate_html.parent.parent.find_all("div")[-2].text).group()[12:])
return last_updated_datetime, get_digits(exchange_rate_html.text)
if __name__ == "__main__":
import sys
source_currency = sys.argv[1]
destination_currency = sys.argv[2]
amount = float(sys.argv[3])
last_updated_datetime, exchange_rate =
convert_currency_xe(source_currency, destination_currency, amount)
print("Last updated datetime:", last_updated_datetime)
print(f"{amount} {source_currency} = {exchange_rate}
{destination_currency}")
currency_converter_yahoofin.py
import yahoo_fin.stock_info as si
from datetime import datetime, timedelta
if __name__ == "__main__":
import sys
source_currency = sys.argv[1]
destination_currency = sys.argv[2]
amount = float(sys.argv[3])
last_updated_datetime, exchange_rate =
convert_currency_yahoofin(source_currency, destination_currency,
amount)
print("Last updated datetime:", last_updated_datetime)
print(f"{amount} {source_currency} = {exchange_rate}
{destination_currency}")
currency_converter_erapi.py
import requests
from dateutil.parser import parse
def get_all_exchange_rates_erapi(src):
url = f"https://open.er-api.com/v6/latest/{src}"
# request the open ExchangeRate API and convert to Python dict using
.json()
data = requests.get(url).json()
if data["result"] == "success":
# request successful
# get the last updated datetime
last_updated_datetime = parse(data["time_last_update_utc"])
# get the exchange rates
exchange_rates = data["rates"]
return last_updated_datetime, exchange_rates
if __name__ == "__main__":
import sys
source_currency = sys.argv[1]
destination_currency = sys.argv[2]
amount = float(sys.argv[3])
last_updated_datetime, exchange_rate =
convert_currency_erapi(source_currency, destination_currency, amount)
print("Last updated datetime:", last_updated_datetime)
print(f"{amount} {source_currency} = {exchange_rate}
{destination_currency}")
currency_converter_fixerapi.py
import requests
from datetime import date, datetime
API_KEY = "8c3dce10dc5fdb6ec1f555a1504b1373"
# API_KEY = "<YOUR_API_KEY_HERE>"
if __name__ == "__main__":
import sys
source_currency = sys.argv[1]
destination_currency = sys.argv[2]
amount = float(sys.argv[3])
# free account
last_updated_datetime, exchange_rate =
convert_currency_fixerapi_free(source_currency, destination_currency,
amount)
# upgraded account, uncomment if you have one
# last_updated_datetime, exchange_rate =
convert_currency_fixerapi(source_currency, destination_currency, amount)
print("Last updated datetime:", last_updated_datetime)
print(f"{amount} {source_currency} = {exchange_rate}
{destination_currency}")
PART 3: Webhooks in Python with
Flask
Learn how to create a streaming application with real-time charting by
consuming webhooks with the help of Flask, Redis, SocketIO and other
libraries in Python.
Introduction
A webhook can be thought of as a type of API that is driven by
events rather than requests. Instead of one application making a
request to another to receive a response, a webhook is a service that
allows one program to send data to another as soon as a particular
event takes place.
Webhooks are sometimes referred to as reverse APIs, because
communication is initiated by the application sending the data
rather than the one receiving it. With web services becoming
increasingly interconnected, webhooks are seeing more action as a
lightweight solution for enabling real-time notifications and data
updates without the need to develop a full-scale API.
Webhooks usually act as messengers for smaller data. They help in
sending messages, alerts, notifications and real-time information
from the server-side application to the client-side application.
Let’s say for instance, you want your application to get notified
when tweets that mention a certain account and contain a specific
hashtag are published. Instead of your application continuously
asking Twitter for new posts meeting these criteria, it makes much
more sense for Twitter to send a notification to your application
only when such event takes place.
This is the purpose of a webhook instead of having to repeatedly
request the data (polling mechanism), the receiving application can
sit back and get what it needs without having to send repeated
requests to another system.
Webhooks can open up a lot of possibilities:
Pre-requisites
As our requirements stand, the following components come into
play:
If this tutorial intrigues you and makes you want to dive into the
code immediately, you can check this repository for reviewing the
code used in this article.
Setup
Setting up the package is quite simple and straightforward. Of
course you need Python 3 installed on your system and it is highly
recommended to setup a virtual environment where we will install
the needed libraries:
$ pip install Faker==8.2.0 Flask==1.1.2 Flask-SocketIO==5.0.1
redis==3.5.3 requests==2.25.1
At the end of this tutorial, our folder structure will look like the
following:
Let's start writing the actual code. First, let's define the
configuration parameters for our application within config.py :
#Application configuration File
################################
#Secret key that will be used by Flask for securely signing the session
cookie
# and can be used for other security related needs
SECRET_KEY = 'SECRET_KEY'
#######################################
#Minimum Number Of Tasks To Generate
MIN_NBR_TASKS = 1
#Maximum Number Of Tasks To Generate
MAX_NBR_TASKS = 100
#Time to wait when producing tasks
WAIT_TIME = 1
#Webhook endpoint Mapping to the listener
WEBHOOK_RECEIVER_URL = 'http://localhost:5001/consumetasks'
#######################################
#Map to the REDIS Server Port
BROKER_URL = 'redis://localhost:6379'
#######################################
# init_producer.py
from flask import Flask
# tasks_producer.py
import random
from faker.providers import BaseProvider
from faker import Faker
import config
import time
import requests
import json
import uuid
# Define a TaskProvider
class TaskProvider(BaseProvider):
def task_priority(self):
severity_levels = [
'Low', 'Moderate', 'Major', 'Critical'
]
return severity_levels[random.randint(0, len(severity_levels)-1)]
# Create a Faker instance and seeding to have the same results every time
we execute the script
# Return data in English
fakeTasks = Faker('en_US')
# Seed the Faker instance to have the same results every time we run the
program
fakeTasks.seed_instance(0)
# Assign the TaskProvider to the Faker instance
fakeTasks.add_provider(TaskProvider)
def send_webhook(msg):
"""
Send a webhook to a specified URL
:param msg: task details
:return:
"""
try:
# Post a webhook message
# default is a function applied to objects that are not serializable = it
converts them to str
resp = requests.post(config.WEBHOOK_RECEIVER_URL,
data=json.dumps(
msg, sort_keys=True, default=str), headers={'Content-Type':
'application/json'}, timeout=1.0)
# Returns an HTTPError if an error has occurred during the process
(used for debugging).
resp.raise_for_status()
except requests.exceptions.HTTPError as err:
#print("An HTTP Error occurred",repr(err))
pass
except requests.exceptions.ConnectionError as err:
#print("An Error Connecting to the API occurred", repr(err))
pass
except requests.exceptions.Timeout as err:
#print("A Timeout Error occurred", repr(err))
pass
except requests.exceptions.RequestException as err:
#print("An Unknown Error occurred", repr(err))
pass
except:
pass
else:
return resp.status_code
if __name__ == "__main__":
for resp, total, msg in produce_bunch_tasks():
pass
Now let's build our Flask app that emulates a service producing
tasks:
#app_producer.py
from flask import Response, render_template
from init_producer import app
import tasks_producer
def stream_template(template_name, **context):
app.update_template_context(context)
t = app.jinja_env.get_template(template_name)
rv = t.stream(context)
rv.enable_buffering(5)
return rv
@app.route("/", methods=['GET'])
def index():
return render_template('producer.html')
@app.route('/producetasks', methods=['POST'])
def producetasks():
print("producetasks")
return Response(stream_template('producer.html', data=
tasks_producer.produce_bunch_tasks() ))
if __name__ == "__main__":
app.run(host="localhost",port=5000, debug=True)
<!doctype html>
<html>
<head>
<title>Tasks Producer</title>
<style>
.content {
width: 100%;
}
.container{
max-width: none;
}
</style>
<meta name="viewport" content="width=device-width, initial-
scale=1.0"/>
</head>
<body class="container">
<div class="content">
<form method='post' id="produceTasksForm" action =
"/producetasks">
<button style="height:20%x;width:100%" type="submit"
id="produceTasks">Produce Tasks</button>
</form>
</div>
<div class="content">
<div id="Messages" class="content" style="height:400px;width:100%;
border:2px solid gray; overflow-y:scroll;"></div>
{% for rsp,total, msg in data: %}
<script>
var rsp = "{{ rsp }}";
var total = "{{ total }}";
var msg = "{{ msg }}";
var lineidx = "{{ loop.index }}";
//If the webhook request succeeds color it in blue else in red.
if (rsp == '200') {
rsp = rsp.fontcolor("blue");
}
else {
rsp = rsp.fontcolor("red");
}
//Add the details of the generated task to the Messages section.
document.getElementById('Messages').innerHTML += "<br>" +
lineidx + " out of " + total + " -- "+ rsp + " -- " + msg;
</script>
{% endfor %}
</div>
</body>
</html>
#Setup the Flask SocketIO integration while mapping the Redis Server.
from flask_socketio import SocketIO
socketio =
SocketIO(app,logger=True,engineio_logger=True,message_queue=app.conf
ig['BROKER_URL'])
#Execute on connecting
@socketio.on('connect', namespace='/collectHooks')
def socket_connect():
# Display message upon connecting to the namespace
print('Client Connected To NameSpace /collectHooks - ', request.sid)
#Execute on disconnecting
@socketio.on('disconnect', namespace='/collectHooks')
def socket_connect():
# Display message upon disconnecting from the namespace
print('Client disconnected From NameSpace /collectHooks - ',
request.sid)
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Tasks Consumer</title>
<link rel="stylesheet" href="
{{url_for('static',filename='css/bootstrap.min.css')}}">
<link rel="stylesheet" href="
{{url_for('static',filename='css/Chart.min.css')}}">
</head>
<body>
<div class="content">
<div id="Messages" class="content" style="height:200px;width:100%;
border:1px solid gray; overflow-y:scroll;"></div>
</div>
<div class="container">
<div class="row">
<div class="col-12">
<div class="card">
<div class="card-body">
<canvas id="canvas"></canvas>
</div>
</div>
</div>
</div>
</div>
<!-- import the jquery library -->
<script src="{{ url_for('static',filename='js/jquery.min.js') }}"></script>
<!-- import the socket.io library -->
<script src="{{ url_for('static',filename='js/socket.io.js') }}"></script>
<!-- import the bootstrap library -->
<script src="{{ url_for('static',filename='js/bootstrap.min.js') }}">
</script>
<!-- import the Chart library -->
<script src="{{ url_for('static',filename='js/Chart.min.js') }}"></script>
<script>
$(document).ready(function(){
const config = {
//Type of the chart - Bar Chart
type: 'bar',
//Data for our chart
data: {
labels: ['Low','Moderate','Major','Critical'],
datasets: [{
label: "Count Of Tasks",
//Setting a color for each bar
backgroundColor: ['green','blue','yellow','red'],
borderColor: 'rgb(255, 99, 132)',
data: [0,0,0,0],
fill: false,
}],
},
//Configuration options
options: {
responsive: true,
title: {
display: true,
text: 'Tasks Priority Matrix'
},
tooltips: {
mode: 'index',
intersect: false,
},
hover: {
mode: 'nearest',
intersect: true
},
scales: {
xAxes: [{
display: true,
scaleLabel: {
display: true,
labelString: 'Priority'
}
}],
yAxes: [{
display: true
,ticks: {
beginAtZero: true
}
,scaleLabel: {
display: true,
labelString: 'Total'
}
}]
}
}
};
const context = document.getElementById('canvas').getContext('2d');
//Creating the bar chart
const lineChart = new Chart(context, config);
//Reserved for websocket manipulation
var namespace='/collectHooks';
var url = 'http://' + document.domain + ':' + location.port + namespace;
var socket = io.connect(url);
//When connecting to the socket join the room
socket.on('connect', function() {
socket.emit('join_room');
});
//When receiving a message
socket.on('msg' , function(data) {
var msg = JSON.parse(data);
var newLine = $('<li>'+ 'Batch ID. = ' + msg.batchid + ' --
Task ID. = ' + msg.id + ' -- Owner = ' + msg.owner + ' -- Priority = ' +
msg.priority +'</li>');
newLine.css("color","blue");
$("#Messages").append(newLine);
//Retrieve the index of the priority of the received message
var lindex = config.data.labels.indexOf(msg.priority);
//Increment the value of the priority of the received
message
config.data.datasets[0].data[lindex] += 1;
//Update the chart
lineChart.update();
});
});
</script>
</body>
</html>
Now let's test our program, please proceed as per the following
steps:
$ python app_producer.py
Start the Redis server, make sure the Redis instance is
running on TCP port 6479.
Open up another terminal and run app_consumer.py:
$ python app_consumer.py
Summary
Webhooks are an important part of the web and they are becoming
more popular. They allow your applications to exchange data
instantly and seamlessly.
While webhooks are similar to APIs, they both play different roles,
each with its own unique use case. Hopefully, this article has
expanded your understanding, and remember that the key to getting
the most out of webhooks is to know when they are the right choice
for your application.
Fullcode:
config.py
#Application configuration File
################################
#Secret key that will be used by Flask for securely signing the session
cookie
# and can be used for other security related needs
SECRET_KEY = 'SECRET_KEY'
#######################################
#Minimum Number Of Tasks To Generate
MIN_NBR_TASKS = 1
# Define a TaskProvider
class TaskProvider(BaseProvider):
def task_priority(self):
severity_levels = [
'Low', 'Moderate', 'Major', 'Critical'
]
return severity_levels[random.randint(0, len(severity_levels)-1)]
# Create a Faker instance and seeding to have the same results every time
we execute the script
# Return data in English
fakeTasks = Faker('en_US')
# Seed the Faker instance to have the same results every time we run the
program
fakeTasks.seed_instance(0)
# Assign the TaskProvider to the Faker instance
fakeTasks.add_provider(TaskProvider)
# Generate A Fake Task
def produce_task(batchid, taskid):
# Message composition
message = {
'batchid': batchid, 'id': taskid, 'owner': fakeTasks.unique.name(),
'priority': fakeTasks.task_priority()
# ,'raised_date':fakeTasks.date_time_this_year()
# ,'description':fakeTasks.text()
}
return message
def send_webhook(msg):
"""
Send a webhook to a specified URL
:param msg: task details
:return:
"""
try:
# Post a webhook message
# default is a function applied to objects that are not serializable = it
converts them to str
resp = requests.post(config.WEBHOOK_RECEIVER_URL,
data=json.dumps(
msg, sort_keys=True, default=str), headers={'Content-Type':
'application/json'}, timeout=1.0)
# Returns an HTTPError if an error has occurred during the process
(used for debugging).
resp.raise_for_status()
except requests.exceptions.HTTPError as err:
#print("An HTTP Error occurred",repr(err))
pass
except requests.exceptions.ConnectionError as err:
#print("An Error Connecting to the API occurred", repr(err))
pass
except requests.exceptions.Timeout as err:
#print("A Timeout Error occurred", repr(err))
pass
except requests.exceptions.RequestException as err:
#print("An Unknown Error occurred", repr(err))
pass
except:
pass
else:
return resp.status_code
if __name__ == "__main__":
for resp, total, msg in produce_bunch_tasks():
pass
init_producer.py
app_producer.py
#Flask imports
from flask import Response, render_template
from init_producer import app
import tasks_producer
@app.route("/", methods=['GET'])
def index():
return render_template('producer.html')
@app.route('/producetasks', methods=['POST'])
def producetasks():
print("producetasks")
return Response(stream_template('producer.html', data=
tasks_producer.produce_bunch_tasks() ))
if __name__ == "__main__":
app.run(host="localhost",port=5000, debug=True)
init_consumer.py
#Setup the Flask SocketIO integration while mapping the Redis Server.
from flask_socketio import SocketIO
socketio =
SocketIO(app,logger=True,engineio_logger=True,message_queue=app.conf
ig['BROKER_URL'])
app_consumer.py
#Flask imports
from flask import render_template, request,session
from flask_socketio import join_room
from init_consumer import app, socketio
import json
import uuid
#Execute on connecting
@socketio.on('connect', namespace='/collectHooks')
def socket_connect():
# Display message upon connecting to the namespace
print('Client Connected To NameSpace /collectHooks - ', request.sid)
#Execute on disconnecting
@socketio.on('disconnect', namespace='/collectHooks')
def socket_connect():
# Display message upon disconnecting from the namespace
print('Client disconnected From NameSpace /collectHooks - ',
request.sid)
templates/producer.html
<!doctype html>
<html>
<head>
<title>Tasks Producer</title>
<style>
.content {
width: 100%;
}
.container{
max-width: none;
}
</style>
<meta name="viewport" content="width=device-width, initial-
scale=1.0"/>
</head>
<body class="container">
<div class="content">
<form method='post' id="produceTasksForm" action =
"/producetasks">
<button style="height:20%x;width:100%" type="submit"
id="produceTasks">Produce Tasks</button>
</form>
</div>
<div class="content">
<div id="Messages" class="content" style="height:400px;width:100%;
border:2px solid gray; overflow-y:scroll;"></div>
{% for rsp,total, msg in data: %}
<script>
var rsp = "{{ rsp }}";
var total = "{{ total }}";
var msg = "{{ msg }}";
var lineidx = "{{ loop.index }}";
//If the webhook request succeeds color it in blue else in red.
if (rsp == '200') {
rsp = rsp.fontcolor("blue");
}
else {
rsp = rsp.fontcolor("red");
}
//Add the details of the generated task to the Messages section.
document.getElementById('Messages').innerHTML += "<br>" +
lineidx + " out of " + total + " -- "+ rsp + " -- " + msg;
</script>
{% endfor %}
</div>
</body>
</html>
templates/consumer.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Tasks Consumer</title>
<div class="container">
<div class="row">
<div class="col-12">
<div class="card">
<div class="card-body">
<canvas id="canvas"></canvas>
</div>
</div>
</div>
</div>
</div>
Note: If this is the first time you use Google APIs, you may need to
simply create an OAuth Consent screen and add your email as a
testing user.
Now that you have set up YouTube API, get your credentials.json in
the current directory of your notebook/Python file, and let's get
started.
import urllib.parse as p
import re
import os
import pickle
SCOPES = ["https://www.googleapis.com/auth/youtube.force-ssl"]
SCOPES is a list of scopes of using YouTube API; we're using this
one to view all YouTube data without any problems.
Now let's make the function that authenticates with YouTube API:
def youtube_authenticate():
os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1"
api_service_name = "youtube"
api_version = "v3"
client_secrets_file = "credentials.json"
creds = None
# the file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the
first time
if os.path.exists("token.pickle"):
with open("token.pickle", "rb") as token:
creds = pickle.load(token)
# if there are no (valid) credentials availablle, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow =
InstalledAppFlow.from_client_secrets_file(client_secrets_file, SCOPES)
creds = flow.run_local_server(port=0)
# save the credentials for the next run
with open("token.pickle", "wb") as token:
pickle.dump(creds, token)
Now that you have everything set up, let's begin with extracting
YouTube video details, such as title, description, upload time, and
even statistics such as view count, and like count.
The following function will help us extract the video ID (that we'll
need in the API) from a video URL:
def get_video_id_by_url(url):
"""
Return the Video ID from the video `url`
"""
# split URL parts
parsed_url = p.urlparse(url)
# get the video ID by parsing the query of the URL
video_id = p.parse_qs(parsed_url.query).get("v")
if video_id:
return video_id[0]
else:
raise Exception(f"Wasn't able to parse video URL: {url}")
def print_video_infos(video_response):
items = video_response.get("items")[0]
# get the snippet, statistics & content details from the video response
snippet = items["snippet"]
statistics = items["statistics"]
content_details = items["contentDetails"]
# get infos from the snippet
channel_title = snippet["channelTitle"]
title = snippet["title"]
description = snippet["description"]
publish_time = snippet["publishedAt"]
# get stats infos
comment_count = statistics["commentCount"]
like_count = statistics["likeCount"]
view_count = statistics["viewCount"]
# get duration from content details
duration = content_details["duration"]
# duration in the form of something like 'PT5H50M15S'
# parsing it to be something like '5:50:15'
parsed_duration = re.search(f"PT(\d+H)?(\d+M)?(\d+S)",
duration).groups()
duration_str = ""
for d in parsed_duration:
if d:
duration_str += f"{d[:-1]}:"
duration_str = duration_str.strip(":")
print(f"""\
Title: {title}
Description: {description}
Channel Title: {channel_title}
Publish time: {publish_time}
Duration: {duration_str}
Number of comments: {comment_count}
Number of likes: {like_count}
Number of views: {view_count}
""")
video_url = "https://www.youtube.com/watch?
v=jNQXAC9IVRw&ab_channel=jawed"
# parse video ID from URL
video_id = get_video_id_by_url(video_url)
# make API call to get video info
response = get_video_details(youtube, id=video_id)
# print extracted video infos
print_video_infos(response)
We first get the video ID from the URL, and then we get the
response from the API call and finally print the data. Here is the
output:
This time we care about the snippet, and we use search() instead
of videos() like in the previously defined get_video_details() function.
Let's, for example, search for "python" and limit the results to only
2:
Now we can parse the channel URL. Let's define our functions to
call the YouTube API:
channel_url = "https://www.youtube.com/channel/UC8butISFwT-
Wl7EV0hUK0BQ"
# get the channel ID from the URL
channel_id = get_channel_id_by_url(youtube, channel_url)
# get the channel details
response = get_channel_details(youtube, id=channel_id)
# extract channel infos
snippet = response["items"][0]["snippet"]
statistics = response["items"][0]["statistics"]
channel_country = snippet["country"]
channel_description = snippet["description"]
channel_creation_date = snippet["publishedAt"]
channel_title = snippet["title"]
channel_subscriber_count = statistics["subscriberCount"]
channel_video_count = statistics["videoCount"]
channel_view_count = statistics["viewCount"]
print(f"""
Title: {channel_title}
Published At: {channel_creation_date}
Description: {channel_description}
Country: {channel_country}
Number of videos: {channel_video_count}
Number of subscribers: {channel_subscriber_count}
Total views: {channel_view_count}
""")
# the following is grabbing channel videos
# number of pages you want to get
n_pages = 2
# counting number of videos grabbed
n_videos = 0
next_page_token = None
for i in range(n_pages):
params = {
'part': 'snippet',
'q': '',
'channelId': channel_id,
'type': 'video',
}
if next_page_token:
params['pageToken'] = next_page_token
res = get_channel_videos(youtube, **params)
channel_videos = res.get("items")
for video in channel_videos:
n_videos += 1
video_id = video["id"]["videoId"]
# easily construct video URL by its ID
video_url = f"https://www.youtube.com/watch?v={video_id}"
video_response = get_video_details(youtube, id=video_id)
print(f"================Video #
{n_videos}================")
# print the video details
print_video_infos(video_response)
print(f"Video URL: {video_url}")
print("="*40)
print("*"*100)
# if there is a next page, then add it to our parameters
# to proceed to the next page
if "nextPageToken" in res:
next_page_token = res["nextPageToken"]
We first get the channel ID from the URL, and then we make an
API call to get channel details and print them.
After that, we specify the number of pages of videos we want to
extract. The default is ten videos per page, and we can also change
that by passing the maxResults parameter. We iterate on
We're extracting the comment itself, the number of likes, and the
last updated date; you can explore the response dictionary to get
various other useful information.
You're free to edit the parameters we passed, such as increasing
the maxResults, or changing the order. Please check the page for this
API endpoint.
Summary
YouTube Data API provides a lot more than what we covered here.
If you have a YouTube channel, you can upload, update and delete
videos, and much more.
Fullcode:
utils.py
import urllib.parse as p
import re
import os
import pickle
SCOPES = ["https://www.googleapis.com/auth/youtube.force-ssl"]
def youtube_authenticate():
os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1"
api_service_name = "youtube"
api_version = "v3"
client_secrets_file = "credentials.json"
creds = None
# the file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the
first time
if os.path.exists("token.pickle"):
with open("token.pickle", "rb") as token:
creds = pickle.load(token)
# if there are no (valid) credentials availablle, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow =
InstalledAppFlow.from_client_secrets_file(client_secrets_file, SCOPES)
creds = flow.run_local_server(port=0)
# save the credentials for the next run
with open("token.pickle", "wb") as token:
pickle.dump(creds, token)
def print_video_infos(video_response):
items = video_response.get("items")[0]
# get the snippet, statistics & content details from the video response
snippet = items["snippet"]
statistics = items["statistics"]
content_details = items["contentDetails"]
# get infos from the snippet
channel_title = snippet["channelTitle"]
title = snippet["title"]
description = snippet["description"]
publish_time = snippet["publishedAt"]
# get stats infos
comment_count = statistics["commentCount"]
like_count = statistics["likeCount"]
view_count = statistics["viewCount"]
# get duration from content details
duration = content_details["duration"]
# duration in the form of something like 'PT5H50M15S'
# parsing it to be something like '5:50:15'
parsed_duration = re.search(f"PT(\d+H)?(\d+M)?(\d+S)",
duration).groups()
duration_str = ""
for d in parsed_duration:
if d:
duration_str += f"{d[:-1]}:"
duration_str = duration_str.strip(":")
print(f"""
Title: {title}
Description: {description}
Channel Title: {channel_title}
Publish time: {publish_time}
Duration: {duration_str}
Number of comments: {comment_count}
Number of likes: {like_count}
Number of views: {view_count}
""")
def parse_channel_url(url):
"""
This function takes channel `url` to check whether it includes a
channel ID, user ID or channel name
"""
path = p.urlparse(url).path
id = path.split("/")[-1]
if "/c/" in path:
return "c", id
elif "/channel/" in path:
return "channel", id
elif "/user/" in path:
return "user", id
video_details.py
if __name__ == "__main__":
# authenticate to YouTube API
youtube = youtube_authenticate()
video_url = "https://www.youtube.com/watch?
v=jNQXAC9IVRw&ab_channel=jawed"
# parse video ID from URL
video_id = get_video_id_by_url(video_url) # make
API call to get video info response =
get_video_details(youtube, id=video_id) # print
extracted video infos print_video_infos(response)
search_by_keyword.py
if __name__ == "__main__":
# authenticate to YouTube API
youtube = youtube_authenticate()
# search for the query 'python' and retrieve 2 items only
response = search(youtube, q="python", maxResults=2)
items = response.get("items")
for item in items:
# get the video ID
video_id = item["id"]["videoId"]
# get the video details
video_response = get_video_details(youtube, id=video_id)
# print the video details
print_video_infos(video_response)
print("="*50)
channel_details.py from
utils import (
youtube_authenticate,
get_channel_id_by_url,
get_channel_details,
get_video_details,
print_video_infos
)
if __name__ == "__main__":
# authenticate to YouTube API
youtube = youtube_authenticate()
channel_url = "https://www.youtube.com/channel/UC8butISFwT-
Wl7EV0hUK0BQ"
# get the channel ID from the URL
channel_id = get_channel_id_by_url(youtube, channel_url)
# get the channel details
response = get_channel_details(youtube, id=channel_id)
# extract channel infos
snippet = response["items"][0]["snippet"]
statistics = response["items"][0]["statistics"]
channel_country = snippet["country"]
channel_description = snippet["description"]
channel_creation_date = snippet["publishedAt"]
channel_title = snippet["title"]
channel_subscriber_count = statistics["subscriberCount"]
channel_video_count = statistics["videoCount"]
channel_view_count = statistics["viewCount"]
print(f"""
Title: {channel_title}
Published At: {channel_creation_date}
Description: {channel_description}
Country: {channel_country}
Number of videos: {channel_video_count}
Number of subscribers: {channel_subscriber_count}
Total views: {channel_view_count}
""")
# the following is grabbing channel videos
# number of pages you want to get
n_pages = 2
# counting number of videos grabbed
n_videos = 0
next_page_token = None
for i in range(n_pages):
params = {
'part': 'snippet',
'q': '',
'channelId': channel_id,
'type': 'video',
}
if next_page_token:
params['pageToken'] = next_page_token
res = get_channel_videos(youtube, **params)
channel_videos = res.get("items")
for video in channel_videos:
n_videos += 1
video_id = video["id"]["videoId"]
# easily construct video URL by its ID
video_url = f"https://www.youtube.com/watch?v={video_id}"
video_response = get_video_details(youtube, id=video_id)
print(f"================Video #
{n_videos}================")
# print the video details
print_video_infos(video_response)
print(f"Video URL: {video_url}")
print("="*40)
# if there is a next page, then add it to our parameters
# to proceed to the next page
if "nextPageToken" in res:
next_page_token = res["nextPageToken"]
comments.py
if __name__ == "__main__":
# authenticate to YouTube API
youtube = youtube_authenticate()
# URL can be a channel or a video, to extract comments
url = "https://www.youtube.com/watch?
v=jNQXAC9IVRw&ab_channel=jawed"
if "watch" in url:
# that's a video
video_id = get_video_id_by_url(url)
params = {
'videoId': video_id,
'maxResults': 2,
'order': 'relevance', # default is 'time' (newest)
}
else:
# should be a channel
channel_id = get_channel_id_by_url(url)
params = {
'allThreadsRelatedToChannelId': channel_id,
'maxResults': 2,
'order': 'relevance', # default is 'time' (newest)
}
# get the first 2 pages (2 API requests)
n_pages = 2
for i in range(n_pages):
# make API call to get all comments from the channel (including posts
& videos)
response = get_comments(youtube, **params)
items = response.get("items")
# if items is empty, breakout of the loop
if not items:
break
for item in items:
comment = item["snippet"]["topLevelComment"]["snippet"]
["textDisplay"]
updated_at = item["snippet"]["topLevelComment"]["snippet"]
["updatedAt"]
like_count = item["snippet"]["topLevelComment"]["snippet"]
["likeCount"]
comment_id = item["snippet"]["topLevelComment"]["id"]
print(f"""\
Comment: {comment}
Likes: {like_count}
Updated At: {updated_at}
==================================\
""")
if "nextPageToken" in response:
# if there is a next page
# add next page token to the params we pass to the function
params["pageToken"] = response["nextPageToken"]
else:
# must be end of comments!!!!
break
print("*"*70)
PART 5: Use Gmail API in Python
Learn how to use Gmail API to send emails, search for emails by query,
delete emails, mark emails as read or unread in Python.
Gmail is by far the most popular mail service nowadays, it's used
by individuals and organizations. Many of its features are enhanced
with AI, including its security (and detection of fraudulent emails)
and its suggestions when writing emails.
In the previous tutorials, we explained how you can send emails as
well as reading emails with Python, if you didn't read them yet, I
highly recommend you check them out.
For this guide, we will explore some of the main features of the
Gmail API, we will write several Python scripts that have the
ability to send emails, search for emails, deletes, and mark as read
or unread, they'll be used as follows:
Note: If this is the first time you use Google APIs, you may need to
simply create an OAuth Consent screen and add your email as a
testing user.
Now we're done with setting up the API, let's start by importing the
necessary modules:
import os
import pickle
# Gmail API utils
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
# for encoding/decoding messages in base64
from base64 import urlsafe_b64decode, urlsafe_b64encode
# for dealing with attachement MIME types
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from email.mime.image import MIMEImage
from email.mime.audio import MIMEAudio
from email.mime.base import MIMEBase
from mimetypes import guess_type as guess_mime_type
does
the authentication with Gmail API and returns a service object that
can be used later in all in our upcoming functions:
def gmail_authenticate():
creds = None
# the file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the
first time
if os.path.exists("token.pickle"):
with open("token.pickle", "rb") as token:
creds = pickle.load(token)
# if there are no (valid) credentials availablle, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file('credentials.json',
SCOPES)
creds = flow.run_local_server(port=0)
# save the credentials for the next run
with open("token.pickle", "wb") as token:
pickle.dump(creds, token)
return build('gmail', 'v1', credentials=creds)
You should see this familiar if you already used a Google API
before, such as Google drive API, it is basically reading
the credentials.json and saving it to token.pickle file after
authenticating with Google in your browser, we save the token so
the second time we run the code we shouldn't authenticate again.
This will prompt you in your default browser to accept the
permissions required for this app, if you see a window that
indicates the app isn't verified, you may just want to head
to Advanced and click on go to Gmail API Python (unsafe):
Sending Emails
First, let's start with the function that sends emails, we know that
emails can contain attachments, so we will define a function that
adds an attachment to a message, a message is an instance
of MIMEMultipart (or MIMEText , if it doesn't contain attachments):
# Adds the attachment with the given filename to the given message
def add_attachment(message, filename):
content_type, encoding = guess_mime_type(filename)
if content_type is None or encoding is not None:
content_type = 'application/octet-stream'
main_type, sub_type = content_type.split('/', 1)
if main_type == 'text':
fp = open(filename, 'rb')
msg = MIMEText(fp.read().decode(), _subtype=sub_type)
fp.close()
elif main_type == 'image':
fp = open(filename, 'rb')
msg = MIMEImage(fp.read(), _subtype=sub_type)
fp.close()
elif main_type == 'audio':
fp = open(filename, 'rb')
msg = MIMEAudio(fp.read(), _subtype=sub_type)
fp.close()
else:
fp = open(filename, 'rb')
msg = MIMEBase(main_type, sub_type)
msg.set_payload(fp.read())
fp.close()
filename = os.path.basename(filename)
msg.add_header('Content-Disposition', 'attachment', filename=filename)
message.attach(msg)
In this section, we'll make Python code that takes a search query as
input and reads all the matched emails; printing email basic
information (To, From addresses, Subject and Date)
and plain/text parts.
We'll also create a folder for each email based on the subject and
download text/html content as well as any file that is attached to the
email and saves it in the folder created.
Before we dive into the function that reads emails given a search
query, we gonna define two utility functions that we'll use:
# utility functions
def get_size_format(b, factor=1024, suffix="B"):
"""
Scale bytes to its proper byte format
e.g:
1253656 => '1.20MB'
1253656678 => '1.17GB'
"""
for unit in ["", "K", "M", "G", "T", "P", "E", "Z"]:
if b < factor:
return f"{b:.2f}{unit}{suffix}"
b /= factor
return f"{b:.2f}Y{suffix}"
def clean(text):
# clean text for creating a folder
return "".join(c if c.isalnum() else "_" for c in text)
This will download and parse all emails that contain Python Code
keyword, here is a part of the output:
==================================================
From: Python Code <email@domain.com>
To: "email@gmail.com" <email@gmail.com>
Subject: How to Play and Record Audio in Python
Date: Fri, 21 Feb 2020 09:24:58 +0000
Hello !
I have no doubt that you already encountered with an application that uses
sound (either recording or playing) and you know how useful is that !
<...SNIPPED..>
Hello,
A brute-force attack consists of an attack that submits many passwords with
the hope of guessing correctly.
<...SNIPPED...>
You'll also see folders created in your current directory for each
email matched:
Inside each folder, it has its corresponding HTML version of the
email, as well as any attachments if available.
Marking Emails as Read
def mark_as_read(service, query):
messages_to_mark = search_messages(service, query)
return service.users().messages().batchModify(
userId='me',
body={
'ids': [ msg['id'] for msg in messages_to_mark ],
'removeLabelIds': ['UNREAD']
}
).execute()
Example run:
Fullcode:
common.py
import os
import pickle
# Gmail API utils
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
def gmail_authenticate():
creds = None
# the file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the
first time
if os.path.exists("token.pickle"):
with open("token.pickle", "rb") as token:
creds = pickle.load(token)
# if there are no (valid) credentials availablle, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file('credentials.json',
SCOPES)
creds = flow.run_local_server(port=0)
# save the credentials for the next run
with open("token.pickle", "wb") as token:
pickle.dump(creds, token)
return build('gmail', 'v1', credentials=creds)
send_emails.py
# for getting full paths to attachements
import os
# for encoding messages in base64
from base64 import urlsafe_b64encode
# for dealing with attachement MIME types
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from email.mime.image import MIMEImage
from email.mime.audio import MIMEAudio
from email.mime.base import MIMEBase
from mimetypes import guess_type as guess_mime_type
# Adds the attachment with the given filename to the given message
def add_attachment(message, filename):
content_type, encoding = guess_mime_type(filename)
if content_type is None or encoding is not None:
content_type = 'application/octet-stream'
main_type, sub_type = content_type.split('/', 1)
if main_type == 'text':
fp = open(filename, 'rb')
msg = MIMEText(fp.read().decode(), _subtype=sub_type)
fp.close()
elif main_type == 'image':
fp = open(filename, 'rb')
msg = MIMEImage(fp.read(), _subtype=sub_type)
fp.close()
elif main_type == 'audio':
fp = open(filename, 'rb')
msg = MIMEAudio(fp.read(), _subtype=sub_type)
fp.close()
else:
fp = open(filename, 'rb')
msg = MIMEBase(main_type, sub_type)
msg.set_payload(fp.read())
fp.close()
filename = os.path.basename(filename)
msg.add_header('Content-Disposition', 'attachment', filename=filename)
message.attach(msg)
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(description="Email Sender using
Gmail API")
parser.add_argument('destination', type=str, help='The destination email
address')
parser.add_argument('subject', type=str, help='The subject of the email')
parser.add_argument('body', type=str, help='The body of the email')
parser.add_argument('-f', '--files', type=str, help='email attachments',
nargs='+')
args = parser.parse_args()
service = gmail_authenticate()
send_message(service, args.destination, args.subject, args.body,
args.files)
read_emails.py
import os
import sys
# for encoding/decoding messages in base64
from base64 import urlsafe_b64decode
from common import gmail_authenticate, search_messages
def get_size_format(b, factor=1024, suffix="B"):
"""
Scale bytes to its proper byte format
e.g:
1253656 => '1.20MB'
1253656678 => '1.17GB'
"""
for unit in ["", "K", "M", "G", "T", "P", "E", "Z"]:
if b < factor:
return f"{b:.2f}{unit}{suffix}"
b /= factor
return f"{b:.2f}Y{suffix}"
def clean(text):
# clean text for creating a folder
return "".join(c if c.isalnum() else "_" for c in text)
if __name__ == "__main__":
service = gmail_authenticate()
# get emails that match the query you specify from the command lines
results = search_messages(service, sys.argv[1])
print(f"Found {len(results)} results.")
# for each email matched, read it (output plain/text to console & save
HTML and attachments)
for msg in results:
read_message(service, msg)
mark_emails.py
from common import gmail_authenticate, search_messages
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(description="Marks a set of emails as
read or unread")
parser.add_argument('query', help='a search query that selects emails to
mark')
parser.add_argument("-r", "--read", action="store_true", help='Whether
to mark the message as read')
parser.add_argument("-u", "--unread", action="store_true",
help='Whether to mark the message as unread')
args = parser.parse_args()
service = gmail_authenticate()
if args.read:
mark_as_read(service, args.query)
elif args.unread:
mark_as_unread(service, args.query)
delete_emails.py
if __name__ == "__main__":
import sys
service = gmail_authenticate()
delete_messages(service, sys.argv[1])
PART 6: Use Shodan API in Python
Learn how to use Shodan API to make a script that searches
for public vulnerable servers, IoT devices, power plants and
much more using Python.
Public IP addresses are routed on the Internet, which means
connection can be established between any host having a public IP,
and any other host connected to the Internet without having a
firewall filtering the outgoing traffic, and because IPv4 is still the
dominant used protocol on the Internet, it's possible and nowadays
practical to crawl the whole Internet.
There are a number of platforms that offer Internet scanning as a
service, to list a few; Shodan, Censys, and ZoomEye. Using these
services, we can scan the Internet for devices running a given
service, we can find surveillance cameras, industrial control
systems such as power plants, servers, IoT devices and much more.
The difficulty with doing this task manually is that most of the
instances should have their login credentials changed. So, to find
accessible DVWA instances, it's necessary to try default credentials
on each of the detected instances, we'll do that with Python:
import shodan
import time
import requests
import re
The above function sends a GET request to the DVWA login page,
to retrieve the user_token, then sends a POST request with the
default username and password, and the CSRF token, and then it
checks whether the authentication was successful or not.
Let's write a function that takes a query, and iterates over the pages
in Shodan search results, and for each host on each page, we call
the has_valid_credentials() function:
# searches on shodan using the given query, and iterates over each page of
the results
def query_shodan(query):
print("[*] querying the first page")
first_page = request_page_from_shodan(query)
total = first_page['total']
already_processed = len(first_page['matches'])
result = process_page(first_page)
page = 2
while already_processed < total:
# break just in your testing, API queries have monthly limits
break
print("querying page {page}")
page = request_page_from_shodan(query, page=page)
already_processed += len(page['matches'])
result += process_page(page)
page += 1
return result
As you can see, this Python script works and reports hosts that has
the default credentials on DVWA instances.
Summary
# searches on shodan using the given query, and iterates over each page of
the results
def query_shodan(query):
print("[*] querying the first page")
first_page = request_page_from_shodan(query)
total = first_page['total']
already_processed = len(first_page['matches'])
result = process_page(first_page)
page = 2
while already_processed < total:
# break just in your testing, API queries have monthly limits
break
print("querying page {page}")
page = request_page_from_shodan(query, page=page)
already_processed += len(page['matches'])
result += process_page(page)
page += 1
return result
Note that you can start the conversation with /start command.
Summary
Telegram offers a very convenient API for developers, allowing
them to extend its use beyond end-to-end communication, we've
seen through this tutorial how it can be used to implement a bot
having multiple states.
I advise you to learn more about the API features it offers, how to
handle images and files sent from users, payments, and much more.
Writing Telegram bots was fun, wasn't it? You can use natural
language processing and build an AI model for a question-
answering chatbot. In fact, check this tutorial where we made a
conversational AI chatbot!
Fullcode:
telegram_bot.py
import telegram
import telegram.ext
import re
from random import randint
In this tutorial, we will make a Python script that is able to get page
ranking of your domain using CSE API. Before we dive into it, I
need to make sure you have CSE API setup and ready to go, if
that's not the case, please check the tutorial to get started with
Custom Search Engine API in Python.
Once you have your search engine up and running, go ahead and
install requests so we can make HTTP requests with ease:
pip3 install requests
import requests
import urllib.parse as p
# get the API KEY here: https://developers.google.com/custom-
search/v1/overview
API_KEY = "<INSERT_YOUR_API_KEY_HERE>"
# get your Search Engine ID on your CSE control panel
SEARCH_ENGINE_ID = "
<INSERT_YOUR_SEARCH_ENGINE_ID_HERE>"
# target domain you want to track
target_domain = "bbc.com"
# target keywords
query = "google custom search engine api python"
Again, please check this tutorial in which I show you how to get
API_KEY and SEARCH_ENGINE_ID. target_domain is the domain
you want to search for and query is the target keyword. For
instance, if you want to track stackoverflow.com for "convert string to
int python" keywords, then you put them in target_domain and query
respectively.
Now, CSE enables us to see the first 10 pages, each search page has
10 results, so 100 URLs in total to check, the below code block is
responsible for iterating over each page and searching for the
domain name in the results:
Awesome, this website ranks the third for that keyword, here is
another example run:
[*] Going for page: 1
[*] Going for page: 2
[+] bbc.com is found on rank #13 for keyword: 'make a bitly url
shortener in python'
[+] Title: How to Make a URL Shortener in Python - Python Code
[+] Snippet: Learn how to use Bitly and Cuttly APIs to shorten
long URLs programmatically
using requests library in Python.
[+] URL: https://www.bbc.com/
This time it went to the 2nd page, as it didn't find it in the first
page. As mentioned earlier, it will go all the way to page 10 and
stop.
Summary
Alright, there you have the script, I encourage you to add up to it
and customize it. For example, make it accept multiple keywords
for your site and make custom alerts to notify you whenever a
position is changed (went down or up), good luck!
Fullcode:
page_ranking.py
import requests
import urllib.parse as p
Bitly API
Cuttly API
After that, grab the account name that's we going to need in code,
as shown in the following image:
# account credentials
username = "o_3v0ulxxxxx"
password = "your_password_here"
the username is the account name I just showed you how to get
it, the password is the actual password of your Bitly account, so
you should replace them with your credentials.
If you read the Bitly API documentation carefully, you'll see that
we need an access token to make API calls to get the shortened
URL, so let's create a new access token:
# get the access token auth_res = requests.post("https://api-
ssl.bitly.com/oauth/access_token",
auth=(username, password))
if auth_res.status_code == 200:
# if response is OK, get the access token
access_token = auth_res.content.decode()
print("[!] Got access token:", access_token)
else:
print("[!] Cannot get access token, exiting...")
exit()
api_key = "64d1303e4ba02f1ebba4699bc871413f0510a"
# the URL you want to shorten
url = "https://www.bbc.com/topic/using-apis-in-python"
# preferred name in the URL
api_url = f"https://cutt.ly/api/api.php?key={api_key}&short={url}"
# or
# api_url = f"https://cutt.ly/api/api.php?key={api_key}&short=
{url}&name=some_unique_name"
# make the request
data = requests.get(api_url).json()["url"]
if data["status"] == 7:
# OK, get shortened URL
shortened_url = data["shortLink"]
print("Shortened URL:", shortened_url)
else:
print("[!] Error Shortening URL:", data)
Simply replace your API key in api_key and your URL you want to
shorten, and you're good to go. Here is my output:
Shortened URL: https://cutt.ly/mpAOd1b
Note that you can specify a unique name, and the result will be
something like: https://cutt.ly/some_unique_name, you can accomplish
that by simply adding name parameter to the GET request in the
URL.
Summary
Excellent, now you know how to shorten your URLs using both
Bitly and Cuttly shorteners! Note that these providers provide more
endpoints for clicks, statistics, and more. You should check their
documentation for more detail.
Fullcode:
bitly_shortener.py
import requests
# account credentials
username = "o_3v0ulxxxxx"
password = "your_password_here"
import requests
api_url = f"https://cutt.ly/api/api.php?key={api_key}&short={url}"
# or
# api_url = f"https://cutt.ly/api/api.php?key={api_key}&short=
{url}&name=some_unique_name"
Note that Googletrans makes API calls to the Google translate API,
if you want a reliable use, then consider using an official API
or making your own machine translation machine learning model.
First, let's install it using pip:
pip3 install googletrans
Translating Text
Importing necessary libraries:
from googletrans import Translator, constants
from pprint import pprint
This will print the original text and language along with the
translated text and language:
Output:
You can also check other translations and some other extra data:
{'all-translations': [['interjection',
['How are you doing?', "What's up?"],
[['How are you doing?', ["Wie geht's?"]],
["What's up?", ["Wie geht's?"]]],
"Wie geht's?",
9]],
'confidence': 1.0,
'definitions': None,
'examples': None,
'language': [['de'], None, [1.0], ['de']],
'original-language': 'de',
'possible-mistakes': None,
'possible-translations': [['Wie gehts ?',
None,
[['How are you ?', 1000, True, False],
["How's it going ?", 1000, True, False],
['How are you?', 0, True, False]],
[[0, 11]],
'Wie gehts ?',
0,
0]],
'see-also': None,
'synonyms': None,
'translation': [['How are you ?', 'Wie gehts ?', None, None, 1]]}
A lot of data to benefit from, you have all the possible translations,
confidence, definitions and even examples.
Translating List of Phrases
Output:
# detect a language
detection = translator.detect("नम त◌े दुिुनय◌ा ")
print("Language code:", detection.lang)
print("Confidence:", detection.confidence)
This will print the code of the detected language along with
confidence rate (1.0 means 100% confident):
Language code: hi
Confidence: 1.0
This will return the language code, to get the full language name,
you can use the LANGUAGES dictionary provided by Googletrans:
print("Language:", constants.LANGUAGES[detection.lang])
Output:
Language: hindi
Supported Languages
As you may know, Google Translate supports more than 100
languages, let's print all of them:
# print all available languages
print("Total supported languages:", len(constants.LANGUAGES))
print("Languages:")
pprint(constants.LANGUAGES)
If you get HTTP 5xx errors with this library, then Google has
banned your IP address, it's because using this library a lot, Google
translate may block your IP address, you'll need to consider using
proxies by passing a proxy dictionary to proxies parameter
in Translator() class, or use the official API as discussed.
Also, I've written a quick Python script that will allow you to
translate text into sentences as well as in documents in the
command line, check it here.
Fullcode:
translator.py
translate_doc.py
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Simple Python script to
translate text using Google Translate API (googletrans wrapper)")
parser.add_argument("target", help="Text/Document to translate")
parser.add_argument("-s", "--source", help="Source language, default is
Google Translate's auto detection", default="auto")
parser.add_argument("-d", "--destination", help="Destination language,
default is English", default="en")
args = parser.parse_args()
target = args.target
src = args.source
dest = args.destination
if os.path.isfile(target):
# translate a document instead
# get basename of file
basename = os.path.basename(target)
# get the path dir
dirname = os.path.dirname(target)
try:
filename, ext = basename.split(".")
except:
# no extension
filename = basename
ext = ""
Usage:
Output:
positional arguments:
target Text/Document to translate
optional arguments:
-h, --help show this help message and exit
-s SOURCE, --source SOURCE
Source language, default is Google Translate's auto
detection
-d DESTINATION, --destination DESTINATION
Destination language, default is English
Output:
'Hello'
PART 11: Use Google Drive API in Python
Learn how you can use Google Drive API to list files, search for specific
files or file types, download and upload files from/to Google Drive in
Python.
Google Drive enables you to store your files in the cloud, which
you can access anytime and everywhere in the world. In this
tutorial, you will learn how to list your Google drive files, search
over them, download stored files, and even upload local files into
your drive programmatically using Python.
Here is the table of contents:
To get started, let's install the required libraries for this tutorial:
def get_gdrive_service():
creds = None
# The file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the
first
# time.
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
# return Google Drive API service
return build('drive', 'v3', credentials=creds)
def main():
"""Shows basic usage of the Drive v3 API.
Prints the names and ids of the first 5 files the user has access to.
"""
service = get_gdrive_service()
# Call the Drive v3 API
results = service.files().list(
pageSize=5, fields="nextPageToken, files(id, name, mimeType, size,
parents, modifiedTime)").execute()
# get the results
items = results.get('files', [])
# list all 20 files & folders
list_files(items)
def list_files(items):
"""given items returned by Google Drive API, prints them in a tabular
way"""
if not items:
# empty drive
print('No files found.')
else:
rows = []
for item in items:
# get the File ID
id = item["id"]
# get the name of file
name = item["name"]
try:
# parent directory ID
parents = item["parents"]
except:
# has no parrents
parents = "N/A"
try:
# get the size in nice bytes format (KB, MB, etc.)
size = get_size_format(int(item["size"]))
except:
# not a file, may be a folder
size = "N/A"
# get the Google Drive type of file
mime_type = item["mimeType"]
# get last modified date time
modified_time = item["modifiedTime"]
# append everything to the list
rows.append((id, name, parents, size, mime_type, modified_time))
print("Files:")
# convert to a human readable table
table = tabulate(rows, headers=["ID", "Name", "Parents", "Size",
"Type", "Modified Time"])
# print the table
print(table)
See my output:
Files:
ID Name Parents Size Ty
pe Modified Time
--------------------------------- ------------------------------ ----------------------- -
------- ---------------------------- ------------------------
1FaD2BVO_ppps2BFm463JzKM-
gGcEdWVT some_text.txt ['0AOEK-
gp9UUuOUk9RVA'] 31.00B text/plain 2020-05-
15T13:22:20.000Z
1vRRRh5OlXpb-vJtphPweCvoh7qYILJYi google-drive-
512.png ['0AOEK-
gp9UUuOUk9RVA'] 15.62KB image/png 2020-05-
14T23:57:18.000Z
1wYY_5Fic8yt8KSy8nnQfjah9EfVRDoIE bbc.zip ['0AOE
K-gp9UUuOUk9RVA'] 863.61KB application/x-zip-compressed 2019-08-
19T09:52:22.000Z
1FX-KwO6EpCMQg9wtsitQ-JUqYduTWZub Nasdaq 100 Historical
Data.csv ['0AOEK-
gp9UUuOUk9RVA'] 363.10KB text/csv 2019-05-
17T16:00:44.000Z
1shTHGozbqzzy9Rww9IAV5_CCzgPrO30R my_python_code.py
['0AOEK-gp9UUuOUk9RVA'] 1.92MB text/x-python 2019-
05-13T14:21:10.000Z
These are the files in my Google Drive. Notice the Size column are
scaled in bytes; that's because we used get_size_format() function
in list_files() function, here is the code for it:
def get_size_format(b, factor=1024, suffix="B"):
"""
Scale bytes to its proper byte format
e.g:
1253656 => '1.20MB'
1253656678 => '1.17GB'
"""
for unit in ["", "K", "M", "G", "T", "P", "E", "Z"]:
if b < factor:
return f"{b:.2f}{unit}{suffix}"
b /= factor
return f"{b:.2f}Y{suffix}"
def upload_files():
"""
Creates a folder and upload a file to it
"""
# authenticate account
service = get_gdrive_service()
# folder details we want to make
folder_metadata = {
"name": "TestFolder",
"mimeType": "application/vnd.google-apps.folder"
}
# create the folder
file = service.files().create(body=folder_metadata, fields="id").execute()
# get the folder id
folder_id = file.get("id")
print("Folder ID:", folder_id)
# upload a file text file
# first, define file metadata, such as the name and the parent folder ID
file_metadata = {
"name": "test.txt",
"parents": [folder_id]
}
# upload
media = MediaFileUpload("test.txt", resumable=True)
file = service.files().create(body=file_metadata, media_body=media,
fields='id').execute()
print("File created, id:", file.get("id"))
if __name__ == '__main__':
upload_files()
After I ran the code, a new folder was created in my Google Drive:
We used a text
file for demonstration, but you can upload any type of file you
want. Check the full code of uploading files to Google Drive.
Search for Files and Directories
Google Drive enables us to search for files and directories using the
previously used list() method just by passing the 'q' parameter, the
below function takes the Drive API service and query and returns
filtered items:
def search(service, query):
# search for the file
result = []
page_token = None
while True:
response = service.files().list(q=query,
spaces="drive",
fields="nextPageToken, files(id, name,
mimeType)",
pageToken=page_token).execute()
# iterate over filtered files
for file in response.get("files", []):
result.append((file["id"], file["name"], file["mimeType"]))
page_token = response.get('nextPageToken', None)
if not page_token:
# no more files
break
return result
if __name__ == '__main__':
main()
Output:
ID Name Type
--------------------------------- ------------- ----------
15gdpNEYnZ8cvi3PhRjNTvW8mdfix9ojV test.txt text/plain
1FaE2BVO_rnps2BFm463JwPN-gGcDdWVT some_text.txt text/plain
def download():
service = get_gdrive_service()
# the name of the file you want to download from Google Drive
filename = "bbc.zip"
# search for the file by name
search_result = search(service, query=f"name='{filename}'")
# get the GDrive ID of the file
file_id = search_result[0][0]
# make it shareable
service.permissions().create(body={"role": "reader", "type": "anyone"},
fileId=file_id).execute()
# download file
download_file_from_google_drive(file_id, filename)
if __name__ == '__main__':
download()
This will search for the bbc.zip file, download it and save it in your
working directory. Check the full code.
Summary
Alright, there you have it. These are basically the core
functionalities of Google Drive. Now you know how to do them in
Python without manual mouse clicks!
Remember, whenever you change the SCOPES list, you need to
delete token.pickle file to authenticate to your account again with
the new scopes. See this page for further information, along with a
list of scopes and their explanations.
Feel free to edit the code to accept file names as parameters to
download or upload them. Go and try to make the script as
dynamic as possible by introducing argparse module to make some
useful scripts. Let's see what you build!
Fullcode:
list_files.py
import pickle
import os
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from tabulate import tabulate
def get_gdrive_service():
creds = None
# The file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the
first
# time.
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
# return Google Drive API service
return build('drive', 'v3', credentials=creds)
def main():
"""Shows basic usage of the Drive v3 API.
Prints the names and ids of the first 5 files the user has access to.
"""
service = get_gdrive_service()
# Call the Drive v3 API
results = service.files().list(
pageSize=5, fields="nextPageToken, files(id, name, mimeType, size,
parents, modifiedTime)").execute()
# get the results
items = results.get('files', [])
# list all 20 files & folders
list_files(items)
def list_files(items):
"""given items returned by Google Drive API, prints them in a tabular
way"""
if not items:
# empty drive
print('No files found.')
else:
rows = []
for item in items:
# get the File ID
id = item["id"]
# get the name of file
name = item["name"]
try:
# parent directory ID
parents = item["parents"]
except:
# has no parrents
parents = "N/A"
try:
# get the size in nice bytes format (KB, MB, etc.)
size = get_size_format(int(item["size"]))
except:
# not a file, may be a folder
size = "N/A"
# get the Google Drive type of file
mime_type = item["mimeType"]
# get last modified date time
modified_time = item["modifiedTime"]
# append everything to the list
rows.append((id, name, parents, size, mime_type, modified_time))
print("Files:")
# convert to a human readable table
table = tabulate(rows, headers=["ID", "Name", "Parents", "Size",
"Type", "Modified Time"])
# print the table
print(table)
if __name__ == '__main__':
main()
upload_files.py
import pickle
import os
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from googleapiclient.http import MediaFileUpload
def get_gdrive_service():
creds = None
# The file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the
first
# time.
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
def upload_files():
"""
Creates a folder and upload a file to it
"""
# authenticate account
service = get_gdrive_service()
# folder details we want to make
folder_metadata = {
"name": "TestFolder",
"mimeType": "application/vnd.google-apps.folder"
}
# create the folder
file = service.files().create(body=folder_metadata, fields="id").execute()
# get the folder id
folder_id = file.get("id")
print("Folder ID:", folder_id)
# upload a file text file
# first, define file metadata, such as the name and the parent folder ID
file_metadata = {
"name": "test.txt",
"parents": [folder_id]
}
# upload
media = MediaFileUpload("test.txt", resumable=True)
file = service.files().create(body=file_metadata, media_body=media,
fields='id').execute()
print("File created, id:", file.get("id"))
if __name__ == '__main__':
upload_files()
search_files.py
import pickle
import os
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from tabulate import tabulate
def get_gdrive_service():
creds = None
# The file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the
first
# time.
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
def main():
# filter to text files
filetype = "text/plain"
# authenticate Google Drive API
service = get_gdrive_service()
# search for files that has type of text/plain
search_result = search(service, query=f"mimeType='{filetype}'")
# convert to table to print well
table = tabulate(search_result, headers=["ID", "Name", "Type"])
print(table)
if __name__ == '__main__':
main()
download_files.py
import pickle
import os
import re
import io
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from googleapiclient.http import MediaIoBaseDownload
import requests
from tqdm import tqdm
def get_gdrive_service():
creds = None
# The file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the
first
# time.
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
# initiate Google Drive service API
return build('drive', 'v3', credentials=creds)
def download():
service = get_gdrive_service()
# the name of the file you want to download from Google Drive
filename = "bbc.zip"
# search for the file by name
search_result = search(service, query=f"name='{filename}'")
# get the GDrive ID of the file
file_id = search_result[0][0]
# make it shareable
service.permissions().create(body={"role": "reader", "type": "anyone"},
fileId=file_id).execute()
# download file
download_file_from_google_drive(file_id, filename)
if __name__ == '__main__':
download()
PART 12: Python Convert Text to Speech in
Learn how you to perform speech synthesis by converting text to speech both
online and offline using gTTS and pyttsx3 libraries in Python.
Speech synthesis (or Text to Speech) is the computer-generated
simulation of human speech. It converts human language text into
human-like speech audio. In this tutorial, you will learn how you can
convert text to speech in Python.
In this tutorial, we won't be building neural networks and training
the model in order to achieve results, as it is pretty complex and
hard to do it. Instead, we gonna use some APIs and engines that
offer it. There are a lot of APIs out there that offer this service, one
of the commonly used services is Google Text to Speech, in this
tutorial, we will play around with it along with another offline
library called pyttsx3.
Table of contents:
It's pretty straightforward to use this library, you just need to pass
text to the gTTS object that is an interface to Google Translate's
Text to Speech API:
# make request to google to get synthesis
tts = gtts.gTTS("Hello world")
Up to this point, we have sent the text and retrieved the actual
audio speech from the API, let's save this audio to a file:
# save the audio file
tts.save("hello.mp3")
Awesome, you'll see a new file appear in the current directory, let's
play it using playsound module installed previously:
# play the audio file
playsound("hello.mp3")
And that's it! You'll hear a robot talking about what you just told
him to say!
It isn't available only in English, you can use other languages as
well by passing the lang parameter:
# in spanish
tts = gtts.gTTS("Hola Mundo", lang="es")
tts.save("hola.mp3")
playsound("hola.mp3")
If you don't want to save it to a file and just play it directly, then
you should use tts.write_to_fp() which accepts io.BytesIO() object to
write into, check this link for more information.
To get the list of available languages, use this:
# all available languages along with their IETF tag
print(gtts.lang.tts_langs())
To get started with this library, open up a new Python file and
import it:
import pyttsx3
Output:
200
Alright, let's change this to 300 (make the speaking rate much
faster):
# setting new voice rate (faster)
engine.setProperty("rate", 300)
engine.say(text)
engine.runAndWait()
Or slower:
# slower
engine.setProperty("rate", 100)
engine.say(text)
engine.runAndWait()
As you can see, my machine has three voice speakers, let's use the
second, for example:
# set another voice
engine.setProperty("voice", voices[1].id)
engine.say(text)
engine.runAndWait()
You can also save the audio as a file using the save_to_file() method,
instead of playing the sound using say() method:
# saving speech audio into a file
engine.save_to_file(text, "python.mp3")
engine.runAndWait()
A new MP3 file will appear in the current directory, check it out!
Summary
Great, that's it for this tutorial, I hope that will help you build your
application, or maybe your own virtual assistant in Python.
To conclude, if you want to use a more reliable synthesis, Google
TTS API is your choice, if you just want to make it work a lot
faster and without an Internet connection, you should use
the pyttsx3 library.
Here are the documentation for both libraries:
Fullcode:
tts_google.py
import gtts
from playsound import playsound
# in spanish
tts = gtts.gTTS("Hola Mundo", lang="es")
tts.save("hola.mp3")
playsound("hola.mp3")
tts_pyttsx3.py
import pyttsx3
# initialize Text-to-speech engine
engine = pyttsx3.init()
# slower
engine.setProperty("rate", 100)
engine.say(text)
engine.runAndWait()
import requests
from pprint import pprint
# github username
username = "x4nth055"
# url to request
url = f"https://api.github.com/users/{username}"
# make the request and return the json
user_data = requests.get(url).json()
# pretty print JSON data
pprint(user_data)
{'avatar_url': 'https://avatars3.githubusercontent.com/u/37851086?v=4',
'bio': None,
'blog': 'https://www.bbc.com',
'company': None,
'created_at': '2018-03-27T21:49:04Z',
'email': None,
'events_url': 'https://api.github.com/users/x4nth055/events{/privacy}',
'followers': 93,
'followers_url': 'https://api.github.com/users/x4nth055/followers',
'following': 41,
'following_url':
'https://api.github.com/users/x4nth055/following{/other_user}',
'gists_url': 'https://api.github.com/users/x4nth055/gists{/gist_id}',
'gravatar_id': '',
'hireable': True,
'html_url': 'https://github.com/x4nth055',
'id': 37851086,
'login': 'x4nth055',
'name': 'Rockikz',
<..SNIPPED..>
A lot of data, that's why using requests library alone won't be
handy to extract this ton of data manually, as a
result, PyGithub comes into the rescue.
Getting Public Repositories of a User
Let's get all the public repositories of that user using PyGithub
library we just installed:
import base64
from github import Github
from pprint import pprint
# Github username
username = "x4nth055"
# pygithub object
g = Github()
# get that user by username
user = g.get_user(username)
Here is my output:
Repository(full_name="x4nth055/aind2-rnn")
Repository(full_name="x4nth055/awesome-algeria")
Repository(full_name="x4nth055/emotion-recognition-using-speech")
Repository(full_name="x4nth055/emotion-recognition-using-text")
Repository(full_name="x4nth055/food-reviews-sentiment-analysis")
Repository(full_name="x4nth055/hrk")
Repository(full_name="x4nth055/lp_simplex")
Repository(full_name="x4nth055/price-prediction")
Repository(full_name="x4nth055/product_recommendation")
Repository(full_name="x4nth055/pythoncode-tutorials")
Repository(full_name="x4nth055/sentiment_analysis_naive_bayes")
def print_repo(repo):
# repository full name
print("Full name:", repo.full_name)
# repository description
print("Description:", repo.description)
# the date of when the repo was created
print("Date created:", repo.created_at)
# the date of the last git push
print("Date of last push:", repo.pushed_at)
# home website (if available)
print("Home Page:", repo.homepage)
# programming language
print("Language:", repo.language)
# number of forks
print("Number of forks:", repo.forks)
# number of stars
print("Number of stars:", repo.stargazers_count)
print("-"*50)
# repository content (files & directories)
print("Contents:")
for content in repo.get_contents(""):
print(content)
try:
# repo license
print("License:",
base64.b64decode(repo.get_license().content.encode()).decode())
except:
pass
======================================================
==============================================
Full name: x4nth055/pythoncode-tutorials
Description: The Python Code Tutorials
Date created: 2019-07-29 12:35:40
Date of last push: 2020-04-02 15:12:38
Home Page: https://www.bbc.com
Language: Python
Number of forks: 154
Number of stars: 150
--------------------------------------------------
Contents:
ContentFile(path="LICENSE")
ContentFile(path="README.md")
ContentFile(path="ethical-hacking")
ContentFile(path="general")
ContentFile(path="images")
ContentFile(path="machine-learning")
ContentFile(path="python-standard-library")
ContentFile(path="scapy")
ContentFile(path="web-scraping")
License: MIT License
<..SNIPPED..>
I've truncated the whole output, as it will return all repositories and
their information, you can see we
used repo.get_contents("") method to retrieve all the files and
folders of that repository, PyGithub parses it into
a ContentFile object, use dir(content) to see other useful fields.
Also, if you have private repositories, you can access them by
authenticating your account (using the correct credentials)
using PyGithub as follows:
username = "username"
password = "password"
# authenticate to github
g = Github(username, password)
# get the authenticated user
user = g.get_user()
for repo in user.get_repos():
print_repo(repo)
The Github API is quite rich, you can search for repositories by a
specific query just like you do in the website:
Fullcode:
get_user_details.py
import requests
from pprint import pprint
# github username
username = "x4nth055"
# url to request
url = f"https://api.github.com/users/{username}"
# make the request and return the json
user_data = requests.get(url).json()
# pretty print JSON data
pprint(user_data)
# get name
name = user_data["name"]
# get blog url if there is
blog = user_data["blog"]
# extract location
location = user_data["location"]
# get email address that is publicly available
email = user_data["email"]
# number of public repositories
public_repos = user_data["public_repos"]
# get number of public gists
public_gists = user_data["public_gists"]
# number of followers
followers = user_data["followers"]
# number of following
following = user_data["following"]
# date of account creation
date_created = user_data["created_at"]
# date of account last update
date_updated = user_data["updated_at"]
# urls
followers_url = user_data["followers_url"]
following_url = user_data["following_url"]
# print all
print("User:", username)
print("Name:", name)
print("Blog:", blog)
print("Location:", location)
print("Email:", email)
print("Total Public repositories:", public_repos)
print("Total Public Gists:", public_gists)
print("Total followers:", followers)
print("Total following:", following)
print("Date Created:", date_created)
print("Date Updated:", date_updated)
get_user_repositories.py
import base64
from github import Github
import sys
def print_repo(repo):
# repository full name
print("Full name:", repo.full_name)
# repository description
print("Description:", repo.description)
# the date of when the repo was created
print("Date created:", repo.created_at)
# the date of the last git push
print("Date of last push:", repo.pushed_at)
# home website (if available)
print("Home Page:", repo.homepage)
# programming language
print("Language:", repo.language)
# number of forks
print("Number of forks:", repo.forks)
# number of stars
print("Number of stars:", repo.stargazers_count)
print("-"*50)
# repository content (files & directories)
print("Contents:")
for content in repo.get_contents(""):
print(content)
try:
# repo license
print("License:",
base64.b64decode(repo.get_license().content.encode()).decode())
except:
pass
search_github_repositories.py
def print_repo(repo):
# repository full name
print("Full name:", repo.full_name)
# repository description
print("Description:", repo.description)
# the date of when the repo was created
print("Date created:", repo.created_at)
# the date of the last git push
print("Date of last push:", repo.pushed_at)
# home website (if available)
print("Home Page:", repo.homepage)
# programming language
print("Language:", repo.language)
# number of forks
print("Number of forks:", repo.forks)
# number of stars
print("Number of stars:", repo.stargazers_count)
print("-"*50)
# repository content (files & directories)
print("Contents:")
for content in repo.get_contents(""):
print(content)
try:
# repo license
print("License:",
base64.b64decode(repo.get_license().content.encode()).decode())
except:
pass
print("="*100)
print("="*100)
creating_and_deleting_files.py
In CSE, you can customize your engine that searches for results on
specific websites, or you can use your website only. However, we
will enable our search engine to search the entire web for this
tutorial.
Setting Up a CSE
First, you need to have a Google account to set up your search
engine. After that, head to the CSE page and sign in to Custom
Search Engine as shown in the following figure:
After you log in to your Google account, a new panel will appear to
you that looks something like this:
You can include the websites you want to include your search
results, choose a language of your search engine, and set up its
name. Once finished, you'll be redirected to this page:
Using CSE API in Python
Now to use your Search Engine in Python, you need two things:
First, you need to get your Search Engine ID, you can get easily
find it in the CSE control panel:
Second, you have to generate a new API key, head to the Custom
Search JSON API page, and click on the "Get a Key" button there,
a new window will appear, you need to create a new project (you
can name it whatever you want) and click on Next button, after that
you'll have your API key, here is my result: