Lab No.9

DEPARTMENT OF ELECTRICAL ENGINEERING
Faculty Member: ____________________ Date: ________________
Semester:_____________ Section: ________________
DS-401: Introduction to Data Science
Lab 10: Databases. MySQL-Python Integration
Name Reg. No Lab Lab

Evaluation Report
(5 Marks) (15 Marks)
Objectives
In this lab, we are introduced to more advanced SQL (functions, joins, subqueries) that are
commonly used when querying databases. Then, we will connect to the database and run
queries using Python.
SQL. Functions
SQL functions are used to perform calculations on data. Functions can be aggregate (using
multiple values to calculate a result) or scalar (applied on a single value).
SQL Aggregate Functions
SQL aggregate functions return a single value calculated from multiple values in a column.
 AVG() - returns the average value
 COUNT() - returns the number of rows
 FIRST() - returns the first value
 LAST() - returns the last value
 MAX() - returns the largest value
 MIN() - returns the smallest value
 SUM() - returns the sum of values

-- AVG() --
SELECT AVG(column_name) FROM TABLE_NAME
-- COUNT() --
SELECT COUNT(column_name) FROM TABLE_NAME;
SELECT COUNT(*) FROM TABLE_NAME;
SELECT COUNT(DISTINCT column_name) FROM TABLE_NAME;
-- In MySQL, FIRST() is replaced by: --
SELECT column_name FROM TABLE_NAME
ORDER BY column_name ASC
LIMIT 1;
-- In MySQL, LAST() is replaced by: --
SELECT column_name FROM TABLE_NAME
ORDER BY column_name DESC
LIMIT 1;
-- MAX() --
SELECT MAX(column_name) FROM TABLE_NAME;
-- MIN() --
SELECT MIN(column_name) FROM TABLE_NAME;
-- SUM() --
SELECT SUM(column_name) FROM TABLE_NAME;

The (SELECT) DISTINCT statement is used to return only distinct values. For example, the
following query returns the distinct positions in the employee table, effectively listing all
positions (ids) that are assigned to employees in a simple way.
SELECT DISTINCT position_id FROM employee;
The GROUP BY statement is used in conjunction with the aggregate functions to group the
result-set by one or more columns. For example, the following query returns the number of
employees for each department. The aggregate function is COUNT(), which counts the
employees within each department.
-- Get the number of employees for each department --
SELECT department_id, COUNT(id) AS employee_count FROM employee
GROUP BY department_id;
GROUP BY defines how the aggregate function is applied. What happens if you remove the
GROUP BY statement? What happens if you use GROUP BY with the employee id instead?
The HAVING clause is used instead of WHERE when using aggregate functions. For
example, the following query returns the number of employees for each department having
more than 5 employees.
-- Get the number of employees for each department having more than 5
employees --
SELECT department_id, COUNT(id) AS employee_count FROM employee
GROUP BY department_id
HAVING COUNT(id) > 5;

SQL Scalar Functions
SQL scalar functions return a single value based on the input value.
 UCASE() - converts a field to uppercase
 LCASE() - converts a field to lowercase
 MID() - extracts characters from a text field
 LEN() - returns the length of a text field
 ROUND() - rounds a numeric field to the number of decimals
specified
-- UCASE() --
SELECT UCASE(column_name) FROM TABLE_NAME;
-- LCASE() --
SELECT LCASE(column_name) FROM TABLE_NAME;
-- MID() --
SELECT MID(column_name, START, LENGTH) AS some_name FROM TABLE_NAME;
-- LEN() --
SELECT LEN(column_name) FROM TABLE_NAME;
-- ROUND() --
SELECT ROUND(column_name, decimals) FROM TABLE_NAME;

SQL. Querying multiple tables
Relational databases are usually structured to avoid duplication of data. For example, a
department may be associated to a few employees by using a one-to-many relationship
defined by a FOREIGN KEY constraint. While simple queries can be used to retrieve data
from each table, it is often required to combine information from multiple tables, such as
selecting a list of employees with information about the department they belong to, filtering
data using the WHERE clause with information from other related tables and more complex
aggregate functions. For such complex queries, there are two main SQL clauses: using
JOINs or subqueries.
Using JOINS can be more efficient than running multiple queries (and then combining the
data in the application). The DBMS handles the optimal retrieval of data based on indexing
and performs the data transformations internally. This is also useful when requesting data
from a remote location as it requires less data to be transferred between the DBMS and the
application server.
JOINs
JOINs are used to retrieve data from two or more tables based on the relationships between
them. The ON clause is used to match records between tables based on the column names.
Consider the following example to list all employees with the name of their position. A way
to write this in SQL is by using the WHERE clause for matching the position_id FOREIGN KEY
with the id PRIMARY KEY in the one-to-many relationship between employee and position.
This method is the same as using INNER JOIN.
-- List the employees name, and the name of their position --
SELECT emp.name AS employee, pos.name AS POSITION
FROM employee AS emp, POSITION AS pos
WHERE pos.id = emp.position_id;
JOINs have better performance compared to subqueries, although might be more difficult to
write for complex queries.
(INNER) JOIN
INNER JOIN is used to return ONLY the matching rows from both tables based on the
matching condition:
-- List the employees name and the name of their department --
SELECT emp.name AS employee, dept.name AS department FROM employee AS emp

INNER JOIN department AS dept
ON dept.id = emp.department_id;
Multiple related tables can be joined according to the database schema. For example,
consider we want to list all employees along with the name of the department they belong
to and their position. In the company database (example), the query requires an INNER
JOIN between 3 tables: employee (get the employee names), department (get the name of
the department), position (get the name of the position).
-- List the employees name, the name of their department and position --
SELECT emp.name AS employee, dept.name AS department, pos.name AS POSITION
FROM employee AS emp
INNER JOIN department AS dept
ON dept.id = emp.department_id
INNER JOIN POSITION AS pos
ON pos.id = emp.position_id;
LEFT (OUTER) JOIN
LEFT JOIN is used to return rows from both tables based on the matching condition. The
difference from INNER JOIN is that ALL the rows from the first table will be returned and if
for some rows there is no matching record on the second table, the result for that column(s)
will be NULL. For example, consider the following query to list all employees with the name
of their manager. When using INNER JOIN, the general manager will be missing from the
list. With LEFT JOIN, the general manager will be listed, with the name of his manager
(does not exist) set to NULL.
-- List the employees name and the name of their manager --
SELECT emp.name AS employee_name, manager.name AS manager_name FROM employee
AS emp
LEFT JOIN employee AS manager ON emp.manager_id = manager.id
ORDER BY manager_name ASC;
RIGHT (OUTER) JOIN
RIGHT JOIN is used to return rows from both tables based on the matching condition. The
difference from INNER JOIN is that ALL the rows from the second table will be returned and
if for some rows there is no matching record on the first table, the result for that column(s)
will be NULL.
FULL (OUTER) JOIN
FULL JOIN combines LEFT JOIN and RIGHT JOIN. Where rows in the tables do not match,
the result set will have NULL values for every column of the table that lacks a matching row.
CROSS JOIN
CROSS JOIN is a simplest form of JOIN which matches each row from one table to all rows
of another table (like cartesian product in set theory). The result of a CROSS JOIN can be
filtered by using a WHERE clause which may then produce the equivalent of an INNER JOIN.
-- List all the employees names, and all positions --
CROSS JOIN POSITION AS pos
-- List all the employees names with their positions using CROSS JOIN and
WHERE --
CROSS JOIN POSITION AS pos
WHERE emp.position_id = pos.id

Python database connection. Data structures.
Databases can be accessed from other applications as well. A MySQL driver: MySQL-
connector-python is required to access the MySQL Server using Python. With the MySQL
driver installed in the Python environment, the main steps in working with SQL from Python
are as follows:
import mysql.connector
# Connect to the database server using the provided credentials
# Note: user="root" for admin account
mydb = mysql.connector.connect(
host="localhost",
port=3306,
user="ewis_student",
passwd="ewis2020",
database="company"
# Open a cursor
mycursor = mydb.cursor(dictionary=True)
# Execute SQL query
mycursor.execute("SELECT * FROM employee")
# Get results from the query using the cursor
myresult = mycursor.fetchall()
# Close the cursor

mycursor.close()
The database can be located on the same machine (use localhost), but it can also be on a
remote machine, in the same network or in the cloud (use IP address/URL). The port is set
to 3306 by default, and this corresponds to the port that the database server is running on.
The (database) cursor is a control structure that enables traversal, retrieval, addition, and
removal of database records in a sequential way. The following steps describe using cursors
for database access:
 Declare a cursor that defines a result set.
 Open the cursor to establish the result set.
 Fetch the data from the cursor into local variables, one row at a time.
 Close the cursor when done.
The data is retrieved by default in the form of tuples (like lists, with the difference that
tuples are immutable). Take for example the result of a query, returning a list of employees
in the company database:
(1, 1, 1, None, 'Big Hoss', Decimal('10000.00'), datetime.datetime(2018, 10,
18, 0, 0)),
(20, 2, 4, 3, 'Captain America', Decimal('10000.00'), datetime.datetime(2019,
04, 26, 0, 0))
]
Using the dictionary=True argument when creating the cursor, the data can be retrieved as
dictionary list such as:
'id':1,
'department_id':1,
'position_id':1,
'manager_id':None,
'name':'Big Hoss',
'salary':Decimal('10000.00'),
'hire_date':datetime.datetime(2018,10,18,0,0)
}, {
'id':20,
'department_id':2,
'position_id':4,
'manager_id':3,
'name':'Captain America',
'salary':Decimal('10000.00'),
'hire_date':datetime.datetime(2019,04,26,0,0)
While the dictionary results are useful for working with data in a Python application, the CSV
format is often used for processing data and saving result files. For this, the conversion
between a list of dictionaries (the results from the cursor) and a CSV list can be done in
Python as follows:
# Function to extract CSV data from a dictionary list
def get_csv_data(result):
csvdata = []
for elem in result:
csvdata.append([elem[k] for k in elem])
return csvdata
The data can then be written into a CSV file from Python:
def write_csv_file(data):
with open("result.csv", "w") as f:
for d in data:
f.write(",".join([str(col) for col in d]) + "\n")
The CSV format and file content will look like this:
1,1,1,None,Big Hoss,10000.00,2018-10-18 00:00:00
20,2,4,3,Captain America,10000.00,2019-04-26 00:00:00
For working with tables, the pandas library can be used which simplifies many operations
such as extracting columns, CRUD operations (creating, reading, updating and deletion of
data), filtering and data analysis. Pandas is often used for Machine Learning in the form of
dataframes: DataFrame, allows to import data from various structures and file formats:
lists, dictionaries, csv files, excel files, etc.
In the following example, data is retrieved from a cursor and added to a pandas Data
Frame:
mycursor = mydb.cursor(dictionary=True)
mycursor.execute("SELECT * FROM employee")

myresult = mycursor.fetchall()
mycursor.close()
res = db.run_query("SELECT * FROM employee")
# create the pandas dataframe with the results from the database (in
dictionary format)
df = pd.DataFrame(res)
# the dataframe can be printed in the console
print(df)
# the dataframe can be converted to list
dlist = df.values.tolist()
print(dlist)
# columns can be extracted from the dataframe
names = df["name"].values
print(names)
# the following way works too
salaries = list(df["salary"])
# then we may want to do some additional data processing, for example
converting to float (real numbers representation in Python)
salaries = [float(sal) for sal in salaries
LAB TASKS
1. Using aggregate functions, write an SQL query to calculate the sum of all salaries in the
company database.
2. Using aggregate functions, write an SQL query to calculate the average salary in the
company database.
3. Write an SQL query to return a list of employees with the following information about
them:
 the name
 the name of their department
 the size of their team (number of employees within the same department)
them:
 the name
 the name of their position
 the min/average/max salary of the employees on the same position
them:
 the name in uppercase
 the location of their department
6. Write an SQL query to return a list of employees (name, salary) having a higher salary
than the average salary within the company.
7. Write an SQL query to return a list of employees (name, salary) having a higher salary
than their manager.
8. Write an SQL query to return the employees (name, salary) having the highest salary
within each department.

Lab No.9

Uploaded by

Copyright:

Available Formats

Lab No.9

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lab No.9

Uploaded by

Copyright:

Available Formats

DEPARTMENT OF ELECTRICAL ENGINEERING

Faculty Member: ____________________ Date: ________________

Semester:_____________ Section: ________________

DS-401: Introduction to Data Science

Lab 10: Databases. MySQL-Python Integration

Name Reg. No Lab Lab

queries using Python.

multiple values to calculate a result) or scalar (applied on a single value).

SQL Aggregate Functions

 AVG() - returns the average value

 COUNT() - returns the number of rows

 FIRST() - returns the first value

 LAST() - returns the last value

 MAX() - returns the largest value

 MIN() - returns the smallest value

 SUM() - returns the sum of values

SELECT AVG(column_name) FROM TABLE_NAME

SELECT COUNT(column_name) FROM TABLE_NAME;

SELECT COUNT(*) FROM TABLE_NAME;

SELECT COUNT(DISTINCT column_name) FROM TABLE_NAME;

-- In MySQL, FIRST() is replaced by: --

SELECT column_name FROM TABLE_NAME

ORDER BY column_name ASC

-- In MySQL, LAST() is replaced by: --

SELECT column_name FROM TABLE_NAME

ORDER BY column_name DESC

SELECT MAX(column_name) FROM TABLE_NAME;

SELECT MIN(column_name) FROM TABLE_NAME;

SELECT SUM(column_name) FROM TABLE_NAME;

positions (ids) that are assigned to employees in a simple way.

SELECT DISTINCT position_id FROM employee;

employees within each department.

-- Get the number of employees for each department --

SELECT department_id, COUNT(id) AS employee_count FROM employee

The HAVING clause is used instead of WHERE when using aggregate functions. For

more than 5 employees.

SELECT department_id, COUNT(id) AS employee_count FROM employee

HAVING COUNT(id) > 5;

 UCASE() - converts a field to uppercase

 LCASE() - converts a field to lowercase

 MID() - extracts characters from a text field

 LEN() - returns the length of a text field

 ROUND() - rounds a numeric field to the number of decimals

SELECT UCASE(column_name) FROM TABLE_NAME;

SELECT LCASE(column_name) FROM TABLE_NAME;

SELECT MID(column_name, START, LENGTH) AS some_name FROM TABLE_NAME;

SELECT LEN(column_name) FROM TABLE_NAME;

SELECT ROUND(column_name, decimals) FROM TABLE_NAME;

department may be associated to a few employees by using a one-to-many relationship

with the id PRIMARY KEY in the one-to-many relationship between employee and position.

This method is the same as using INNER JOIN.

-- List the employees name, and the name of their position --

SELECT emp.name AS employee, pos.name AS POSITION

FROM employee AS emp, POSITION AS pos

WHERE pos.id = emp.position_id;

write for complex queries.

-- List the employees name and the name of their department --

SELECT emp.name AS employee, dept.name AS department FROM employee AS emp

JOIN between 3 tables: employee (get the employee names), department (get the name of

Faculty Member: ____ Date:

Semester:_ Section: ____