Lab No.9
Lab No.9
Lab No.9
In this lab, we are introduced to more advanced SQL (functions, joins, subqueries) that are
commonly used when querying databases. Then, we will connect to the database and run
SQL. Functions
SQL functions are used to perform calculations on data. Functions can be aggregate (using
SQL aggregate functions return a single value calculated from multiple values in a column.
-- COUNT() --
LIMIT 1;
LIMIT 1;
-- MAX() --
-- MIN() --
-- SUM() --
following query returns the distinct positions in the employee table, effectively listing all
The GROUP BY statement is used in conjunction with the aggregate functions to group the
result-set by one or more columns. For example, the following query returns the number of
employees for each department. The aggregate function is COUNT(), which counts the
GROUP BY department_id;
GROUP BY defines how the aggregate function is applied. What happens if you remove the
GROUP BY statement? What happens if you use GROUP BY with the employee id instead?
example, the following query returns the number of employees for each department having
-- Get the number of employees for each department having more than 5
employees --
GROUP BY department_id
SQL scalar functions return a single value based on the input value.
specified
-- UCASE() --
-- LCASE() --
-- MID() --
-- LEN() --
-- ROUND() --
SQL. Querying multiple tables
Relational databases are usually structured to avoid duplication of data. For example, a
defined by a FOREIGN KEY constraint. While simple queries can be used to retrieve data
from each table, it is often required to combine information from multiple tables, such as
selecting a list of employees with information about the department they belong to, filtering
data using the WHERE clause with information from other related tables and more complex
aggregate functions. For such complex queries, there are two main SQL clauses: using
JOINs or subqueries.
Using JOINS can be more efficient than running multiple queries (and then combining the
data in the application). The DBMS handles the optimal retrieval of data based on indexing
and performs the data transformations internally. This is also useful when requesting data
from a remote location as it requires less data to be transferred between the DBMS and the
application server.
JOINs
JOINs are used to retrieve data from two or more tables based on the relationships between
them. The ON clause is used to match records between tables based on the column names.
Consider the following example to list all employees with the name of their position. A way
to write this in SQL is by using the WHERE clause for matching the position_id FOREIGN KEY
JOINs have better performance compared to subqueries, although might be more difficult to
(INNER) JOIN
INNER JOIN is used to return ONLY the matching rows from both tables based on the
matching condition:
ON dept.id = emp.department_id;
Multiple related tables can be joined according to the database schema. For example,
consider we want to list all employees along with the name of the department they belong
to and their position. In the company database (example), the query requires an INNER
-- List the employees name, the name of their department and position --
ON dept.id = emp.department_id
ON pos.id = emp.position_id;
LEFT JOIN is used to return rows from both tables based on the matching condition. The
difference from INNER JOIN is that ALL the rows from the first table will be returned and if
for some rows there is no matching record on the second table, the result for that column(s)
will be NULL. For example, consider the following query to list all employees with the name
of their manager. When using INNER JOIN, the general manager will be missing from the
list. With LEFT JOIN, the general manager will be listed, with the name of his manager
AS emp
RIGHT JOIN is used to return rows from both tables based on the matching condition. The
difference from INNER JOIN is that ALL the rows from the second table will be returned and
if for some rows there is no matching record on the first table, the result for that column(s)
will be NULL.
FULL (OUTER) JOIN
FULL JOIN combines LEFT JOIN and RIGHT JOIN. Where rows in the tables do not match,
the result set will have NULL values for every column of the table that lacks a matching row.
CROSS JOIN
CROSS JOIN is a simplest form of JOIN which matches each row from one table to all rows
of another table (like cartesian product in set theory). The result of a CROSS JOIN can be
filtered by using a WHERE clause which may then produce the equivalent of an INNER JOIN.
-- List all the employees names with their positions using CROSS JOIN and
WHERE --
connector-python is required to access the MySQL Server using Python. With the MySQL
driver installed in the Python environment, the main steps in working with SQL from Python
are as follows:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
port=3306,
user="ewis_student",
passwd="ewis2020",
database="company"
# Open a cursor
mycursor = mydb.cursor(dictionary=True)
myresult = mycursor.fetchall()
The database can be located on the same machine (use localhost), but it can also be on a
remote machine, in the same network or in the cloud (use IP address/URL). The port is set
to 3306 by default, and this corresponds to the port that the database server is running on.
The (database) cursor is a control structure that enables traversal, retrieval, addition, and
removal of database records in a sequential way. The following steps describe using cursors
Fetch the data from the cursor into local variables, one row at a time.
The data is retrieved by default in the form of tuples (like lists, with the difference that
tuples are immutable). Take for example the result of a query, returning a list of employees
18, 0, 0)),
]
Using the dictionary=True argument when creating the cursor, the data can be retrieved as
'id':1,
'department_id':1,
'position_id':1,
'manager_id':None,
'name':'Big Hoss',
'salary':Decimal('10000.00'),
'hire_date':datetime.datetime(2018,10,18,0,0)
}, {
'id':20,
'department_id':2,
'position_id':4,
'manager_id':3,
'name':'Captain America',
'salary':Decimal('10000.00'),
'hire_date':datetime.datetime(2019,04,26,0,0)
While the dictionary results are useful for working with data in a Python application, the CSV
format is often used for processing data and saving result files. For this, the conversion
between a list of dictionaries (the results from the cursor) and a CSV list can be done in
Python as follows:
# Function to extract CSV data from a dictionary list
def get_csv_data(result):
csvdata = []
return csvdata
The data can then be written into a CSV file from Python:
def write_csv_file(data):
for d in data:
The CSV format and file content will look like this:
For working with tables, the pandas library can be used which simplifies many operations
such as extracting columns, CRUD operations (creating, reading, updating and deletion of
data), filtering and data analysis. Pandas is often used for Machine Learning in the form of
dataframes: DataFrame, allows to import data from various structures and file formats:
In the following example, data is retrieved from a cursor and added to a pandas Data
Frame:
mycursor = mydb.cursor(dictionary=True)
mycursor.close()
# create the pandas dataframe with the results from the database (in
dictionary format)
df = pd.DataFrame(res)
print(df)
dlist = df.values.tolist()
print(dlist)
names = df["name"].values
print(names)
salaries = list(df["salary"])
LAB TASKS
1. Using aggregate functions, write an SQL query to calculate the sum of all salaries in the
company database.
2. Using aggregate functions, write an SQL query to calculate the average salary in the
company database.
3. Write an SQL query to return a list of employees with the following information about
them:
the name
the size of their team (number of employees within the same department)
4. Write an SQL query to return a list of employees with the following information about
them:
the name
5. Write an SQL query to return a list of employees with the following information about
them:
6. Write an SQL query to return a list of employees (name, salary) having a higher salary
7. Write an SQL query to return a list of employees (name, salary) having a higher salary
8. Write an SQL query to return the employees (name, salary) having the highest salary