LAB: String Patterns, Sorting & Grouping: HR Database
LAB: String Patterns, Sorting & Grouping: HR Database
HR Database
To complete this lab you will utilize Db2 database service on IBM Cloud as you did
for the previous lab. There are three parts to this lab:
I. Creating tables
II. Loading data into tables
III. Composing and running queries
If you do not yet have access to Db2 on IBM Cloud, please refer to Lab Instructions
in the Module/Week 1.
Rather than create each table manually by typing the DDL commands in the SQL
editor, you will execute a script containing the create table commands for all the
tables. Following step by step instructions are provided to perform this:
4) Once the statements are in the SQL Editor tool , you can run the queries against
the database by selecting the "Run All" button.
5) On the right side of the SQL editor window you will see a Result section. Clicking
on a query in the Result scetion will show the execution details of the job - whether it
ran successfully, or had any errors or warnings. Ensure your queries ran
successfully and created all the tables.
6) Now you can look at the tables you created. Navigate to the three bar menu icon,
select "Explore", then click on “Tables”.
Select the Schema corresponding to your Db2 userid. It is typically starts with 3
letters (not SQL) followed by 5 numbers (but will be different from the “QWX76809”
example below). Then on the right side of the screen you should see the 5 newly
created tables listed – DEPARTMENTS, EMPLOYEES, JOBS, JOB_HISTORY, and
LOCATIONS (plus any other tables you may have created in previous labs e.g.
INSTRUCTOR, TEST, etc.).
Click on any of the tables and you will see its SCHEMA definition (that is list of
columns, their data types, etc).
Now let us see how data can be loaded into Db2. We could manually insert each
row into the table one by one but that would take a long time. Instead, Db2 (and
almost every other database) allows you to Load data from .CSV files.
Please follow the steps below which explains the process of loading data into the
tables we created earlier.
1) Download the 5 required data source files from the lab page in the course:
(“Employees.csv”,”Departments.csv”,”Jobs.csv”,”JobsHistory.csv”,”Locations.csv”) to
your computer:
2) First let us learn how to load data into the Employees table that we created
earlier. From the 3 bar menu icon, select "Load" then “Load Data”:
On the Load page that opens ensure "My Computer" is selected as the source. Click
on the "browse files" link.
3) Choose the file "Emloyees.csv" that you downloaded to your computer and click
"Open".
4) Once the File is selected click "Next" in the bottom right corner.
NOTE: if you only see 2-3 schemas and not your Db2 schema then scroll down in
that list till you see the desired one in which you previously created the tables.
It will show all the tables that have been created in this Schema previously, including
the Employees table. Select the EMPLOYEES table, and choose “Overwrite table
with new data” then click “Next”.
6) Since our source data files do not contain any rows with column labels, turn off
the setting for "Header in first row". Also, click on the down arrow next to "Date
format" and choose "MM/DD/YYYY" since that is how the date is formatted in our
source file.
7) Click "Next". Review the Load setting and click "Begin Load" in the top-right
corner.
8) After Loading is complete you will notice that we were successful in loading all 10
rows of the Employees table. If there are any Errors or Warnings you can view them
on this screen.
9) You can see the data that was loaded by clicking on the View Table. Alternatively
you can go into the Explore page and page select the correct schema, then the
EMPLOYEES table, and click "View Data".
10. Now its your turn to load the remaining 4 tables of the HR database – Locations,
JobHistory, Jobs, and Departments. Please follow the steps above to load the data
from the remaining source files.
Question 1: Were there any warnings loading data into the JOBS table? What can
be done to resolve this?
Hint: View the data loaded into this table and pay close attention to the JOB_TITLE
column.
Question 2: Did all rows from the source file load successfully in the DEPARTMENT
table? If not, are you able to figure out why not?
Hint: Look at the warning. Also, note the Primary Key for this table.
You created the tables for the HR database schema and also learned how to load
data into these tables. Now try and work on a few advanced DML queries that were
introduced in this module.
Follow these steps to create and run the queries indicated below
3) Check the Logs created under the Results section. Looking at the contents of the
Log explains whether the SQL statement ran successfully. Also look at the query
results to ensure the output is what you expected.
Query 2: Retrieve all employees who were born during the 1970's.
Query 5A: For each department ID retrieve the number of employees in the
department.
[Hint: Use COUNT(*) to retrieve the total count of a column, and then GROUP BY]
Query 5B: For each department retrieve the number of employees in the
department, and the average employees salary in the department.
[Hint: Use COUNT(*) to retrieve the total count of a column, and AVG() function to
compute average salaries, and then group]
Query 5C: Label the computed columns in the result set of Query 5B as
“NUM_EMPLOYEES” and “AVG_SALARY”.
Query 5E: In Query 5D limit the result to departments with fewer than 4
employees.
[Hint: Use HAVING after the GROUP BY, and use the count() function in the
HAVING clause instead of the column label.
Note: WHERE clause is used for filtering the entire result set whereas the HAVING
clause is used for filtering the result of the grouping]
[Hint: Department name is in the DEPARTMENTS table. So your query will need to
retrieve data from more than one table. Don’t worry if you are not able to figure this
one out … we’ll cover working with multiple tables in the next lesson.]
In this lab you learned how to work with string patterns, sorting result sets and
grouping result sets.
Lab Solutions
3) Run the queries. Looking at the contents of the Log explains that the SQL
statement that we ran was successful. Here are the results for the queries:
Query 1: Output
Query 2: Output
Query 3: Output
Note that in the Query below “D” and “E” are aliases for the table names. Once you
define an alias like “D’ in your query, you can simply write “D.COLUMN_NAME”
rather than the full form ‘DEPARTMENTS.COLUMN_NAME”.