Overview
In this tutorial, learn about Structured Query Language (SQL), including:
- Using basic SQL commands
- Performing basic data manipulation
This tutorial is a brief introduction to the SQL concepts that you need to know for the LPI 102 exam.
Databases and SQL
So far in this series of tutorials, you used flat text files to store data. Flat text files can be suitable for relatively small amounts of data, but they do not lend themselves well to storing large amounts of data or to querying that data. Over the years, several kinds of databases were developed for that purpose, including hierarchical and network databases, but the most common is now the relational database. The relational database is based on the ideas of E. F. Codd, who worked for IBM® and published the seminal paper “A Relational Model of Data for Large Shared Data Banks” in 1970. Several relational database systems exist today, including commercial products such as IBM DB2®, IBM Informix®, and Oracle Database, and open source projects such as MySQL, PostgreSQL SQLite, and MariaDB (a fork of MySQL). Relational databases use SQL as a data definition and query language.
This tutorial helps you prepare for Objective 105.3 in Topic 105 of the Linux Server Professional (LPIC-1) exam 102. The objective has a weight of 2.
Prerequisites
To get the most from the tutorials in this series, you need a basic knowledge of Linux and a working Linux system on which you can practice the commands covered in this tutorial. You should be familiar with GNU and UNIX® commands. Sometimes, different versions of a program format output differently, so your results might not always look exactly like the listings shown here.
For this tutorial, you need a database, such as MariaDB, and the documentation for your database.
The SQL examples in this tutorial are largely distribution and database independent. Most use the open source MariaDB version 10.0.21 on Fedora 23 with a 4.2.6 kernel. I also include a few examples that use IBM DB2 Express-C version 10.5, a no-charge community edition of the IBM DB2 data server, on CentOS 6.7 with a 2.6.32 kernel. By comparing these examples, you can get an idea of the kinds of differences that you might encounter when you work with multiple database systems. If you need to implement portable database programming or scripts, you need to learn more about the ISO/ANSI SQL standards than I can cover here.
Some database-manipulation commands are database-specific. Small differences in SQL syntax, particularly for nonstandard extensions, also exist. Consult the documentation for the database that you are using, as necessary.
Databases, tables, columns, and rows
A relational database consists of a set of tables. Think of each row of data in the table as a record, with each column of the table corresponding to the fields in the record for the corresponding row. The data in a column is all of the same type— such as character, integer, date, or binary data (such as images). By using structured data in this way, you can construct meaningful queries such as “find all the employees who were hired after a certain date” or “find all the parcels of land that are larger than 0.25 acres.”
The data in a relational database can have a unique value in each row, such as an employee ID for an employee record, a parcel ID for a municipal land database, or a user ID for a computer system. If so, you probably create an index for that column, and you probably designate one such column as your primary key for the table. Keys and indexes help the database engine to speed up retrieval of your data.
When you install a database, you usually install a database server. You might also install a client, or you can access the server through programmed applications or the command line. The server usually runs as a daemon process, and your client or application usually connects to it over a TCP/IP connection. You must take steps to secure the installation and prevent access that you do not want to support. Database installation and security are beyond the scope of the LPI 102 exam objectives.
Many Linux distributions come with packages for MariaDB, SQLite, PostgreSQL, and MySQL. For this tutorial, I primarily use MariaDB. See “Distributions Which Include MariaDB,” or use the appropriate package-management tool for your distribution to check what is already packaged for your distribution. MariaDB is an open source fork of MySQL, so mysql
is part of many of the command names.
Getting started with MariaDB
I give you a few brief tips here to help you get running with MariaDB on a distribution that packages it. I use Fedora 23 in this tutorial. Start by installing the mariadb
and mariadb-server
packages, which pull in several other packages that are needed.
Next, run the mysql_secure_installation
command as a user with root authority. By running this command, you:
- Set up a root database user with a password
- Remove the anonymous user that is initially installed for testing
- Ensure that the root user can log in only from the local system
- Remove the test database (optional)
- Reload the privilege tables to have the preceding changes take effect immediately
The database root user is not the system root user and should not have the same password. You can rerun mysql_secure_installation
if you need to make changes.
For this tutorial, I use the employee sample database that is available as part of the test_db package from GitHub. Listing 1 shows the steps that I used to install the database on my system.
Listing 1. Installing the sample employee database
[ian@attic-f23 ~]$ unzip -q test_db-master.zip
[ian@attic-f23 ~]$ cd test_db-master
[ian@attic-f23 test_db-master]$ mysql -u root -p <employees.sql
Enter password:
INFO
CREATING DATABASE STRUCTURE
INFO
storage engine: InnoDB
INFO
LOADING departments
INFO
LOADING employees
INFO
LOADING dept_emp
INFO
LOADING dept_manager
INFO
LOADING titles
INFO
LOADING salaries
data_load_time_diff
NULL
The default InnoDB engine used in is fine for this tutorial’s purposes. See Related topics for more information on database engines.
Your first database
Now that MariaDB is installed on your system, you can start to see what you have. Listing 2 uses the mysqlshow
command to show the databases that I have installed. The -u
option specifies the database root user, and the -p
option tells mysqlshow
to prompt you for the password that you defined when you ran the mysql_secure_installation
command.
Listing 2. What databases do I have?
[ian@attic-f23 ~]$ mysqlshow -u root -p
Enter password:
+--------------------+
| Databases |
+--------------------+
| employees |
| information_schema |
| mysql |
| performance_schema |
| test |
+--------------------+
You can see that I have five databases: the employees
database that I just created, the test
database that I did not delete, and three other databases. Database programs usually include several databases to describe the database itself, and you see them in Listing 2.
The mysqlshow
command is an convenience tool for quickly listing information about databases, tables, and columns. MariaDB (and MySQL) include an interactive command-line interface (CLI) program called mysql
that is similar to a database shell. DB2 also has a CLI program, called db2
. As with shells such as bash, you can pass a single command to either of these database shells, or you can run an interactive session with many commands. Listing 3 shows the database information via the mysql
command with the -e
option to execute a single database command.
Listing 3. Listing MariaDB databases
[ian@attic-f23 ~]$ mysql -u root -p -e "show databases"
Enter password:
+--------------------+
| Database |
+--------------------+
| employees |
| information_schema |
| mysql |
| performance_schema |
| test |
+--------------------+
If you also installed DB2 Express-C, you created a user named db2inst1 (the default) to manage the database. Listing 4 shows how to use the db2
command to get corresponding information about DB2 databases.
Listing 4. Listing DB2 databases
[ian@attic4-cent ~]$ db2 list database directory
System Database Directory
Number of entries in the directory = 1
Database 1 entry:
Database alias = SAMPLE
Database name = SAMPLE
Local database directory = /home/db2inst1
Database release level = 10.00
Comment =
Directory entry type = Indirect
Catalog database partition number = 0
Alternate server hostname =
Alternate server port number =
Exploring tables and columns
Before you can extract information from a database, you need to know what is in the database. In Listing 5, you see how to:
- Start the
mysql
interactive database shell and connect to theemployees
database using the root ID that you created - Use the
show tables
command to see what tables are in theemployees
database - Use the
describe
command to find out what columns are in theemployees
table in theemployees
database
Listing 5. Showing MariaDB table and column information
[ian@attic-f23 ~]$ mysql -u root -p employees
Enter password:
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 19
Server version: 10.0.21-MariaDB MariaDB Server
Copyright (c) 2000, 2015, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [employees]> show tables;
+----------------------+
| Tables_in_employees |
+----------------------+
| current_dept_emp |
| departments |
| dept_emp |
| dept_emp_latest_date |
| dept_manager |
| employees |
| salaries |
| titles |
+----------------------+
8 rows in set (0.00 sec)
MariaDB [employees]> describe employees;
+------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------+------+-----+---------+-------+
| emp_no | int(11) | NO | PRI | NULL | |
| birth_date | date | NO | | NULL | |
| first_name | varchar(14) | NO | | NULL | |
| last_name | varchar(16) | NO | | NULL | |
| gender | enum('M','F') | NO | | NULL | |
| hire_date | date | NO | | NULL | |
+------------+---------------+------+-----+---------+-------+
6 rows in set (0.00 sec)
Each field (or column) in the employees
table has several attributes:
- Field: The name of the column.
- Type: A data type for the column. Many data types have a maximum length. For example,
int(11)
specifies an integer that can hold 11 digits, andvarchar(16)
specifies a variable-length character string with up to 16 bytes of data. See the “” section for more on data types. - Null: Tells whether the column is allowed to have null (empty) values.
- Key: The key type if the column is a key. A primary (
PRI
) key must contain unique values that are not null. - Default: Specifies what default value is placed in the column if a record is added and no data is supplied for the column.
- Extra: Specifies extra attributes (for example,
auto_increment
, which is used to create a unique incrementing number such as a serial number).
If you need help in the shell, use the help
(or ?
) command. Listing 6 shows the help output for the describe
command.
Listing 6. Help for the MariaDB describe command
MariaDB [employees]> ? describe
Name: 'DESCRIBE'
Description:
Syntax:
{DESCRIBE | DESC} tbl_name [col_name | wild]
DESCRIBE provides information about the columns in a table. It is a
shortcut for SHOW COLUMNS FROM. These statements also display
information for views. (See [HELP SHOW COLUMNS].)
col_name can be a column name, or a string containing the SQL "%" and
"_" wildcard characters to obtain output only for the columns with
names matching the string. There is no need to enclose the string
within quotation marks unless it contains spaces or other special
characters.
MariaDB> DESCRIBE City;
+------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+---------+----------------+
| Id | int(11) | NO | PRI | NULL | auto_increment |
| Name | char(35) | NO | | | |
| Country | char(3) | NO | UNI | | |
| District | char(20) | YES | MUL | | |
| Population | int(11) | NO | | 0 | |
+------------+----------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
The description for SHOW COLUMNS provides more information about the
output columns (see [HELP SHOW COLUMNS]).
URL: https://mariadb.com/kb/en/library/describe/
Notice that the help shows DESCRIBE
in all-uppercase letters. In general, SQL command names are not case sensitive. Whether names for other objects — such as databases, tables, or columns — are case sensitive depends on your database program and the platform that it runs on. See Related topics for more information on names in MariaDB.
SQL data types
You saw examples of SQL data types in Show 4. Typical SQL data types fall broadly into four categories:
- String data types store fixed-length or variable-length character or binary strings and also large objects. Examples include
CHAR(8)
,VARCHAR(240)
,BINARY(12)
,VARBINARY(500)
, andBLOB(200000)
. - Numeric data types store fixed-point or floating-point numbers. Examples include
SMALLINT
(16 bits),INT
orINTEGER
(32-bits),BIGINT
(64 bits),FLOAT
(single precision floating point),DOUBLE
(double precision floating point), andDECIMAL
(a packed decimal number with a decimal point). Integer data can be signed or unsigned. - Boolean data types store
TRUE
orFALSE
values. - Date and time values store dates and times that can be used in comparisons. Examples include
DATE
andTIME
.
These examples are not exhaustive, and different database programs might expand on them. For example, DB2 supports CLOB
for character large objects and DBCLOB
for large objects that contain double-byte character data. MariaDB supports TINYINT
for a 1-byte integer and ENUM
to store enumeration data, as you saw for the gender
field in Listing 5.
Note: The use of enumeration data types in SQL is somewhat controversial. A more traditional approach uses a reference table with a foreign key. Search the Internet to learn more about the debate over enumeration data types in databases.
Selecting data from tables
You have now seen examples of databases, tables, and the types of data in the tables. The next step is to get data out of the tables. You retrieve data by using the SELECT
statement. You can select everything in the table, or you can select from particular columns. Listing 7 shows what is in the departments
table, how to select everything from it via SELECT `SELECT`, and then how to select only the department names.
Listing 7. Basic use of the SELECT command
MariaDB [employees]> describe departments;
+-----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| dept_no | char(4) | NO | PRI | NULL | |
| dept_name | varchar(40) | NO | UNI | NULL | |
+-----------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
MariaDB [employees]> select * from departments;
+---------+--------------------+
| dept_no | dept_name |
+---------+--------------------+
| d009 | Customer Service |
| d005 | Development |
| d002 | Finance |
| d003 | Human Resources |
| d001 | Marketing |
| d004 | Production |
| d006 | Quality Management |
| d008 | Research |
| d007 | Sales |
+---------+--------------------+
9 rows in set (0.00 sec)
MariaDB [employees]> select dept_name from departments;
+--------------------+
| dept_name |
+--------------------+
| Customer Service |
| Development |
| Finance |
| Human Resources |
| Marketing |
| Production |
| Quality Management |
| Research |
| Sales |
+--------------------+
9 rows in set (0.00 sec)
Sorting and choosing
When you select everything from the departments
table as in Listing 7, the output is sorted by the department name. If you want to sort your output data according to the values in one or more columns, you use the ORDER BYORDERBY
clause. Specify a comma-separated list of columns. You can also specify ASC
for ascending sort (the default) or DESC
for descending sort order. Listing 8 shows how to sort your output by dept_no
in descending order.
Listing 8. Sorting query output
MariaDB [employees]> select dept_name,dept_no from departments order by dept_no desc;
+--------------------+---------+
| dept_name | dept_no |
+--------------------+---------+
| Customer Service | d009 |
| Research | d008 |
| Sales | d007 |
| Quality Management | d006 |
| Development | d005 |
| Production | d004 |
| Human Resources | d003 |
| Finance | d002 |
| Marketing | d001 |
+--------------------+---------+
9 rows in set (0.00 sec)
You can choose which data to display by using the WHERE
clause. You can compare columns, or compare a column value against a single value. Such a single value — called a scalar— can be a constant (which must be enclosed in quotation marks) or the result of a scalar function, such as today’s date. Use Boolean AND
, OR
, and NOT
operators to define more-complex conditions. Listing 9 shows some examples, using the departments
table.
Listing 9. Using the WHERE clause to limit data output
MariaDB [employees]> select * from departments where dept_no > 'd007';
+---------+------------------+
| dept_no | dept_name |
+---------+------------------+
| d008 | Research |
| d009 | Customer Service |
+---------+------------------+
2 rows in set (0.00 sec)
MariaDB [employees]> select * from departments
-> WHERE dept_name = 'Customer Service' OR (
-> dept_no > 'd002' AND dept_no <= "d006"
-> );
+---------+--------------------+
| dept_no | dept_name |
+---------+--------------------+
| d009 | Customer Service |
| d005 | Development |
| d003 | Human Resources |
| d004 | Production |
| d006 | Quality Management |
+---------+--------------------+
5 rows in set (0.00 sec)
Notice in Listing 9 that the output from the more-complex WHERE
clause is not sorted. Use the ORDER
clause for your desired sort. Notice also that you can enter SQL commands on multiple lines. MariaDB gives you a ->
prompt on the second and subsequent lines. I remove these secondary prompts from subsequent examples so that you can copy and paste the command directly into your own system more easily.
You can also do pattern matching and direct comparison by using LIKE
. Most SQL databases support the following two wildcard characters:
%
matches any number of characters, including the empty string._
matches any single character.
Listing 10 shows how to find all department names that contain the string es
.
Listing 10. Using the WHERE clause with LIKE for pattern matching
MariaDB [employees]> select * from departments WHERE dept_name LIKE '%es%';
+---------+-----------------+
| dept_no | dept_name |
+---------+-----------------+
| d003 | Human Resources |
| d008 | Research |
| d007 | Sales |
+---------+-----------------+
3 rows in set (0.00 sec)
Some databases, including MariaDB, support regular-expression pattern matching. See the documentation for your database for more information on this topic.
SQL functions for columns
You have seen some simple comparison operators. SQL also has several functions that you can use, including:
COUNT
: Counts the number of returned rows.DISTINCT
: Selects only distinct values.MAX
andMIN
: Select the maximum or minimum value of a pair of values.NOW
: Returns the current date and time.DATEDIFF
: Subtracts two dates and returns the number of days between them.DAY
: Returns the day of the week corresponding to a given date.LEAST
: Finds the least of a set of values.
Many other such functions are available, so consult your documentation to learn about them.
Listing 11 shows you how to count the number of rows in two different tables.
Listing 11. Using COUNT to count the number of rows in a table
MariaDB [employees]> # How many employees?
MariaDB [employees]> select count(*) from employees;
+----------+
| count(*) |
+----------+
| 300024 |
+----------+
1 row in set (0.08 sec)
MariaDB [employees]> # How many managers?
MariaDB [employees]> select count(*) from dept_manager;
+----------+
| count(*) |
+----------+
| 24 |
+----------+
1 row in set (0.00 sec)
Listing 12 shows how to use the DISTINCT
function to see if some managers were hired on the same date as others.
MariaDB [employees]> describe dept_manager;
+-----------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------+------+-----+---------+-------+
| emp_no | int(11) | NO | PRI | NULL | |
| dept_no | char(4) | NO | PRI | NULL | |
| from_date | date | NO | | NULL | |
| to_date | date | NO | | NULL | |
+-----------+---------+------+-----+---------+-------+
4 rows in set (0.00 sec)
MariaDB [employees]> select DISTINCT(from_date) from dept_manager order by from_date;
+------------+
| from_date |
+------------+
| 1985-01-01 |
| 1988-09-09 |
| 1988-10-17 |
| 1989-05-06 |
| 1989-12-17 |
| 1991-03-07 |
| 1991-04-08 |
| 1991-09-12 |
| 1991-10-01 |
| 1992-03-21 |
| 1992-04-25 |
| 1992-08-02 |
| 1992-09-08 |
| 1994-06-28 |
| 1996-01-03 |
| 1996-08-30 |
+------------+
16 rows in set (0.00 sec)
Only 16 rows are returned. You could have combined COUNT
and DISTINCT
to find this number, but you also have the actual start dates in Listing 12. You don’t yet have any idea which date or dates have multiple managers starting.
Suppose you want to find out how long a manager has been managing. You can calculate the difference between the from_date
and to_date
, but how does this work for someone who is currently managing? Tables often use a marker of some kind to indicate a current date — possibly a NULL
value, or possibly a date far in the future. The dept_manager
table uses a future date for this purpose. Someone whose to_date
is later than today’s date is still managing. Use the NOW
function to get the current time stamp, or use the CURDATE
function to get the current date only. Listing 13 shows one way to find how many days each manager has been a manager.
Listing 13. Finding how long managers have been managing
MariaDB [employees]> # Show marker for managers still managing
MariaDB [employees]> select max(to_date) from dept_manager;
+--------------+
| max(to_date) |
+--------------+
| 9999-01-01 |
+--------------+
1 row in set (0.00 sec)
MariaDB [employees]> # Calculate duration of management in days
MariaDB [employees]> select emp_no,
datediff(least(to_date, curdate()),from_date)
from dept_manager;
+--------+-----------------------------------------------+
| emp_no | datediff(least(to_date, curdate()),from_date) |
+--------+-----------------------------------------------+
| 110022 | 2464 |
| 110039 | 8869 |
| 110085 | 1811 |
| 110114 | 9522 |
| 110183 | 2636 |
| 110228 | 8697 |
| 110303 | 1347 |
| 110344 | 1423 |
| 110386 | 1489 |
| 110420 | 7074 |
| 110511 | 2671 |
| 110567 | 8662 |
| 110725 | 1586 |
| 110765 | 859 |
| 110800 | 1020 |
| 110854 | 7868 |
| 111035 | 2256 |
| 111133 | 9077 |
| 111400 | 2288 |
| 111534 | 9045 |
| 111692 | 1385 |
| 111784 | 1422 |
| 111877 | 1212 |
| 111939 | 7314 |
+--------+-----------------------------------------------+
24 rows in set (0.00 sec)
Using aliases
When you have a long expression such as datediff(least(to_date,curdate()),from_date)
, you might want to shorten the column heading or provide an alias for the expression because you plan to use it elsewhere in your query — for example, in a WHERE
clause. Listing 14 shows how to use aliases to provide shorter headings, to find out how many years all the current managers have been managing.
Listing 14. Using aliases for shorter headings
MariaDB [employees]> select emp_no AS Employee,
(datediff(least(to_date, curdate()),from_date)/365.25) as Years
from dept_manager where to_date > curdate();
+----------+---------+
| Employee | Years |
+----------+---------+
| 110039 | 24.2820 |
| 110114 | 26.0698 |
| 110228 | 23.8111 |
| 110420 | 19.3676 |
| 110567 | 23.7153 |
| 110854 | 21.5414 |
| 111133 | 24.8515 |
| 111534 | 24.7639 |
| 111939 | 20.0246 |
+----------+---------+
9 rows in set (0.00 sec)
Grouping data with GROUP BY
Sometimes you want aggregate information from a table. For example, you want to know how many employees are in each $10K salary range. You can use the GROUP BY
clause to group your data for this purpose. Listing 15 shows how.
Listing 15. Aggregating data by using GROUP BY
MariaDB [employees]> describe salaries;
+-----------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------+------+-----+---------+-------+
| emp_no | int(11) | NO | PRI | NULL | |
| salary | int(11) | NO | | NULL | |
| from_date | date | NO | PRI | NULL | |
| to_date | date | NO | | NULL | |
+-----------+---------+------+-----+---------+-------+
4 rows in set (0.00 sec)
MariaDB [employees]> select salary DIV 10000 as 10K, count(*) as Number
from salaries where to_date > curdate() group by 10K;
+------+--------+
| 10K | Number |
+------+--------+
| 3 | 85 |
| 4 | 20220 |
| 5 | 44666 |
| 6 | 56236 |
| 7 | 49128 |
| 8 | 32351 |
| 9 | 19939 |
| 10 | 10611 |
| 11 | 4729 |
| 12 | 1645 |
| 13 | 421 |
| 14 | 78 |
| 15 | 15 |
+------+--------+
13 rows in set (1.42 sec)
Further selection by using HAVING
In Group 1, the values in the Number
column are computed from aggregate data. Suppose that you are only interested in salary ranges that have 5,000 or fewer employees in the range. Your first idea might be to use a WHERE
clause, but you cannot use it for intermediate result data that is computed as an aggregate. You need to use a HAVING
clause to restrict the intermediate results to a subset that has a particular condition or combination of conditions. Listing 16 shows how to find salary ranges that have 5,000 or fewer employees in the range.
Listing 16. Using the HAVING clause
MariaDB [employees]> select salary DIV 10000 as 10K, count(*) as Number
from salaries where to_date > curdate() group by 10K
HAVING Number <= 5000;
+------+--------+
| 10K | Number |
+------+--------+
| 3 | 85 |
| 11 | 4729 |
| 12 | 1645 |
| 13 | 421 |
| 14 | 78 |
| 15 | 15 |
+------+--------+
6 rows in set (1.45 sec)
Getting data from multiple tables by using JOIN
So far, the examples in this tutorial each use a single table. The salaries
table and the dept_manager
table contain the employee number but not the name or other employee information. Employee information is kept in the employees
table. Keeping data in separate tables is how SQL design eliminates (or at least reduces) both redundancy in data storage and the associated risk of data being updated in one place but not another.
When you want to extract information such as the name and gender of all current managers, you need to get this information from both the dept_manager
table and the employees
table. For this purpose, you use the JOIN
clause with a conditional expression that specifies the join condition. Most commonly, you join two tables on a single field, where the field value is the same in both tables — for example, the emp_no
field that occurs in both the dept_manager
table and the employees
table.
Listing 17 shows how to use JOIN
to find the name and gender of all current managers. (Note the use of the aliases e
and m
for the dept_manager
table and the employees
table.)
Listing 17. Using the JOIN clause for current manager information
MariaDB [employees]> SELECT e.first_name, e.last_name, e.gender
FROM employees as e JOIN dept_manager as m
ON e.emp_no = m.emp_no
WHERE m.to_date > now()
ORDER BY e.last_name;
+------------+------------+--------+
| first_name | last_name | gender |
+------------+------------+--------+
| Leon | DasSarma | F |
| Oscar | Ghazalie | M |
| Hilary | Kambil | F |
| Isamu | Legleitner | F |
| Vishwani | Minakawa | M |
| Dung | Pesch | M |
| Karsten | Sigstam | F |
| Yuchang | Weedman | M |
| Hauke | Zhang | M |
+------------+------------+--------+
9 rows in set (0.00 sec)
If you want salary instead of gender as an output column, you must join the salaries
table to the other two tables, as shown in Listing 18. (I added some parentheses, which I often find helpful in constructing complex queries.)
Listing 18. Using the JOIN clause for current manager salaries
MariaDB [employees]> SELECT e.first_name, e.last_name, s.salary
FROM (employees as e JOIN dept_manager as m ON e.emp_no = m.emp_no)
JOIN salaries as s on e.emp_no = s.emp_no
WHERE m.to_date > now() AND s.to_date > now()
ORDER BY e.last_name;
+------------+------------+--------+
| first_name | last_name | salary |
+------------+------------+--------+
| Leon | DasSarma | 74510 |
| Oscar | Ghazalie | 56654 |
| Hilary | Kambil | 79393 |
| Isamu | Legleitner | 83457 |
| Vishwani | Minakawa | 106491 |
| Dung | Pesch | 72876 |
| Karsten | Sigstam | 65400 |
| Yuchang | Weedman | 58745 |
| Hauke | Zhang | 101987 |
+------------+------------+--------+
9 rows in set (0.00 sec)
The JOIN
examples in Join 1 and Join 2 are called inner joins: They find the rows from both tables that match the join condition. The three other common kinds of joins are:
- LEFT JOIN
LEFTJOIN
finds all rows from the left table and the rows from the right table that match the join condition. - RIGHT JOIN
RIGHTJOIN
finds all rows from the right table and the rows from the left table that match the join condition. - OUTER JOIN
OUTERJOIN
combines the results of LEFT JOINLEFTJOIN
and RIGHT JOINRIGHTJOIN
. Many databases, including IBM DB2, call this a FULL OUTER JOINFULLOUTERJOIN
.
The INNER JOININNERJOIN
is the most common and is the default if no join type is specified. Listing 19 illustrates a LEFT JOINLEFTJOIN
to show selected employees and their from_date
and to_date
if they happen to be a manager. In this example, I also use a LIMIT
value to limit the output to a maximum of 15 rows.
Listing 19. Using the LEFT JOIN clause
MariaDB [employees]> SELECT e.emp_no, e.first_name, e.last_name, s.salary, m.from_date, m.to_date
FROM (employees as e LEFT JOIN dept_manager as m ON e.emp_no = m.emp_no)
JOIN salaries as s on e.emp_no = s.emp_no AND s.to_date > now()
WHERE e.last_name LIKE 'Kambi%' AND e.first_name > 'G'
ORDER BY e.last_name, e.first_name limit 15;
+--------+------------+-----------+--------+------------+------------+
| emp_no | first_name | last_name | salary | from_date | to_date |
+--------+------------+-----------+--------+------------+------------+
| 431582 | Gaurav | Kambil | 118128 | NULL | NULL |
| 252478 | Gaurav | Kambil | 69516 | NULL | NULL |
| 487991 | Gift | Kambil | 115960 | NULL | NULL |
| 204311 | Gil | Kambil | 96756 | NULL | NULL |
| 416604 | Gonzalo | Kambil | 80009 | NULL | NULL |
| 236164 | Hausi | Kambil | 66130 | NULL | NULL |
| 412003 | Hausi | Kambil | 83213 | NULL | NULL |
| 111534 | Hilary | Kambil | 79393 | 1991-04-08 | 9999-01-01 |
| 295702 | Huei | Kambil | 49498 | NULL | NULL |
| 77408 | Idoia | Kambil | 67122 | NULL | NULL |
| 271049 | Jianhao | Kambil | 58393 | NULL | NULL |
| 216820 | JiYoung | Kambil | 87541 | NULL | NULL |
| 206261 | Jongsuk | Kambil | 78396 | NULL | NULL |
| 250164 | Josyula | Kambil | 98835 | NULL | NULL |
| 289558 | Jouko | Kambil | 51393 | NULL | NULL |
+--------+------------+-----------+--------+------------+------------+
15 rows in set (0.14 sec)
Using subselects
Sometimes you are interested in only part of the data in a query, and you want to manipulate just that part. For this purpose, you can use a subselect (also called a subquery), which is essentially a SELECT
within another SELECT
. You can also use a subselect in a FROM
, WHERE
, or HAVING
clause. Sometimes the work done by a subselect can also be done by a JOIN
. With so many possibilities, I show you two examples to get you started.
The titles
table in the employees
database shows which positions an employee has held. Some employees have held multiple positions. Listing 20 shows how to find the positions held by three employees. Note the use of the IN
operator to select the results from a set and the BETWEEN
operator to select employee numbers between two values.
Listing 20. Employees having held different positions
MariaDB [employees]> describe titles;
+-----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| emp_no | int(11) | NO | PRI | NULL | |
| title | varchar(50) | NO | PRI | NULL | |
| from_date | date | NO | PRI | NULL | |
| to_date | date | YES | | NULL | |
+-----------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
MariaDB [employees]> select * from titles where emp_no IN (10001, 10004, 499666);
+--------+--------------------+------------+------------+
| emp_no | title | from_date | to_date |
+--------+--------------------+------------+------------+
| 10001 | Senior Engineer | 1986-06-26 | 9999-01-01 |
| 10004 | Engineer | 1986-12-01 | 1995-12-01 |
| 10004 | Senior Engineer | 1995-12-01 | 9999-01-01 |
| 499666 | Assistant Engineer | 1987-10-18 | 1994-10-18 |
| 499666 | Engineer | 1994-10-18 | 2001-10-18 |
| 499666 | Senior Engineer | 2001-10-18 | 9999-01-01 |
+--------+--------------------+------------+------------+
6 rows in set (0.00 sec)
MariaDB [employees]> select * from titles where emp_no BETWEEN 10001 AND 10004;
+--------+-----------------+------------+------------+
| emp_no | title | from_date | to_date |
+--------+-----------------+------------+------------+
| 10001 | Senior Engineer | 1986-06-26 | 9999-01-01 |
| 10002 | Staff | 1996-08-03 | 9999-01-01 |
| 10003 | Senior Engineer | 1995-12-03 | 9999-01-01 |
| 10004 | Engineer | 1986-12-01 | 1995-12-01 |
| 10004 | Senior Engineer | 1995-12-01 | 9999-01-01 |
+--------+-----------------+------------+------------+
5 rows in set (0.01 sec)
You already know how to use GROUP BYGROUPBY
and COUNT
to find out how many positions each employee has held. What if you only want to know how many employees have held a single position, how many have held two positions, and so on? Listing 21 shows how you construct a query to find the number of positions held by each employee, then how to use this query as a subselect in a query to answer the question of how many employees have held a single position, how many have held two positions, and so on. I name the subquery subq
. The subquery returns only a single column: jobs
. The main query refers to this column by using the qualified name subq.jobs
.
Listing 21. Using a subselect to find out how many employees have held multiple positions
MariaDB [employees]> SELECT count(*) as jobs from titles group by emp_no limit 5;
+------+
| jobs |
+------+
| 1 |
| 1 |
| 1 |
| 2 |
| 2 |
+------+
5 rows in set (0.00 sec)
MariaDB [employees]> SELECT jobs as '# jobs', count(subq.jobs) as '# employees'
FROM (
SELECT count(*) as jobs from titles group by emp_no
) subq
group by jobs;
+--------+-------------+
| # jobs | # employees |
+--------+-------------+
| 1 | 159754 |
| 2 | 137256 |
| 3 | 3014 |
+--------+-------------+
3 rows in set (0.31 sec)
Now suppose that you would like to know who the top 10 salary earners are in the company. Listing 22 shows another subselect that returns this information.
Listing 22. Using a subselect to find top salary earners
MariaDB [employees]> SELECT e.emp_no, e.last_name, e.first_name,
(
SELECT MAX(salary) FROM salaries s
WHERE e.emp_no = s.emp_no
) max_sal
FROM employees e ORDER BY max_sal desc limit 10;
+--------+-----------+------------+---------+
| emp_no | last_name | first_name | max_sal |
+--------+-----------+------------+---------+
| 43624 | Pesch | Tokuyasu | 158220 |
| 254466 | Mukaidono | Honesty | 156286 |
| 47978 | Whitcomb | Xiahua | 155709 |
| 253939 | Luders | Sanjai | 155513 |
| 109334 | Alameldin | Tsutomu | 155377 |
| 80823 | Baca | Willard | 154459 |
| 493158 | Meriste | Lidong | 154376 |
| 205000 | Griswold | Charmane | 153715 |
| 266526 | Chenoweth | Weijing | 152710 |
| 237542 | Hatcliff | Weicheng | 152687 |
+--------+-----------+------------+---------+
10 rows in set (2.66 sec)
I mentioned that you can sometimes do the work of a subselect by using JOIN
. Listing 23 shows how you can find the top 10 salary earners by using a JOIN
. Note that this query executes faster than the subselect of Listing 22. In general, prefer a join over a subselect if performance matters. The performance difference depends on your database engine.
Listing 23. Using a JOIN to find top salary earners
MariaDB [employees]> SELECT e.emp_no, e.last_name, e.first_name, max(s.salary) as max_sal
FROM employees e JOIN salaries s
ON e.emp_no = s.emp_no
GROUP BY e.emp_no ORDER BY max_sal desc limit 10;
+--------+-----------+------------+---------+
| emp_no | last_name | first_name | max_sal |
+--------+-----------+------------+---------+
| 43624 | Pesch | Tokuyasu | 158220 |
| 254466 | Mukaidono | Honesty | 156286 |
| 47978 | Whitcomb | Xiahua | 155709 |
| 253939 | Luders | Sanjai | 155513 |
| 109334 | Alameldin | Tsutomu | 155377 |
| 80823 | Baca | Willard | 154459 |
| 493158 | Meriste | Lidong | 154376 |
| 205000 | Griswold | Charmane | 153715 |
| 266526 | Chenoweth | Weijing | 152710 |
| 237542 | Hatcliff | Weicheng | 152687 |
+--------+-----------+------------+---------+
10 rows in set (2.16 sec)
Interesting results with ENUM
values
In the “” section, I mentioned that enumeration types in SQL are somewhat controversial. The gender
column in the employees
table is an ENUM
with values M
and F
. These values look like characters but are actually stored internally as integers. They are sorted in an ORDER BYORDERBY
clause according to their internal numeric representation, which can give a possibly surprising result. Listing 24 shows the first five employees sorted by gender.
Listing 24. Ordering output via ENUM fields
MariaDB [employees]> SELECT * FROM (
SELECT * FROM employees LIMIT 5
) x ORDER BY x.gender;
+--------+------------+------------+-----------+--------+------------+
| emp_no | birth_date | first_name | last_name | gender | hire_date |
+--------+------------+------------+-----------+--------+------------+
| 10001 | 1953-09-02 | Georgi | Facello | M | 1986-06-26 |
| 10003 | 1959-12-03 | Parto | Bamford | M | 1986-08-28 |
| 10004 | 1954-05-01 | Chirstian | Koblick | M | 1986-12-01 |
| 10005 | 1955-01-21 | Kyoichi | Maliniak | M | 1989-09-12 |
| 10002 | 1964-06-02 | Bezalel | Simmel | F | 1985-11-21 |
+--------+------------+------------+-----------+--------+------------+
5 rows in set (0.00 sec)
It might surprise you to see the M
values listed before the F
values. If you want the alphabetic collating order, use a CAST
or CONVERT
in the ORDER BYORDERBY
clause. For example:
ORDER BY CAST(x.gender AS CHAR)
Creating, changing, and deleting data and tables
You learned about tables and how to explore their structure. You also learned how to create SQL queries to find answers from the data in one or more tables. In the remainder of this tutorial, I show you how to create and manipulate tables and how to insert, update, and delete the data in your tables.
Creating tables
Suppose that you decide to create a reference table for employee gender, rather than use an enumeration. You decide to key off the initial of the gender, which is M
or F
in English. You also want a second column for the full word, MALE
or FEMALE
. The first step is to create the table by using CREATE TABLECREATETABLE
. The basic form of CREATE TABLECREATETABLE
provides a table name followed by a list of columns, indexes, and constraints. Listing 25 shows how to create the gender
table with two columns, a single character as the primary key, and a variable character field of up to 20 characters.
Listing 25. Creating a new table
MariaDB [employees]> CREATE TABLE gender (
code CHAR(1) NOT NULL,
gender VARCHAR(20) NOT NULL,
PRIMARY KEY (code)
);
Query OK, 0 rows affected (0.23 sec)
MariaDB [employees]> describe gender;
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| code | char(1) | NO | PRI | NULL | |
| gender | varchar(20) | NO | | NULL | |
+--------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
Inserting, updating, and deleting data
Now that you have an empty table, you need to populate it with data. Use the INSERT
command to load one or more rows of data. You specify a list of columns that you want to insert data into, and you then specify one or more lists of matching values. Listing 26 shows how to add the two rows that you planned.
Listing 26. Inserting data in the gender table
MariaDB [employees]> INSERT INTO gender (code, gender)
VALUES
('F', 'FEMALE'),
('M', 'MALE');
Query OK, 2 rows affected (0.03 sec)
Records: 2 Duplicates: 0 Warnings: 0
MariaDB [employees]> select * from gender;
+------+--------+
| code | gender |
+------+--------+
| F | FEMALE |
| M | MALE |
+------+--------+
2 rows in set (0.00 sec)
Assume next that you think some employees might need to be added to the database before you know their gender, so you decide to add a third possibility for UNASSIGNED
gender. Listing 27 shows another way to insert data: by using the SET
clause.
Listing 27. Inserting a new row in the gender table
MariaDB [employees]> INSERT INTO gender SET code='U', gender='UNSIGNED';
Query OK, 1 row affected (0.04 sec)
MariaDB [employees]> select * from gender;
+------+----------+
| code | gender |
+------+----------+
| F | FEMALE |
| M | MALE |
| U | UNSIGNED |
+------+----------+
3 rows in set (0.00 sec)
The sample database I am using has only two gender values. Suppose that your company adopts a gender diversity policy and you need additional values for gender diverse employees. Use either the VALUES
option or SET
option with INSERT INTO
to insert more rows into your table.
Because of a typographical error, I inserted UNSIGNED
when I meant to insert UNASSIGNED
. Use the UPDATE
command to fix this mistake, as shown in Listing 28.
Listing 28. Updating a table value
MariaDB [employees]> UPDATE gender SET gender='UNASSIGNED' WHERE code='U';
Query OK, 1 row affected (0.03 sec)
Rows matched: 1 Changed: 1 Warnings: 0
MariaDB [employees]> select * from gender;
+------+------------+
| code | gender |
+------+------------+
| F | FEMALE |
| M | MALE |
| U | UNASSIGNED |
+------+------------+
3 rows in set (0.00 sec)
Now the human resources manager tells you that the company always knows a new employee’s gender before the employee is added to the database, so you need to delete the entry for unassigned gender. Use the DELETE
command, as shown in Listing 29.
Listing 29. Deleting a table row
MariaDB [employees]> DELETE FROM gender WHERE code='U';
Query OK, 1 row affected (0.04 sec)
MariaDB [employees]> select * from gender;
+------+--------+
| code | gender |
+------+--------+
| F | FEMALE |
| M | MALE |
+------+--------+
2 rows in set (0.00 sec)
Altering tables
Sometimes you want to change a table. Some of the many things you can do include:
- Rename a column.
- Add a new column.
- Delete a column.
- Change data type of a column.
- Change constraints on a column.
Listing 30 shows how to alter the gender
table to add a new column, gender_fr
for a French version of the gender
column, load the new French data, then rename the existing gender
column to gender_en
.
Listing 30. Altering the gender table
MariaDB [employees]> ALTER TABLE gender ADD COLUMN gender_fr VARCHAR(20) AFTER gender;
Query OK, 0 rows affected (0.38 sec)
Records: 0 Duplicates: 0 Warnings: 0
MariaDB [employees]> UPDATE gender SET gender_fr='MASCULIN' WHERE code='M';
Query OK, 1 row affected (0.03 sec)
Rows matched: 1 Changed: 1 Warnings: 0
MariaDB [employees]> UPDATE gender SET gender_fr='FMININ' WHERE code='F';
Query OK, 1 row affected (0.04 sec)
Rows matched: 1 Changed: 1 Warnings: 0
MariaDB [employees]> ALTER TABLE gender CHANGE COLUMN gender gender_en VARCHAR(20);
Query OK, 0 rows affected (0.38 sec)
Records: 0 Duplicates: 0 Warnings: 0
MariaDB [employees]> select * from gender;
+------+-----------+-----------+
| code | gender_en | gender_fr |
+------+-----------+-----------+
| F | FEMALE | FÉMININ |
| M | MALE | MASCULIN |
+------+-----------+-----------+
2 rows in set (0.00 sec)
Other ways to create tables and views
You can also create a table and populate it with data from a SELECT
. Listing 31 shows a simple example of how to create a table of female employees.
Listing 31. Creating a table of female employees
MariaDB [employees]> create table female_employees
select * from employees where gender='F';
Query OK, 120051 rows affected (3.22 sec)
Records: 120051 Duplicates: 0 Warnings: 0
When you create a table this way, the new table probably will not inherit all the attributes of the source table. Compare the table descriptions in Listing 32. Notice that the female_employees
table has no key.
Listing 32. Comparing the employees and female_employees tables
MariaDB [employees]> describe employees;
+------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------+------+-----+---------+-------+
| emp_no | int(11) | NO | PRI | NULL | |
| birth_date | date | NO | | NULL | |
| first_name | varchar(14) | NO | | NULL | |
| last_name | varchar(16) | NO | | NULL | |
| gender | enum('M','F') | NO | | NULL | |
| hire_date | date | NO | | NULL | |
+------------+---------------+------+-----+---------+-------+
6 rows in set (0.00 sec)
MariaDB [employees]> describe female_employees;
+------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------+------+-----+---------+-------+
| emp_no | int(11) | NO | | NULL | |
| birth_date | date | NO | | NULL | |
| first_name | varchar(14) | NO | | NULL | |
| last_name | varchar(16) | NO | | NULL | |
| gender | enum('M','F') | NO | | NULL | |
| hire_date | date | NO | | NULL | |
+------------+---------------+------+-----+---------+-------+
6 rows in set (0.00 sec)
Something else to consider if you create a table from existing data is how you plan to keep the data in the two tables synchronized in the future. In this case, you probably want a VIEW
, which enables you to use the result of the SELECT
as if it were a table. The data stays in the underlying table or tables, and only a single copy needs to be updated. Listing 33 shows how to create a view for male employees. Note the word AS
between the view name and the SELECT
.
Listing 33. Creating a view for male employees
MariaDB [employees]> CREATE VIEW male_employees AS
select * from employees where gender='M';
Query OK, 0 rows affected (0.05 sec)
MariaDB [employees]> select * from male_employees limit 5;
+--------+------------+------------+-----------+--------+------------+
| emp_no | birth_date | first_name | last_name | gender | hire_date |
+--------+------------+------------+-----------+--------+------------+
| 10001 | 1953-09-02 | Georgi | Facello | M | 1986-06-26 |
| 10003 | 1959-12-03 | Parto | Bamford | M | 1986-08-28 |
| 10004 | 1954-05-01 | Chirstian | Koblick | M | 1986-12-01 |
| 10005 | 1955-01-21 | Kyoichi | Maliniak | M | 1989-09-12 |
| 10008 | 1958-02-19 | Saniya | Kalloufi | M | 1994-09-15 |
+--------+------------+------------+-----------+--------+------------+
5 rows in set (0.00 sec)
If you want to know which tables in a database are really views, a command is available to help you. Different databases use different commands. In MariaDB use:
show full tables
When you create a table, you can combine your column specifications and the result of a SELECT
. For the final example of table creation, Listing 34 shows you one way to make a copy of the employees table with a CHAR(1)
field for gender
instead of the enumeration and how to set this value by using an IF
statement.
Listing 34. Making a revised copy of the employees table
MariaDB [employees]> CREATE TABLE employees_new (
emp_no int(11) NOT NULL,
birth_date date NOT NULL,
first_name varchar(14) NOT NULL,
last_name varchar(16) NOT NULL,
gender CHAR(1),
hire_date date NOT NULL,
PRIMARY KEY (emp_no)
) select
emp_no, birth_date, first_name, last_name,
IF(gender = 'M', 'M', 'F') as gender,
hire_date from employees;
Query OK, 300024 rows affected (5.92 sec)
Records: 300024 Duplicates: 0 Warnings: 0
MariaDB [employees]> describe employees_new;
+------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| emp_no | int(11) | NO | PRI | NULL | |
| birth_date | date | NO | | NULL | |
| first_name | varchar(14) | NO | | NULL | |
| last_name | varchar(16) | NO | | NULL | |
| gender | char(1) | YES | | NULL | |
| hire_date | date | NO | | NULL | |
+------------+-------------+------+-----+---------+-------+
6 rows in set (0.00 sec)
The creation and maintenance of foreign keys is beyond the scope of this introductory tutorial. However, you can still use a JOIN
to join the employees_new
and gender
table to perform queries such as those that I illustrate in Listing 35.
Listing 35. Joining the employees_new and gender tables
MariaDB [employees]> select e.emp_no, e.first_name, e.last_name, e.gender,
g.gender_en, g.gender_fr
from employees_new e join gender g on e.gender=g.code
limit 5;
+--------+------------+-----------+--------+-----------+-----------+
| emp_no | first_name | last_name | gender | gender_en | gender_fr |
+--------+------------+-----------+--------+-----------+-----------+
| 10001 | Georgi | Facello | M | MALE | MASCULIN |
| 10002 | Bezalel | Simmel | F | FEMALE | FÉMININ |
| 10003 | Parto | Bamford | M | MALE | MASCULIN |
| 10004 | Chirstian | Koblick | M | MALE | MASCULIN |
| 10005 | Kyoichi | Maliniak | M | MALE | MASCULIN |
+--------+------------+-----------+--------+-----------+-----------+
5 rows in set (0.00 sec)
Deleting tables and databases
I showed you how to use DELETE
to delete rows from a table. If you want to drop a column, rather than a row, you need to use ALTER TABLEALTERTABLE
. For example, if you want to delete the gender
column from the female_employees
table, use the command illustrated in Listing 36.
Listing 36. Deleting the gender column from the female_employees table
MariaDB [employees]> ALTER TABLE female_employees DROP COLUMN gender;
Query OK, 0 rows affected (4.32 sec)
Records: 0 Duplicates: 0 Warnings: 0
MariaDB [employees]> describe female_employees;
+------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| emp_no | int(11) | NO | | NULL | |
| birth_date | date | NO | | NULL | |
| first_name | varchar(14) | NO | | NULL | |
| last_name | varchar(16) | NO | | NULL | |
| hire_date | date | NO | | NULL | |
+------------+-------------+------+-----+---------+-------+
5 rows in set (0.00 sec)
To drop the female_employees
table and male_employees
view, use the DROP TABLEDROPTABLE
and DROP VIEWDROPVIEW
commands shown in Listing 37.
Listing 37. Deleting the female_employees table and male_employees view
MariaDB [employees]> drop table female_employees;
Query OK, 0 rows affected (0.14 sec)
MariaDB [employees]> drop view male_employees;
Query OK, 0 rows affected (0.00 sec)
If you want to remove the entire employees
database and all of its tables, use the DROP DATABASEDROPDATABASE
command, as shown in Listing 38.
Listing 38. Deleting the entire employees database
MariaDB [employees]> drop database employees;
Query OK, 10 rows affected (0.92 sec)
This concludes your brief introduction to SQL. I have only scratched the surface of a large subject, so I encourage you to use this tutorial as a starting point for further exploration.