Introduction
When there are a lot of foreign key constraints in a database, you may find it difficult to visualize the foreign key relationships among tables. This article discusses how to write SQL queries to find the foreign key relationships in DB2 for Linux, UNIX, and Windows.
The following variations will be discussed.
- Given a foreign key parent table, return the RI (Referential Integrity) children and descendants, as well as the RI relationship paths from the parent table to these children and descendants.
- Modify the query provided to return results for all tables in the database.
Sample schema
The sample schema shown in Listing 1 will be used for the examples shown in this article.
Sample schema
set schema newton;
create table grandparent (i1 int not null primary key, i2 int, i3 int);
create table parent (i1 int not null primary key, i2 int);
create table parent2 (i1 int not null primary key, i2 int);
create table child (i1 int not null primary key, i2 int, i3 int);
create table grandchild (i1 int not null primary key, i2 int, i3 int);
alter table parent add constraint fkp1 foreign key (i2) references grandparent;
alter table parent2 add constraint fkp2 foreign key (i2) references grandparent;
alter table child add constraint fk1 foreign key (i2) references parent;
alter table child add constraint fk2 foreign key (i3) references parent2;
alter table grandchild add constraint fk3 foreign key (i2) references child;
alter table grandchild add constraint fk4 foreign key (i3) references parent2;
create table gp (i1 int not null, i2 int not null, i3 int, primary key (i1, i2));
create table p1 (i1 int not null primary key, i2 int, i3 int);
create table c11 (i1 int not null primary key, i2 int);
create table c12 (i1 int not null primary key, i2 int);
alter table p1 add constraint fkp1 foreign key (i2, i3) references gp;
alter table c11 add constraint fkc11 foreign key (i2) references p1;
alter table c12 add constraint fkc12 foreign key (i2) references p1;
alter table gp add constraint fkgp1 foreign key (i2) references c12;
create table self (i1 int not null primary key, i2 int);
alter table self add constraint fk_self foreign key (i2) references self;
How to display all of the RI constraints
In the simplest form, you can obtain a listing of all the foreign key constraints by querying the catalog view SYSCAT.REFERENCES.
SELECT * FROM SYSCAT.REFERENCES
The results can be joined with SYSCAT.KEYCOLUSE to find the foreign key columns.
To produce a comma-separated list of foreign key columns used in RI constraints, you can use the LISTAGG() aggregate function when joining SYSCAT.REFERENCES with SYSCAT.KEYCOLUSE, as shown in Listing 2.
LISTAGG() aggregate function
select substr(R.reftabschema,1,12) as P_Schema, substr(R.reftabname,1,12) as PARENT,
substr(R.tabschema,1,12) as C_Schema, substr (R.tabname,1,12) as CHILD,
substr(R.constname,1,12) as CONSTNAME,
substr(LISTAGG(C.colname,', ') WITHIN GROUP (ORDER BY C.colname),1,20) as FKCOLS
from syscat.references R, syscat.keycoluse C
where R.constname = C.constname and R.tabschema = C.tabschema and R.tabname = C.tabname
group by R.reftabschema, R.reftabname, R.tabschema, R.tabname, R.constname;
The results will look similar to what is shown in Listing 3.
LISTAGG() aggregate function output
P_SCHEMA PARENT C_SCHEMA CHILD CONSTNAME FKCOLS
------------ ------------ ------------ ------------ ------------ ----------------
NEWTON PARENT NEWTON CHILD FK1 I2
NEWTON PARENT2 NEWTON CHILD FK2 I3
NEWTON CHILD NEWTON GRANDCHILD FK3 I2
NEWTON PARENT2 NEWTON GRANDCHILD FK4 I3
NEWTON P1 NEWTON C11 FKC11 I2
NEWTON P1 NEWTON C12 FKC12 I2
NEWTON C12 NEWTON GP FKGP1 I2
NEWTON GP NEWTON P1 FKP1 I2 , I3
NEWTON GRANDPARENT NEWTON PARENT FKP1 I2
NEWTON GRANDPARENT NEWTON PARENT2 FKP2 I2
NEWTON SELF NEWTON SELF FK_SELF I2
However, in a complex database, the relationship between a parent table and its non-immediate descendants will not be obvious from the results of the simple query described previously.
Displaying all of the foreign key children and descendants of a given table
You could write a recursive query to traverse the RI relationships. However, if you are using DB2 for Linux, UNIX, and Windows, Version 9.7 or later, a hierarchical query is a better alternative to traverse the RI relationships in SYSCAT.REFERENCES recursively. See the Related topics section for more information.
Use of hierarchical queries in DB2 requires the setting of the DB2_COMPATIBILITY_VECTOR:
db2set DB2_COMPATIBILITY_VECTOR=08
db2stop
db2start
Hierarchical queries allow the specification of the START WITH and CONNECT BY clauses.
For example, the query (Query 1) shown in Listing 4 returns all RI descendants and the unique path to them from the root table NEWTON.GRANDPARENT.
Query 1
with
root_parents (root_parent_schema, root_parent_name) AS
(select * from table(values ('NEWTON', 'GRANDPARENT')))
select
substr(connect_by_root reftabname,1,11) as root,
substr (level, 1,5) as lvl,
-- substr(reftabschema,1,6) as parent_schema,
substr(reftabname,1,11) as parent,
-- substr(tabschema,1,6) as child_schema,
substr(tabname,1,10) as child,
substr(constname,1,5) as cnst,
substr(sys_connect_by_path(reftabname, '->') || '->' ||
substr(tabname,1,20),1,42) as chain
from syscat.references
start with (reftabschema,reftabname) in (select root_parent_schema,
root_parent_name from root_parents)
connect by nocycle prior tabname = reftabname
and tabschema = reftabschema;
Note: The schema name has been removed from the output of all queries for better formatting.
Query 1 will return the results shown in Listing 5.
Query 1 output
ROOT LVL PARENT CHILD CNST CHAIN
----------- ----- ----------- ---------- ----- ------------------------------------------
GRANDPARENT 1 GRANDPARENT PARENT FKP1 ->GRANDPARENT->PARENT
GRANDPARENT 2 PARENT CHILD FK1 ->GRANDPARENT->PARENT->CHILD
GRANDPARENT 3 CHILD GRANDCHILD FK3 ->GRANDPARENT->PARENT->CHILD->GRANDCHILD
GRANDPARENT 1 GRANDPARENT PARENT2 FKP2 ->GRANDPARENT->PARENT2
GRANDPARENT 2 PARENT2 CHILD FK2 ->GRANDPARENT->PARENT2->CHILD
GRANDPARENT 3 CHILD GRANDCHILD FK3 ->GRANDPARENT->PARENT2->CHILD->GRANDCHILD
GRANDPARENT 2 PARENT2 GRANDCHILD FK4 ->GRANDPARENT->PARENT2->GRANDCHILD
Note: The query will display all of the unique paths from the given root table to each of its descendant tables. If there is a diamond shape RI path, both paths will be shown in the result. As shown in the previous example, both of the following paths will be shown.
- GRANDPARENT->PARENT->CHILD->GRANDCHILD
- GRANDPARENT->PARENT2->CHILD->GRANDCHILD
The following are several advantages of using hierarchical queries over traditional recursive queries.
- The recursion is naturally handled by the START WITH and CONNECT BY clauses without the need to write a recursive query.
- The pseudocolumn LEVEL automatically returns the level of the table from root parent.
- The scalar function sys_connect_by_path() builds a string representing a path from the root to a node in hierarchical queries. In the previous example, the function is used in the CHAIN column to build the RI relationship chain from the root table.
- If there is any RI cycle, for example, GP → P1 → C12 → GP, the NOCYCLE clause allows you to direct the recursion to ignore the duplicated rows in the cycle.
Displaying the foreign key children and descendants relationship for all tables with no RI cycles in the database
If there are no RI cycles in the database, you can modify the common table expression root_parents to include all the tables with at least one child but no parent, as shown in Query 2 in Listing 6.
Query 2
with
root_parents (root_parent_schema, root_parent_name) AS
(select tabschema, tabname
from syscat.tables
where parents = 0 and children > 0)
select
substr(connect_by_root reftabname,1,11) as root,
substr (level, 1,5) as lvl,
-- substr(reftabschema,1,6) as parent_schema,
substr(reftabname,1,11) as parent,
-- substr(tabschema,1,6) as child_schema,
substr(tabname,1,10) as child,
substr(constname,1,5) as cnst,
substr(sys_connect_by_path(reftabname, '->') || '->' ||
substr(tabname,1,20),1,42) as chain
from syscat.references
start with (reftabschema,reftabname) in (select root_parent_schema,
root_parent_name from root_parents)
connect by prior tabname = reftabname and tabschema = reftabschema;
However, Query 2 will not include tables that are in an RI cycle, because such tables will have parents > 0 and children > 0.
Displaying the foreign key children and descendant relationships which may have RI cycles
With user input of root tables in RI cycles
From a business logic point of view, some tables in an RI cycle will be a better candidate as a root parent table in the RI cycle. However, it will not be easy for a database to determine which table in an RI cycle should be the root parent table.
If there are not too many RI cycles in the system, you can write a semi-automatic query to manually specify one table per each RI cycle as the root parent table.
In Query 3, shown in Listing 7, by adding a VALUES clause in the UNION ALL operations in the common table expression root_parents to manually specify the root table in each RI cycle, the query will return all the RI children and descendants in the database.
Query 3
with
root_parents (root_parent_schema, root_parent_name) AS
(select tabschema, tabname
from syscat.tables
where parents = 0 and children > 0
UNION ALL
select * from table(values ('NEWTON', 'GP'), ('NEWTON', 'SELF')))
select
substr(connect_by_root reftabname,1,11) as root,
substr (level, 1,3) as lvl,
-- substr(reftabschema,1,6) as parent_schema,
substr(reftabname,1,11) as parent,
-- substr(tabschema,1,6) as child_schema,
substr(tabname,1,10) as child,
substr(constname,1,7) as cnstnam,
substr(sys_connect_by_path(reftabname, '->') || '->' ||
substr(tabname,1,20),1,42) as chain
from syscat.references
start with (reftabschema,reftabname) in (select root_parent_schema,
root_parent_name from root_parents)
connect by NOCYCLE prior tabname = reftabname and tabschema = reftabschema;
The result will look similar to what is shown in Listing 8.
Query 3 output
ROOT LVL PARENT CHILD CNSTNAM CHAIN
----------- --- ----------- ---------- ------- ------------------------------------------
SELF 1 SELF SELF FK_SELF ->SELF->SELF
GRANDPARENT 1 GRANDPARENT PARENT FKP1 ->GRANDPARENT->PARENT
GRANDPARENT 2 PARENT CHILD FK1 ->GRANDPARENT->PARENT->CHILD
GRANDPARENT 3 CHILD GRANDCHILD FK3 ->GRANDPARENT->PARENT->CHILD->GRANDCHILD
GRANDPARENT 1 GRANDPARENT PARENT2 FKP2 ->GRANDPARENT->PARENT2
GRANDPARENT 2 PARENT2 CHILD FK2 ->GRANDPARENT->PARENT2->CHILD
GRANDPARENT 3 CHILD GRANDCHILD FK3 ->GRANDPARENT->PARENT2->CHILD->GRANDCHILD
GRANDPARENT 2 PARENT2 GRANDCHILD FK4 ->GRANDPARENT->PARENT2->GRANDCHILD
GP 1 GP P1 FKP1 ->GP->P1
GP 2 P1 C11 FKC11 ->GP->P1->C11
GP 2 P1 C12 FKC12 ->GP->P1->C12
GP 3 C12 GP FKGP1 ->GP->P1->C12->GP
Without user input of root tables in RI cycles
If you prefer full automation and do not care which table in an RI cycle is selected as the root table, as long as one representative from each RI cycle is included, the following procedure will store one representative table from each RI cycle in a temporary table called CYCLEROOTS. At the end, the procedure uses the table names stored in CYCLEROOTS, shown previously in Listing 8, to display all RI chains in the database, as shown in Listing 9.
Query 3 to display all RI chains
-- If needed, create the user temporary tablespace for the temporary table.
CREATE BUFFERPOOL BUFFERPOOL4K PAGESIZE 4K;
CREATE USER TEMPORARY TABLESPACE STMPTSP4 PAGESIZE 4K BUFFERPOOL BUFFERPOOL4K;
-- create the temporary table to store one representative from each RI cycle as root
create GLOBAL TEMPORARY TABLE SESSION.CYCLEROOTS (SCHEMANAME VARCHAR(128),
TABNAME VARCHAR(128));
--#SET TERMINATOR @
-- procedure to display RI chains in the database
CREATE PROCEDURE newton.FIND_RI_CHAINS ()
DETERMINISTIC
NO EXTERNAL ACTION
DYNAMIC RESULT SETS 1
BEGIN
DECLARE CYCLESCHEMA VARCHAR(128);
DECLARE CYCLETABLE VARCHAR(128);
DECLARE ROWS_FETCHED BIGINT;
DECLARE C_CYCLETABLES CURSOR;
-- This query will return the final result set after
-- temporary table SESSION.CYCLEROOTS has been populated.
DECLARE C_RESULTS CURSOR WITH RETURN TO CLIENT FOR
WITH
ROOT_PARENTS (ROOT_PARENT_SCHEMA, ROOT_PARENT_NAME) AS
(SELECT TABSCHEMA, TABNAME
FROM SYSCAT.TABLES
WHERE PARENTS = 0 AND CHILDREN > 0
UNION ALL
SELECT * FROM SESSION.CYCLEROOTS),
HIERARCHY (ROOT, LEVEL, REFTABSCHEMA, REFTABNAME, TABSCHEMA, TABNAME,
CONSTNAME, CHAIN) AS
(SELECT
CONNECT_BY_ROOT REFTABNAME,
LEVEL,
REFTABSCHEMA AS P_SCHEMA,
REFTABNAME AS PARENT,
TABSCHEMA AS C_SCHEMA,
TABNAME AS CHILD,
CONSTNAME AS CNSTNAM,
SUBSTR(SYS_CONNECT_BY_PATH(REFTABNAME, '->') || '->' ||
SUBSTR(TABNAME,1,20),1,42) AS CHAIN
FROM SYSCAT.REFERENCES
START WITH (REFTABSCHEMA,REFTABNAME) IN (SELECT ROOT_PARENT_SCHEMA,
ROOT_PARENT_NAME
FROM ROOT_PARENTS)
CONNECT BY NOCYCLE PRIOR TABNAME = REFTABNAME AND TABSCHEMA =
REFTABSCHEMA)
SELECT
SUBSTR(H.root,1,11) AS ROOT,
CAST (H.LEVEL AS CHAR(2)) as LVL,
-- SUBSTR(H.REFTABSCHEMA,1,6) as P_SCHEMA,
SUBSTR(H.REFTABNAME,1,11) as PARENT,
-- SUBSTR(H.TABSCHEMA,1,6) as C_SCHEMA,
SUBSTR(H.TABNAME,1,10) as CHILD,
SUBSTR(H.CONSTNAME,1,7) as CNSTNAM,
SUBSTR(H.CHAIN,1,42) as CHAIN
FROM HIERARCHY H;
-- initialize temporary table
DELETE FROM SESSION.CYCLEROOTS;
-- this query will return the remaining tables that are in RI cycles
SET C_CYCLETABLES = CURSOR FOR
WITH
ROOT_PARENTS (ROOT_PARENT_SCHEMA, ROOT_PARENT_NAME) AS
(SELECT TABSCHEMA, TABNAME
FROM SYSCAT.TABLES
WHERE PARENTS = 0 AND CHILDREN > 0
UNION ALL
SELECT * FROM SESSION.CYCLEROOTS),
HIERARCHY (ROOT, LEVEL, REFTABSCHEMA, REFTABNAME, TABSCHEMA,
TABNAME, CONSTNAME, CHAIN) AS
(SELECT CONNECT_BY_ROOT REFTABNAME AS ROOT, LEVEL,
REFTABSCHEMA AS PARENT_SCHEMA,
REFTABNAME AS PARENT,
TABSCHEMA AS CHILD_SCHEMA,
TABNAME AS CHILD,
CONSTNAME AS CONSTNAME,
SUBSTR(SYS_CONNECT_BY_PATH(REFTABNAME, '->') || '->' ||
SUBSTR(TABNAME,1,20),1,50) AS CHAIN
FROM SYSCAT.REFERENCES
START WITH (REFTABSCHEMA,REFTABNAME) IN (SELECT ROOT_PARENT_SCHEMA,
ROOT_PARENT_NAME FROM ROOT_PARENTS)
CONNECT BY NOCYCLE PRIOR TABNAME = REFTABNAME AND
TABSCHEMA = REFTABSCHEMA)
SELECT TABSCHEMA, TABNAME
FROM SYSCAT.TABLES
WHERE CHILDREN > 0
EXCEPT
SELECT REFTABSCHEMA, REFTABNAME FROM HIERARCHY H;
OPEN C_CYCLETABLES;
-- Just select the first table as a root table from the remaining tables that are
-- in RI cycles
FETCH C_CYCLETABLES INTO CYCLESCHEMA, CYCLETABLE;
SET ROWS_FETCHED = CURSOR_ROWCOUNT(C_CYCLETABLES);
-- Keep looping until the result set from C_CYCLETABLES is empty.
WHILE (ROWS_FETCHED > 0) DO
-- insert the select representative into temporary table SESSION.CYCLEROOTS
INSERT INTO SESSION.CYCLEROOTS VALUES (CYCLESCHEMA, CYCLETABLE);
CLOSE C_CYCLETABLES;
-- restart the cursor. The result set will be different from the
-- the previous iterations of the loop. All the tables that
-- are in the same RI cycle as the newly added representative
-- will not show up in the result set from the next iteration of the
-- loop.
OPEN C_CYCLETABLES;
FETCH FROM C_CYCLETABLES INTO CYCLESCHEMA, CYCLETABLE;
SET ROWS_FETCHED = CURSOR_ROWCOUNT(C_CYCLETABLES);
END WHILE;
CLOSE C_CYCLETABLES;
-- When the loop exits, one representative from each RI cycle will have
-- been added to the temporary table SESSION.CYCLEROOTS. We will now
-- open C_RESULTS using the populated temporary table SESSION.CYCLEROOTS.
OPEN C_RESULTS;
END@
--#SET TERMINATOR ;
CALL newton.FIND_RI_CHAINS();
DROP TABLE SESSION.CYCLEROOTS;
The result will look similar to what is shown in Listing 10.
Query 3 to display all RI chains output
Result set 1
--------------
ROOT LVL PARENT CHILD CNSTNAM CHAIN
---------- --- ----------- ---------- ------- ------------------------------------------
SELF 1 SELF SELF FK_SELF ->SELF->SELF
GRANDPARENT 1 GRANDPARENT PARENT FKP1 ->GRANDPARENT->PARENT
GRANDPARENT 2 PARENT CHILD FK1 ->GRANDPARENT->PARENT->CHILD
GRANDPARENT 3 CHILD GRANDCHILD FK3 ->GRANDPARENT->PARENT->CHILD->GRANDCHILD
GRANDPARENT 1 GRANDPARENT PARENT2 FKP2 ->GRANDPARENT->PARENT2
GRANDPARENT 2 PARENT2 CHILD FK2 ->GRANDPARENT->PARENT2->CHILD
GRANDPARENT 3 CHILD GRANDCHILD FK3 ->GRANDPARENT->PARENT2->CHILD->GRANDCHILD
GRANDPARENT 2 PARENT2 GRANDCHILD FK4 ->GRANDPARENT->PARENT2->GRANDCHILD
C12 1 C12 GP FKGP1 ->C12->GP
C12 2 GP P1 FKP1 ->C12->GP->P1
C12 3 P1 C11 FKC11 ->C12->GP->P1->C11
C12 3 P1 C12 FKC12 ->C12->GP->P1->C12
12 record(s) selected.
Return Status = 0
Note that table C12 has been selected as the representative of the RI cycle involving GP->P1->C12->GP.
Conclusion
This article showed how to write a hierarchy query to find the foreign key relationships in the database. In particular, it has given an example of the query to find the foreign key children and descendant tables of a given table. Examples of queries and procedures to find all the foreign key relationships of all tables in the database, both with and without user input of the root tables in RI cycles, have also been given.