SQL performance tuning is a never ending battle. Throughout this technote, I am going to provide some tips on how users can themselves identify problematic queries & take countermeasures to rectify it along with SQL best practices which can be adopted to write a optimized query.

Unnecessary SQL

  • Avoid unnecessary execution of SQL
  • Consider accomplishing as much as possible with a single call, rather than multiple calls

Rows Returned

  • Minimize the number of rows searched and/or returned
  • Code predicates to limit the result to only the rows needed
  • Avoid generic queries that do not have a WHERE clause

Column Selection

  • Minimize the number of columns retrieved and/or updated
  • Specify only the columns needed
  • Avoid SELECT *
  • Extra columns increases row size of the result set
  • Retrieving very few columns can encourage index-only access

Singleton SELECT vs. Cursor
If a single row is returned Singleton SELECT .. INTO outperforms a Cursor & error when more than 1 row is returned.

If multiple rows are returned Cursor requires overhead of OPEN, FETCH, and CLOSE

For Row Update:
When the selected row must be retrieved first:
Use FOR UPDATE OF clause with a CURSOR

Using a Singleton SELECT
the row can be updated by another program after the singleton SELECT but before the subsequent UPDATE, causing a possible data integrity issue

Use For Fetch Only

  • When a SELECT statement is used only for data retrieval – use FOR FETCH ONLY
  • FOR READ ONLY clause provides the same function – and is ODBC compliant
  • Enables DB2 to use ‘block fetch’
  • Monitor the performance to decide which is best for each situation

Avoid Sorting

  • >DISTINCT – always results in a sort
  • UNION – always results in a sort
  • UNION ALL – does not sort, but retains any duplicates
  • ORDER BY– May be faster if columns are indexed & use it to guarantee the sequence of the data
  • GROUP By – specify only columns that need to be grouped, may be faster if the columns are indexed & do not include extra columns in SELECT list or GROUP BY because DB2 must sort the rows.

Subselects

DB2 processes the subselect (inner select) first before the outer select. You may be able to improve performance of complex queries by coding a complex predicate in a subselect. Applying the predicate in the subselect may reduce the number of rows returned

Use Inline Views

Inline views allow the FROM clause of a SELECT statement to contain another SELECT statement. We may enhance performance of the outer select by applying predicates in the inner select. Useful when detail and aggregated data must be returned in a single query

Indexes

Create indexes for columns you frequently:

  • ORDER BY
  • GROUP BY (better than a DISTINCT)
  • SELECT DISTINCT
  • JOIN
  • Several factors determine whether the index will be used.

Avoid Data Conversion

When comparing column values to host variables – use the same Data Type & Length. When DB2 must convert data, available indexes are sometimes not used

Join Predicates

  • Response time -> determined mostly by the number of rows participating in the join.
  • Provide accurate join predicates
  • Never use a JOIN without a predicate
  • Join ON indexed columns
  • Use Joins over subqueries
  • When the results of a join must be sorted – limiting the ORDER BY to columns of a single table can avoid a sort
  • specifying columns from multiple tables causes a sort
  • Favor coding explicit INNER and LEFT OUT joins over RIGHT OUTER joins
  • EXPLAIN converts RIGHT to LEFT join

Example: Outer Join With A Local Predicate
SELECT emp.empno, emp.lastname, dept.deptname
FROM emp LEFT OUTER JOIN dept
ON emp.workdept = dept.deptno
WHERE emp.salary > 50000.00;

Works correctly but… the outer join is performed first, before any rows are filtered out.

Example: Outer Join Using An Inline View
SELECT emp.empno, emp.lastname, dept.deptname
FROM (SELECT empno, lastname
FROM emp WHERE salary > 50000.00) as e
LEFT OUTER JOIN dept
ON emp.workdept = dept.deptno

Works better… applies the inner join predicates first, reducing number of rows to be joined.

OR vs. UNION

  • OR requires Stage 2 processing
  • Consider rewriting the query as the union of 2 SELECTs, making index access possible
  • UNION ALL avoids the sort, but duplicates are included
  • Monitor and EXPLAIN the query to decide which is best

Use BETWEEN

BETWEEN is usually more efficient than <= predicate and the >= predicate

Except when comparing a host variable to 2 columns
Stage 2 : WHERE
:hostvar BETWEEN col1 and col2
Stage 1: WHERE
Col1 <= :hostvar AND col2 >= :hostvar

Use IN Instead of Like

If you know that only a certain number of values exist and can be put in a list
Use IN or BETWEEN
IN ('Value1', 'Value2', 'Value3')
or
BETWEEN :valuelow AND :valuehigh
Rather than:
LIKE 'Value_'

Use LIKE With Care

Avoid the % or the _ at the beginning because it prevents DB2 from using a matching index and may cause a scan
Use the % or the _ at the end to encourage index usage

Avoid NOT
Predicates formed using NOT are Stage 1. But they are not indexable
For Subquery – when using negation logic:
Use NOT Exists , DB2 tests non-existence instead of NOT IN ,DB2 must materialize the complete result set.

Use Exists

Use EXISTS to test for a condition and get a True or False returned by DB2 and not return any rows to the query:
SELECT col1 FROM table1
WHERE EXISTS
(SELECT 1 FROM table2
WHERE table2.col2 = table1.col1)

Code the Most Restrictive Predicate First

After the indexes, place the predicate that will eliminate the greatest number of rows first
Know your data
Race, Gender, Type of Student, Year, Term

Avoid Arithmetic in Predicates

An index is not used for a column when the column is in an arithmetic expression.
Stage 1 but not indexable

SELECT col1
FROM table1
WHERE col2 = :hostvariable + 10

Limit Scalar Function Usage

Scalar functions are not indexable .But, you can use scalar functions to offload work from the application program
Examples:DATE,SUBSTR,CHAR etc.

Other Cautions

  • Predicates that contain concatenated columns are not indexable
  • SELECT Count(*) can be expensive
  • CASE Statement – powerful but can be expensive

Filter Factors for Predicates

Filter factor is based on the number of rows that will be filtered out by the predicate

  • A ratio that estimates I/O costs
  • The lower the filter factor, the lower the cost, and in general, the more efficient the query

Filter Factor Formulas – use FIRSTKEYCARDF column from the SYSINDEXES table of the Catalog

If there are no statistics for the indexes, the default filter factors are used
The lowest default filter factor is .01:
Column BETWEEN Value1 AND Value2
Column LIKE ‘char%’

Equality predicates have a default filter factor of .04:
Column = value
Column = :hostvalue
ColumnA = ColumnB (of different tables)
Column IS NULL

Comparative Operators have a default filter factor of .33
Column <, <=, >, >= value

IN List predicates have a filter factor of .04 * (list size)
Column IN (list of values)

Not Equal predicates have a default filter factor of .96:
Column <> value
Column <> :hostvalue
ColumnA <> ColumnB (of different tables)

Not List predicates have a filter factor of 1 – (.04 * (list size))
Column NOT IN (list of values)

Other Not Predicates that have a default filter factor of .90
Column NOT BETWEEN Value1 and Value2
Column NOT IN (non-correlated subquery)
Column <> ALL (non-correlated subquery)

Column Matching

With a composite index, the column matching stops at one predicate past the last equality predicate.
See Example in the handout that uses a 4 column index.

(C1 = :hostvar1 AND C2 = :hostvar2 AND C3 = (non column expression) AND C4 > :hostvar4)
Stage 1 – Indexable with 4 matching columns

(C1 = :hostvar1 AND C2 BETWEEN :hostvar2 AND :hostvar3 AND C3 = :hostvar4)
Stage 1 – Indexable with 2 matching columns

(C1 > value1 AND C2 = :hostvar2 AND C2 IN (value1, value2, value3, value4))
Stage 1 – Indexable with 1 matching column

(C1 = :hostvar1 AND C2 LIKE 'ab%xyz_1' AND C3 NOT BETWEEN :hostvar3 AND :hostvar4 AND C4 = value1)
Indexable with C1 = :hostvar1 AND C2 LIKE 'ab%xyz_1'

Stage 1 – LIKE ‘ab%xyz_1’ AND C3 NOT BETWEEN :hostvar3 AND :hostvar4 AND C4 = value1

With two indexes: C1.C2 and C3.C4

(C1 = :hostvar1 AND C2 LIKE :hostvar2) OR
(C3 = (non column expression) AND C4 > :hostvar4)

Multiple Index Access
1 column matching of first index
2 columns matching on second index
LIKE will be Stage 2

Order of Predicate Evaluation

  1. Indexed predicates
  2. Non-indexed predicates – Stage 1 then Stage 2

Within each of the groups above, predicates are evaluated in this sequence:

  1. Equality predicates, including single element IN list predicates
  2. Range and NOT NULL predicates
  3. All other predicates
  4. If multiple predicates are of the exact same type, they are evaluated in the order in which they are coded in the predicate.

REORG & RUNSTATS

Runstats Utility

  • updates the catalog tables with information about the tables in your system
  • used by the Optimizer for determining the best access path for a SQL statement

Reorg Utility

  • reorganizes the data in your tables

1 comment on"SQL Coding Strategies and Guidelines"

  1. Thank you for valuable tips. Applied some of these in my site http://lithiumbatterywale.com and improved loading time.

Join The Discussion

Your email address will not be published. Required fields are marked *