Making DB2 case-insensitive

Archived content

Archive date: 2020-09-24

This content is no longer being updated or maintained. The content is provided “as is.” Given the rapid evolution of technology, some content, steps, or illustrations may have changed.

Introduction

New capabilities with DB2 9.5 fixpack 1

Locale-aware Unicode collation algorithm-based collations were introduced in DB2 9.5 fixpack 1. These collations can be tailored to ignore case and/or accents. When a database is created with a collation that ignores case, all database operations become case insensitive. It is necessary to create a new Unicode database to use these collations.

To create a database with an English collation that is case insensitive, use the following statement:

 CREATE DATABASE testdb 
 USING CODESET UTF-8 TERRITORY US 
 COLLATE USING UCA500R1_LEN_S2

Selecting from tables in this database ignores case.

 SELECT lastname FROM employee 
 ORDER BY lastname 
 
 LASTNAME 
 ----------------- 
 abrams 
 Adams 
 allison 
 Alomar 
 
 4 record(s) selected.

Case insensitivity applies to all operations in the database.

 SELECT lastname FROM employee 
 WHERE lastname LIKE 'AL%' 
 ORDER BY lastname 
 
 LASTNAME 
 ----------------- 
 allison 
 Alomar 
 
 2 record(s) selected.

I answer a lot of questions about DB2®, and one of the common ones I see is about how to make DB2 case-insensitive. The short answer is that DB2 for Linux®, UNIX®, and Windows® Version 7.2 is case sensitive: BLAIR is not the same as Blair. Case sensitivity is powerful (why else would Netscape give you the choice of 'Match case' when you do a search?), and a relational database can search more quickly if it knows that Greenland is not a match for greenland. However, in all probability, your users probably want your application to return 'MacInnis' if they submit a search on 'Macinnis'. Case insensitivity is important if you can't control the data coming into the database; basically, in any circumstance in which users enter data for themselves, such as their names and addresses, or when you are loading outside data. Allowing creation of a case-insensitive database in DB2 is an important requirement for future releases.

All samples in this article were tested with DB2 Version 7, FixPak 6 on Windows 2000. They will have the same behavior on UNIX as well.

Using functions to provide case insensitivity

However, there are things you can do to provide case-insensitive support at the database design level. All members of the DB2 family supply the following functions, which can all be used to fold lowercase data in CHAR and VARCHAR fields to uppercase:

TRANSLATE, which can also be used for substitution.
UCASE.
UPPER (an alternative syntax for UCASE).

See the SQL Reference manual for the syntax and options for these functions.

LCASE and LOWER fold uppercase data to lowercase.

Here are some examples of these functions in use:

     SELECT UPPER(lastname) FROM employee

1              
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑
HAAS
THOMPSON
KWAN

     SELECT LCASE(lastname) FROM employee

1
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑
haas
thompson
kwan

So, although your data may be stored in mixed case, you can present it in consistent uppercase or lowercase by using views. For example:

CREATE VIEW upperemp
         (empno, firstnme, midinit, lastname, workdept, phoneno,
             hiredeate, edlevel, birthdate, salary, bonus, comm)
AS SELECT UPPER(empno), UPPER(firstnme), UPPER(midinit), UPPER(lastname),
UPPER(workdept), UPPER(phoneno), hiredate, edlevel, birthdate,
      salary, bonus, comm
 FROM employee

If you select against the view, you can guarantee that results will be evaluated and returned in uppercase. This is important for sorting. Respecting mixed case means that lowercase a is sequenced before uppercase A, resulting in allison coming before Adams, not the way these would be ordered in an address book:

     SELECT lastname FROM employee ORDER BY lastname

LASTNAME
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑
abrams
allison
Adams
Alomar

  4 record(s) selected.

Selecting against the view provides the data in the desired order:

     SELECT lastname FROM upperemp ORDER BY lastname

COL
‑‑‑‑‑‑‑‑‑‑
ABRAMS
ADAMS
ALLISON
ALOMAR

What about INSERT, UPDATE and DELETE?

To keep things simple, use INSERT, UPDATE and DELETE on the base table, not the view. This is mainly because the SQL Reference specifies the conditions upon which views can be deletable, updatable or insertable:

INSERTs are not allowed.
You can update columns that are not based on the UPPER function.
You can delete through the view.

With these rules, operations like these can be performed against your view:

DELETE FROM upperemp WHERE lastname='GEYER'


UPDATE upperemp SET comm=50 WHERE lastname='KWAN'

The next release of DB2 on the Linux, Windows®, and UNIX® platforms is planned to contain two features that loosen these restrictions:

A view will be insertable if it contains any column that is updatable. Columns based on expressions are omitted in the column list and the base values are set to defaults.
Second, the release is planned to have INSTEAD OF triggers, like this for INSERT, which directs the update to go to the base table, thereby allowing the use of INSERTs on the view:

CREATE TRIGGER upper_emp_ins INSTEAD OF INSERT ON upperemp
REFERENCING NEW AS new DEFAULTS NULL FOR EACH ROW MODE DB2SQL
INSERT INTO employee
VALUES(new.empno, new.firstnme, new.midinit,
       new.lastname, new.workdept, new.phoneno,
       new.hiredate, new.edlevel, new.birthdate,
       new.salary, new.bonus, new.comm);

UPDATE could be handled through this sort of trigger to allow updates of the UPPER() columns through the view:

CREATE TRIGGER upper_emp_ins INSTEAD OF UPDATE ON upperemp
REFERENCING NEW AS new DEFAULTS NULL FOR EACH ROW MODE DB2SQL
UPDATE employee
SET (empno, firstnme, midinit, lastname, workdept, phoneno,
     hiredate, edlevel, birthdate, salary, bonus, comm)
= (new.empno, new.firstnme, new.midinit,
   new.lastname, new.workdept, new.phoneno,
   new.hiredate, new.edlevel, new.birthdate,
   new.salary, new.bonus, new.comm)
      WHERE old.empno = empno;

Selecting and ordering rows

All selects, including subselects within INSERT, UPDATE and DELETE statements, should reference the view to achieve consistent and expected behavior when ordering or searching on character data. For applications with embedded SQL, you can use a function, like UPPER, to fold the search arguments in WHERE clauses:

SELECT lastname FROM upperemp WHERE lastname = UPPER('Geyer')

You can also use the UPPER function to fold host variables to uppercase:

SELECT lastname FROM upperemp WHERE lastname = UPPER(:lstname)

Using indexes with data folded to uppercase

By using case functions and views, you can store, insert, update, and insert data in mixed case, but retrieve, order and search only in uppercase. There's one more thing to consider: when you use a case function, this normally forces a table scan. Generally, table scans are more expensive than using an index. One of DB2 Version 7's new features, generated columns, can be used to store data in both mixed and uppercase, and still allow the benefit of an index when searching on the data in uppercase. You wouldn't usually store all columns in both mixed and uppercase: just the ones that are frequently used in search predicates (most likely the same columns you want to create indexes on).

Here's how you do it:

If the standard access method is a search on name, use a generated column to store the name in uppercase:
```
CREATE TABLE employee (name VARCHAR(10),
      Name_up GENERATED ALWAYS AS (UPPER(name)))
```
Now create an index over the uppercase version of the column:
```
CREATE INDEX name_ind ON employee (name_up)
```
Now, the following query can get the benefit of the index, and avoid a table scan:
```
SELECT name FROM employee WHERE UPPER(name) = 'MACINNIS'
```

One small caveat: If you create the generated column using UPPER, you must code the SELECT with the UPPER function to get the benefit of an index scan. Similarly, if you go with UCASE in the generated column, the SELECT has to use UCASE as well. It makes good sense to choose which version of the functions you want to use (LOWER and UPPER vs. UCASE and LCASE) and be consistent in using them.

Case sensitivity and object names

All database object names (tables, views, columns and so on) are stored in the catalog tables in uppercase unless the identifier is delimited. If you use a delimited name to create the identifier, the exact case of the name is stored in the catalog tables.

An identifier, such as a column name or table name, is treated as case insensitive when used in an SQL statement unless it is delimited. For example, assume that the following statements are issued:

CREATE TABLE MyTable   (id INTEGER)
CREATE TABLE "YourTable" (id INTEGER)

Two tables -- MYTABLE and YourTable -- will exist.

Now, the following two statements are equivalent:

SELECT FROM MyTable
SELECT FROM MYTABLE

However, the second statement below will fail with TABLE NOT FOUND since there is no table named YOURTABLE:

SELECT FROM "YourTable"    // executes without error
SELECT FROM YourTable      // error, table not found

Conclusion

The approaches I describe in this article give you the ability to store data as you always have, including the ability to load mixed case data into a table without cleansing it first. These approaches also give DB2 the benefit of searching on data without having to equate upper and lowercase characters; any relational database that does case insensitive search has an algorithm somewhere folding all character data to one case for comparison and sorting. While this is convenient, it is not a free operation, and you're paying for it in processing time within the database software. The approach described in this article gives you the flexibility to decide which columns should be case-insensitive.

Products

Languages

Technologies

All Events

External Resources