Diagnosing corruption when using IBM DB2

Introduction

Perhaps regarded as one of the most troublesome business issues, database corruption often takes place subtly and affects businesses adversely. In simple words, corruption can be defined as any unintentional entry in database. Corruption issues may result in the severe performance hit of the system. In some cases, it can result in frequent system crashes and can become a business-critical downtime situation. Database corruption can occur at any layer, from DB2 to the operating system and right down to the hardware layer. Hence, it is important to understand and troubleshoot it by involving all possible affected layers and collecting any diagnostic data that might be required as soon as possible, while the data is available.

In this article, you learn why a database can go offline when it encounters corruption. You also learn to analyze symptoms for corruption and differentiate between easy-to-fix and catastrophic failures. This article sheds light on corruption issues when using IBM DB2 and helps DB2 users understand and choose the best approach in dealing with such critical and high-impact issues.

This article begins with a discussion about possible sources of corruption. It then explains the following tasks:

Identifying and troubleshooting corruption: Identify and categorize corruption in the database when using DB2, with the help of sample symptom messages that appear in db2diag.log. Corruption issues can be broadly classified in five categories: data page corruption (or table corruption), index corruption, CBIT corruption, log corruption, and packed descriptor corruption.
Using db2dart and INSPECT to identify corruption: Get insight into useful DB2 commands, db2dart and INSPECT, to check database corruption.
Approaches to recover from corruption: Once a corruption issue is identified, how to approach these cases, what data to collect, and how to recover from the situation is crucial. Learn possible recovery approaches and how to choose between available options.
Preventive strategies to avoid possible corruption: Best practices are discussed.

Sources

Database corruption can occur during writing, reading, storage, transmission, or processing, which introduces unintended changes to the original data. Some of the common reasons of corruption include:

Corrupt file system is one the most common reasons for corruption in a database. Abrupt system shutdowns, power surges, file system double-mount, migrating disks, file system-level activities like checking and repairing file system (using utilities like fsck on Linux®) when the database is up and running, and using Ctrl+Alt+Delete while a file is open, viruses can all introduce unintentional changes in database.
Hardware failure.
Memory corruption.
DB2 defect.
I/O and network issues (issues in fiber adapter, switches, and more).
Incorrect application coding.
Inconsistency in the value of the page in the buffer pool (sqldPage) and the one stored in the file system.
Overwriting of disk data causes corruption.
User interference with critical configuration files of database, log files, log control files, and more can put database in inconsistent state

Having said that, corruption can be due to various reasons, and finding out exactly what caused data corruption is challenging. In most cases, it is caused by file system issues and hardware problems.

Identify and troubleshoot

For a DBMS, page is the smallest unit of data for memory allocation performed by the operating system for a program and transfer between main memory and any other auxiliary store, such as a hard drive. So when you say something in a database is corrupt, you actually mean some pages in a database are corrupt.

Panic is a method DB2 uses to cause itself to crash if there is an error condition it cannot handle gracefully. When a page corruption is detected by DB2, it halts all processing by means of a controlled crash (panic), because it can't determine database integrity. This is also to prevent further damage or loss of data.

A number of error messages are dumped in db2diag.log when DB2 encounters corruption in database. When an outage occurs and automatic first occurrence data capture (FODC) is enabled, data is collected based on symptoms. FODC data will be collected automatically on DB2 9.5 when one of the following conditions is met:

FOCD_Trap when an instance wide trap has occurred.
FODC_Panic when a DB2 engine detects an incoherence and decides not to continue.
FODC_BadPage when a bad page has been detected.
FODC_DBMarkedBad when a database has been marked bad due to an error.

It is crucial to involve OS and hardware support in order to gather required information like OS diagnostics (for example, errpt –a, snap, and fileplace outputs on AIX®), as well as any and all hardware diagnostics (state saves, error logs, etc). It is important to ensure that there is adequate disk space for critical file systems like dump space and log directories to ensure that critical events are captured in entirety.

Looking at the db2diag.log, you can confirm if panic is due to corruption or another reason. Below, you will see how to identify and categorize corruption in DB2. Following are some of the most common db2diag.log error messages that identify corruption.

Data page corruption

Data page corruption indicates corruption in actual data in a table. A number of messages are logged to the db2diag.log when data corruption is detected. These messages are necessary to identify the affected object. Listing 1 shows a sample error message in db2diag.log.

2012-02-04-03.13.05.261082-360 I3442A358          LEVEL: Error
PID     : 393470               TID  : 1           PROC : db2pfchr 0
INSTANCE: inst1                NODE : 000
FUNCTION: DB2 UDB, buffer pool services, sqlbReadAndReleaseBuffers,
probe:13
RETCODE : ZRC=0x86020001=-2046689279=SQLB_BADP "page is bad"
DIA8400C A bad page was encountered.
.
.
.
2012-02-04-03.13.05.264301-360 I3801A437          LEVEL: Error
PID     : 393470               TID  : 1           PROC : db2pfchr 0
INSTANCE: inst1                NODE : 000
FUNCTION: DB2 UDB, buffer pool services, sqlbReadAndReleaseBuffers,
probe:13
DATA #1 : String, 158 bytes
Obj={pool:9;obj:20;type:0} State=x27 Parent={9;20}, EM=1120, PP0=1152
Page=51235 Cont=17 Offset=2848 BlkSize=15
sqlbReadAndReleaseBuffers error: num-pages=16

As the above error messages indicate, DB2 encountered a bad page for tablespace ID 9 and table ID 20. The object type filed is marked 0, which indicates data page corruption.

You can query the catalog table to determine which table has corrupt pages:

db2 "select char(tabname,20), char(tabschema,20) from
  syscat.tables where tableid=20 and tbspaceid=9"

Note that DB2 dumps bad page errors only for those pages it has tried to access. It does not necessarily mean that only those pages are corrupt. You need to explicitly check all pages in your database to find the extent of corruption, with the help of the db2dart or INSPECT commands.

Similar error messages are dumped in db2diag.log for corruption in temporary tables. If the data type field in db2diag.log error message holds a value greater than 128, it indicates corruption in the temporary table. If the object type is 3, it indicates LOB data corruption in a table.

Index corruption

An index is a database object that contains an ordered set of pointers that refer to rows in a base table. There are many flavors of index-related corruption problems, including:

Unique index duplicates with different RID(s) or the same RID(s)
Multiple index entries pointing to the same RID
Out-of-place index key (wrong index key order)
Row exists, but index keys do not exist in any or some of the indices
Index entry pointing to an empty data slot or unused data slot or RID is invalid
Incorrect previous or next index page pointers, incorrect high key or other corruptions on index pages

Following are sample error message for index corruption.

2012-01-30-01.35.50.952434+000 I29308542A2532     LEVEL: Severe
PID     : 1175792              TID  : 33926       PROC : db2sysc 0
INSTANCE: inst1                NODE : 000         DB   : SAMPLE
APPHDL  : 0-7                  APPID: *LOCAL.inst1.120130013528
AUTHID  : TP0ADM
EDUID   : 33926                EDUNAME: db2redow (TP0) 0
FUNCTION: DB2 UDB, buffer pool services, sqlb_verify_page, probe:3
MESSAGE : ZRC=0x86020001=-2046689279=SQLB_BADP "page is bad"
DIA8400C A bad page was encountered.
DATA #1 : String, 64 bytes
Error encountered trying to read a page - information follows :
DATA #2 : String, 23 bytes
Page verification error
DATA #3 : Page ID, PD_TYPE_SQLB_PAGE_ID, 4 bytes
23046981
DATA #4 : Object descriptor, PD_TYPE_SQLB_OBJECT_DESC, 72 bytes
Obj: {pool:9;obj:11076;type:1} Parent={8;11076}

As you can see, error messages have object type:1, which indicates index page corruption. The index resides in tablespace ID 9, and the index ID is 11076. This index is on a table with tablespace ID 8, table ID 11076. You can retrieve base table name and index name by querying catalog tables.

The above snippet from db2diag.log indicates a bad page for index. Other common errors in db2diag.log that point to index corruption are SQLI_NOKEY and row not found from index function.

CBIT corruption

CBITs are a method used by DB2 to verify that a page being read into the buffer pool from disk is not a partial page or has not been changed from some form of corruption.

The basic idea behind CBITs is that a bit from each sector (512 bytes) on a page is set to the same value before writing the page. Before DB2 flushes a page to disk, the checksum is calculated and recorded on the page. When a page is read back in the buffer pool, this checksum is recalculated and checked against the stored value. If some of the bits are different, it indicates a partial page write or disk corruption.

2012-03-12-04.45.17.559235-240 I1104A2616 LEVEL: Severe
PID : 2551866          TID : 1    PROC : db2pfchr
INSTANCE: inst1           NODE : 000
FUNCTION: DB2 UDB, buffer pool services, sqlbVerifyCBITS, probe:1110
MESSAGE : ZRC=0x86020019=-2046689255=SQLB_CSUM "Bad Page, Checksum Error"
DIA8426C A invalid page checksum was found for page "".
DATA #1 : String, 64 bytes
Error encountered trying to read a page - information follows :
DATA #2 : String, 95 bytes
CBIT verification error
bitExpected is 0, userByte is 33, sector 7 (from head of page, 0 based)
DATA #3 : Page ID, PD_TYPE_SQLB_PAGE_ID, 4 bytes

Note that changes made to the page outside of DB2 (file system failures, disk failures, etc.) will not be noticed if the corruption does not touch the bytes holding the CBITs. CBIT errors ( checksum errors ) are most of the time due to hardware or OS error, either during the initial write or during the read.

Log corruption

The transaction log in DB2 is simply a record of all changes that have taken place in the database. To keep track of changes made by transactions, a method is needed to timestamp changes to data as well as to timestamp log records. In DB2, this timestamp mechanism is performed using a Log Sequence Number (LSN). In case you come across a corrupt log, you may have an error message in db2diag.log similar to:

2010-06-07-12.37.21.143998+120 I8673583A553       LEVEL: Severe
PID     : 2498668              TID  : 27358       PROC : db2sysc 56
INSTANCE: inst1              NODE : 056         DB   : SAMPLE
APPHDL  : 998-22947            APPID: *N998.inst1.100607192315
AUTHID  : LOADDSF
EDUID   : 27358                EDUNAME: db2agntp (SAMPLE) 56
FUNCTION: DB2 UDB, data protection services, sqlpgrlg, probe:291
DATA #1 : < reformatted >
Error -2028994519 when reading LSN 00000B1C261B4FD3 from log file
S0119292.LOG tellMe 1 dpsAcbFlags 1000 setSkipOutputBuf 0

2010-06-07-12.37.21.144202+120 I8674137A487       LEVEL: Severe
PID     : 2498668              TID  : 27358       PROC : db2sysc 56
INSTANCE: inst1              NODE : 056         DB   : SAMPLE
APPHDL  : 998-22947            APPID: *N998.inst1.100607192315
AUTHID  : LOADDSF
EDUID   : 27358                EDUNAME: db2agntp (SAMPLE) 56
FUNCTION: DB2 UDB, data protection services, sqlpgrlg, probe:291
DATA #1 : < preformatted >
HeadLsn 00000B1B8B996EB3, copyLookForLsn 00000B1C261B4FD3

2010-06-07-12.37.21.158065+120 I8675153A549       LEVEL: Error
PID     : 2498668              TID  : 27358       PROC : db2sysc 56
INSTANCE: inst1              NODE : 056         DB   : SAMPLE
APPHDL  : 998-22947            APPID: *N998.inst1.100607192315
AUTHID  : LOADDSF
EDUID   : 27358                EDUNAME: db2agntp (SAMPLE) 56
FUNCTION: DB2 UDB, data protection services, sqlptudo, probe:1010
RETCODE : ZRC=0x87100029=-2028994519=SQLP_BADLSN "Invalid LSN value."
DIA8538C An invalid log sequence number (LSN), the value was "".

If logs are corrupt, it can result into a severe problem during roll-forward of the database, crash recovery, HADR logs reply, etc., where you actually reply to logs. Database roll-forward enables you to maintain consistency in the database. It recovers a database by applying transactions recorded in the database log files. Roll-forward is a process invoked after a database or a tablespace backup image has been restored.

Packed descriptor corruption

A packed descriptor is a column within the system catalog tables that DB2 uses to identify the details of a database object. If it is corrupt for some reason, you see errors in db2diag.log as shown below.

2011-08-22-20.50.00.922275-300 I154161182E497      LEVEL: Severe
PID     : 14152                TID  : 184633256288 PROC : db2sysc 0
INSTANCE: inst1             NODE : 000          DB   : SAMPLE
APPHDL  : 0-64465              APPID: 161.166.44.48.27625.11082301341
AUTHID  : DTADM1
EDUID   : 606                  EDUNAME: db2agent (SAMPLE) 0
FUNCTION: DB2 UDB, catcache support, sqlrlc_systables_fetch_from_disk,
probe:200
MESSAGE : Corrupt PD->length in table:DTWF    .JBPM_TASKINSTANCE

2011-08-22-20.50.00.922476-300 I154161680E1588     LEVEL: Severe
PID     : 14152                TID  : 184633256288 PROC : db2sysc 0
INSTANCE: inst1             NODE : 000          DB   : SAMPLE
APPHDL  : 0-64465              APPID: 161.166.44.48.27625.11082301341
AUTHID  : DTADM1
EDUID   : 606                  EDUNAME: db2agent (SAMPLE) 0
FUNCTION: DB2 UDB, catcache support, sqlrlc_systables_fetch_from_disk,
probe:201
MESSAGE : Length in PD=3792, LFD length=10104, DMS length=10104
DATA #1 : String, 11 bytes
Corrupt PD:
DATA #2 : Dumped object of size 3792 bytes at offset 1386248, 53 bytes
/volumes/work/db2dump/xnawmtp5/14152.606.000.dump.bin
DATA #3 : LOB Descriptor, PD_TYPE_LOB_DESCRIPTOR, 60 bytes
SQLDX_LD: Size:60
    x0000        lfd_check                        0x49
    x0001        lfd_version                      6
    x0002        lfd_numsegs                      1
    x0003        lfd_flags                        0x00
    x0004        lfd_size                         10104
    x000C        lfd_life_lsn                     0000009860A6FF2D
    x0014        lfd_mini_numsegs                 0
    x0015        lfd_first                        4
    x0016        lfd_descsize                     60
    x0018        lfd_last_pages                   16
    x001C        lfd_last_bytes                   10104
    x0038        lfd_dir                          Regular Directory
Offsets
        lfd_dir[0]: 33888 (16K)
    Hexdump of LOB descriptor follows:
        4906 0100 0000 0000 7827 0000 0000 0098
        60A6 FF2D 0004 3C00 1000 0000 7827 0000
        0000 0000 0000 0000 0000 0000 0000 0000
        0000 0000 0000 0000 6084 0000

If you have corrupt PD, the table becomes inaccessible.

Using db2dart and INSPECT to identify corruption

db2dart is a command that can be used to verify the architectural correctness of databases and the objects within them. It can also be used to extract data from tables that might otherwise be inaccessible due to corruption.

db2dart should never be run against a database that still has active connections. db2dart accesses the data and metadata in a database by reading them directly from disk. So, if there are connections, db2dart will be unaware about pages in the buffer pool, control structures in memory, etc. and may report false errors as a result. Similarly, if you run db2dart against a database that requires crash recovery or that has not completed roll-forward recovery, you may get inconsistent results.

To display all possible options, execute the db2dart utility without any parameters. Some options of db2dart require user input, such as tablespace ID and table ID, which are prompted for if not explicitly specified in the command line.

Inspecting databases, table spaces, and tables using db2dart

The default behavior for db2dart is to inspect the entire database. Only the database name must be provided in this case. By default, db2dart will create a report file with the name databaseName.RPT. For single-partition database environments, the file is created in the current directory. For multiple-partition database environments, the file is created under a subdirectory in the diagnostic directory. The subdirectory is called DARTnnnn, where nnnn is the partition number.

If a database is large and you are only interested in one tablespace, you can use the /TS option. When using this, you must provide the tablespace ID on the command line (by specifying the /TSI parameter) or you can let db2dart prompt you for it. If you do not know the tablespace ID, you can obtain it via DB2 LIST TABLESPACES.

Similarly, a single table and its associated objects (LOBs, indices, etc.) can be inspected using the /T option. When using this, you must provide the table name or object ID and the ID of the tablespace in which the table resides. To determine the object ID and tablespace ID for a table, you can query the SYSIBM.SYSTABLES catalog table.

Dumping formatted table data using db2dart

If a tablespace or table becomes corrupt for any reason (for example, due to a bad disk or disk controller), attempts to access the table through SQL may not work. The SQL statement may fail with an error or the database may be marked bad and all connections will be dropped.

If this happens, it may be necessary to extract all the data possible so the tablespace and table can be rebuilt. In such a situation, the /DDEL option of db2dart can be used to extract the table data and place it into a delimited ASCII. Following sessions discuss these options in details.

INSPECT command

The INSPECT command is similar to the db2dart command. It allows you to check databases, tablespaces, and tables for architectural integrity by checking the pages of the database for page consistency. A significant difference between the two commands is that the database needs to be deactivated before you run db2dart, whereas INSPECT requires a database connection and can be run while there are simultaneous active connections to the database.

Unlike db2dart, the INSPECT command cannot be used to format and dump data pages, format and dump index pages, format data rows to delimited ASCII, and mark an index invalid. So, it can only inspect database or its objects online for any corruption.

Approaches to recover from corruption

Database corruption issues are sometimes not very straightforward and need IBM's support expertise to choose the best possible way out of the situation. Here, you will see how to tackle the most common types of corruption issues and look for the best possible recovery options. When corruption is detected, determining the problem source is usually of secondary importance. You would want to correct current situation first.

Recovering from data page corruption

As discussed, you need to identify corrupt pages using tools like db2dart. Looking at the db2dart report file generated, you can gauge the extent of corruption. Depending on the amount of data corrupt and complexity involved, you need to decide on the best possible recovery plan. Here are some options to recover:

If there is a backup of the database, restore the database and roll forward to the end of the logs. This is the cleanest approach if feasible, preferably if the database size is small.
You could also restore the tablespace and roll forward to the end of the logs. This can be the best option if the corruption is localized.
If you have other ways of recreating the data for the corrupt table or have a copy of the table data, drop and recreate the table. You need the DDL of the table, and if you have the data for the table, you should be able to drop the table, recreate the table using the DDL, and recreate the data through whatever means you might have.
If you do not have a valid backup image and any way to recreate the table, you can use db2dart with /ddel to salvage data from a corrupt table. Before that, you need to have the DDL of the table using db2look. You can fetch the DDL of a corrupt table using:
```
  db2look ‑d <dbname> ‑e
    ‑z <schema_name> ‑t <table_name> ‑o <output_file_name>
```
Here is an example of using db2dart with /ddel to salvage data from a corrupt table: db2dart <dbname> /ddel. This command requires four input values: a table object ID or table name, tablespace ID, page number to start with, and number of pages. The number of pages can be a specific value or a large enough value (for example, 999999999) to extract all pages in the table. Also, if a particular page in the table contains too much damage, the db2dart /DDEL may need to be executed more than once for the range of pages and omit the damaged page. The dumped delimited ASCII file is encoded in the database code page. The db2dart command does not perform code page conversions. Once you have salvages all the data from corrupt table, you can check the output file *.DEL to make sure all the data exists. Once that is done, you can drop the corrupt table and recreate it later with data extracted using db2dart. Note that db2dart /DDEL does not work against lob data.
If you have a way to recreate the corrupt tablespace, you can mark the corrupt tablespace in drop pending state using restart database. You can recreate corrupt tablespace later.
If none of the above options are feasible or give some error during extracting data, dropping the table while table recreate etc., approach IBM support for help. IBM support may help you to drop the corrupt table, initialize corrupt pages to NULL, etc., depending on the situation.

Recovering from index corruption

If there are indications of index corruption in db2diag.log and/or db2dart reports, you can mark an index invalid using db2dart and get rid of bad indices. You can recreate indices later.

db2dart has an option to mark an index invalid and make it drop pending. You can mark a corrupt index invalid with db2dart /MI. For example: `db2dart

/MI /TSI 9 /OI 11076`.

You can decide when to recreate an index by setting the INDEXREC parameter to restart, access, etc. To recreate indices when an application tries to access the index, you can update INDEXREC to access: db2 update db cfg for <dbname> INDEXREC ACCESS.

Once you have invalidated the bad index and updated INDEXREC, you can connect to the database.

If the problem is an index NOKEY error, then a number of db2dart commands are printed in the db2diag.log file. You can run these db2dart <dbname> /di commands to dump formatted index data for root-cause analysis of index corruption if required. You need to save these commands using grep on UNIX® or Find on Windows® and save them in a file. Edit the file and replace DBNAME with the database name. If the problem has been hit multiple times, there could be duplicate entries. You only need to keep the latest set of db2dart commands. To confirm the bad index using db2dart, you can issue db2dart <dbname> /t /tsi <tablespace_id> /oi <table_id> where tablespace_id and table_id are the tablespace ID and object ID of the base table on which the index is defined.

Recovering from CBIT errors

To fix CBIT corruption errors, examine the extent of the problem by running a db2dart on at least the corrupt table (better to run it against the whole database). You can decide on approaches discussed above depending on whether CBIT errors are on the data page or index. The most feasible option to recover from CBIT errors is to restore the database or tablespace (if errors are localized).

Recovering from log corruption

Log corruption is a matter of concern during log reply. You need to replay logs during database or tablespace roll-forward, log replay on HADR standby, and crash recovery. In the event of log corruption, the database may be fine and only the log may be damaged.

If you have errors due to bad logs during roll-forward, the very first thing that can be done is to check which log file DB2 is reporting error for. db2flsn can be used to return the name of the file that contains the log record identified by a specified log sequence number (LSN). So if you have 'bad_lsn' messages in db2diag.log, you can use db2flsn to find the corresponding log file.

If it is for a missing log file or a log file from an incorrect log chain, you can look for the correct log file. If roll-forward fails due to a corrupt log, you can go for point-in-time roll-forward. The specified point in time for the roll-forward operation must be equal to or later than minimum recovery time. The minimum recovery time (MRT) is the earliest point in time during a roll-forward when a database is consistent. If you cannot roll forward logs to at least MRT, you need to contact IBM support for assistance. Another option would be to restore from an offline database backup and not roll forward the logs. Transactions in logs would not be applied to the database in this case.

If you have issues in crash recovery due to a bad log file, you need to restore recent backup or contact IBM support for assistance.

In HADR terminology, the server that processes transactions is known as primary and the partner database that receives logs and replays them is referred to as standby. On HADR standby, during log reply, standby may crash due to a bad log. You can check db2diag.log on standby to figure out the bad log file and try shipping a good copy of that log file from primary. Once a good log file is in place, you can start HADR:

Run start on HADR of standby node: db2 start hadr on db <dbname> as standby.
Run start of HADRon primary node: db2 start hadr on db <dbname> as primary.

If the above attempt fails, you might need to reconfigure HADR with fresh backup from primary and restore it on standby. If you pass a copy of the bad log file to IBM DB2 support, they could examine its contents to see what was in it that might give some indication of what went wrong.

You can use db2cklog to check the validity of archive log files to determine whether the log files can be used during roll-forward recovery of a database or tablespace. Either a single archive log file or a range of archive log files can be checked. A log file that returns an error during validation by db2cklog causes the recovery operation to fail.

Recovering from packed descriptor corruption

You can use the db2cat tool to fix the corrupt the packed descriptor. You need to consult IBM support before modifying a packed descriptor. Perform the following steps to modify the packed descriptor:

export DB2SVCPW=<service_password from IBM support>
db2cat -d <dbname> -s <schema> -n <tablename> -f <raw pd output file> -o <message file>
db2cat -d <dbname> -s <schema> -n <tablename> -g <generated pd output file> -o <message file>
db2cat -d <dbname> -s <schema> -n <tablename> -r <generated pd output file> -o <message file> (use the file generated with -g as the input for the replace option "-r").
Export the data from the table (if required)
Drop the table
Recreate the table (if required)
Import the user data (if required)

You can now run the verify option of db2cat on the database again: db2cat -d <dbname> -s % -n % -v.

If you do not need the tables, you can drop them. Otherwise, you should replace the packed descriptor, extract the data, and then drop and recreate the table.

Preventive strategies to avoid possible corruption

Database corruption can be subtle and difficult to detect. So there could never be a tool developed that would be able to detect every single corruption that could possibly happen. Corruptions and procedures that risk the possibility of corruption should always be avoided.

It is essential to understand various preventive measures to avoid possible downtime and database crash. Here are some of the best practices to help to identify corruption in a database well before a system crash:

Keep track of all changes.
Be on the latest fixpack and, if possible, the latest version of DB2 and operating system (if applicable).
Regular check on file system health, network issues.
As much as possible, shut down DB2 gracefully.
Run db2dart against your database when offline to check for corruption. If you don't have the luxury of downtime required to run db2dart on the production database, restore recent production backups on to test machines and run db2dart. Alternately, you can also use INSPECT when database is online. It can work as early detection or proactive health-checking for corruption.
Have a good backup policy. Backup does not detect corruption in a page, so it is recommended to have a strong backup policy and enough backup generations.
Disk configurations like RAID help minimize data corruption by using redundant disks to back up data.
Good power backup to combat corruption due to power surges.
Track latest defects in IBM DB2 and operating system.

Conclusion

The article discussed the most common corruption issues when using IBM DB2. It has demonstrated various symptom messages of corruption issues in db2diag.log, how to identify type of corruption based on these messages, and how to troubleshoot these issues. It also described the db2dart and INSPECT commands, which are helpful in dealing with corruption issues.

Products

Languages

Technologies

All Events

External Resources