Management of large databases and tables has its own challenges. As tables grow in size, it sometimes becomes easier to manage their data in chunks or by limited ranges, rather than operate on all of the data in those large tables in their entirety. This methodology of managing data is also especially useful when chunks or ranges of new data are added or old data is removed from tables at frequent intervals (for example, roll-in and roll-out operations typical in data warehousing environments).
DB2 9 includes table (range) partitioning capabilities for easier management, improved performance, and greater scalability of large databases.
What is table partitioning
Table partitioning (sometimes referred to as range partitioning) is a data organization scheme in which table data is divided across multiple storage objects, called data partitions (not to be confused with database partitions or DPF), according to values in one or more table columns. These storage objects can be in different table spaces, in the same table space, or a combination of both.
DB2 9 supports data partitions or data ranges based on a variety of attributes. A commonly used partitioning scheme is the date, where you may decide to clump together data in data partitions such as by year or month. You could also have numeric attributes for partitioning, for instance, records with IDs from one up to 1 million are be stored in one data partition, IDs from 1 million to 2 million in another data partition, and so on. Or for example, you could have records for customers with names starting with A-C in one data partition, D-M in the second data partition, N-Q in a third data partition, and R-Z in the last data partition.
Although you have the option of referring to data partitions by names or numbers (useful for data partition operations), they can be completely transparent to applications. That is, applications can continue to access data by specifying column and table names, and do not need to worry about which data partition(s) the data resides in.
The table partitioning functionality in DB2 for Linux®, UNIX®, and Windows® is similar to equivalent capabilities in DB2 for z/OS®, Informix® Dynamic Server, and Informix Extended Parallel Server. The DB2 for z/OS, DB2 for iSeries™, and DB2 for Linux, UNIX, and Windows use a common base syntax, but implement a different subset. DB2 for Linux, UNIX, and Windows uses an abbreviated syntax that is actually much more concise than the syntax of either of the other two products.
Table partitioning benefits
The benefits of table partitioning in DB2 9 include:
- Improved manageability for large tables
- Increased query performance through data partition elimination
- Fast online data roll-in / roll-out
- Better optimization of storage costs
- Larger table capacity
Improved manageability: DB2 9 allows the various data partitions to be administered independently. For example, you can choose to back up and restore individual data partitions instead of entire tables. This lets you break down time-consuming maintenance operations into a series of smaller operations.
Increased query performance: The DB2 optimizer is data partition aware. Therefore, during query execution, only the relevant data partitions are scanned. Eliminating the need to scan data partitions that are not impacted by the query can result in improved performance.
Figure 1. Scanning only relevant partitions
Fast roll-in / roll-out: DB2 9 allows data partitions to be easily added or removed from the table without having to take the database offline. This ability can be particularly useful in a data warehouse environment where you often need to load or delete data to run decision-support queries. For example, a typical insurance data warehouse may have three years of claims history. As each month is loaded and rolled-in into the warehouse, the oldest month can be archived and removed (rolled-out) from the active table. This method of rolling out data partitions is also more efficient as it does not need to log delete operations, which would be the case when deleting specific data ranges.
Better optimization of storage costs: Table partitioning in DB2 9 lets you integrate better with hierarchical storage models. By only using your fastest and most expensive storage hardware for only the most active data partitions, DB2 9 allows you to optimize your overall storage costs and improve performance. If most of your queries only run against the last three months of data, you have to option to assign slower and less expensive storage hardware to older data.
Larger table capacity: Without partitioning, there are limits on the maximum amount of data a storage object, and hence a table, can hold. However, by dividing the contents of the table into multiple storage objects or data partitions, each capable of supporting as much data as in a non-partitioned table, you can effectively create databases that are virtually unlimited in size.
Greater index placement flexibility: DB2 9 allows indexes for partitioned tables to be stored in their own storage objects (table spaces), as opposed to being in the same storage object as the non-partitioned table. This index placement flexibility is particularly useful for performing faster index operations (such as drop index, online index create, and index reorganization), managing table growth, and reduced I/O contention, providing more efficient concurrent access to the index data for the table.
Create and use table partitions
DB2 provides a lot of flexibility for creating partitioned tables. Say you have one year’s worth of data and you want to partition it by date, so that each quarter resides in a separate data partition. The following
create table syntax illustrates how to do this easily. You could also use the graphical DB2 Control Center for creating and managing data partitions.
Listing 1. Create table code to partition by date
CREATE TABLE orders(id INT, shipdate DATE, …) PARTITION BY RANGE(shipdate) ( STARTING '1/1/2006' ENDING '12/31/2006' EVERY 3 MONTHS )
This results in a table being created with four data partitions, each with three months of data.
Figure 2. Table with 4 data partitions
As the following example illustrates, it is also possible to specify the data partition ranges explicitly, have open ended ranges, and give names to each data partition. Naming data partitions is useful for performing data partition operations such as
DETACH (more on this to follow).
Listing 2. Create table code to specify data partition ranges explicitly
CREATE TABLE orders(id INT, shipdate DATE, …) PARTITION BY RANGE(shipdate) ( PARTITION q4_05 STARTING MINVALUE, PARTITION q1_06 STARTING '1/1/2006', PARTITION q2_06 STARTING '4/1/2006', PARTITION q3_06 STARTING '7/1/2006', PARTITION q4_06 STARTING '10/1/2006' ENDING ‘12/31/2006' )
With table partitioning, you can easily roll-in and roll-out data instantaneously. The
DETACH options in the
ALTER TABLE command facilitate these operations. Using
DETACH, an existing data partition or range of values is split off into a standalone table. The rolled-out (detached) table can be dropped, archived, moved to slower storage, and so on.
Now say you want to keep only one year’s (2006) data in the orders table created in the last example. You can
DETACH the partition containing the old (pre-2006) data.
Listing 3. Coding detach in the partition
ALTER TABLE orders DETACH PARTITION qold INTO oldorders
Similarly, attaching (rolling-in) data as a new data partition in a table is just as easy. The data that is to be rolled-in is first loaded into a separate (staging) table, transformed or cleansed if needed, and then attached to an existing table as shown in the following example.
Listing 4. Attaching data as a new data partition
// CREATE TABLE neworders // load / insert desired data into neworders // transform or cleanse new data if neeeded ALTER TABLE orders ATTACH PARTITION q1_07 STARTING '01/01/2007' ENDING '03/31/2007' FROM TABLE neworders // COMMIT // SET INTEGRITY … // COMMIT
Before the rolled-in data is visible to applications, the
SET INTEGRITY statement needs issued (to validate the new data and perform maintenance for global indexes) and the work commited.
Figure 3 shows the rolled-in and rolled-out data partitions in the orders table used in the previous examples.
Figure 3. Rolled-in and rolled-out data partitions
Table partitioning and other organization schemes
Table partitioning in DB2 9 can be used in isolation or in combination with other data organization schemes. Each clause of the
CREATE TABLE statement includes an algorithm to indicate how the data should be organized. The following three clauses demonstrate the levels of data organization that can be used together in any combination:
DISTRIBUTE BY— Spreads data evenly across database partitions. Use this clause to enable intraquery parallelism and distribute the load across each database partition. This concept is known as database partitioning and is enabled using the Database Partitioning Feature (DPF) in DB2.
PARTITION BY— Groups rows with similar values of a single dimension in the same data partition. This concept is known as table partitioning.
ORGANIZE BY— Groups rows with similar values on multiple dimensions in the same table extent. This concept is known as multidimensional clustering (MDC).
This syntax allows for consistency between the clauses as well as allowing for future algorithms of data organization. Combining the
DISTRIBUTE BY and
PARTITION BY clauses of the
CREATE TABLE statement allows data to be spread across database partitions spanning multiple table spaces.
DB2 9 is the first data server to support all three methods of grouping data at the same time. This is a major innovation in improving data management and information availability.
Figure 4 illustrates all three DB2 data organization schemes being used in conjunction with each other:
Figure 4. Three DB2 data organization schemes
Table partitioning in DB2 9 provides powerful capabilities for managing large volumes easily and quickly. You are encouraged to download DB2 9 and try out this feature for yourself to see how you can benefit from this and other enhanced features in this new version of DB2.