Join us for Code @ Think 2019 | San Francisco | February 12 – 15 Register now Limited availability
By Rav Ahuja | Published May 24, 2006 - Updated December 13, 2010
Management of large databases and tables has its own challenges. As tables grow in size, it sometimes becomes easier to manage their data in chunks or by limited ranges, rather than operate on all of the data in those large tables in their entirety. This methodology of managing data is also especially useful when chunks or ranges of new data are added or old data is removed from tables at frequent intervals (for example, roll-in and roll-out operations typical in data warehousing environments).
DB2 9 includes table (range) partitioning capabilities for easier management, improved performance, and greater scalability of large databases.
Table partitioning (sometimes referred to as range partitioning) is a data organization scheme in which table data is divided across multiple storage objects, called data partitions (not to be confused with database partitions or DPF), according to values in one or more table columns. These storage objects can be in different table spaces, in the same table space, or a combination of both.
DB2 9 supports data partitions or data ranges based on a variety of attributes. A commonly used partitioning scheme is the date, where you may decide to clump together data in data partitions such as by year or month. You could also have numeric attributes for partitioning, for instance, records with IDs from one up to 1 million are be stored in one data partition, IDs from 1 million to 2 million in another data partition, and so on. Or for example, you could have records for customers with names starting with A-C in one data partition, D-M in the second data partition, N-Q in a third data partition, and R-Z in the last data partition.
Although you have the option of referring to data partitions by names or numbers (useful for data partition operations), they can be completely transparent to applications. That is, applications can continue to access data by specifying column and table names, and do not need to worry about which data partition(s) the data resides in.
The table partitioning functionality in DB2 for Linux®, UNIX®, and Windows® is similar to equivalent capabilities in DB2 for z/OS®, Informix® Dynamic Server, and Informix Extended Parallel Server. The DB2 for z/OS, DB2 for iSeries™, and DB2 for Linux, UNIX, and Windows use a common base syntax, but implement a different subset. DB2 for Linux, UNIX, and Windows uses an abbreviated syntax that is actually much more concise than the syntax of either of the other two products.
The benefits of table partitioning in DB2 9 include:
Greater index placement flexibility
Improved manageability: DB2 9 allows the various data partitions to be administered independently. For example, you can choose to back up and restore individual data partitions instead of entire tables. This lets you break down time-consuming maintenance operations into a series of smaller operations.
Increased query performance: The DB2 optimizer is data partition aware. Therefore, during query execution, only the relevant data partitions are scanned. Eliminating the need to scan data partitions that are not impacted by the query can result in improved performance.
Fast roll-in / roll-out: DB2 9 allows data partitions to be easily added or removed from the table without having to take the database offline. This ability can be particularly useful in a data warehouse environment where you often need to load or delete data to run decision-support queries. For example, a typical insurance data warehouse may have three years of claims history. As each month is loaded and rolled-in into the warehouse, the oldest month can be archived and removed (rolled-out) from the active table. This method of rolling out data partitions is also more efficient as it does not need to log delete operations, which would be the case when deleting specific data ranges.
Better optimization of storage costs: Table partitioning in DB2 9 lets you integrate better with hierarchical storage models. By only using your fastest and most expensive storage hardware for only the most active data partitions, DB2 9 allows you to optimize your overall storage costs and improve performance. If most of your queries only run against the last three months of data, you have to option to assign slower and less expensive storage hardware to older data.
Larger table capacity: Without partitioning, there are limits on the maximum amount of data a storage object, and hence a table, can hold. However, by dividing the contents of the table into multiple storage objects or data partitions, each capable of supporting as much data as in a non-partitioned table, you can effectively create databases that are virtually unlimited in size.
Greater index placement flexibility: DB2 9 allows indexes for partitioned tables to be stored in their own storage objects (table spaces), as opposed to being in the same storage object as the non-partitioned table. This index placement flexibility is particularly useful for performing faster index operations (such as drop index, online index create, and index reorganization), managing table growth, and reduced I/O contention, providing more efficient concurrent access to the index data for the table.
“We are very pleased with some of the new features we’ve tested in DB2 9 …The enhancements to ALTER TABLE work well, improve productivity, and save time. The ability to place indexes anywhere for partitioned tables is also particularly useful when we run out of space in the designated table space.” – Ellen Reys-Klebaner, Chief Database Architect, Visa
DB2 provides a lot of flexibility for creating partitioned tables. Say you have one year’s worth of data and you want to partition it by date, so that each quarter resides in a separate data partition. The following create table syntax illustrates how to do this easily. You could also use the graphical DB2 Control Center for creating and managing data partitions.
CREATE TABLE orders(id INT, shipdate DATE, ...)
PARTITION BY RANGE(shipdate)
STARTING '1/1/2006' ENDING '12/31/2006'
EVERY 3 MONTHS
This results in a table being created with four data partitions, each with three months of data.
As the following example illustrates, it is also possible to specify the data partition ranges explicitly, have open ended ranges, and give names to each data partition. Naming data partitions is useful for performing data partition operations such as DETACH (more on this to follow).
CREATE TABLE orders(id INT, shipdate DATE, ...)
PARTITION BY RANGE(shipdate)
PARTITION q4_05 STARTING MINVALUE,
PARTITION q1_06 STARTING '1/1/2006',
PARTITION q2_06 STARTING '4/1/2006',
PARTITION q3_06 STARTING '7/1/2006',
PARTITION q4_06 STARTING '10/1/2006'
ENDING b 12/31/2006'
With table partitioning, you can easily roll-in and roll-out data instantaneously. The ATTACH and DETACH options in the ALTER TABLE command facilitate these operations. Using DETACH, an existing data partition or range of values is split off into a standalone table. The rolled-out (detached) table can be dropped, archived, moved to slower storage, and so on.
Now say you want to keep only one year’s (2006) data in the orders table created in the last example. You can DETACH the partition containing the old (pre-2006) data.
ALTER TABLE orders
DETACH PARTITION qold INTO oldorders
Similarly, attaching (rolling-in) data as a new data partition in a table is just as easy. The data that is to be rolled-in is first loaded into a separate (staging) table, transformed or cleansed if needed, and then attached to an existing table as shown in the following example.
// CREATE TABLE neworders
// load / insert desired data into neworders
// transform or cleanse new data if neeeded
ALTER TABLE orders
ATTACH PARTITION q1_07
FROM TABLE neworders
// SET INTEGRITY ...
Before the rolled-in data is visible to applications, the SET INTEGRITY statement needs issued (to validate the new data and perform maintenance for global indexes) and the work commited.
Figure 3 shows the rolled-in and rolled-out data partitions in the orders table used in the previous examples.
Table partitioning in DB2 9 can be used in isolation or in combination with other data organization schemes. Each clause of the CREATE TABLE statement includes an algorithm to indicate how the data should be organized. The following three clauses demonstrate the levels of data organization that can be used together in any combination:
This syntax allows for consistency between the clauses as well as allowing for future algorithms of data organization. Combining the DISTRIBUTE BY and PARTITION BY clauses of the CREATE TABLE statement allows data to be spread across database partitions spanning multiple table spaces.
DB2 9 is the first data server to support all three methods of grouping data at the same time. This is a major innovation in improving data management and information availability.
Figure 4 illustrates all three DB2 data organization schemes being used in conjunction with each other:
Table partitioning in DB2 9 provides powerful capabilities for managing large volumes easily and quickly. You are encouraged to download DB2 9 and try out this feature for yourself to see how you can benefit from this and other enhanced features in this new version of DB2.
Back to top