Overview
Purpose of this computation is to identify storage consumed by new deployments on existing PDOA databases.

This basically has two parts:

  1. To compute size of newly created tables as part of new deployments
  2. To compute size of newly added columns in existing table as part of new deployments.

We will refer below table for computation of uncompressed column size in remaining section of this paper.

Standard Used for computation of Uncompressed Column Size:

Space Occupied in Bytes Example
SMALLINT 2
INT 4
BIGINT 8
DECIMAL(N,P) roundoff(N/P)+1 DECIMAL(16,3)=16/3+1=6
DATE 4
TIME 3
TIMESTAMP 7
VARCHAR(N) N+4 VARCHAR(100)= 100+4=104



Database used in example is IBM DB2 UDF.

Solution Overview

A. Elaboration of New Methodology

D.1: To compute size of newly created tables for new deployment:

Computation of table size requires average row size of table along with number rows present in that table. Database system table(syscat.tables) contains avgrowsize column which represents compressed row size of each table.

Formula for table size computation along with example is illustrated below:

Table_Size_GB=(avgrowsize*card)/1024/1024/1024
*avgrowsize : compressed row size of table
*card : number of rows in the table

D.1.1 Example:

For illustration purpose, we are considering newly created ZTE table

IN_FACT.FCT_UPP_PRICE_PLAN

bash-3.2$ db2 “select avgrowsize from syscat.tables where tabschema=’IN_FACT’ and tabname=’FCT_UPP_PRICE_PLAN’ with ur”

AVGROWSIZE
———-
205

1 record(s) selected.

bash-3.2$ db2 “select count(*) as card from in_fact.FCT_UPP_PRICE_PLAN with ur”

CARD
———–
46583531
1 record(s) selected.

As, illustrated above avgrowsize=205 & card=46583531
By substituting these values in our formula,
Table_Size_GB=(205*46583531)/1024/1024/1024=8.89 GB(approx).

D.2 : To compute size of newly added columns in existing table for handling new deployments.

Computation of size of newly added columns because of deployment is bit tricky. Again it requires avgrowsize column from database system table which contains compressed row size of existing table which includes all columns (including newly added once).

Formulas of those computations are listed below:

Compressed Size of Existing Table =(avgrowsize*card)/1024/1024/1024
Uncompressed Size of Existing table=(Uncompressed Row Size of Existing Table* card)/1024/1024/1024
Uncompressed Row Size of OLD TABLE= Uncompressed Row Size of Existing Table - Uncompressed Row Size for newly added columns
Uncompressed Size of older table = (Uncompressed Row Size of Older Table*card)/1024/1024/1024
*avgrowsize : compressed row size of table obtained from database system tables(syacat.tables)
* Uncompressed Row Size of Existing Table: Uncompressed Row Size of existing table computed depending upon datatype & space they should ideally occupy.
* Uncompressed Row Size of Older Table: Uncompressed Row Size of Existing Table - sum of uncompressed size of columns (all uncompressed values are calculated added depending upon data type)

Compressed Size of OLDER TABLE (all columns – new columns) = (Compressed Size of Existing Table (all columns)/Uncompressed Size of Existing Table (all Columns) * Uncompressed Size of OLD TABLE (all columns – new columns).

Increased in Size of table because of new columns = Compressed Size of Existing Table – Compressed Size of OLD TABLE.

Example:
For illustration purpose,we are using existing table AGGR.AGG_ASPIRE_DIST_MART_DAILY.

Command Used for extracting below data:

db2 -x describe table aggr.AGG_ASPIRE_DIST_MART_DAILY | awk '{print $1":"$3":"$4":"$5}'

Table Showing Computation of Uncompressed Row Size of Existing Table:

Column Name Data Type Value Precision Size
CIRCLE_ID SMALLINT 2 0 2
DIST_DSM_ID VARCHAR 30 0 34
DIST_FTA_ID VARCHAR 30 0 34
DIST_VTOPUP_ID VARCHAR 30 0 34
EVNT_DT DATE 4 0 4
PRIM_VTOPUP_CNT INTEGER 4 0 4
PRIM_VTOPUP_AMT DECIMAL 18 6 4
SECD_VTOPUP_CNT INTEGER 4 0 4
SECD_VTOPUP_AMT DECIMAL 18 6 4
ACTV_RETL_CNT INTEGER 4 0 4
NEW_RETL_CNT INTEGER 4 0 4
ACTV_DSE_CNT INTEGER 4 0 4
DIST_STATUS VARCHAR 1 0 5
LOAD_TIMESTAMP TIMESTAM 10 6 7
AVG_TOT_BILLCUT_PER_DSE DECIMAL 16 4 5
AVG_VALID_BILLCUT_PER_DSE DECIMAL 16 4 5
AVG_PLANNED_BILLCUT_PER_DSE DECIMAL 16 4 5
DIST_BAL_AMT DECIMAL 18 6 4
DSM_PRIM_VTOPUP_AMT DECIMAL 22 6 4
DSM_PRIM_PAPER_AMT DECIMAL 22 6 4
DSM_PRIM_ISF_AMT DECIMAL 22 6 4
DSM_PRIM_DONGLE_AMT DECIMAL 22 6 4
DIST_MSISDN VARCHAR 21 0 25
AVG_STCK_HOLD_BASE_MTD DECIMAL 16 4 5
DSM_PRIM_PAPER_QTY INTEGER 4 0 4
DSM_PRIM_ISF_QTY INTEGER 4 0 4
DSM_PRIM_DONGLE_QTY INTEGER 4 0 4
DSM_SIM_BILLING_QTY INTEGER 4 0 4
DSM_SIM_BILLING_AMT DECIMAL 18 6 4
TOL_CNT_MTD INTEGER 4 0 4
DTOL_CNT_MTD INTEGER 4 0 4
AOL_CNT_MTD INTEGER 4 0 4
FAO_CNT_MTD INTEGER 4 0 4
TAO_CNT_MTD INTEGER 4 0 4
MNP_AOL_CNT_MTD INTEGER 4 0 4
DAO INTEGER 4 0 4
DATA_FRC_OL_CNT_MTD INTEGER 4 0 4
OL_5K_WITH_50_PER_DROP_MTD INTEGER 4 0 4
OL_1ACT_WITH_50_PER_DROP_MTD INTEGER 4 0 4
COCA_PAID_AMT DECIMAL 16 4 5
Uncompressed Row Size of Existing Table 278


Table showing computation of compressed row size for newly added columns:

Column Name Data Type Value Precision Size
DIST_BAL_AMOUNT DECIMAL 18 6 4
DSM_PRIM_VTOPUP_AMT DECIMAL 22 6 4
DSM_PRIM_PAPER_AMT DECIMAL 22 6 4
DSM_PRIM_ISF_AMT DECIMAL 22 6 4
DSM_PRIM_DONGLE_AMT DECIMAL 22 6 4
DSM_SIM_BILLING_AMT DECIMAL 18 6 4
Uncompressed Row Size of New Columns 24



Uncompressed Row Size of OLD TABLE= Uncompressed Row Size of Existing Table – Uncompressed Row Size for newly added columns

Uncompressed Row Size of OLD TABLE= 278-24= 254 Bytes

bash-3.2$ db2 “select avgrowsize from syscat.tables where tabschema=’AGGR’ and tabname=’AGG_ASPIRE_DIST_MART_DAILY’ with ur”

AVGROWSIZE
———-
221
1 record(s) selected.

Compressed Row Size of Existing Table=221

bash-3.2$ db2 “select count_big(*) from aggr.AGG_ASPIRE_DIST_MART_DAILY with ur”

1
———————————
9855559.
1 record(s) selected.

Compressed Size of Existing Table=(avgrowsize*card)/1024/1024/1024

Compressed Size of Existing Table=(221* 9855559)/1024/1024/1024=2.03 GB(approx)

Uncompressed Size of Existing table=(Uncompressed Row Size of Existing Table *card)/1024/1024/1024

Uncompressed Size of Existing table=(278*9855559)/1024/1024/1024=2.55 GB(approx)

Uncompressed Row Size of OLD TABLE= Uncompressed Row Size of Existing Table – Uncompressed Row Size for newly added columns

Uncompressed Row Size of OLD TABLE= 278 – 24 = 254 Bytes

Uncompressed Size of older table=(Uncompressed Row Size of Older Table*card)/1024/1024/1024

Uncompressed Size of older table=(254*9855559)/1024/1024/1024 = 2.33 GB (approx)

Compressed Size of OLDER TABLE(all columns – new columns)=(Compressed Size of Existing Table(all columns)/Uncompressed Size of Existing Table(all Columns) * Uncompressed Size of OLD TABLE(all columns – new columns).

Compressed Size of OLDER TABLE=(2.03/2.55)*2.33 GB = 1.86 GB(approx)

Hence increase in size due to addition of column = (2.03 – 1.86) = 0.17 GB

1 comment on"New Methodology For Computing of Impact of New Deployment on Existing & New Tables on DataWareHouse (IBM PDOA)."

  1. Mukesh Kumar February 14, 2018

    Nice work Sir

Join The Discussion

Your email address will not be published. Required fields are marked *