With Big SQL v4.2, you can now create your own user-defined aggregate (UDA) functions in cases where a built-in aggregate function does not exist. Defining your own UDA consists of defining 4 component routines (INITIALIZE, ACCUMULATE, MERGE, and FINALIZE) along with an aggregate interface function.

The aggregate interface function is the entry point to your UDA and you define it by specifying the names of the component routines, the return data type of the UDA, and state variables. The state variables are used to store intermediate values during the processing of the aggregation.

The INITIALIZE component routine has the state variables as output parameters. It is responsible for initializing the state variables.

The ACCUMULATE component routine has a single row of the result set we are aggregating over as input parameters and the state variables as input/output parameters. It is responsible for accumulating the single row’s values into the state variables.

The MERGE component routine has one set of state variable values as input parameters and another set of state variable values as input/output parameters. It is responsible for merging the 2 sets of state variable values into a single set of state variable values.

The FINALIZE component routine has the final set state variable values as input parameters and the final result as the output parameter. It is responsible for computing the final result value based on the final set of state variable values.

An example of how to create a UDA to compute the sum of squares follows. The sum of squares is used is various contexts in statistics, such as variance, standard deviation, mean squared error, and residual sum of squares. The sum of squares is computed using the following formula:
formula
where n is the number of values.

The DDL for the 4 component routines and the aggregate interface function are shown below:

CREATE OR REPLACE PROCEDURE sumsq_initialize(OUT sum DOUBLE, OUT sumofsquare DOUBLE, OUT count BIGINT)
LANGUAGE C
SPECIFIC sumsq_initialize
PARAMETER STYLE SQL
NOT FENCED
NO SQL
EXTERNAL NAME 'mylib!sumsq_initialize'

CREATE OR REPLACE PROCEDURE sumsq_accumulate(IN input DOUBLE, INOUT sum DOUBLE, INOUT sumofsquare DOUBLE, INOUT count BIGINT)
LANGUAGE C
SPECIFIC sumsq_accumulate
PARAMETER STYLE SQL
NOT FENCED
NO SQL
EXTERNAL NAME 'mylib!sumsq_accumulate';

CREATE OR REPLACE PROCEDURE sumsq_merge(IN sum DOUBLE, IN sumofsquare DOUBLE, IN count BIGINT, INOUT mergesum DOUBLE, INOUT mergesumofsquare DOUBLE, INOUT mergecount BIGINT)
LANGUAGE C
SPECIFIC sumsq_merge
PARAMETER STYLE SQL
NOT FENCED
NO SQL
EXTERNAL NAME 'mylib!sumsq_merge';

CREATE OR REPLACE FUNCTION sumsq_finalize(sum DOUBLE, sumofsquare DOUBLE, count BIGINT)
RETURNS DOUBLE
LANGUAGE C
SPECIFIC sumsq_finalize
PARAMETER STYLE SQL
NOT FENCED
NO SQL
CALLED ON NULL INPUT
EXTERNAL NAME 'mylib!sumsq_finalize';

CREATE OR REPLACE FUNCTION sumsq(DOUBLE)
RETURNS DOUBLE
SPECIFIC sumsq
AGGREGATE WITH (sum DOUBLE, sumofsquare DOUBLE, count BIGINT)
USING
INITIALIZE PROCEDURE sumsq_initialize
ACCUMULATE PROCEDURE sumsq_accumulate
MERGE PROCEDURE sumsq_merge
FINALIZE FUNCTION sumsq_finalize;

The state variable is declared with the AGGREGATE WITH clause.

The following is the C code implementation of the component routines. In this example, the C code is in a file named mylib.c.

#include <memory.h>
#include <sql.h>
#include <sqlca.h>
#include <sqlda.h>
#include <sqludf.h>
#include <string.h>

#ifdef __cplusplus
extern "C"
#endif
SQL_API_RC SQL_API_FN sumsq_initialize
(
  SQLUDF_DOUBLE *outSum,
  SQLUDF_DOUBLE *outSumofsquare,
  SQLUDF_BIGINT *outCount,
  SQLUDF_NULLIND *outSumNullInd,
  SQLUDF_NULLIND *outSumofsquareNullInd,
  SQLUDF_NULLIND *outCountNullInd,
  SQLUDF_TRAIL_ARGS )
{
  /* Initialize state variables */
  *outSum = 0.0;
  *outSumofsquare = 0.0;
  *outCount = 0;

  /* Set each state variables' null indicator to not null */
  *outSumNullInd = 0;
  *outSumofsquareNullInd = 0;
  *outCountNullInd = 0;

  return 0;
}

#ifdef __cplusplus
extern "C"
#endif
SQL_API_RC SQL_API_FN sumsq_accumulate
(
  SQLUDF_DOUBLE *inInput,
  SQLUDF_DOUBLE *inoutSum,
  SQLUDF_DOUBLE *inoutSumofsquare,
  SQLUDF_BIGINT *inoutCount,
  SQLUDF_NULLIND *inInputNullInd,
  SQLUDF_NULLIND *inoutSumNullInd,
  SQLUDF_NULLIND *inoutSumofsquareNullInd,
  SQLUDF_NULLIND *inoutCountNullInd,
  SQLUDF_TRAIL_ARGS )
{
  if (SQLUDF_NULL(inInputNullInd))
  {
    return 0;
  }

  *inoutSum += *inInput;
  *inoutSumofsquare += (*inInput * *inInput);
  ++(*inoutCount);
  return 0;
}


#ifdef __cplusplus
extern "C"
#endif
SQL_API_RC SQL_API_FN sumsq_merge
(
  SQLUDF_DOUBLE *inSum,
  SQLUDF_DOUBLE *inSumofsquare,
  SQLUDF_BIGINT *inCount,
  SQLUDF_DOUBLE *inoutSum,
  SQLUDF_DOUBLE *inoutSumofsquare,
  SQLUDF_BIGINT *inoutCount,
  SQLUDF_NULLIND *inSumNullInd,
  SQLUDF_NULLIND *inSumofsquareNullInd,
  SQLUDF_NULLIND *inCountNullInd,
  SQLUDF_NULLIND *inoutSumNullInd,
  SQLUDF_NULLIND *inoutSumofsquareNullInd,
  SQLUDF_NULLIND *inoutCountNullInd,
  SQLUDF_TRAIL_ARGS )
{
  *inoutSum += *inSum;
  *inoutSumofsquare += *inSumofsquare;
  *inoutCount += *inCount;
  return 0;
}

#ifdef __cplusplus
extern "C"
#endif
SQL_API_RC SQL_API_FN sumsq_finalize
(
  SQLUDF_DOUBLE *inSum,
  SQLUDF_DOUBLE *inSumofsquare,
  SQLUDF_BIGINT *inCount,
  SQLUDF_DOUBLE *outResult,
  SQLUDF_NULLIND *inSumNullInd,
  SQLUDF_NULLIND *inSumofsquareNullInd,
  SQLUDF_NULLIND *inCountNullInd,
  SQLUDF_NULLIND *outResultNullInd,
  SQLUDF_TRAIL_ARGS )
{
  *outResultNullInd = 0;
  *outResult = *inSumofsquare - (*inSum * *inSum / *inCount);
  return (0);
}

It is compiled into a binary library using the following commands:

gcc -m64 -fpic -I/home/bigsql/sqllib/include -c mylib.c -D_REENTRANT
gcc -m64 -fpic -shared -o mylib mylib.o -Wl,-rpath,/home/bigsql/sqllib/lib64 -L/home/bigsql/sqllib/lib64 -ldb2 -lpthread

You need to place the binary library mylib into the directory /home/bigsql/sqllib/function on all nodes. You can also use Java to implement the component routines, but in general C routines will perform better.

Here is a simple query showing the UDA in action:

select * from tab1

C1                      
------------------------
  +1.00000000000000E+000
  +2.00000000000000E+000
  +3.00000000000000E+000
  +4.00000000000000E+000

  4 record(s) selected.


select sumsq(c1) from tab1

1                       
------------------------
  +5.00000000000000E+000

  1 record(s) selected.

The following figure shows the roles that each of the component routines plays toward computing the final result.
uda_blog_drawing

Join The Discussion

Your email address will not be published. Required fields are marked *