User Defined Functions (UDFs) are very commonly used to encapsulate business logic for data processing at more granular level. You can write UDFs to perform custom calculations, data extraction, data manipulation, and other transformations meant to be repeatedly used.

You can use UDFs in Big SQL as part of the select list of a query or in where clause to filter data. They are allowed everywhere a SELECT statement is used like in create …as select… or insert…select statements to transform data from one form into some other table.

[code language=”sql”] select area(radius) from radii_tab;
select count(*) from radii_tab where area(radius) < 100;[/code]

C User Defined Functions (UDFs)

Here is a sample C UDF example that returns area of a circle –

1. Create a file called area_circle.c as:

[code language=”cpp”] void cirArea(double *radius, double *outarea)
{
double pi=3.14;
*outarea = pi * (*radius) * (*radius);
}
[/code]

2. Create a makefile as:

[code language=”bash”] all: libmath.so
libmath.so: area_circle.o
gcc -shared -o libmath.so area_circle.o
area_circle.o: area_circle.c
gcc -c area_circle.c -O3
clean:
rm -f area_circle.o libmath.so
[/code]
3. Compile the .c using the makefile by running:

[code language=”bash”]make[/code]

4. Copy the .so to ~bigsql/sqllib/function on all nodes of the Big SQL cluster.

5. Register the UDF to invoke the .so as:

[code language=”sql”] CREATE FUNCTION cirArea (FLOAT)
RETURNS FLOAT
EXTERNAL NAME ‘libmath.so!cirArea’
LANGUAGE C
NO SQL
NO EXTERNAL ACTION
PARAMETER STYLE SQL
DETERMINISTIC
ALLOW PARALLEL
@[/code]

There are different styles of passing parameters like DB2 GENERAL, JAVA, and SQL. Refer to documentation for more details.

Java User Defined Functions (UDFs)
You can find a sample Java UDF on this tutorial link.

Performance Considerations
C UDFs perform better than Java UDFs in general.

Listed below are the options that allow better performance, but it is important to choose the right options, as applicable in your specific environment, when registering the UDF. For details on when to use which and implications of the options, refer to documentation.

NOT FENCED
DETERMINISTIC
ALLOW PARALLEL
NO EXTERNAL ACTION

User Defined Table Functions (UDTFs)
While UDFs process the arguments passed in, and return one data element, UDTFs are user-defined functions that are used to return a set of rows with subsequent executions of the function.

Here is a sample UDTF definition:

[code language=”sql”] CREATE OR REPLACE FUNCTION HIVE.GETALLMEMBERS (
jsonDocument varchar(32672)
)
RETURNS TABLE (
key VARCHAR(128),
value VARCHAR(128)
)
SPECIFIC GETALLMEMBERS
EXTERNAL NAME ‘com.ibm.biginsights.bigsql.hiveudf.getAllMembersUDTF!exec’
LANGUAGE JAVA
PARAMETER STYLE DB2GENERAL
FENCED
NO SQL
DISALLOW PARALLEL
FINAL CALL
NO EXTERNAL ACTION @
[/code]

The function above points to a function in a class in which the ‘exec’ function accepts three types of calls:

UDF.SQLUDF_TF_OPEN = When the table is opened to read from, the function is called with this type once per scan.

UDF.SQLUDF_TF_FETCH = The function is called with this type when fetching rows from this table function. So this is the portion of code that returns individual rows of the form (key varchar(128), value varchar(128)) as defined in the function definition above.

UDF.SQLUDF_TF_CLOSE = When the read finishes, the table is closed and the function is called with this type.

Here is a pseudo-code:

[code language=”java”] import COM.ibm.db2.app.UDF;
public class getAllMembersUDTF extends UDF
{
List keys = new ArrayList();
Iterator keysIter;
List values = new ArrayList();
Iterator valuesIter;
public void exec (String document, String outKey, String outValue) throws Exception
{
switch (getCallType())
{
case UDF.SQLUDF_TF_OPEN:
// Populate keys and values from the document string
keysIter = keys.iterator();
valuesIter = values.iterator();
break;
case UDF.SQLUDF_TF_FETCH:
if (keysIter == null || ! keysIter.hasNext()) {
setSQLstate("02000");
return;
}
if (keysIter.hasNext()) {
set(2, keysIter.next());
set(3, valuesIter.next());
}
break;
default:
break;
}
}
}
[/code]

For more details about about the UDF class, refer to this documentation link.

The above UDTF is shown in Java for illustration purpose only, but the same is possible with a C UDTF as well as Stored Procedures. Here is a sample C UDTF for json_tuple using the libjson as above for get_json_object.

[code language=”C”]#include <string.h>
#include "sqludf.h"
#include "libjson/libjson.h"

struct SCRATCHDATA
{
JSONNODE *jsonnode;
SQLUDF_SMALLINT nextKey;
};

char *get_value(JSONNODE *node, char *key)
{
JSONNODE *out_json = json_get(node, key);
if ((json_type(out_json) == JSON_ARRAY) || (json_type(out_json) == JSON_NODE))
return json_write(out_json);
else
return json_as_string(out_json);
}

extern "C"
void SQL_API_FN json_tuple(
SQLUDF_VARCHAR *in_json_text,
SQLUDF_VARCHAR *in_key1,
SQLUDF_VARCHAR *in_key2,
SQLUDF_VARCHAR *in_key3,
SQLUDF_VARCHAR *out_val1,
SQLUDF_VARCHAR *out_val2,
SQLUDF_VARCHAR *out_val3,
SQLUDF_NULLIND *in_json_text_nullind,
SQLUDF_NULLIND *in_key1_nullind,
SQLUDF_NULLIND *in_key2_nullind,
SQLUDF_NULLIND *in_key3_nullind,
SQLUDF_NULLIND *out_val1_nullind,
SQLUDF_NULLIND *out_val2_nullind,
SQLUDF_NULLIND *out_val3_nullind,
SQLUDF_TRAIL_ARGS_ALL)
{
struct SCRATCHDATA *sp;
sp = (struct SCRATCHDATA *) SQLUDF_SCRAT->data;

switch (SQLUDF_CALLT)
{
case SQLUDF_TF_OPEN:
if (*in_json_text_nullind == -1) {
strcpy( SQLUDF_STATE, "02000");
break;
}

sp->jsonnode = json_parse(in_json_text);
sp->nextKey = 1;
break;
case SQLUDF_TF_FETCH:
switch (sp->nextKey) {
case -1:
strcpy( SQLUDF_STATE, "02000");
break;
case 1:
strcpy(out_val1, get_value(sp->jsonnode, in_key1));
strcpy(out_val2, get_value(sp->jsonnode, in_key2));
strcpy(out_val3, get_value(sp->jsonnode, in_key3));
*out_val1_nullind = 0;
*out_val2_nullind = 0;
*out_val3_nullind = 0;
sp->nextKey = -1;
break;
}
break;
case SQLUDF_TF_CLOSE:
json_delete(sp->jsonnode);
break;
}
}[/code]

Register the UDTF as –

[code language=”SQL”]ECHO CREATE FUNCTION hive.json_tuple @
CREATE OR REPLACE FUNCTION hive.json_tuple
(
JSON_TEXT varchar(32672),
KEY1 varchar(4096),
KEY2 varchar(4096),
KEY3 varchar(4096)
)
RETURNS TABLE (
value1  VARCHAR(50),
value2  VARCHAR(50),
value3  VARCHAR(50)
)
SPECIFIC hive.json_tuple
DETERMINISTIC
NO EXTERNAL ACTION
LANGUAGE C
PARAMETER STYLE SQL
DISALLOW PARALLEL
NO DBINFO
NO SQL
SCRATCHPAD
NO FINAL CALL
NOT FENCED
THREADSAFE
PROGRAM TYPE SUB
EXTERNAL NAME ‘libbigsqljson_tuple.so!json_tuple’
@[/code]

 

The function accepts 3 keys and returns the values as a table row, so it should be used as

[code language=”SQL”]select * from table(hive.json_tuple(‘{"Foo":"ABC","Bar":"20090101100000","Quux":{"QuuxId":1234,"QuuxName":"Sam"}}’, ‘Foo’, ‘Bar’, ‘Quux’))[/code]

You can also use lateral keyword to join it with some other json_tuple UDTF as

[code language=”SQL”]select name from table(hive.json_tuple(‘{"Foo":"ABC","Bar":"20090101100000","Quux":{"QuuxId":1234,"QuuxName":"Sam"}}’, ‘Foo’, ‘Bar’, ‘Quux’)) v1 (foo, bar, quux), LATERAL (select * from table(hive.json_tuple(v1.quux, ‘QuuxName’, ‘QuuxId’, ‘QuuxId’)) v2(name, id1, id2) ) [/code]

If tuple is the column from a json_tuple_test table having the json string, you could join that table to json_tuple as

[code language=”SQL”]select name from json_tuple_test, LATERAL (select * from table(hive.json_tuple(tuple, ‘QuuxName’, ‘QuuxId’, ‘QuuxId’)) v2(name, id1, id2) ) [/code]

If you would like to use json_tuple with different number of keys, just repeat the above process for the desired signatures.

Hive built-in functions
In BigInsights 3.0 Fix Pack 1, there is support for a majority of Hive built-in functions. See Hive User-defined functions.

C UDF for get_json_object
The get_json_object function is available as a Java UDF in the fixpack. Here is a C UDF I attempted for the same using libjson library. Note that the content is not supported by IBM. Also, review license requirements from the libjson library. Compile the code below along with the libjson.so library to build a UDF library called libbigsqljson.so and then register the UDF as shown below.

get_json_object.c

[code language=”cpp”] #include<string.h>
#include "libjson/libjson.h"
void find_at_top_level(JSONNODE *node, char *path, char*out_string)
{
JSONNODE *out_json = NULL;
int i=0;
char *sublevelpath = NULL;
int arr_subscript = -1;
if ((path[0] == ‘$’) && (path[1] == ‘.’))
path+=2;
int end = strlen(path);
while (i<end)
{
if (path[i] == ‘.’)
{
if ((i+1) < end) sublevelpath = path+i+1;
path[i] = ‘\0’;
break;
}
if (path[i] == ‘\[‘)
{
path[i] = ‘\0’;
int j =i+1;
while (path[j++] != ‘]’);
path[j] = ‘\0’;
if (path[i+1] != ‘*’) arr_subscript = atoi(path+i+1);
if ((j + 1) < end) sublevelpath = path+j+1;
break;
}
i++;
}
if (json_type(node) == JSON_ARRAY)
{
int first = 1;
for (i=0; i < json_size(node); i++)
{
if (first == 1)
{
strcat(out_string, "[");
first = 0;
}
else
strcat(out_string, "[");
find_at_top_level(json_at(node, i), path, out_string);
}
if (first == 0) strcat(out_string, "]" );
}
else
{
out_json = json_get(node, path);
if (arr_subscript != -1) out_json = json_at(out_json, arr_subscript);
}
if (out_json == NULL)
return;
if (sublevelpath != NULL)
find_at_top_level(out_json, sublevelpath, out_string);
else
{
if ((arr_subscript != -1) || (json_type(out_json) == JSON_ARRAY))
strcat(out_string, json_write(out_json));
else
strcat(out_string, json_as_string(out_json));
}
}
void get_json_object(char *in_json_text, char *in_json_path,
char *out_json_object)
{
JSONNODE *n = json_parse(in_json_text);
strcpy(out_json_object, "");
find_at_top_level(n, in_json_path, out_json_object);
json_delete(n);
}
[/code]

get_json_object.sql

[code language=”sql”] CREATE OR REPLACE FUNCTION cudf.get_json_object
(
JSON_TEXT varchar(32672),
JSON_PATH varchar(4096)
)
RETURNS VARCHAR(32672)
SPECIFIC cudf.get_json_object
DETERMINISTIC
NO EXTERNAL ACTION
LANGUAGE C
PARAMETER STYLE SQL
NO DBINFO
NO SQL
NO SCRATCHPAD
NO FINAL CALL
NOT FENCED
THREADSAFE
PROGRAM TYPE SUB
EXTERNAL NAME ‘libbigsqljson.so!get_json_object'[/code]

Conclusion

With Big SQL you have the flexibility to customize data processing for individual business requirements. There are various options available for this and they are quite easy to start implementing and deploying as shown above.

3 comments on"User-Defined Functions (UDFs) in Big SQL v3.0"

  1. Adidas Schuhe Braun Damen
    Insurance plan is a must-have in numerous aspects of our everyday lives. Vehicle insurance, medical insurance, daily life and home insurance all guard us in one way or another from financial failures. The price of these guidelines can easily tally up, and among the finest steps you can take is choose a well-set up insurer that offers numerous insurance policy varieties. By bundling many different protection kinds together, you might conserve a considerable amount of money all your insurance.

    https://www.verwaiste-eltern-heilbronn.de/images/ver2/511-le-coq-sportif-schuhe-sneaker.jpg

    Cut back by replacing high priced, labeled products with items that you already individual. As an example, it will save you money by using points, like baking soda pop to eradicate unsightly stains and smelly messes. Use the Internet to your advantage and search for different natural home remedies which you can use that happen to be affordable.

    https://www.chinamassagesofa.de/images/chi2/10826-puma-velvet-rot.jpg

  2. Adidas Tubular Run Big
    When you are possessing a hard time conceiving, be sure to view your doctor and also have them manage checks. Plenty of good reasons that you may possibly not being expecting a baby, most of which can be treated. Also, be sure your spouse is tested, since the dilemma may be with him or his semen.

    https://www.treguier-immobilier.fr/images/treguier-immobilier/21666-sneakers-valentino.jpg

    Make sure to do the best to broaden your portfolio if you are investing in the stock market. This way, you will be greater insulated in opposition to downturns in the market. Distributing your money out over several organizations and marketplace sectors, indicates that you will be not as likely to shed money over the long term.

    https://www.soc16.fr/images/socfr/3308-chaussures-armani-femme-pas-cher.jpg

  3. Reebok Rose Gold White
    If you would like residence-university your kids, you need to talk about the complete program your self prior to deciding to train everything to your son or daughter. You should be comfortable with each and every subject to help you explain all things in straightforward terminology. If you want aid, usually do not be reluctant to sign up for classes.

    https://www.stasi-live-haft.de/images/sta2/3102-adidas-x-15.4.jpg

    Find out concerning how to finest guard your own personal details. Particularly if you have very delicate facts about it. It will give you the ability to have all your certain info fully erased when someone falters to penetrate the proper pass word 10 times in a successive get. This could Ave you from a major head ache afterwards.

    https://www.angermuender-sommerkonzerte.de/images/ang2/5460-beige-pumps-riemchen.jpg

Join The Discussion

Your email address will not be published. Required fields are marked *