Big SQL enables users to create their own SQL functions that can be invoked in queries. User-defined functions (UDFs) promote code re-use and reduce query complexity. They can be written to return a single (scalar) value or a result set (table). Programmers can write UDFs in SQL or any supported programming languages (such as Java and C). For simplicity, this lab focuses on SQL UDFs.
After you complete this lab, you will understand how to:
- Create scalar and table UDFs written in SQL.
- Incorporate procedural logic in your UDFs.
- Invoke UDFs in Big SQL queries.
- Drop UDFs.
Allow 1 – 1.5 hours to complete this lab.
Please note that this lab discusses only some of the capabilities of Big SQL scalar and table UDFs. For an exhaustive list of all the capabilities, please see the BigInsights 4.0 knowledge center (http://www-01.ibm.com/support/knowledgecenter/SSPT3X_4.0.0/com.ibm.swg.im.infosphere.biginsights.welcome.doc/doc/welcome.html).
Prior to starting this lab, you must be familiar with how to use the Big SQL command line (JSqsh), and you must have created the sample GOSALESDW tables. If necessary, work through earlier lab exercises on these topics.
This UDF lab was originally developed by Uttam Jain (email@example.com) with contributions from Cynthia M. Saracco. Please post questions or comments to the forum on Hadoop Dev at https://developer.ibm.com/hadoop/support/.
Big SQL provides many built-in functions to perform common computations. An example is dayname(), which takes a date/timestamp and returns the corresponding day name, such as Friday.
Often, organizations need to perform some customized or complex operation on their data that’s beyond the scope of any built-in-function. Big SQL allows users to embed their customized business logic inside a user-defined function (UDF) and write queries that call these UDFs.
As mentioned earlier, Big SQL supports two types of UDFs:
- Scalar UDF: These functions take one or more values as input and return a single value as output. For example, a scalar UDF can take three values (price of an item, percent discount on that item, and percent sales tax) to compute the final price of that item.
- Table UDF: These functions take one or more values as input and return a whole table as output. For example, a table UDF can take single value (department-id) as input and return a table of employees who work in that department. This result set could have multiple columns, such as employee-id, employee-first-name, employee-last-name.
Once created, UDFs can be incorporated into queries in a variety of ways, as you’ll soon see.
In this lab, you will first set up your environment for UDF development and then explore how to create and invoke UDFs through various exercises.
Ready to get started?
In this section, you will set up your JSqsh environment for UDF development.
__1. If necessary, launch JSqsh using the connection to your bigsql database. (This was covered in an earlier lab.)
__2. Reset the default SQL terminator character to “@”:
\set terminator = @;
Because some of the UDFs you will be developing involve multiple SQL statements, you must reset the JSqsh default termination character so that the semi-colon following each SQL statement in your UDF is not interpreted as the end of the CREATE FUNCTION statement for your UDF.
__3. Validate that the terminator was effectively reset:
__4. Inspect the output from the command (a subset of which is shown below), and verify that the terminator property is set to @.
You’re now ready to create your first Big SQL UDF.
In this section, you will create a scalar SQL UDF to compute final price of a particular item that was sold. Your UDF will require several input parameters:
- unit sale price: Price of one item
- quantity: Number of units of this item being sold in this transaction
- % discount: Discount on the item (computed before tax)
- % sales-tax: Sales tax (computed after discount)
As you might expect, your UDF will return a single value – the final price of the item.
After creating and registering the UDF, you will invoke it using some test values to ensure that it behaves correctly. Afterwards, you will invoke it in a query, passing in values from columns in a table as input to your function.
__1. Create a UDF named new_final_price in the gosalesdw schema:
CREATE OR REPLACE FUNCTION gosalesdw.new_final_price ( quantity INTEGER, unit_sale_price DOUBLE, discount_in_percent DOUBLE, sales_tax_in_percent DOUBLE ) RETURNS DOUBLE LANGUAGE SQL RETURN (quantity * unit_sale_price) * DOUBLE(1 - discount_in_percent / 100.0) * DOUBLE(1 + sales_tax_in_percent / 100.0) @
__2. Review the logic of this function briefly. This first line creates the function, which is defined to take four input parameters. The RETURNS clause indicates that a single (scalar) value of type DOUBLE will be returned. The function’s language is as SQL. Finally, the last two lines include the function’s logic, which simply performs the necessary arithmetic operations to calculate the final sales price of an item.
__3. After creating the function, test it using some sample values. A simple way to do this is with the VALUES clause shown here:
VALUES gosalesdw.new_final_price (1, 10, 20, 8.75)@
__4. Verify that result returned by your test case is
__5. Next, use the UDF in a query to compute the final price for items listed in sales transactions in the SLS_SALES_FACT table. Note that this query uses values from two columns in the table as input for the quantity and unit price and two user-supplied values as input for the discount rate and sales tax rate.
SELECT sales_order_key, quantity, unit_sale_price, gosalesdw.new_final_price(quantity, unit_sale_price, 20, 8.75) as final_price FROM sls_sales_fact ORDER BY sales_order_key FETCH FIRST 10 ROWS ONLY@
__6. Inspect the results.
__7. Now invoke your UDF in the WHERE clause of a query. (Scalar UDFs can be included anywhere in a SQL statement that a scalar value is expected.) This query is similar to your previous query expect that it includes a WHERE clause to restrict the result set to items with a file price of greater than 7000.
-- scalar UDF can be used wherever a scalar value is expected, -- for example in WHERE clause SELECT sales_order_key, quantity, unit_sale_price, gosalesdw.new_final_price(quantity, unit_sale_price, 20, 8.75) as final_price FROM sls_sales_fact WHERE gosalesdw.new_final_price(quantity, unit_sale_price, 20, 8.75) > 7000 ORDER BY sales_order_key FETCH FIRST 10 ROWS ONLY@
__8. Note that your results no longer include rows with items priced at 7000 or below.
In the previous lab, you used the fully-qualified UDF name (GOSALESDW.NEW_FINAL_PRICE) in your VALUES or SELECT statements. (GOSALESDW is the schema name and NEW_FINAL_PRICE is the function name.)
A UDF with the same name and input parameters can be specified in more than one schema, so providing Big SQL with the fully qualified function name identifies the function you want to execute. With Big SQL, you can also specify a list of schemas in a special register called “CURRENT PATH” (also called “CURRENT FUNCTION PATH”). When Big SQL encounters an unqualified UDF (in which no schema name specified), it will look for the UDF in the schemas specified in CURRENT PATH.
In this lab, you’ll learn how to set the CURRENT PATH and invoke your function without specifying a schema name.
__1. To begin, determine the values of your current function path by issuing either of these two statements:
VALUES CURRENT PATH@
VALUES CURRENT FUNCTION PATH@
__2. Verify that the results are similar to this:
__3. Add the GOSALESDW schema to the current path:
SET CURRENT FUNCTION PATH = CURRENT FUNCTION PATH, "GOSALESDW"@
__4. Inspect your function path setting again:
VALUES CURRENT FUNCTION PATH@
__5. Verify that the GOSALESDW schema is now in the path:
__6. Re-run the query you executed earlier, but this time remove the GOSALESDW schema from the function name with you invoke it:
SELECT sales_order_key, quantity, unit_sale_price, new_final_price(quantity, unit_sale_price, 20, 8.75) as final_price FROM sls_sales_fact ORDER BY sales_order_key FETCH FIRST 10 ROWS ONLY@
Note that Big SQL will automatically locate your UDF and successfully execute your query.
__7. Inspect the results.
Quite often, you may find it useful to incorporate conditional logic in your UDFs. In this section, you will learn how to include IF/ELSE statements to calculate the final price of an item based on a varying discount rate. To keep your work simple, you will create a modified version of the previous UDF that includes the following logic:
- If the unit price is 0 to 10, use a discount rate of X%
- If the unit price is 10 to 100, use a discount rate of Y%
- If the unit price is greater than 100, use a discount rate of Z%
The three different discount rates (X, Y, and Z) are based on input parameters.
__1. Create a UDF named new_final_price_v2 in the gosalesdw schema:
CREATE OR REPLACE FUNCTION gosalesdw.new_final_price_v2 ( quantity INTEGER, unit_sale_price DOUBLE, discount_in_percent_if_price_0_t0_10 DOUBLE, discount_in_percent_if_price_10_to_100 DOUBLE, discount_in_percent_if_price_greater_than_100 DOUBLE, sales_tax_in_percent DOUBLE ) RETURNS DOUBLE LANGUAGE SQL BEGIN ATOMIC DECLARE final_price DOUBLE; SET final_price = -1; IF unit_sale_price <= 10
THEN SET final_price = (quantity * unit_sale_price) * DOUBLE(1 - discount_in_percent_if_price_0_t0_10 / 100.0) * DOUBLE(1 + sales_tax_in_percent / 100.0) ; ELSEIF unit_sale_price <= 100 THEN SET final_price = (quantity * unit_sale_price) * DOUBLE(1 - discount_in_percent_if_price_10_to_100 / 100.0) * DOUBLE(1 + sales_tax_in_percent / 100.0) ;
ELSE SET final_price = (quantity * unit_sale_price) * DOUBLE(1 - discount_in_percent_if_price_greater_than_100 / 100.0) * DOUBLE(1 + sales_tax_in_percent / 100.0) ; END IF; RETURN final_price;
__2. Review the logic of this function briefly. As shown on lines 3 – 8, the function requires 6 input parameters. The first two represent the quantity ordered and the base unit price of each. The next three parameters specify different discount rates. The final input parameter represents the sales tax. The body of this function uses various conditional logic clauses (IF, THEN, ELSEIF, and ELSE) to calculate the final price of an item based on the appropriate discount rate and sales tax. Note that the unit price of the item determines the discount rate applied.
__3. Test your function’s logic using sample data values:
VALUES gosalesdw.new_final_price_v2 (1, 100, 10, 20, 30, 8.75)@
__4. Verify that the result is
If desired, review the function’s logic to confirm that this is the correct value based on the input parameters. Note that 1 item was ordered at a price of $100, qualifying it for a 20% discount (to $80). Sales tax of 8.75% on $80 is $7, which results in a final item price of $87.
__5. Now invoke your UDF in a query to report the final sales prices for various items recorded in your SLS_SALES_FACT table:
SELECT sales_order_key, quantity, unit_sale_price, gosalesdw.new_final_price_v2(quantity, unit_sale_price, 10,20,30, 8.75) as final_price FROM sls_sales_fact ORDER BY sales_order_key FETCH FIRST 10 ROWS ONLY @
__6. Inspect the results.
Big SQL enables you to include loops in your scalar UDFs. In this section, you’ll use a WHILE loop to create a mathematical function for factorials. As a reminder, the factorial of a non-negative integer N is the product of all positive integers less than or equal to N. In other words,
factorial(N) = N * (N-1) * (N-2) ……* 1
As an example,
factorial(5) = 5 * 4 * 3 * 2 * 1 = 120
__1. Create a scalar UDF named gosalesdw.factorial that uses a WHILE loop to perform the necessary multiplication operations.
-- WHILE-DO loop in scalar UDF -- This example is independent of gosalesdw tables -- Given a number n (n >= 1), returns its factorial -- as long as it is in INTEGER range.
-- Create scalar UDF with WHILE-DO loop CREATE OR REPLACE FUNCTION gosalesdw.factorial(n INTEGER) RETURNS INTEGER LANGUAGE SQL BEGIN ATOMIC DECLARE n2 INTEGER; DECLARE res INTEGER; SET res = n; SET n2 = n; loop1:
WHILE (n2 >= 2) DO SET n2 = n2 - 1; SET res = res * n2; END WHILE loop1; RETURN res;
__2. Review the logic of this function. Note that two variables are declared and set to the value of the input parameter. The first variable (res) holds the result of the computation. Its value changes as the body of the WHILE loop is executed. The second variable (n2) controls the loop’s execution and serves as part of the calculation of the factorial.
__3. Test your function supplying different input parameters:
-- The output of factorial(5) should be 120 VALUES gosalesdw.factorial(5)@ -- The output of factorial(7) should be 5040 VALUES gosalesdw.factorial(7)@
__4. Optionally, drop your function.
drop function gosalesdw.factorial@
Note that if you try to invoke your function again, you will receive an error message similar to this:
No authorized routine named "FACTORIAL" of type "FUNCTION" having compatible arguments was found.. SQLCODE=-440, SQLSTATE=42884, DRIVER=3.68.61
[State: 56098][Code: -727]: An error occurred during implicit system action type "2". Information returned for the error includes SQLCODE "-440", SQLSTATE "42884" and message tokens "FACTORIAL|FUNCTION".. SQLCODE=-727, SQLSTATE=56098, DRIVER=3.68.61
As you might expect, Big SQL also supports FOR-DO loops in SQL-bodied UDFs. In this exercise, you’ll create a function to calculate the sum of the top 5 sales for a given day.
__1. Create a scalar UDF named gosalesdw.sum_sale_total_top_5. Note that this UDF references the SLS_SALES_FACT table that you created in an earlier lab.
-- FOR-DO loop and a SELECT statement inside scalar UDF
-- Given order_day_key, returns sum of sale_total
-- for first 5 sales with given order_day_key. Order by sale_total
-- Create UDF with FOR-DO loop and a SELECT statement inside CREATE OR REPLACE FUNCTION gosalesdw.sum_sale_total_top_5(input_order_day_key INTEGER) RETURNS DOUBLE LANGUAGE SQL READS SQL DATA BEGIN ATOMIC DECLARE result DOUBLE; DECLARE counter INTEGER; SET result = 0; SET counter = 5; FOR v1 AS
SELECT sale_total FROM sls_sales_fact WHERE order_day_key = input_order_day_key ORDER BY sale_total DESC DO
IF counter > 0 THEN SET result = result + sale_total; SET counter = counter - 1; END IF; END FOR;
__1. Review the logic of this function. Note that the FOR loop begins by retrieving SALE_TOTAL values from the SLS_SALES_FACT table based on the order key day provided as input. These results are ordered, and the DO block uses a counter to control the number of times it will add a SALE_TOTAL value to the result. In this example, that will occur 5 times.
__2. Finally, use this UDF to compute the sum of the top 5 sales on a specific order day key (20040112).
-- The output of this function call should be 925973.09000 VALUES (gosalesdw.sum_sale_total_top_5(20040112)) @
Now that you’ve created several scalar UDFs, it’s time to explore how you can create a simple UDF that will return a result set. Such UDFs are called table UDFs because they can return multiple columns and multiple rows.
In this lab, you will create a table UDF that returns information about the items sold on a given day input by the user. The result set will include information about the sales order, the quantity of items, the pre-discounted sales price, and the final sales price (including tax and a discount). In doing so, your table UDF will call a scalar UDF you created previously: new_final_price_v2.
__1. Create a table UDF named gosalesdw.sales_summary. Note that this UDF references the SLS_SALES_FACT table that you created in an earlier lab.
-- Table UDF
-- given an order_day_key, returns some desired fields and
-- new_final_price for that order_day_key
-------------------------------- -- Create a simple table UDF
CREATE OR REPLACE FUNCTION gosalesdw.sales_summary(input_order_day_key INTEGER) RETURNS TABLE(sales_order_key INTEGER, quantity INTEGER, sale_total DOUBLE, new_final_price DOUBLE) LANGUAGE SQL READS SQL DATA RETURN SELECT sales_order_key, quantity, sale_total, gosalesdw.new_final_price_v2(quantity, unit_sale_price, 10,20,30, 8.75) FROM sls_sales_fact WHERE order_day_key = input_order_day_key @
__2. Inspect the logic in this function. Note that it includes a READS SQL DATA clause (because the function SELECTs data from a table) and that the RETURNS clause specifies a TABLE with columns and data types. Towards the end of the function is the query that drives the result set that is returned. As mentioned earlier, this query invokes a scalar UDF that you created earlier.
__3. Invoke your table UDF in the FROM clause of a query, supplying an input parameter of 20040112 to your function for the order day key.
-- use it in the FROM clause SELECT t1.* FROM TABLE (gosalesdw.sales_summary(20040112)) AS t1 ORDER BY sales_order_key FETCH FIRST 10 ROWS ONLY @
__4. Inspect your output.
As you might imagine, the bodies of table UDFs aren’t limited to queries. Indeed, you can write table UDFs that contain IF/ELSE, WHILE/DO, FOR-DO, and many more constructs. Consult the BigInsights Knowledge Center for details.
As you saw in an earlier exercise, you can drop UDFs with the DROP FUNCTION statement. In addition, you can create multiple UDFs with the same name (even in the same schema) if their input parameters differ enough so that Big SQL can identify which should be called during a query. Such UDFs are said to be “overloaded”. When working with overloaded UDFs, you must use the DROP SPECIFIC FUNCTION statement to properly identify which UDF bearing the same name should be dropped.
In this lab, you’ll explore the concepts of overloading functions and dropping a specific function. To keep things simple and focused on the topics at hand, the UDFs will be trivial – they will simply increment a supplied INTEGER or DOUBLE value by 1.
__1. Create a scalar UDF that increments an INTEGER value.
-- Create a scalar UDF CREATE FUNCTION increment_by_one(p1 INT) RETURNS INT LANGUAGE SQL SPECIFIC increment_by_one_int RETURN p1 + 1 @
Note that the SPECIFIC clause provides a unique name for the function that we can later reference it when we need to drop this function.
__2. Create a scalar UDF that increments a DOUBLE value.
-- Create another scalar UDF with same name (but different specific name) CREATE FUNCTION increment_by_one(p1 DOUBLE) RETURNS DOUBLE LANGUAGE SQL SPECIFIC increment_by_one_double RETURN p1 + 1 @
__3. Attempt to drop the increment_by_one function without referencing the specific name you included in each function.
-- If we try to drop the function using DROP FUNCTION statement, -- Big SQL will throw Error : SQLCODE=-476, SQLSTATE=42725, because -- Big SQL needs to know which function should be dropped DROP FUNCTION increment_by_one@
Note that this statement will fail because Big SQL isn’t certain which of the two increment_by_one functions you intended to drop.
__4. Drop the function that requires an INTEGER as its input parameter. Reference the function’s specific name in a DROP SPECIFIC FUNCTION statement.
-- User must drop using specific name DROP SPECIFIC FUNCTION increment_by_one_int@
__5. Now drop the remaining increment_by_one function. Since we only have 1 function by this name in this schema, we can issue a simple DROP FUNCTION statement:
-- Now we have only one function with this name, so we can use -- simple DROP FUNCTION statement. DROP FUNCTION increment_by_one@
What if you didn’t include a SPECIFIC clause (i.e., a specific name) in your UDF definition? Big SQL will explicitly provide one, and you can query the system catalog tables to identify it. Let’s explore that scenario.
__6. Create a simple scalar UDF again.
-- Create a UDF CREATE FUNCTION increment_by_one(p1 INT) RETURNS INT LANGUAGE SQL RETURN p1 + 1 @
__7. Create another scalar UDF with the same name (but different input parameter)
-- Create another scalar UDF with same name (but different input parm) CREATE FUNCTION increment_by_one(p1 DOUBLE) RETURNS DOUBLE LANGUAGE SQL RETURN p1 + 1 @
__8. Query the Big SQL catalog for specific names for these functions:
-- Query catalog for specific name: SELECT ROUTINENAME, SPECIFICNAME, PARM_COUNT, RETURN_TYPENAME FROM SYSCAT.ROUTINES WHERE ROUTINENAME = 'INCREMENT_BY_ONE' @
__9. Inspect the output, noting the different names assigned to your functions. (Your output may vary from that shown below.)
__10. If desired, drop each of these UDFs. Remember that you will need to reference the specific name of the first UDF that you drop when you execute the DROP SPECIFIC FUNCTION statement.