JSON is used for data exchange. Before generating JSON data, it is important to know what JSON is, how it looks, what components it comprises, and on which syntax and grammar rules it is based.

JSON basics

JavaScript Object Notation (JSON) is an open-standard file format that uses human-readable text to transmit data. JSON data consists of a sequence of Unicode characters that are arranged in a hierarchical form based on the JSON value syntax and grammar. The JSON syntax is derived from the JavaScript programming language syntax. JavaScript itself is one of the three core technologies of the World Wide Web Consortium (W3C). JSON is a completely independent text format and many programming languages include functions or methods for generating and consuming JSON data.

JSON data can include the following structural tokens:

  • Curly brackets { }
    Curly brackets are used for defining JSON objects.
  • Square brackets [ ]
    Square brackets are used for defining JSON arrays.
  • Colon
    Key:value pairs within JSON objects are separated by a colon.
  • Comma
    Multiple key:value pairs in objects or multiple elements in an array are separated by a comma.

JSON values represent one of the following data types:

  • String
    A string value includes a sequence of zero or more Unicode characters. The string values are embedded in double quotes (“). Double quotes and backslashes () within JSON strings must be escaped with a backslash. Instead of Unicode characters it is also possible to pass Unicode hex values which start with \u followed by the four-digit hexadecimal Unicode number.

    Several characters, escaped with a backslash, have a specific meaning:

    • \b = backspace
    • \f = form feed
    • \n = line feed
    • \r = carriage return
    • \t = character tabulation

    Figure 1: JSON string JSON string

    The following example shows an array with a sequence of objects. All values within the objects are strings. The special characters, backslash () and double quotes (“), are escaped with an additional backslash. There is also one value including a Unicode Hex value (\u00c5), representing the Swedish A character.

    Figure 2: JSON string values JSON string values

  • Numbers
    A JSON number consists of a sequence of digits between 0 and 9. If a decimal separator is needed, the period must be used (even in environments or countries where the comma is used as decimal separator). For numeric values between -0.9 and 0.9, a leading zero in front of the decimal point is required. Negative values must be specified with a leading minus sign. For decimal or float values, the exponential E-notation with mantissa and exponent may be used.

    Figure 3: JSON number JSON number

    The next figure shows an array with several objects. All values within the objects are numeric, with positive or negative values, with or without decimal positions, and with or without mantissa and exponent.

    Figure 4: Numeric values
    Numeric values

  • Boolean literals true and false
    The boolean values true and false without leading and trailing double quotes can be used as values in JSON objects or arrays.

    The following figure shows an object where all values are boolean values.

    Figure 5: Boolean literals
    Boolean literals

  • NULL values
    The literal null without leading and trailing double quotes represents a NULL value within a JSON object or array.

    The following figure shows an object containing NULL values.

    Figure 6: Null literals
    Null literals

  • Objects
    A JSON object starts and ends with a curly bracket and consists of zero, one or multiple key:value pairs. The key or name that describes the value must be a string. The value can be a string or a number, one of the literals (true, false, null), another (nested) object, or a (nested) array. Key and value are separated by a colon. If an object consists of multiple key:value pairs, the key:value pairs are separated from each other with a comma.

    Figure 7: JSON object syntax JSON object syntax

    The following example shows an object with four different key:value pairs. Three of the values are string and the fourth value is a number.

    Figure 8: JSON object
    JSON object

  • Arrays
    A JSON array starts and ends with a square bracket. A JSON array represents a list of zero or more values. The values can be the literals (true, false, null), strings or numbers, (nested) objects, or (nested) arrays. The array elements are separated by a comma.

    Figure 9: JSON array syntax
    JSON array syntax

    The following figure shows an array with six elements. The second array consists of three objects. Each object includes the type and number key:value pair.

    Figure 10: JSON arraysg
    JSON arrays

JSON support in IBM Db2 for i

IBM® Db2® for i has been adding SQL support for JSON since October 2015. The support is largely completed for those using IBM i 7.2 or later versions, if the latest Db2 PTF Group is installed. To use JSON with SQL, clients need to confirm that they are at IBM i 7.2 or later and have installed the latest Db2 PTF Group.

The following figure gives an overview of JSON support in Db2 for i.

Figure 11: JSON support in Db2 for i – JSON functions and predicates
JSON support in Db2 for i – JSON functions and predicates

Sample data

To keep it simple, we will generate JSON data based on a few very basic tables and views. The first one is a table named LOBSTAFF which includes the employee number and address data for six employees.

The following figure shows the content of the LOBSTAFF table.

Figure 12: LOBSTAFF table – sample data
LOBSTAFF table – sample data

The second one, SALESV05 is a view based on a sales table, but accumulates the sales per customer and sales year. The sales per customer and year are split over 12 columns representing the month in which the sales occurred.

Figure 13: SALESV05 – Yearly sales per customer
SALESV05 – Yearly sales per customer

The SALESV06 view matches the SALESV05 view, except in months with no sales, a NULL value is returned.

Figure 14: SALESV06 – Yearly sales per customer with NULL values
SALESV06 – Yearly sales per customer with NULL values

JSON_OBJECT – Generating objects

With the JSON_OBJECT function (nested) JSON objects can be generated. The key:value pairs are automatically embedded in curly brackets ( {} ) and key and value are separated by a colon ( : ).

The key value itself is directly converted into a string, that is, surrounded with double quotes and the special characters (double quotes and backslashes) if included are escaped.

The value is converted depending on the Db2 data type. The value itself can represent a literal (true, false or null), a string, a number, a (nested) JSON object or a (nested) JSON array. Values in Db2 data types that are not supported by JSON, for examples the DATE or TIME data types are automatically converted into strings. If a date or time value is converted into a character string, the International Standards Organization (ISO) format is used.

There are two syntax alternatives for coding JSON objects with the JSON_OBJECT function:

JSON_OBJECT(KEY KeyExpression1 VALUE ValueExpression1,
            KEY KeyEspression2 VALUE ValueExpression2, …);
JSON_OBJECT(KeyExpression1: ValueExpression1,
            KeyExpression2: ValueExpression2, …);

In the first syntax, alternatives are KEY and VALUE part of the syntax. KEY can optionally be specified, while VALUE is mandatory.

The second syntax alternative requires no key words, but a key expression and a value, separated with a colon.

If multiple key:value pairs have to be included in the object, they have to be separated by commas, independent of whichever syntax alternative is used.

The key expression can be any Unicode string, including double quotes (“) and backslashes ( \ ). Both special characters are automatically escaped in the string.

Note: Key names are case sensitive.

The value expression can be any literal value, column value, the result of concatenating multiple columns, the result of a scalar user-defined function (UDF), or a full select that returns a single value.

In the following example four different independent JSON objects are generated. The first object with the EmployeeNo key includes the integer value of the EMPLOYEENO column. The next two objects, LastName and FirstName are based on the character columns NAME and FIRSTNAME. The last object with the City key name is composed of the column values of the ZIPCODE and CITY columns.

The example shows two queries which produce the same result.

Figure 15: Basic JSON objects
Basic JSON objects

To generate JSON objects with multiple key:value pairs, the key:value pairs must be defined as in the previous example. But all key:value pairs must be defined in the same JSON_OBJECT function, separated by a comma, independent of the syntax alternative that is used.

In the following example, a single object containing a key:value pair for the employee number, the last and first names, and the city is generated.

Figure 16: JSON object with multiple key:value pairs
JSON object with multiple key:value pairs

Returning data type

The generated JSON data is by default returned as a 2 gigabyte (GB) character large object (CLOB) in coded character set identifier (CCSID) 1208 (UTF8). However, it is possible to convert the JSON result into any character data type with any CCSID (except 65535) and in any valid length.

Note: Double-byte character set (DBCS) data must use Unicode encoding.

To modify the output format, use the RETURNING clause.

Syntax:
JSON_OBJECT(Key:value pairs Definition
            RETURNING data-type(Length) CCSID xxxxx)

Note: Returning the JSON data with a shorter length than 2 GB, may speed up the data exchange.

The JSON object data created in the following example is returned as UTF-8 varying character field with a maximum length of 128 characters (VARCHAR(128) CCSID 1208).

Figure 17: JSON object returned in UTF-8 as VARCHAR(128)
JSON object returned in UTF-8 as VARCHAR(128)

Objects containing null literal values

If a value expression returns a NULL value, it is automatically converted into the JSON null literal, that is, null without leading and trailing double quotes is returned.

In the following example, an object with the key:value pairs LastName and FirstName is built.

The LOBSTAFF table does not contain any NULL values, but there is an employee (Meier und Sohn) without first name. With the aid of the NULLIF scalar function, the FIRSTNAME column value is converted into a NULL value, if it does not include anything else than blanks (NULLIF(FirstName, '') ).

When running the query, the first name value for the employee Meier und Sohn is returned as a null literal.

Figure 18: JSON object including a null literal
JSON object including a null literal

Omitting key:value pairs with NULL values

Sometimes, you may not want to return a key:value pair containing a NULL value, but ignore the key:value pair instead.

Whether a key:value pair with a NULL value is returned or not depends on whether NULL ON NULL or ABSENT ON NULL has been specified on the JSON_OBJECT function:

Syntax:
JSON_OBJECT(Key:value pairs Definition
            ***NULL ON NULL or ABSENT ON NULL***
             RETURNING DataType(Length) CCSID xxxxx)

NULL ON NULL is the default value. So, key:value pairs containing a NULL value are always included in the object. The NULL value is converted into a null literal.

If ABSENT ON NULL is specified, key:value pairs containing a NULL value are omitted.

Note: ABSENT ON NULL (or NULL ON NULL) can only be specified once for the JSON_OBJECT function. It is not possible to decide for specific key:value pairs whether they have to be omitted or not. In this way, within the same object either all key:value pairs with NULL values are returned or none of the key:value pairs with NULL values are returned.

In the following example an object with the LastName and FirstName key:value pairs is generated. As in the previous example, a missing first name is converted into a NULL value. But this time the ABSENT ON NULL clause is added to the JSON_OBJECT function. Because of this specification, the first name key:value pair for the employee Meier und Sohn is dropped.

Figure 19: JSON object with the ABSENT ON NULL specification
JSON object with the ABSENT ON NULL specification

Objects with boolean literals

If a key:value pair must include a boolean literal (true/false), true and/or false must be passed as a character constant value. Currently, there is no automatic conversion of 0/1 or ON/OFF into true/false.

If a character constant value is passed as a value in a key:value pair, it is by default converted into a string. That means if we pass true or false, these values are handled as regular string constants and embedded into double quotes.

To prevent a JSON value from being converted into a string, the FORMAT JSON specification can be added to any key:value pair definition in the JSON_OBJECT function.

Syntax:
JSON_OBJECT(KEY KeyExpression1 VALUE ValueExpression1 **FORMAT JSON** ,
            KEY KeyExpression2 VALUE ValueExpression2,
            ...
            KEY KeyExpressionN VALUE ValueExpressionN
            NULL ON NULL or ABSENT ON NULL
            RETURNING DataType(Length) CCSID xxxxx )

JSON_OBJECT(KeyExpression1: ValueExpression1 **FORMAT JSON** ,
            KeyExpression2: ValueExpression2,
            ...
            KeyExpressionN: ValueExpressionN
            NULL ON NULL or ABSENT ON NULL
            RETURNING DataType(Length) CCSID xxxxx )

Note: The FORMAT JSON specification is not only relevant for boolean values but also for objects or arrays generated in Common Table Expressions (CTE) or subselect statements and then included as a value in a key:value pair.

In the following example, objects with boolean values are generated. The values in the first column are generated without specifying the FORMAT JSON clause, and consequently the true/false values are converted into strings. For the second column, FORMAT JSON is added to the key:value pair definition. Due to this specification, true and false are included as JSON literals.

Figure 20: JSON object with boolean literals
JSON object with boolean literals

Normally the boolean values are not hardcoded as in the previous example, instead the information is read from a column in a database table. In most cases, the column values are not true and false, but 0/1 or N/Y. There is no automatic conversion, but you can use a CASE expression for converting the column values into true or false. The FORMAT JSON clause must then be specified after the CASE expression otherwise true and false are handled as string.

In the following example, two independent JSON objects are generated. Both objects consist of the key:value pairs Time, Open/Close, and Open. The Open/Close value is based on the column OPNCLS, which includes the values 0 and 1. These values are converted with the aid of a CASE expression into true or false.

The first object is generated without the FORMAT JSON specification for the Open/Close key:value pair and consequently true and false are converted into strings.

In the second object FORMAT JSON is specified for the Open/Close key:value pair and true and false are returned as literals.

Figure 21: JSON objects with boolean literals based on column values
JSON objects with boolean literals based on column values

Nested JSON objects

If a JSON object must be nested, that is, the value of any key:value pair in the object represents another object, the JSON_OBJECT function can be specified instead of the value.

The following figure shows a nested object with the employee information for a single employee. The first key:value pair in the object represents the Employee, through the employee number. The next value for the key Name is a JSON object that itself consists of the key:value pairs LastName and FirstName. The third key:value pair, includes the address information embedded in another object. The Address object includes the key:value pairs Street, City, and Country.

Figure 22: Nested JSON object containing employee data
Nested JSON object containing employee data

The nested object with the employee data for the employee number 40, can be generated with the following SELECT statement. The JSON_OBJECT function for generating the Name and Address JSON objects is specified instead of a value.

Figure 23: Generate a nested JSON object with employee data
Generate a nested JSON object with employee data

Note: If an expression contains (multiple) nested object definitions, it is not necessary to specify FORMAT JSON for the key:value pair in which the nested object is defined (even though the generated object must not be converted into a string).

However, FORMAT JSON has to be specified if the Object is created in a CTE or a Sub-Select.

Nested objects based on CTEs

Until now we generated very simple JSON objects. As long as the data is read from a single table, and only a few nested objects over a few levels have to be returned, everything can be done in a single step.

… unfortunately real life is not as easy!

The easiest technique to generate complex JSON data with multiple nested objects and arrays is to use CTEs.

CTEs are part of a SELECT statement. The first CTE starts with WITH, and the SELECT statement for the CTE must be embedded in parenthesis. If multiple CTEs are needed, other CTEs can be coded after the first one separated by a comma. All CTEs must be specified before the main or final SELECT. CTEs can be compared with temporary tables or views. CTEs can be used in other CTEs of the current SELECT statement or in the final SELECT statement, but CTEs are only valid within the current SELECT statement.

When generating complex JSON data, the first (few) CTEs should be used for providing the raw data. In the subsequent CTEs JSON objects and arrays, beginning from the lowest level along with some key values for joining the CTEs should be generated. The higher level objects and arrays which include the previously generated objects and/or arrays should be built in the next CTEs and so on. In the last step, in the final SELECT, everything is put together and the JSON data is provided.

Attention: If objects or arrays are generated in CTEs or nested subselects and the generated object or array is included as a value in an object in a subsequent CTE or the final SELECT, FORMAT JSON must be specified. If FORMAT JSON is not specified, the data is handled as string and all leading and trailing double quotes and backslashes included in the JSON object are escaped.

The next SQL statement will generate the following JSON data:

Figure 24: Nested JSON containing more detailed employee data
Nested JSON containing more detailed employee data

The SQL statement for generating the JSON data includes two CTEs. In the first CTE (x) the requested employee data is selected (WHERE EmployeeNo=40). Additionally, the objects on the lowest level, ObjName (consisting of the “LastName” and the “FirstName”) and ObjAddress (consisting of “Town”, “PostalCode”, “Country” and “Street”) are generated.

The next CTE (y) is based on the results of the x CTE. The object ObjEmpl, consisting of the “EmployeeNo” and the 2 in the x CTE generated objects is generated. To prevent the objects from being converted into strings, FORMAT JSON is specified in the key:value definition for the keys, “Name” and “Address”.

In the final SELECT statement, the root object is generated, consisting of the ObjEmpl information generated in the y CTE and the department (“Department”) and the cost center (“CostCenter”). Because ObjEmpl must not be converted into a string, FORMAT JSON is added to the appropriate key:value definition.

Figure 25: Generating nested JSON objects in composition with CTEs
Generating nested JSON objects in composition with CTEs

Including duplicate key:value pairs

Duplicate key:value pairs are commonplace in JSON documents.

Before Db2 PTF Group SF99702 level 23 and Db2 PTF Group SF99703 level 11, defining multiple key:value pairs with the same key name in the JSON_OBJECT function resulted in an error with the SQLSTATE 22030 and SQLCODE -16400.

After those Db2 PTF Groups are applied, the behavior is changed to allow duplicate key names within JSON documents. Db2 for i also added in a new SQL syntax to allow the programmer to control whether duplicate keys should be disallowed. The behavior change was documented in the Memorandum To Users.

The WITHOUT UNIQUE KEYS / WITH UNIQUE KEYS clause is added to the JSON_OBJECT and the JSON_OBJECTAGG functions. By specifying this clause, the programmer can decide whether or not a JSON object can include multiple key:value pairs.

Note: If you want SQL to enforce unique keys, use the new clause. If your data will never generate duplicate keys, omitting the clause will result in better performance.

Syntax:
JSON_OBJECT(Key:value pairs Definition
            NULL ON NULL or ABSENT ON NULL
            WITHOUT UNIQUE KEYS or WITH UNIQUE KEYS
            RETURNING DataType(Length) CCSID xxxxx)

In the following SQL statement, an object with three key:value pairs with the Phone key name will no longer cause any problems. An error is only returned if WITH UNIQUE KEYS is explicitly specified.

Figure 26: Objects with duplicate key names
Objects with duplicate key names

JSON_ARRAY – Generating JSON arrays

JSON arrays are generated with the JSON_ARRAY scalar function. The array elements are enclosed in square brackets [ ] and separated from each other with a comma. The elements themselves can be strings, numbers, boolean literals, null literals, JSON objects, or other JSON arrays. A single array can include elements with different types, that is, the elements in the same array can be numbers, strings, objects, literals, and arrays.

A JSON array can be created in the following two ways:

  • By explicitly specifying a list of values
  • Based on a SELECT statement where a single column is selected

JSON array based on a list of values

The easiest way for building an array is just to list the different values in the JSON_ARRAY function.

Syntax:
JSON_ARRAY(ValueExpression1, ValueExpression2, …)
`

The table in the following figure shows the SALESV05 view, in which all sales are aggregated per customer and sales year. The sales are split into multiple columns depending on the month in which the sales occurred. In this example, the sales for year 2014 are displayed.

Figure 27: SALESV05 – Sales per customer for a given year
SALESV05 – Sales per customer for a given year

For building an array containing the monthly sales per customer (and year), the columns JAN (January) to DEC (December) have to be listed in the JSON_ARRAY function, as in the following example.

Figure 28: JSON array with monthly sales per customer for a given year
JSON array with monthly sales per customer for a given year

Arrays can be embedded in JSON objects, that is, the JSON_ARRAY function can be specified as value in a key:value pair definition in the JSON_OBJECT function.

In the following example, an object containing a JSON array with the monthly sales is generated. In addition to the sales array (SalesMonth), the customer (Customer) and sales year (SalesYear) information is integrated in separate key:value pairs.

Figure 29: JSON object including a JSON array
JSON object including a JSON array

JSON array based on a SELECT statement

If an array must be generated that includes multiple values of the same column in a table or view, the array can be generated based on a SELECT statement that is specified within the JSON_ARRAY function.

Syntax:
JSON_ARRAY((Select Expression From …))

Note: The SELECT statement must be embedded in an extra pair of parenthesis. Otherwise an error is returned.

In the following example, two independent arrays based on two separate SELECT statements are generated. The first array includes all distinct EmployeeNo of the LOBSTAFF table and the second array includes all distinct Cities of the LOBSTAFF table.

Because multiple employees live in the same city, the second array includes only four elements while the first array includes 6 elements.

Figure 30: JSON arrays based on SELECT statements
JSON arrays based on SELECT statements

Note: The sequence of the array elements depends on the ORDER BY clause specified in the SELECT statement. If no ORDER BY is specified, the sequence of the elements depends on the query optimizer’s decision.

In other words, without ORDER BY, the sequence of the returned data may or may not match the desired sequence.

JSON array based on a SELECT statement including JSON objects

Array elements cannot be only strings or numbers, but also JSON objects or literals. If you want to prevent objects or literals to be converted into strings, the FORMAT JSON clause must be specified in the JSON_ARRAY function, independent of whether the objects are directly specified or built by using a SELECT statement.

Syntax:

JSON_ARRAY(ValueExpression1 FORMAT JSON,
ValueExpression2 FORMAT JSON …)
JSON_ARRAY((Sub-Select) FORMAT JSON)

In the following example, an array of objects containing the zip code (PostalCode) and the city (City) is built. The JSON object is defined through the JSON_OBJECT function within the SELECT statement. For preventing the JSON object from being converted into a string, FORMAT JSON must be specified after the closing parenthesis of the SELECT statement. Without the FORMAT JSON clause, the object is handled like a string, that is, double quotes and backslashes within object will be escaped.

Figure 31: JSON array based on a SELECT statement including JSON objects
JSON array based on a SELECT statement including JSON objects

Integrating NULL values in the JSON_ARRAY function

Like the JSON_OBJECT function, the NULL ON NULL/ABSENT ON NULL clause can be specified in the JSON_ARRAY function. In contrast to the JSON_OBJECT function, the default value for the JSON_ARRAY function is ABSENT ON NULL. That means, NULL values are ignored by default. If NULL values have to be returned, NULL ON NULL must be explicitly specified.

Syntax:

JSON_ARRAY(ValueExpressions …
       ABSENT ON NULL or NULL ON NULL)
JSON_ARRAY((Sub-Select)
       ABSENT ON NULL or NULL ON NULL)

The SALESV06 view returns exactly the same values as the SALESV05 view. If there are no monthly sales, in SALESV06 a NULL value is returned while in SALESV05 0 is returned.

The following figure shows the sales for all customers in January and February in the year 2014 from both views.

Figure 32: Compare monthly sales with and without NULL values
Compare monthly sales with and without NULL values

In the following examples, three different arrays containing the sales in January 2014 are generated. The first array is based on a SELECT statement accessing the SALESV05 view. If there are no monthly sales, 0 is returned. The array contains five elements, one for each customer.

The second array is based on a SELECT statement accessing the SALESV06 view. If there are no monthly sales, a NULL value is returned. Because in January 2014 we had only sales with three customers and ABSENT ON NULL is the default, the array contains only three elements.

The third array matches the second array definition, but this time the NULL ON NULL clause is specified. Now the array contains five elements, three of them with values and two of them with null literals.

Figure 33: Array with and without the NULL ON NULL clause
Array with and without the NULL ON NULL clause

JSON_ARRAYAGG – Generating an array over multiple rows

Up to this point, we generated (nested) objects and arrays based on information located in a single row. With the JSON_ARRAY function in composition with a SELECT statement, it is possible to accumulate information from multiple rows.

In our examples we generated independent and even nested objects and arrays in multiple rows. The next step is to build arrays by accumulating the JSON data generated for multiple rows in an array. This requirement can be achieved with the JSON_ARRAYAGG aggregate function.

Syntax: JSON_ARRAYAGG(JSONExpression)

In the following example, three independent arrays are created by accumulating row values. With JSON_ARRAYAGG, it is possible to generate all arrays with a single SELECT statement.

The first array includes all employee numbers (numeric values). The second array contains all names (string values). The third array contains JSON objects. The JSON object is defined directly within the JSON_ARRAYAGG function.

Figure 34: JSON arrays based on the JSON_ARRAYAGG function
JSON arrays based on the JSON_ARRAYAGG function

Aggregating JSON objects generated in CTEs

The aggregated JSON object in the previous example is generated directly within the JSON_ARRAYAGG function. However, if the object is created in a CTE or a (nested) subselect statement, the FORMAT JSON clause must be specified for avoiding the object to be converted into a string.

Syntax: JSON_ARRAYAGG(JSONExpression FORMAT JSON)

In the following example an array containing nested objects in multiple rows is generated. The objects to be aggregated are generated in two CTEs.

In the first CTE, the ObjName object, consisting of the first name (FirstName) and the last name (LastName) is generated.

The ObjName object is embedded in the ObjEmpl object, which is defined in the second CTE, and also includes a key:value pair with the employee number (EmployeeNo).

Finally, an array accumulating all ObjName objects over all rows is built. FORMAT JSON is specified in the JSON_OBJECT and the JSON_ARRAYAGG function, if a previously generated object is included.

Figure 35: Aggregated JSON array with JSON objects built in CTEs
Aggregated JSON array with JSON objects built in CTEs

Sequence of the accumulated elements

By default, the sequence of the array elements depends on the query optimizer’s decision. If the array elements should be ordered, use the ORDER BY clause within the JSON_ARRAYAGG function.

In the ORDER BY clause, multiple columns separated by a comma can be specified and ascending (ASC) or descending (DESC) sequence specifications can be included.

Syntax:

JSON_ARRAYAGG(JSONExpression FORMAT JSON
ORDER BY Clause)

In the following example, an object containing the city and zip code is generated for each row. The resulting rows are accumulated and embedded in an array with the JSON_ARRAYAGG function. The JSON array is in ascending order by city name.

Figure 36: JSON_ARRAYAGG with ORDER BY clause
JSON_ARRAYAGG with ORDER BY clause

Integrating NULL values in the JSON_ARRAYAGG function

If an array element is a NULL value, the element is omitted in the JSON_ARRAYAGG function as in the JSON_ARRAY function.

If NULL values should be returned, the NULL ON NULL clause must be added to the JSON_ARRAYAGG function.

Syntax:

JSON_ARRAYAGG(JSONExpression FORMAT JSON
ORDER BY Clause
NULL ON NULL or ABSENT ON NULL)

In the following example, three arrays are returned. The first array returns January sales and NULL values are ignored. The second array includes NULL values, converted to 0 with the COALESCE scalar function. The third array uses the NULL ON NULL clause to return the JSON null literal instead of zero.

Figure 37: JSON_ARRAYAGG – Integrating NULL values
JSON_ARRAYAGG

JSON_OBJECTAGG – Generating an object over multiple rows

With the JSON_OBJECTAGG function, a JSON object with key:value pairs from multiple rows can be generated. The particular of the generated JSON object is, that the key name is the value of the key name expression in each row.

Syntax: JSON_OBJECTAGG(KEY KeyNameExpression VALUE ValueExpression)

The JSON object generated in the next example includes the customer numbers as key values and the total sales as value for each customer.

Figure 38: Generating a JSON object with the JSON_OBJECTAGG function
Generating a JSON object with the JSON_OBJECTAGG function

The result of the value expression can be a number, string, boolean, null literal, JSON array, or a different JSON object.

In the following example, a JSON object is built over multiple rows with the customer number as the key name and the value as an array with the sales for the months January, February, and March.

Figure 39: Generating an object with the JSON_OBJECTAGG function with a value array
Generating an object with the JSON_OBJECTAGG function with a value array

Omitting key: value pairs with NULL values

If the value of an aggregated key:value pair is a NULL value, the key:value pair is not omitted but included in the object with a null literal value by default.

For omitting key:value pairs with NULL values, an ABSENT ON NULL clause can be added to the JSON_OBJECTAGG function.

Note: The default value in the JSON_OBJECTAGG function is NULL ON NULL, like in the JSON_OBJECT function. The default for the JSON_ARRAY and JSON_ARRAYAGG functions is ABSENT ON NULL.

Syntax:

JSON_OBJECTAGG(KEY KeyNameExpression VALUE ValueExpression
ABSENT ON NULL or NULL ON NULL)

In the following example, an object with the sales of each customer in January 2014 is generated. In January 2014, there has been no sales for two of the customers. The customers are included in the JSON object, but the value is returned as a null literal.

In the second example, the ABSENT ON NULL clause is added to the JSON_OBJECTAGG function and the key:value pairs for the customers without sales are omitted.

Figure 40: JSON_ObjectAgg with the ABSENT ON NULL clause
JSON_ObjectAgg with the ABSENT ON NULL clause

Generating (more) complex JSON data – Examples

Until now we generated JSON data based on a single SQL table. But normally, the JSON data requested is more complex. Data is spread over multiple database tables, and the relationship between the data is quite often not 1:1. The JSON data must include multiple nested objects, arrays, and so on.

The hardest part when generating JSON data is not the use of the JSON publishing functions but the translation from the relational data of the database into the hierarchical form of the JSON data.

In either way, it is possible to generate complex JSON data with a single SELECT statement, with several CTEs. Here are a few more complex examples:

Determining employees per town

The LOBSTAFF table includes information about all employees. Among this information are the first and last name and the address. Multiple employees can live in the same city. We need to generate a JSON document with an array of objects, one object for each distinct city in the LOBSTAFF table.

The object per city first includes the Address key:values pair with an object containing the zip code (ZipCode) and the City (City). After the address information, the object per city also includes the key:values pair Employee with an array of objects. In the array, there is one object for each employee living in the appropriate city. The employee object includes the first (FirstName) and last (LastName) names of the employee.

The following figure shows the structure of the JSON data. Two employees (Herrmann Bauer and Birgitta Hauser) are living in Dietzenbach.

Figure 41: JSON data – Employees per town
JSON data – Employees per town

The next figure shows the SELECT statement for generating this data.

In the x CTE, the data from the LOBSTAFF table is condensed on the zip code and city (GROUP BY ZipCode, City). For each zip code and city, an object is built. Because zip code and city are listed in the GROUP BY cause, an object based on the zip code and city columns can be generated.

With the employee information, it is a little more complex. As the employee is not specified in the GROUP BY clause, an aggregate function is needed for accumulating the information. JSON data is accumulated over multiple rows with the JSON_ARRAYAGG function. Using the JSON_ARRAYAGG function will link all of the generated objects containing the first and last names for the city.

In the final SELECT statement, the objects per zip code and city generated in the x CTE are aggregated in the EmployeePerTown object by running the JSON_ARRAYAGG function.

Figure 42: SELECT statement – Employees per town
SELECT statement – Employees per town

Delivered orders in December 2015

In the next example we have to generate a complex JSON document for all orders (including order header and order detail information) delivered in December 2015.

The following figure shows the structure of the JSON data. The DeliveredOrders object includes an array with all orders delivered in December 2015. The order object includes first the order header and address information (Company, OrderNo, CustomerInfo, DeliveryDate, OrderType, and IncoTerms).

The customer information is included in a nested object, containing the customer number (CustomerNo), the customer name (CustomerName), and an object with the address information, that is, Street, PostalCode, and City.

Additionally, the order object includes an array of objects with information about all order positions. The array contains an object for each order position. The Positions object includes the position number (PositionNo), the delivery quantity (DeliveryQuantity), the delivery value (DeliveryValue), and an object with the item information, such as ItemNo, ItemDescription, PricePerUnit and Currency.

Figure 43: JSON data – Delivered orders in December 2015
JSON data – Delivered orders in December 2015

To achieve this request the order header table (ORDERHDRX) has to be joined with the address table (ADDRESSX), where the address information is located. Because we also need the order position information, the order header table has to be joined with the order positions table (ORDERDETX), too. The item description and the price per unit are located in the item master table (ITEMMASTX). To get this information, the item master table has to be joined with the order position table.

To generate JSON data for the delivered orders in a specific date range, six CTEs are used. In the OrderAddr CTE (first CTE) the order header table and the address table are joined together. Orders with a delivery date in December 2015 are selected (WHERE DELDATE BETWEEN '2015-01-01' and '2015-12-31'). Only the columns containing the relevant information are listed.

Additionally, several column values are converted with CASE expressions into the data that are expected to be returned (OrderTypeText and DelTermsText columns). The customer name (CustName) consist of the concatenated values in the CustName1 and CustName2 columns. The delivery date is converted into a character representation of the date in the ISO format (DelDateChar column). In all character columns with a fixed length, the leading and trailing blank spaces are trimmed off.

Figure 44: CTE – Order header and address information
CTE – Order header and address information

The order position and item raw data for the orders delivered in December 2015 are prepared in the OrderAddr CTE. In this CTE, the OrderAddr CTE is joined with the order position table (ORDERDETX) for determining the delivered positions. The order positions table is joined with the item master table (ITEMMASTX).

Only positions with a delivery quantity are selected (WHERE DELQTY > 0).

A new column is generated containing the delivery value (DELVALUE) which must be returned in the JSON document. The delivery value is calculated by multiplying the delivery quantity (DELQTY column) with the price per unit (PRICE column). The result is formatted as DECIMAL(11, 2).

In all character columns with a fixed length, the leading and trailing blank spaces are trimmed off.

Figure 45: CTE – Order detail and item master information
CTE – Order detail and item master information

Now, after the raw data has been determined, we can start building the JSON data.

Next, the CTEAddress CTE (a nested object containing the address information from the OrderAddr CTE) is generated. The JSON object includes the customer number (CustomerNo), the concatenated customer names 1 and 2 (Name) and an object with the address information. The address object includes the street (Street), the zip code (PostalCode) and the city (City).

The customer number column is needed for joining the JSON data in a subsequent CTE with the OrderAddr CTE.

Figure 46: CTE – Generating address information JSON data
CTE – Generating address information JSON data

In the CTEObjPos CTE, for each order position, a nested object containing the order position and the item information is generated. The position object includes the order position (PositionNo), an object with the item information, the delivery quantity (DeliveryQty), and the delivery value (DeliveryValue) which is prepared in the CTEOrdAddr CTE.

Company and OrderNo are needed for joining the CTEObjPos CTE with the OrderAddr CTE in a subsequent CTE.

Figure 47: CTE – Generating order detail information JSON data
CTE – Generating order detail information JSON data

In the CTEArrPos CTE, the previously generated order detail objects are aggregated for each company and order no using the JSON_ARRAYAGG function.

Company and order number are not only necessary for condensing the data but also for joining the result in a subsequent CTE with the CTEOrdAddr CTE.

Figure 48: CTE – Accumulating order position JSON data
CTE – Accumulating order position JSON data

In the CTEOrdHdr CTE, the JSON data for each order delivered in December 2015 is generated.

The OrderAddr CTE is joined with the CTEAddress CTE (containing the customer JSON data) and the CTEArrPos CTE (containing array with all order positions for each order).

Based on the joined information for each order, an object is generated containing the company (Company), the order number (OrderNo), the nested object with the customer data (CustomerInfo), the delivery date (DeliveryDate), the order type (OrderType), the delivery terms (IncoTerms), and the order positions array (Positions).

Figure 49: CTE – Generating the JSON data per order
CTE – Accumulating order position JSON data

In the final SELECT statement, the JSON data for the different orders is accumulated by running the JSON_ARRAYAGG function. The result is embedded in the DeliveredOrders object.

Figure 50: Final SELECT – Accumulating the order data
Final SELECT – Accumulating the order data

Conclusion

SQL provides many kinds of JSON publishing functions which allow even complex JSON data to be generated.

The hardest part when creating JSON data is not the use of the JSON publishing function, but the conversion of the relational data in the database into the hierarchical form of the JSON data. In combination with CTEs, even complex requests can be organized in a readable and maintainable way.