JSON is used for data exchange. Before generating JSON data, it is important to know what JSON is, how it looks, what components it comprises, and on which syntax and grammar rules it is based.
JSON basics
JavaScript Object Notation (JSON) is an open-standard file format that uses human-readable text to transmit data. JSON data consists of a sequence of Unicode characters that are arranged in a hierarchical form based on the JSON value syntax and grammar. The JSON syntax is derived from the JavaScript programming language syntax. JavaScript itself is one of the three core technologies of the World Wide Web Consortium (W3C). JSON is a completely independent text format and many programming languages include functions or methods for generating and consuming JSON data.
JSON data can include the following structural tokens:
- Curly brackets { }
Curly brackets are used for defining JSON objects. - Square brackets [ ]
Square brackets are used for defining JSON arrays. - Colon
Key:value pairs within JSON objects are separated by a colon. - Comma
Multiple key:value pairs in objects or multiple elements in an array are separated by a comma.
JSON values represent one of the following data types:
String
A string value includes a sequence of zero or more Unicode characters. The string values are embedded in double quotes (“). Double quotes and backslashes () within JSON strings must be escaped with a backslash. Instead of Unicode characters it is also possible to pass Unicode hex values which start with \u followed by the four-digit hexadecimal Unicode number.Several characters, escaped with a backslash, have a specific meaning:
- \b = backspace
- \f = form feed
- \n = line feed
- \r = carriage return
- \t = character tabulation
Figure 1: JSON string
The following example shows an array with a sequence of objects. All values within the objects are strings. The special characters, backslash () and double quotes (“), are escaped with an additional backslash. There is also one value including a Unicode Hex value (\u00c5), representing the Swedish A character.
Figure 2: JSON string values
Numbers
A JSON number consists of a sequence of digits between 0 and 9. If a decimal separator is needed, the period must be used (even in environments or countries where the comma is used as decimal separator). For numeric values between -0.9 and 0.9, a leading zero in front of the decimal point is required. Negative values must be specified with a leading minus sign. For decimal or float values, the exponential E-notation with mantissa and exponent may be used.Figure 3: JSON number
The next figure shows an array with several objects. All values within the objects are numeric, with positive or negative values, with or without decimal positions, and with or without mantissa and exponent.
Figure 4: Numeric values
Boolean literals true and false
The boolean values true and false without leading and trailing double quotes can be used as values in JSON objects or arrays.The following figure shows an object where all values are boolean values.
Figure 5: Boolean literals
NULL values
The literal null without leading and trailing double quotes represents a NULL value within a JSON object or array.The following figure shows an object containing NULL values.
Figure 6: Null literals
Objects
A JSON object starts and ends with a curly bracket and consists of zero, one or multiple key:value pairs. The key or name that describes the value must be a string. The value can be a string or a number, one of the literals(true, false, null)
, another (nested) object, or a (nested) array. Key and value are separated by a colon. If an object consists of multiple key:value pairs, the key:value pairs are separated from each other with a comma.Figure 7: JSON object syntax
The following example shows an object with four different key:value pairs. Three of the values are string and the fourth value is a number.
Figure 8: JSON object
Arrays
A JSON array starts and ends with a square bracket. A JSON array represents a list of zero or more values. The values can be the literals (true, false, null), strings or numbers, (nested) objects, or (nested) arrays. The array elements are separated by a comma.Figure 9: JSON array syntax
The following figure shows an array with six elements. The second array consists of three objects. Each object includes the type and number key:value pair.
Figure 10: JSON arraysg
JSON support in IBM Db2 for i
IBM® Db2® for i has been adding SQL support for JSON since October 2015. The support is largely completed for those using IBM i 7.2 or later versions, if the latest Db2 PTF Group is installed. To use JSON with SQL, clients need to confirm that they are at IBM i 7.2 or later and have installed the latest Db2 PTF Group.
The following figure gives an overview of JSON support in Db2 for i.
Figure 11: JSON support in Db2 for i – JSON functions and predicates
Sample data
To keep it simple, we will generate JSON data based on a few very basic tables and views. The first one is a table named LOBSTAFF
which includes the employee number and address data for six employees.
The following figure shows the content of the LOBSTAFF
table.
Figure 12: LOBSTAFF table – sample data
The second one, SALESV05
is a view based on a sales table, but accumulates the sales per customer and sales year. The sales per customer and year are split over 12 columns representing the month in which the sales occurred.
Figure 13: SALESV05 – Yearly sales per customer
The SALESV06
view matches the SALESV05
view, except in months with no sales, a NULL value is returned.
Figure 14: SALESV06 – Yearly sales per customer with NULL values
JSON_OBJECT – Generating objects
With the JSON_OBJECT
function (nested) JSON objects can be generated. The key:value pairs are automatically embedded in curly brackets ( {} ) and key and value are separated by a colon ( : ).
The key value itself is directly converted into a string, that is, surrounded with double quotes and the special characters (double quotes and backslashes) if included are escaped.
The value is converted depending on the Db2 data type. The value itself can represent a literal (true, false or null
), a string, a number, a (nested) JSON object or a (nested) JSON array. Values in Db2 data types that are not supported by JSON, for examples the DATE or TIME data types are automatically converted into strings. If a date or time value is converted into a character string, the International Standards Organization (ISO) format is used.
There are two syntax alternatives for coding JSON objects with the JSON_OBJECT
function:
JSON_OBJECT(KEY KeyExpression1 VALUE ValueExpression1,
KEY KeyEspression2 VALUE ValueExpression2, …);
JSON_OBJECT(KeyExpression1: ValueExpression1,
KeyExpression2: ValueExpression2, …);
In the first syntax, alternatives are KEY
and VALUE
part of the syntax. KEY
can optionally be specified, while VALUE
is mandatory.
The second syntax alternative requires no key words, but a key expression and a value, separated with a colon.
If multiple key:value pairs have to be included in the object, they have to be separated by commas, independent of whichever syntax alternative is used.
The key expression can be any Unicode string, including double quotes (“) and backslashes ( \ ). Both special characters are automatically escaped in the string.
Note: Key names are case sensitive.
The value expression can be any literal value, column value, the result of concatenating multiple columns, the result of a scalar user-defined function (UDF), or a full select that returns a single value.
In the following example four different independent JSON objects are generated. The first object with the EmployeeNo
key includes the integer value of the EMPLOYEENO
column. The next two objects, LastName
and FirstName
are based on the character columns NAME
and FIRSTNAME
. The last object with the City
key name is composed of the column values of the ZIPCODE
and CITY
columns.
The example shows two queries which produce the same result.
Figure 15: Basic JSON objects
To generate JSON objects with multiple key:value pairs, the key:value pairs must be defined as in the previous example. But all key:value pairs must be defined in the same JSON_OBJECT
function, separated by a comma, independent of the syntax alternative that is used.
In the following example, a single object containing a key:value pair for the employee number, the last and first names, and the city is generated.
Figure 16: JSON object with multiple key:value pairs
Returning data type
The generated JSON data is by default returned as a 2 gigabyte (GB) character large object (CLOB) in coded character set identifier (CCSID) 1208 (UTF8). However, it is possible to convert the JSON result into any character data type with any CCSID (except 65535) and in any valid length.
Note: Double-byte character set (DBCS) data must use Unicode encoding.
To modify the output format, use the RETURNING
clause.
Syntax:
JSON_OBJECT(Key:value pairs Definition
RETURNING data-type(Length) CCSID xxxxx)
Note: Returning the JSON data with a shorter length than 2 GB, may speed up the data exchange.
The JSON object data created in the following example is returned as UTF-8 varying character field with a maximum length of 128 characters (VARCHAR(128) CCSID 1208)
.
Figure 17: JSON object returned in UTF-8 as VARCHAR(128)
Objects containing null literal values
If a value expression returns a NULL value, it is automatically converted into the JSON null
literal, that is, null
without leading and trailing double quotes is returned.
In the following example, an object with the key:value pairs LastName
and FirstName
is built.
The LOBSTAFF
table does not contain any NULL values, but there is an employee (Meier und Sohn) without first name. With the aid of the NULLIF
scalar function, the FIRSTNAME column value is converted into a NULL
value, if it does not include anything else than blanks (NULLIF(FirstName, '') )
.
When running the query, the first name value for the employee Meier und Sohn is returned as a null
literal.
Figure 18: JSON object including a null literal
Omitting key:value pairs with NULL values
Sometimes, you may not want to return a key:value pair containing a NULL value, but ignore the key:value pair instead.
Whether a key:value pair with a NULL value is returned or not depends on whether NULL ON NULL
or ABSENT ON NULL
has been specified on the JSON_OBJECT
function:
Syntax:
JSON_OBJECT(Key:value pairs Definition
***NULL ON NULL or ABSENT ON NULL***
RETURNING DataType(Length) CCSID xxxxx)
NULL ON NULL
is the default value. So, key:value pairs containing a NULL value are always included in the object. The NULL
value is converted into a null
literal.
If ABSENT ON NULL
is specified, key:value pairs containing a NULL value are omitted.
Note: ABSENT ON NULL
(or NULL ON NULL
) can only be specified once for the JSON_OBJECT
function. It is not possible to decide for specific key:value pairs whether they have to be omitted or not. In this way, within the same object either all key:value pairs with NULL values are returned or none of the key:value pairs with NULL values are returned.
In the following example an object with the LastName
and FirstName
key:value pairs is generated. As in the previous example, a missing first name is converted into a NULL value. But this time the ABSENT ON NULL
clause is added to the JSON_OBJECT
function. Because of this specification, the first name key:value pair for the employee Meier und Sohn is dropped.
Figure 19: JSON object with the ABSENT ON NULL specification
Objects with boolean literals
If a key:value pair must include a boolean literal (true/false), true
and/or false
must be passed as a character constant value. Currently, there is no automatic conversion of 0/1 or ON/OFF into true/false.
If a character constant value is passed as a value in a key:value pair, it is by default converted into a string. That means if we pass true
or false
, these values are handled as regular string constants and embedded into double quotes.
To prevent a JSON value from being converted into a string, the FORMAT JSON
specification can be added to any key:value pair definition in the JSON_OBJECT
function.
Syntax:
JSON_OBJECT(KEY KeyExpression1 VALUE ValueExpression1 **FORMAT JSON** ,
KEY KeyExpression2 VALUE ValueExpression2,
...
KEY KeyExpressionN VALUE ValueExpressionN
NULL ON NULL or ABSENT ON NULL
RETURNING DataType(Length) CCSID xxxxx )
JSON_OBJECT(KeyExpression1: ValueExpression1 **FORMAT JSON** ,
KeyExpression2: ValueExpression2,
...
KeyExpressionN: ValueExpressionN
NULL ON NULL or ABSENT ON NULL
RETURNING DataType(Length) CCSID xxxxx )
Note: The FORMAT JSON
specification is not only relevant for boolean values but also for objects or arrays generated in Common Table Expressions (CTE) or subselect statements and then included as a value in a key:value pair.
In the following example, objects with boolean values are generated. The values in the first column are generated without specifying the FORMAT JSON
clause, and consequently the true/false values are converted into strings. For the second column, FORMAT JSON
is added to the key:value pair definition. Due to this specification, true
and false
are included as JSON literals.
Figure 20: JSON object with boolean literals
Normally the boolean values are not hardcoded as in the previous example, instead the information is read from a column in a database table. In most cases, the column values are not true
and false
, but 0/1 or N/Y. There is no automatic conversion, but you can use a CASE
expression for converting the column values into true or false. The FORMAT JSON
clause must then be specified after the CASE
expression otherwise true
and false
are handled as string.
In the following example, two independent JSON objects are generated. Both objects consist of the key:value pairs Time, Open/Close
, and Open
. The Open/Close
value is based on the column OPNCLS
, which includes the values 0 and 1. These values are converted with the aid of a CASE
expression into true
or false
.
The first object is generated without the FORMAT JSON
specification for the Open/Close
key:value pair and consequently true
and false
are converted into strings.
In the second object FORMAT JSON
is specified for the Open/Close
key:value pair and true
and false
are returned as literals.
Figure 21: JSON objects with boolean literals based on column values
Nested JSON objects
If a JSON object must be nested, that is, the value of any key:value pair in the object represents another object, the JSON_OBJECT
function can be specified instead of the value.
The following figure shows a nested object with the employee information for a single employee. The first key:value pair in the object represents the Employee
, through the employee number. The next value for the key Name
is a JSON object that itself consists of the key:value pairs LastName
and FirstName
. The third key:value pair, includes the address information embedded in another object. The Address
object includes the key:value pairs Street
, City
, and Country
.
Figure 22: Nested JSON object containing employee data
The nested object with the employee data for the employee number 40, can be generated with the following SELECT
statement. The JSON_OBJECT
function for generating the Name
and Address
JSON objects is specified instead of a value.
Figure 23: Generate a nested JSON object with employee data
Note: If an expression contains (multiple) nested object definitions, it is not necessary to specify FORMAT JSON
for the key:value pair in which the nested object is defined (even though the generated object must not be converted into a string).
However, FORMAT JSON
has to be specified if the Object is created in a CTE or a Sub-Select.
Nested objects based on CTEs
Until now we generated very simple JSON objects. As long as the data is read from a single table, and only a few nested objects over a few levels have to be returned, everything can be done in a single step.
… unfortunately real life is not as easy!
The easiest technique to generate complex JSON data with multiple nested objects and arrays is to use CTEs.
CTEs are part of a SELECT
statement. The first CTE starts with WITH
, and the SELECT
statement for the CTE must be embedded in parenthesis. If multiple CTEs are needed, other CTEs can be coded after the first one separated by a comma. All CTEs must be specified before the main or final SELECT
. CTEs can be compared with temporary tables or views. CTEs can be used in other CTEs of the current SELECT
statement or in the final SELECT
statement, but CTEs are only valid within the current SELECT
statement.
When generating complex JSON data, the first (few) CTEs should be used for providing the raw data. In the subsequent CTEs JSON objects and arrays, beginning from the lowest level along with some key values for joining the CTEs should be generated. The higher level objects and arrays which include the previously generated objects and/or arrays should be built in the next CTEs and so on. In the last step, in the final SELECT
, everything is put together and the JSON data is provided.
Attention: If objects or arrays are generated in CTEs or nested subselects and the generated object or array is included as a value in an object in a subsequent CTE or the final SELECT, FORMAT JSON must be specified. If FORMAT JSON is not specified, the data is handled as string and all leading and trailing double quotes and backslashes included in the JSON object are escaped.
The next SQL statement will generate the following JSON data:
Figure 24: Nested JSON containing more detailed employee data
The SQL statement for generating the JSON data includes two CTEs. In the first CTE (x) the requested employee data is selected (WHERE EmployeeNo=40
). Additionally, the objects on the lowest level, ObjName (consisting of the “LastName” and the “FirstName”) and ObjAddress (consisting of “Town”, “PostalCode”, “Country” and “Street”) are generated.
The next CTE (y) is based on the results of the x CTE. The object ObjEmpl, consisting of the “EmployeeNo” and the 2 in the x CTE generated objects is generated. To prevent the objects from being converted into strings, FORMAT JSON is specified in the key:value definition for the keys, “Name” and “Address”.
In the final SELECT
statement, the root object is generated, consisting of the ObjEmpl information generated in the y CTE and the department (“Department”) and the cost center (“CostCenter”). Because ObjEmpl must not be converted into a string, FORMAT JSON
is added to the appropriate key:value definition.
Figure 25: Generating nested JSON objects in composition with CTEs
Including duplicate key:value pairs
Duplicate key:value pairs are commonplace in JSON documents.
Before Db2 PTF Group SF99702 level 23 and Db2 PTF Group SF99703 level 11, defining multiple key:value pairs with the same key name in the JSON_OBJECT
function resulted in an error with the SQLSTATE 22030 and SQLCODE -16400.
After those Db2 PTF Groups are applied, the behavior is changed to allow duplicate key names within JSON documents. Db2 for i also added in a new SQL syntax to allow the programmer to control whether duplicate keys should be disallowed. The behavior change was documented in the Memorandum To Users.
The WITHOUT UNIQUE KEYS / WITH UNIQUE KEYS
clause is added to the JSON_OBJECT
and the JSON_OBJECTAGG
functions. By specifying this clause, the programmer can decide whether or not a JSON object can include multiple key:value pairs.
Note: If you want SQL to enforce unique keys, use the new clause. If your data will never generate duplicate keys, omitting the clause will result in better performance.
Syntax:
JSON_OBJECT(Key:value pairs Definition
NULL ON NULL or ABSENT ON NULL
WITHOUT UNIQUE KEYS or WITH UNIQUE KEYS
RETURNING DataType(Length) CCSID xxxxx)
In the following SQL statement, an object with three key:value pairs with the Phone
key name will no longer cause any problems. An error is only returned if WITH UNIQUE KEYS
is explicitly specified.
Figure 26: Objects with duplicate key names
JSON_ARRAY – Generating JSON arrays
JSON arrays are generated with the JSON_ARRAY
scalar function. The array elements are enclosed in square brackets [ ] and separated from each other with a comma. The elements themselves can be strings, numbers, boolean literals, null literals, JSON objects, or other JSON arrays. A single array can include elements with different types, that is, the elements in the same array can be numbers, strings, objects, literals, and arrays.
A JSON array can be created in the following two ways:
- By explicitly specifying a list of values
- Based on a SELECT statement where a single column is selected
JSON array based on a list of values
The easiest way for building an array is just to list the different values in the JSON_ARRAY
function.
Syntax:
JSON_ARRAY(ValueExpression1, ValueExpression2, …)
`
The table in the following figure shows the SALESV05
view, in which all sales are aggregated per customer and sales year. The sales are split into multiple columns depending on the month in which the sales occurred. In this example, the sales for year 2014 are displayed.
Figure 27: SALESV05 – Sales per customer for a given year
For building an array containing the monthly sales per customer (and year), the columns JAN
(January) to DEC
(December) have to be listed in the JSON_ARRAY
function, as in the following example.
Figure 28: JSON array with monthly sales per customer for a given year
Arrays can be embedded in JSON objects, that is, the JSON_ARRAY
function can be specified as value in a key:value pair definition in the JSON_OBJECT
function.
In the following example, an object containing a JSON array with the monthly sales is generated. In addition to the sales array (SalesMonth
), the customer (Customer
) and sales year (SalesYear
) information is integrated in separate key:value pairs.
Figure 29: JSON object including a JSON array
JSON array based on a SELECT statement
If an array must be generated that includes multiple values of the same column in a table or view, the array can be generated based on a SELECT
statement that is specified within the JSON_ARRAY
function.
Syntax:
JSON_ARRAY((Select Expression From …))
Note: The SELECT
statement must be embedded in an extra pair of parenthesis. Otherwise an error is returned.
In the following example, two independent arrays based on two separate SELECT
statements are generated. The first array includes all distinct EmployeeNo
of the LOBSTAFF
table and the second array includes all distinct Cities
of the LOBSTAFF
table.
Because multiple employees live in the same city, the second array includes only four elements while the first array includes 6 elements.
Figure 30: JSON arrays based on SELECT statements
Note: The sequence of the array elements depends on the ORDER BY
clause specified in the SELECT
statement. If no ORDER BY
is specified, the sequence of the elements depends on the query optimizer’s decision.
In other words, without ORDER BY
, the sequence of the returned data may or may not match the desired sequence.
JSON array based on a SELECT statement including JSON objects
Array elements cannot be only strings or numbers, but also JSON objects or literals. If you want to prevent objects or literals to be converted into strings, the FORMAT JSON
clause must be specified in the JSON_ARRAY
function, independent of whether the objects are directly specified or built by using a SELECT
statement.
Syntax:
JSON_ARRAY(ValueExpression1 FORMAT JSON,
ValueExpression2 FORMAT JSON …)
JSON_ARRAY((Sub-Select) FORMAT JSON)
In the following example, an array of objects containing the zip code (PostalCode
) and the city (City
) is built. The JSON object is defined through the JSON_OBJECT
function within the SELECT
statement. For preventing the JSON object from being converted into a string, FORMAT JSON
must be specified after the closing parenthesis of the SELECT
statement. Without the FORMAT JSON
clause, the object is handled like a string, that is, double quotes and backslashes within object will be escaped.
Figure 31: JSON array based on a SELECT statement including JSON objects
Integrating NULL values in the JSON_ARRAY function
Like the JSON_OBJECT
function, the NULL ON NULL/ABSENT ON NULL
clause can be specified in the JSON_ARRAY
function. In contrast to the JSON_OBJECT
function, the default value for the JSON_ARRAY
function is ABSENT ON NULL
. That means, NULL
values are ignored by default. If NULL
values have to be returned, NULL ON NULL
must be explicitly specified.
Syntax:
JSON_ARRAY(ValueExpressions …
ABSENT ON NULL or NULL ON NULL)
JSON_ARRAY((Sub-Select)
ABSENT ON NULL or NULL ON NULL)
The SALESV06
view returns exactly the same values as the SALESV05
view. If there are no monthly sales, in SALESV06
a NULL value is returned while in SALESV05
0 is returned.
The following figure shows the sales for all customers in January and February in the year 2014 from both views.
Figure 32: Compare monthly sales with and without NULL values
In the following examples, three different arrays containing the sales in January 2014 are generated. The first array is based on a SELECT
statement accessing the SALESV05
view. If there are no monthly sales, 0 is returned. The array contains five elements, one for each customer.
The second array is based on a SELECT
statement accessing the SALESV06
view. If there are no monthly sales, a NULL
value is returned. Because in January 2014 we had only sales with three customers and ABSENT ON NULL
is the default, the array contains only three elements.
The third array matches the second array definition, but this time the NULL ON NULL
clause is specified. Now the array contains five elements, three of them with values and two of them with null
literals.
Figure 33: Array with and without the NULL ON NULL clause
JSON_ARRAYAGG – Generating an array over multiple rows
Up to this point, we generated (nested) objects and arrays based on information located in a single row. With the JSON_ARRAY
function in composition with a SELECT
statement, it is possible to accumulate information from multiple rows.
In our examples we generated independent and even nested objects and arrays in multiple rows. The next step is to build arrays by accumulating the JSON data generated for multiple rows in an array. This requirement can be achieved with the JSON_ARRAYAGG
aggregate function.
Syntax:
JSON_ARRAYAGG(JSONExpression)
In the following example, three independent arrays are created by accumulating row values. With JSON_ARRAYAGG
, it is possible to generate all arrays with a single SELECT
statement.
The first array includes all employee numbers (numeric values). The second array contains all names (string values). The third array contains JSON objects. The JSON object is defined directly within the JSON_ARRAYAGG
function.
Figure 34: JSON arrays based on the JSON_ARRAYAGG function
Aggregating JSON objects generated in CTEs
The aggregated JSON object in the previous example is generated directly within the JSON_ARRAYAGG
function. However, if the object is created in a CTE or a (nested) subselect statement, the FORMAT JSON
clause must be specified for avoiding the object
to be converted into a string.
Syntax:
JSON_ARRAYAGG(JSONExpression FORMAT JSON)
In the following example an array containing nested objects in multiple rows is generated. The objects to be aggregated are generated in two CTEs.
In the first CTE, the ObjName object, consisting of the first name (FirstName
) and the last name (LastName
) is generated.
The ObjName
object is embedded in the ObjEmpl
object, which is defined in the second CTE, and also includes a key:value pair with the employee number (EmployeeNo
).
Finally, an array accumulating all ObjName objects over all rows is built. FORMAT JSON
is specified in the JSON_OBJECT
and the JSON_ARRAYAGG
function, if a previously generated object is included.
Figure 35: Aggregated JSON array with JSON objects built in CTEs
Sequence of the accumulated elements
By default, the sequence of the array elements depends on the query optimizer’s decision. If the array elements should be ordered, use the ORDER BY
clause within the JSON_ARRAYAGG
function.
In the ORDER BY clause, multiple columns separated by a comma can be specified and ascending (ASC
) or descending (DESC
) sequence specifications can be included.
Syntax:
JSON_ARRAYAGG(JSONExpression FORMAT JSON
ORDER BY Clause)
In the following example, an object containing the city and zip code is generated for each row. The resulting rows are accumulated and embedded in an array with the JSON_ARRAYAGG
function. The JSON array is in ascending order by city name.
Figure 36: JSON_ARRAYAGG with ORDER BY clause
Integrating NULL values in the JSON_ARRAYAGG function
If an array element is a NULL value, the element is omitted in the JSON_ARRAYAGG
function as in the JSON_ARRAY
function.
If NULL values should be returned, the NULL ON NULL
clause must be added to the JSON_ARRAYAGG
function.
Syntax:
JSON_ARRAYAGG(JSONExpression FORMAT JSON
ORDER BY Clause
NULL ON NULL or ABSENT ON NULL)
In the following example, three arrays are returned. The first array returns January sales and NULL values are ignored. The second array includes NULL values, converted to 0 with the COALESCE
scalar function. The third array uses the NULL ON NULL
clause to return the JSON null
literal instead of zero.
Figure 37: JSON_ARRAYAGG – Integrating NULL values
JSON_OBJECTAGG – Generating an object over multiple rows
With the JSON_OBJECTAGG
function, a JSON object with key:value pairs from multiple rows can be generated. The particular of the generated JSON object is, that the key name is the value of the key name expression in each row.
Syntax:
JSON_OBJECTAGG(KEY KeyNameExpression VALUE ValueExpression)
The JSON object generated in the next example includes the customer numbers as key values and the total sales as value for each customer.
Figure 38: Generating a JSON object with the JSON_OBJECTAGG function
The result of the value expression can be a number, string, boolean, null
literal, JSON array, or a different JSON object.
In the following example, a JSON object is built over multiple rows with the customer number as the key name and the value as an array with the sales for the months January, February, and March.
Figure 39: Generating an object with the JSON_OBJECTAGG function with a value array
Omitting key: value pairs with NULL values
If the value of an aggregated key:value pair is a NULL value, the key:value pair is not omitted but included in the object with a null literal value by default.
For omitting key:value pairs with NULL values, an ABSENT ON NULL
clause can be added to the JSON_OBJECTAGG
function.
Note: The default value in the JSON_OBJECTAGG
function is NULL ON NULL
, like in the JSON_OBJECT
function. The default for the JSON_ARRAY
and JSON_ARRAYAGG
functions is ABSENT ON NULL
.
Syntax:
JSON_OBJECTAGG(KEY KeyNameExpression VALUE ValueExpression
ABSENT ON NULL or NULL ON NULL)
In the following example, an object with the sales of each customer in January 2014 is generated. In January 2014, there has been no sales for two of the customers. The customers are included in the JSON object, but the value is returned as a null
literal.
In the second example, the ABSENT ON NULL
clause is added to the JSON_OBJECTAGG
function and the key:value pairs for the customers without sales are omitted.
Figure 40: JSON_ObjectAgg with the ABSENT ON NULL clause
Generating (more) complex JSON data – Examples
Until now we generated JSON data based on a single SQL table. But normally, the JSON data requested is more complex. Data is spread over multiple database tables, and the relationship between the data is quite often not 1:1. The JSON data must include multiple nested objects, arrays, and so on.
The hardest part when generating JSON data is not the use of the JSON publishing functions but the translation from the relational data of the database into the hierarchical form of the JSON data.
In either way, it is possible to generate complex JSON data with a single SELECT
statement, with several CTEs. Here are a few more complex examples:
Determining employees per town
The LOBSTAFF table includes information about all employees. Among this information are the first and last name and the address. Multiple employees can live in the same city. We need to generate a JSON document with an array of objects, one object for each distinct city in the LOBSTAFF table.
The object per city first includes the Address
key:values pair with an object containing the zip code (ZipCode
) and the City (City
). After the address information, the object per city also includes the key:values pair Employee
with an array of objects. In the array, there is one object for each employee living in the appropriate city. The employee object includes the first (FirstName
) and last (LastName
) names of the employee.
The following figure shows the structure of the JSON data. Two employees (Herrmann Bauer and Birgitta Hauser) are living in Dietzenbach.
Figure 41: JSON data – Employees per town
The next figure shows the SELECT
statement for generating this data.
In the x CTE, the data from the LOBSTAFF table is condensed on the zip code and city (GROUP BY
ZipCode, City). For each zip code and city, an object is built. Because zip code and city are listed in the GROUP BY
cause, an object based on the zip code and city columns can be generated.
With the employee information, it is a little more complex. As the employee is not specified in the GROUP BY
clause, an aggregate function is needed for accumulating the information. JSON data is accumulated over multiple rows with the JSON_ARRAYAGG
function. Using the JSON_ARRAYAGG
function will link all of the generated objects containing the first and last names for the city.
In the final SELECT
statement, the objects per zip code and city generated in the x CTE are aggregated in the EmployeePerTown
object by running the JSON_ARRAYAGG
function.
Figure 42: SELECT statement – Employees per town
Delivered orders in December 2015
In the next example we have to generate a complex JSON document for all orders (including order header and order detail information) delivered in December 2015.
The following figure shows the structure of the JSON data. The DeliveredOrders
object includes an array with all orders delivered in December 2015. The order object includes first the order header and address information (Company
, OrderNo
, CustomerInfo
, DeliveryDate
, OrderType
, and IncoTerms
).
The customer information is included in a nested object, containing the customer number (CustomerNo
), the customer name (CustomerName
), and an object with the address information, that is, Street
, PostalCode
, and City
.
Additionally, the order object includes an array of objects with information about all order positions. The array contains an object for each order position. The Positions
object includes the position number (PositionNo
), the delivery quantity (DeliveryQuantity
), the delivery value (DeliveryValue
), and an object with the item information, such as ItemNo
, ItemDescription
, PricePerUnit
and Currency
.
Figure 43: JSON data – Delivered orders in December 2015
To achieve this request the order header table (ORDERHDRX
) has to be joined with the address table (ADDRESSX
), where the address information is located. Because we also need the order position information, the order header table has to be joined with the order positions table (ORDERDETX
), too. The item description and the price per unit are located in the item master table (ITEMMASTX
). To get this information, the item master table has to be joined with the order position table.
To generate JSON data for the delivered orders in a specific date range, six CTEs are used. In the OrderAddr
CTE (first CTE) the order header table and the address table are joined together. Orders with a delivery date in December 2015 are selected (WHERE DELDATE BETWEEN '2015-01-01' and '2015-12-31'
). Only the columns containing the relevant information are listed.
Additionally, several column values are converted with CASE
expressions into the data that are expected to be returned (OrderTypeText
and DelTermsText
columns). The customer name (CustName
) consist of the concatenated values in the CustName1
and CustName2
columns. The delivery date is converted into a character representation of the date in the ISO format (DelDateChar
column). In all character columns with a fixed length, the leading and trailing blank spaces are trimmed off.
Figure 44: CTE – Order header and address information
The order position and item raw data for the orders delivered in December 2015 are prepared in the OrderAddr
CTE. In this CTE, the OrderAddr
CTE is joined with the order position table (ORDERDETX
) for determining the delivered positions. The order positions table is joined with the item master table (ITEMMASTX
).
Only positions with a delivery quantity are selected (WHERE DELQTY > 0
).
A new column is generated containing the delivery value (DELVALUE
) which must be returned in the JSON document. The delivery value is calculated by multiplying the delivery quantity (DELQTY
column) with the price per unit (PRICE
column). The result is formatted as DECIMAL(11, 2)
.
In all character columns with a fixed length, the leading and trailing blank spaces are trimmed off.
Figure 45: CTE – Order detail and item master information
Now, after the raw data has been determined, we can start building the JSON data.
Next, the CTEAddress
CTE (a nested object containing the address information from the OrderAddr
CTE) is generated. The JSON object includes the customer number (CustomerNo
), the concatenated customer names 1 and 2 (Name
) and an object with the address information. The address object includes the street (Street
), the zip code (PostalCode
) and the city (City
).
The customer number column is needed for joining the JSON data in a subsequent CTE with the OrderAddr
CTE.
Figure 46: CTE – Generating address information JSON data
In the CTEObjPos
CTE, for each order position, a nested object containing the order position and the item information is generated. The position object includes the order position (PositionNo
), an object with the item information, the delivery quantity (DeliveryQty
), and the delivery value (DeliveryValue
) which is prepared in the CTEOrdAddr
CTE.
Company and OrderNo are needed for joining the CTEObjPos
CTE with the OrderAddr
CTE in a subsequent CTE.
Figure 47: CTE – Generating order detail information JSON data
In the CTEArrPos
CTE, the previously generated order detail objects are aggregated for each company and order no using the JSON_ARRAYAGG
function.
Company and order number are not only necessary for condensing the data but also for joining the result in a subsequent CTE with the CTEOrdAddr
CTE.
Figure 48: CTE – Accumulating order position JSON data
In the CTEOrdHdr
CTE, the JSON data for each order delivered in December 2015 is generated.
The OrderAddr
CTE is joined with the CTEAddress
CTE (containing the customer JSON data) and the CTEArrPos
CTE (containing array with all order positions for each order).
Based on the joined information for each order, an object is generated containing the company (Company
), the order number (OrderNo
), the nested object with the customer data (CustomerInfo
), the delivery date (DeliveryDate
), the order type (OrderType
), the delivery terms (IncoTerms
), and the order positions array (Positions
).
Figure 49: CTE – Generating the JSON data per order
In the final SELECT
statement, the JSON data for the different orders is accumulated by running the JSON_ARRAYAGG
function. The result is embedded in the DeliveredOrders
object.
Figure 50: Final SELECT – Accumulating the order data
Conclusion
SQL provides many kinds of JSON publishing functions which allow even complex JSON data to be generated.
The hardest part when creating JSON data is not the use of the JSON publishing function, but the conversion of the relational data in the database into the hierarchical form of the JSON data. In combination with CTEs, even complex requests can be organized in a readable and maintainable way.