Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

What’s New in SQL:2016


In December 2016, ISO released a new version of the international SQL standard (ISO/IEC 9075:2016). It supersedes the previous version from 2011.

This article is a brief overview of the new features being introduced into the SQL language. Strictly speaking, this article covers the additions to part 2 of the standard (SQL/Foundation), i.e. the most commonly used part.

This article also shows the availability of these features among six major databases. Note that respective figures—shown below—only reflect whether or not the databases support the features in the way described by the standard. For example, an X in the JSON row does not mean the database has no JSON support—it simply means the database doesn’t support the JSON features described by the SQL standard. As a matter of fact, all tested databases support JSON in some way—but not necessarily in the way described by the standard.0

BigQueryDb2 (LUW)dMariaDBbMySQLbOracle DBacPostgreSQLSQL ServerbSQLitebRow pattern recognitionJSONlistaggcast(… format …)Polymorphic table functions
  1. In the from clause
  2. Some functions conform (by “coincidence”)
  3. Some minor omissions
  4. No on overflow clause • Limited distinct

Before going through all the shiny new features, let’s look at some trivia: part 2 of the SQL:2016 standard has 1732 pages—that’s 260 pages more (~18%) than the 2011 edition. It introduces 44 new optional features (+14%). Let’s take a look at them…

Row Pattern Recognition

Row pattern recognition captures groups of rows that follow a pattern. Rows in matched groups can be filtered, grouped, and aggregated. The pattern itself is described with a simple regular expression syntax.

The main use of row pattern recognition is to check time series for patterns. However, the match_recognize clause combines aspects of the where, group by, having and over clauses (window functions) so it is also useful in many other cases.

I have given a presentation about row pattern recognition. It discusses several examples in two implementation variants: with and without the new match_recognize clause. The examples cover some typical use cases, and also some atypical use cases for row pattern recognition:

  • Consecutive events: identifying sessions in a web-log; tolerating gaps in sequences

  • Top-N per group (might be faster than window functions!)

  • Time intervals: finding gaps (e.g. for scheduling)

  • Time intervals: closing gaps (creating new rows)

If you understand German, you can watch the video recording here.

Readers interested in more details may refer to the technical report “Row Pattern Recognition in SQL” (ISO/IEC TR 19075-5:2016) available for free at ISO.

BigQueryDb2 (LUW)MariaDBMySQLOracle DBPostgreSQLSQL ServerSQLitefrom clausewindow clausefull aggregate support

JSON

SQL has been supporting arrays, multisets (nested tables) and composite types for 18 years. In 2003, SQL was given a native XML type. Now it has JSON support.

The following sections briefly describe the key parts of the new standard JSON functionality. A more detailed discussion is available in the technical report by ISO.

No Native JSON Type

Even though XML and JSON are somewhat similar—they are documents with nested structures—their integration into SQL is quite different. The most striking difference is that the standard does not define a native JSON type like it does for XML. Instead, the standard uses strings to store JSON data. Update: ISO/IEC 9075-2:2023 introduced a JSON type (T801).

Note that this does not prevent vendors from providing a JSON type. The standard just defines functions that interpret strings as JSON data. To claim conformity to the standard’s JSON features, a database must support these string-based functions. Vendors are still free to add a JSON type and related functions. They are even allowed to provide the standard’s JSON functions for their native JSON type—a very reasonable option in my opinion.

The only annoying consequence of using strings for JSON data is the error handling.1 If there was a JSON type, parsing errors could only happen when casting a string into the JSON type. Instead, the standard defines an on error clause for all functions that interpret strings as JSON data.

json_value(json, '$.id' null on error)

The default is error or error. The string '$.id' is a JSON path (described below).

Note that is still possible to declare table columns that only accept valid JSON data:

CREATE TABLE … (
   jcol CLOB CHECK (jcol IS JSON)
)

The check constraint uses the new is json predicate to test whether the string contains valid JSON data or not. The is json predicate can even test for a specific JSON types and is thus more flexible than a single JSON type:

<expr> is [not] json [ value | array | object | scalar ]
BigQueryaDb2 (LUW)MariaDBaMySQLaOracle DBbcPostgreSQLSQL ServerSQLiteais [not] json predicateon [ error | empty ] clauses
  1. See Alternatives
  2. Accepts unquoted object keys: {a: 1} • No type constraints: … is json [array|object|scalar]
  3. No unknown on error. No expressions in default ... on [ error | empty ]

JSON Formats

The SQL standard allows database vendors to support different JSON formats. The one described by RFC 7159 is the mandatory default.

Functions that generate or parse JSON data accept the optional format clause to specify which format to use (format json is default). The alternatives offered might have a very different syntax (like BSON).

Creating JSON Data

The standard defines the following functions to create JSON strings:

json_object([key] <expr> value <expression> [,…])

Creates a JSON object. The keywords key (optional) and value introduce the attribute name and value:

json_object( key 'id'   value 1234
           ,     'name' value 'Markus')

The standard also accepts a colon (:) between key and value:2

json_object( 'id': 1234
           , 'name': 'Markus')

A comma is only used to list multiple key/value pairs. This is a noteworthy difference to json_object (MySQL, SQLite) and json_build_object (PostgreSQL): they use the comma for both separating keys from values and listing multiple key/value pairs.

json_array([<expr>[,…]])

Creates a JSON array from the values provided.

json_array(<query>)

Creates a JSON array from the values returned by <query>. The query must return exactly one column.

json_arrayagg(<expr> [order by …])

Creates a JSON array from the values of a group (like array_agg):

SELECT json_arrayagg(col [order by seq])
  FROM …
 GROUP BY x

Note that the optional order by clause3 is inside the parentheses—not in a within group clause as for listagg.

json_objectagg([key] <expr> value <expression>)

Creates a JSON object from the key/value pairs of a group. It uses the same syntax to denote the key and value as json_object:

SELECT json_objectagg(k value v)
  FROM …
 GROUP BY x

An order by clause is not allowed because JSON objects are unordered key/value pairs.

BigQuerydDb2 (LUW)aeMariaDBdgiMySQLdgOracle DBahPostgreSQLbbSQL ServereSQLitecfcjson_objectjson_arrayjson_objectaggjson_arrayagg(… order by …)
  1. No colon syntax (T814)
  2. No colon syntax (T814) • No key uniqueness constraint (T830): [with|without] unique [keys]
  3. Requires the -DSQLITE_ENABLE_JSON1 compile-time option prior to SQLite 3.38.0
  4. Defaults to absent on null • No construction by query: json_array(select …)
  5. No construction by query: json_array(select …)
  6. Defaults to absent on null • No construction by query: json_array(select …) • Requires the -DSQLITE_ENABLE_JSON1 compile-time option prior to SQLite 3.38.0
  7. Supports comma (,) instead of values or colon (:)
  8. No colon syntax (T814) • Supports comma (,) instead of values or colon (:)
  9. No (ABSENT | NULL) ON NULL

Accessing JSON Items

The following functions use the so-called SQL/JSON path language (described below) to access parts of a JSON document. They interpret strings as JSON data and thus accept the on error clause.

json_exists(<json>, <path>)

Tests whether a specific path exists in JSON document. It evaluates to true, false or unknown and can be used in the where clause directly:

WHERE json_exists(jcol, '$.name')
json_value(<json>, <path> [returning <type>])

Extracts a scalar JSON value—everything except object and array—and returns it as a native SQL type. The optional returning clause performs a typecast. Without a returning clause, json_value returns a string.

json_query(<json>, <path> …)

Extracts a part out of JSON document and returns it as a JSON string. The main differences to json_value are: (1) it can extract any JSON type; (2) it always returns a string; (3) it can extract multiple elements from a JSON document.

Due to these differences, there are two special cases:

Multiple hits

By default, json_query raises an exception if the JSON path matches more than one element (like json_value). The optional with [ conditional | unconditional ] [array] wrapper clause wraps the result into an array and returns all hits.

JSON path denotes a single string

Json_query generally returns JSON data. If the JSON path matches a single JSON string, the result is still enclosed in double quotes with JSON special characters escaped. The omit quotes [on scalar string] clause returns the raw data instead (like json_value).

Finally, there is the json_table function: it is basically a table function—i.e. it is used in the from clause.

json_table(<json>, <path> columns …)

Transforms JSON data into a table. Json_table does not introduce any new functionality but is often simpler (and faster) than using multiple json_value calls.4

Please refer to ISO’s technical report (paragraph 5.3.4) for the full syntax. As a teaser, consider the following example. It transforms a JSON document (an array containing two objects) into a table:

[{id: 1, name: "Marvin"},
 {id: 2, name: "Arthur"}
]
SELECT jt.*
  FROM t
     , JSON_TABLE
       ( jcol
       , '$[*]'
         COLUMNS (id   NUMERIC      PATH '$.id',
                  name VARCHAR(255) PATH '$.name'
                 )
       ) jt

The json_table function produces one row for each element matched by the JSON path $[*]. The columns clause declares the names and types of the result columns and the JSON paths to the actual values (relative to the main JSON path).

idname
1Marvin
2Arthur
BigQueryafDb2 (LUW)bdgMariaDBbdhMySQLbhOracle DBehPostgreSQLadhSQL ServeradiSQLiteccjjson_existsjson_valuejson_queryjson_table
  1. Defaults to error on errorReturning clause with limited functinality
  2. Returning clause with limited functinality
  3. Requires the -DSQLITE_ENABLE_JSON1 compile-time option prior to SQLite 3.38.0
  4. No quotes behavior: [ keep | omit ] quotes
  5. with unconditional wrapper seems to be buggy
  6. Alternative: json_query_array, unnest and json_value
  7. Without plan clause • Requires error on error and a strict JSON/Path for the main expression
  8. Without plan clause
  9. Alternative: openjson
  10. Alternative: json_each and json_extract

JSON Path

The SQL standard specifies a path language for JSON.5 It “adopts many features of ECMAscript, though it is neither a subset nor a superset of ECMAscript.”6

In the JSON path language, the dollar sign ($) represents the current context element, the period (.) an object member, and the brackets ([]) an array element. The following examples illustrate this:

$.name

Denotes the value of the name attribute of the current JSON object.

$[0]

Denotes the first element of the current JSON array.

$.events[last]

Denotes the last element of the array stored in the attribute events of the current JSON object.

$.events[0 to 4, last]

Denotes the first five and the last array element of the array stored in the attribute events of the current JSON object.

$.*

Denotes the values of all attributes of the current JSON object.

JSON path supports filter expressions in the form ?(<expression>). In filter expressions, the at sign (@) denotes the current context.

$.* ?(@.type()=="number")

Denotes all attribute values of type number.

$.events[*] ?⁠(exists(@.name))

Denotes all array elements of the attribute events that contain a JSON object with the attribute name.

$?⁠(@.name starts with "Lorem")

Denotes the full JSON document if it is a JSON object that has an attribute name that contains a string that starts with Lorem.

Other functions available in filter expressions include size() (array size), ceiling(), floor(), abs(), and datetime() (for parsing, see below).

The SQL/JSON path language defines two modes: lax, which is the default, and strict. The mode can be specified by adding a prefix to the JSON path expression (e.g. 'strict $.name'). The strict mode triggers error handling (subject to the on error clause) for all errors. This includes accessing non-existing object members and using an array accessor on a scalar value or JSON object.

The lax mode suppresses these errors. If required, it unwraps arrays or wraps scalar values so that the document structure and JSON path expression fit to each other. The lax mode allows working with variable document structures without adjusting the JSON path to each document.

BigQueryacjDb2 (LUW)bdkoMariaDBelMySQLemOracle DBfhinPostgreSQLSQL ServeragjpSQLiteagjJSON path: member accessor .…JSON path: array accessor .[…]JSON path: filter expression ?(…)JSON path: item methods .…()JSON path: default mode (lax)JSON path: keyword laxJSON path: strict modeJSON path: named variables
  1. No wildcard (.*)
  2. No quoted key names (."…")
  3. No expressions • No last • No ranges ([… to …])
  4. No expressions • No last • No ranges ([… to …]) • No multiple subscripts (T833)
  5. No expressions • No multiple subscripts (T833)
  6. No expressions
  7. No expressions • No last • No ranges ([… to …]) • No multiple subscripts (T833) • No wildcard ($[*])
  8. Comparisons cannot have two expressions • Like_regex doesn’t accept flags • No (…) is unknown
  9. No .datetime() (but .date() and .timestamp()) • Only in ?(…): abs, ceiling, floor, double • No .keyvalue()
  10. No unwrap • No wrap
  11. SQL/JSON path syntax errors not reported (not even with ERROR ON ERROR)
  12. No unwrap • Wildcard array accessor (.[*]) doesn’t wrap • SQL/JSON path syntax errors not reported
  13. No unwrap
  14. Filter expression (? (…)) doesn’t unwrap
  15. Requires ERROR ON ERROR to report SQL/JSON path syntax errors • Wrong SQLSTATEs
  16. Wrong SQLSTATEs

What’s missing in SQL/JSON

The SQL standard does not provide functions to update parts of JSON documents (like json_set in MySQL, PostgreSQL, and SQLite or json_modify in SQL Server).

If you like this page, you might also like …

… to subscribe my mailing lists, get free stickers, buy my book or join a training.

Date and Time Formatting and Parsing

Formatting dates and times (temporal data) is one of the gaps in the SQL standard that was filled in pretty much every database—of course, every vendor filled it differently. SQL:2016 finally added this functionality to the standard.

The SQL standard uses a format template such as 'YYYY-MM-DD' to describe how to format or parse temporal data. The following table summarized the available mnemonics—notice the similarities with the “datetime format models” used by the Oracle database.

MnemonicMeaningextract field
YYYY | YYY | YY | YYearYEAR
RRRR | RRRounded year7
MMMonthMONTH
DDDay of monthDAY
DDDDay of year
HH | HH1212 hour
HH2424 hourHOUR
MIMinuteMINUTE
SSSecond of minuteSECOND
SSSSSSecond of day
FF1 | … | FF9Fraction(in SECOND)
A.M. | P.M.AM or PM
TZHTime zone hourTIMEZONE_HOUR
TZMTime zone minuteTIMEZONE_MINUTE

Remember that the extract expression (good old SQL-928) can access the individual components of temporal types. I’ve added the respective extract field names for reference.

Format templates can be used in two ways: (1) in the JSON path method datetime (see above); (2) in a cast specification:

CAST(<datetime> AS <char string type> [FORMAT <template>])
CAST(<char string> AS <datetime type> [FORMAT <template>])
BigQueryDb2 (LUW)bMariaDBMySQLcOracle DBaPostgreSQLSQL ServerSQLitecast(… format …)JSON path: .datetime(…)extract(… from <datetime>)
  1. Offers .date() and .timestamp()
  2. No time zone fields.
  3. No time zone fields. SECOND does not include fractions. Use SECOND_MICROSECOND.

Listagg

Listagg is a new ordered set function that resembles the group_concat and string_agg functions offered by some databases. It transforms values from a group of rows into a delimited string.

The minimal syntax is:

LISTAGG(<expr>, <separator>) WITHIN GROUP(ORDER BY …)

Listagg accepts the optional on overflow clause to define the behavior if the result becomes too long:

LISTAGG(<expr>, <separator> ON OVERFLOW …)

The default is on overflow error. The on overflow truncate clause prevents the overflow by only concatenating as many elements as the result type can accommodate. Furthermore, the on overflow truncate clause allows you to specify how to terminate the result:

ON OVERFLOW TRUNCATE [<filler>] WITH[OUT] COUNT

The optional <filler> defaults to three periods (...) and this is added as the last element if truncation happens. If with count is specified, the number of omitted values is put in parentheses and appended to the result.

I have written a more detailed article about listagg. Please keep the type-safe alternatives in mind, which are a better choice in many cases.

BigQueryDb2 (LUW)aMariaDBMySQLOracle DBPostgreSQLSQL ServerSQLitelistagg(…) within group (…)listagg(… on overflow …)listagg(distinct …)
  1. If ordered by the aggregated values: listagg(distinct X,…) within group (order by X)

Trigonometric and Logarithmic Functions

SQL:2016 introduces trigonometric functions (sin, cos, tan, asin, acos, atan, sinh, cosh, tanh), a general logarithm function (log(<base>, <value>)) and a shortcut for the logarithm with base 10 (log10(<value>)).

Note that the natural logarithm ln(<value>) was introduced with SQL:2003.

BigQueryDb2 (LUW)MariaDBMySQLOracle DBPostgreSQLSQL ServeraSQLitesin, cos, tanasin, acos, atansinh, cosh, tanhlog(<base>, <value>)log10(<value>)ln(<value>)
  1. SQL Server expects the arguments in reverse order: LOG(<value>, <base>)

Polymorphic Table Functions

SQL table functions—standardized with SQL:2003—are functions that return tables. Table functions have to specify the names and types of the columns they return—the so-called row-type—at the time of creation:

CREATE FUNCTION <name> (<parameters>)
       RETURNS TABLE (c1 NUMERIC(10,2), c2 TIMESTAMP)
   ...

This function returns a table with two columns: c1 and c2 with the respective types.

Once declared, table functions can be used in the from and join clauses similarly to regular tables.9

Prominent examples of table functions are:

SQL:2016 introduces polymorphic table functions (PTF) that don’t need to specify the result type upfront. Instead, they can provide a describe component procedure that determines the return type at run time. Neither the author of the PTF nor the user of the PTF need to declare the returned columns in advance.

PTFs as described by SQL:2016 are not yet available in any tested database.

Update 2018: Oracle Database 18c

Oracle Database 18c introduced polymorphic table functions with a proprietary syntax. See Polymorphic Table Functions (PTF).

Interested readers may refer to the free technical report “Polymorphic table functions in SQL” released by ISO. The following are some of the examples discussed in the report:

  • CSVreader, which reads the header line of a CSV file to determine the number and names of the return columns

  • Pivot (actually unpivot), which turns column groups into rows (example: phonetype, phonenumber)

  • TopNplus, which passes through N rows per partition and one extra row with the totals of the remaining rows

Personally, I think many of the examples from the technical report could be solved using other SQL features.10 Implementing the json_table function as a PTF might have been an interesting example.

Miscellaneous Features

Join … using now accepts as (F404):

FROM A
JOIN B USING (…) AS correlation_name

There is a new type decfloat[(<precision>)] (T076).

Named arguments in function and procedure invocations (name => value) are no longer limited to call statements (T524).

The default clause is allowed for arguments to functions (T525) and inout arguments (T523).

BigQueryDb2 (LUW)MariaDBMySQLOracle DBPostgreSQLSQL ServeraSQLitejoin … using … asdecfloat typeNamed arguments outside callDefaults for function arguments
  1. Join ... using not supported at all

Feature Taxonomy: Obsolete and New Features

For the sake of completeness: part 2 of SQL:2016 introduces no new mandatory features.11

The feature T581, “Regular expression substring function” has been deprecated (“The functionality is essentially subsumed by Feature F844, “SUBSTRING_REGEX”12). Note that feature T141, “SIMILAR predicate” has been deprecated in 2011.13

The following table lists the new optional features in ISO/IEC 9075-2:2016 (compared to 2011).

Feature IDFeature Name
B200Polymorphic table functions
B201More than one PTF generic table parameter
B202PTF copartitioning
B203More than one copartition specification
B204PRUNE WHEN EMPTY
B205Pass-through columns
B206PTF descriptor parameters
B207Cross products of partitionings
B208PTF component procedure interface
B209PTF extended names
F404Range variable for common column names
R010Row pattern recognition: FROM clause
R020Row pattern recognition: WINDOW clause
R030Row pattern recognition: full aggregate support
T076DECFLOAT data type
T523Default values for INOUT parameters of SQL-invoked procedures
T524Named arguments in routine invocations other than a CALL statement
T525Default values for parameters of SQL-invoked functions
T622Trigonometric functions
T623General logarithm functions
T624Common logarithm functions
T625LISTAGG
T811Basic SQL/JSON constructor functions
T812SQL/JSON: JSON_OBJECTAGG
T813SQL/JSON: JSON_ARRAYAGG with ORDER BY
T814Colon in JSON_OBJECT or JSON_OBJECTAGG
T821Basic SQL/JSON query operators
T822SQL/JSON: IS JSON WITH UNIQUE KEYS predicate
T823SQL/JSON: PASSING clause
T824JSON_TABLE: specific PLAN clause
T825SQL/JSON: ON EMPTY and ON ERROR clauses
T826General value expression in ON ERROR or ON EMPTY clauses
T827JSON_TABLE: sibling NESTED COLUMNS clauses
T828JSON_QUERY
T830Enforcing unique keys in SQL/JSON constructor functions
T831SQL/JSON path language: strict mode
T832SQL/JSON path language: item method
T833SQL/JSON path language: multiple subscripts
T834SQL/JSON path language: wildcard member accessor
T835SQL/JSON path language: filter expressions
T836SQL/JSON path language: starts with predicate
T837SQL/JSON path language: regex_like predicate
T838JSON_TABLE: PLAN DEFAULT clause
T839Formatted cast of datetimes to/from character strings

You can’t catch up on 20 years of SQL evolution in one day. Subscribe the newsletter via E-Mail, Bluesky or RSS to gradually catch up and to keep modern-⁠sql.com on your radar.

About the Author

Photo of Markus Winand

Markus Winand provides insights into SQL and shows how different systems support it at modern-sql.com. Previously he made use-the-index-luke.com, which is still actively maintained. Markus can be hired as trainer, speaker and consultant via winand.at.

Buy the Book

Cover of “SQL Performance Explained”: Squirrel running on grass

The essence of SQL tuning in 200 pages

Buy now!
(paperback and/or PDF)

Paperback also available at Amazon.com.

Hire Markus

Markus offers SQL training and consulting for developers working at companies of all sizes.
Learn more »

Footnotes

  1. Without any doubt, the “driver-based solution” provided by IBM DB2 is not even close to the JSON functionality described by the standard.

  2. Arguably, performance is another far-reaching consequence—even though I think a good implementation can significantly mitigate its impact.

  3. Optional feature T814 — Colon in JSON_OBJECT or JSON_OBJECTAGG

  4. Optional feature T813 — SQL/JSON: JSON_ARRAYAGG with ORDER BY.

  5. More importantly, json_table can produce multiple rows. This can be done by other means as well, but the syntax becomes bulky.

  6. As there is no other established international standard for a JSON path language (like XQuery for XML), the SQL standard entirely defines the syntax and semantics of this new SQL/JSON path language.

  7. TR 19075-6:2017 §6.1

  8. RR may have different rules than YY when filling the gaps (IV096, ISO/IEC 9075-2:2023 §9.51 GR 6biv)

  9. This is from intermediate SQL-92, now part of T052.

  10. Syntactic sugar required: table(). Lateral is also often used in the context of table functions.

  11. CSVreader: SQL/MED comes to my mind (which lacks polymorphism). However, for a CSV reader, arrays might be a suitable solution.

    Unpivot: use join lateral (values (phtype1, phonenumber1), (phtype2, phonenumber2)) (with an on clause to filter null if needed).

    TopNplus: use match_recognize (like shown in my slides).

  12. Based on the numbering of features in the tables “Feature taxonomy and definition for mandatory features” in 9075-2:2011 vs 9075-2:2016.

  13. SQL:2016-2 Annex D.

  14. SQL:2016 also deprecates “the use of <similar predicate part 2> in <case expression>”.

Connect with Markus Winand

Subscribe mailinglistsSubscribe the RSS feedMarkus Winand on LinkedInMarkus Winand on XINGMarkus Winand on TwitterMarkus Winand on Bluesky
Copyright 2015-2025 Markus Winand. All righs reserved.
Legal | Contact | NO WARRANTY | Trademarks | Privacy and GDPR