Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Fast lossless JSON parse event streaming, in JavaScript.

License

Notifications You must be signed in to change notification settings

xtao-org/jsonhilo

Repository files navigation

logo

JsonHilo.js

Minimal lossless JSON parse event streaming, akin to SAX.


Handcrafted by Darius J Chuck.

Donate directly via Stripe   or   Buy Me a Coffee at ko-fi.com   Postaw mi kawę na buycoffee.to


Fast, modular, and dependency-free.

Provides two interfaces: a high-level one and a low-level one.

Written in runtime-independent JavaScript.

Works in Deno, Node.js, and the browser.

Status

Stable.

Passes standards-compliance tests and performs well in benchmarks.

Battle-tested.

Installation

Node.js

An npm package is available:

npm i @xtao-org/jsonhilo

Deno and the browser

Import modules directly from deno.land/x:

import {JsonHigh} from 'https://deno.land/x/jsonhilo@v0.3.7/mod.js'

Or from a CDN such as jsDelivr:

import {JsonHigh} from 'https://cdn.jsdelivr.net/gh/xtao-org/jsonhilo@v0.3.7/mod.js'

Quickstart

See a basic example in demo/basic.js, pasted below:

import {JsonHigh} from '@xtao-org/jsonhilo'
const stream = JsonHigh({
  openArray: () => console.log('<array>'),
  openObject: () => console.log('<object>'),
  closeArray: () => console.log('</array>'),
  closeObject: () => console.log('</object>'),
  key: (key) => console.log(`<key>${key}</key>`),
  value: (value) => console.log(`<value type="${typeof value}">${value}</value>`),
})
stream.chunk('{"tuple": [null, true, false, 1.2e-3, "[demo]"]}')

This uses the simplified high-level interface built on top of the more powerful low-level core.

Features

Runtime-independent

The library logic is written in modern JavaScript and relies upon some of its features, standard modules in particular.

Beyond that it does not use any runtime-specific features and should work in any modern JavaScript environment. It was tested in Deno, Node.js, and the browser.

That said, the primary target runtime is Deno, and tests depend on it.

Lossless

Unlike any other known streaming JSON parser, JsonHilo provides a low-level interface for lossless parsing, i.e. it is possible to recover the exact input, including whitespace and string escape sequences, from parser events.

This feature can be used to implement accurate translators from JSON to other representations (see Rationale), syntax highlighters (demo below), JSON scanners that search for substrings in strings on-the-fly, without first loading them into memory, and more.

Highlight demo

Pictured above is the syntax highlighting demo: demo/highlight.js

Modular

The library is highly modular with a fully independent core, around which various adapters and extensions are built, including an easy-to-use high-level interface.

JsonLow

The core module is JsonLow.js. It has no dependencies, so it can be used on its own. It is very minimal and optimized for maximum performance and accuracy, as well as minimum memory footprint. It provides the most fine-grained control over the parsing process. The events generated by the parser carry enough information to losslessly recreate the input exactly, including whitespace.

See JsonLow.d.ts for type information and demo/highlight.js for usage example.

JsonHigh

JsonHigh.js is the high-level module which provides a more convenient interface. It is composed of auxiliary modules and adapters built around the core. It is optimized for convenience and provides similar functionality and granularity to other streaming parsers, such as Clarinet or creationix/jsonparse.

See JsonHigh.d.ts for type information and Quickstart for usage example.

Parameters

JsonHigh is called with an object which contains named event handlers that are invoked during parsing. All handlers are optional and described below.

Return value

JsonHigh returns a stream object with two methods:

  • chunk which accepts a JSON chunk to parse. It returns the stream object for chaining.
  • end with no arguments which signals that the current JSON document is finished. If there is no error, it calls the corresponding end event handler, passing its return value to the caller.

Events

There are 4 event handlers without arguments which indicate start and end of structures:

  • openArray: an array started ([)
  • closeArray: an array ended (])
  • openObject: an object started ({)
  • closeObject: an object ended (})

And 2 event handlers with one argument which capture primitives:

  • key: an object's key ended. The argument of the handler contains the key as a JavaScript string.
  • value: a primitive JSON value ended. The argument of the event contains the corresponding JavaScript value: true, false, null, a number, or a string.

Finally, there is the argumentless end event handler which is called by the end method of the stream to confirm that the parsed JSON document is complete and valid.

Note that an event handler won't be called if there is an error in the parsed JSON, see error handling.

Error handling

If there is an error when parsing a chunk, an Error is thrown, containing a serialized JSON object with details in the error message.

If there is an error at the end, that error is returned to the caller. The user-provided end event handler is not called, so it should not contain any cleanup code.

Cleanup

To run cleanup code at the end of parsing a document regardless of whether there was an error or not, don't put that code in the end handler. Instead put it after .end(), like so:

// ...
stream.end()
cleanup()

If you want to also handle an error, you can use the isError helper:

import {isError} from '@xtao-org/jsonhilo'

// ...

const ret = stream.end()
if (isError(ret)) { handle(ret) } // handle error
cleanup()

If your error handler can throw, you can use try-catch-finally:

import {isError} from '@xtao-org/jsonhilo'

// ...

const ret = stream.end()
try { if (isError(ret)) { handle(ret) } }
catch (e) { /* optional */ }
finally { cleanup() }

Fast

Achieving optimal performance without sacrificing simplicity and correctness was a design goal of JsonHilo. This goal was realized and for applications without extreme performance requirements JsonHilo should be more than fast enough.

It may be worth noting however that using pure JavaScript for extremely performance-sensitive applications is ill-advised and that nothing can replace individual case-by-case benchmarks.

It is difficult to find a parser that can be sensibly compared with JsonHilo. The one that comes the closest and is fairly widely known is Clarinet. It is the only low-level streaming JSON parser featured on JSON.org and the fastest one I could find.

xtao-org/jsonhilo-benchmarks contains simple benchmarks used to compare the performance of JsonHilo with Clarinet and jq (a fast and versatile command-line JSON processor).

According to these benchmarks, for validating JSON (just parsing without any further processing) JsonHilo is the fastest, before jq, which is in turn faster than Clarinet. Overall for comparable tasks the low-level JsonHilo interface is up to 2x faster than Clarinet, whereas the high-level interface is on par.

Again, these results need to be taken with a grain of salt, and there is no replacement for individual benchmarks. Use whatever suits your case best. In most cases, relative performance should not be the only factor to take into account.

Factors which make a fair comparison between JsonHilo and Clarinet problematic are mentioned below.

Differences between JsonHilo and Clarinet

The major differences that make the comparison of the two problematic are:

  • Clarinet is not fully ECMA-404-compliant, as measured by JSON Parsing Test Suite by Nicolas Seriot -- it accepts certain invalid JSON and rejects certain valid JSON. JsonHilo is designed to parse the JSON grammar correctly and so can pass the ECMA-404-compliance test suite. JsonHilo is overall safer to use with unknown inputs -- it can very well be used as a validator.
  • JsonHilo fundamentally operates on individual Unicode code points as opposed to strings, chunks, or characters. Performance-wise this may be an advantage or a disadvantage, depending on how the input is structured (it may need conversion).
  • Even though low-level processing with JsonHilo may be overall significantly faster than Clarinet, the fact that the former does not use regular expressions to parse the input while the latter does may lead to a narrower performance gap between the two.
  • JsonHilo is overall simpler in terms of code complexity, making it easier to adjust or audit. The code is also significantly smaller in size than Clarinet, even taking into account the optional high-level interfaces laid on top of the tiny core.
  • JsonHilo's core is more low-level and amenable to extension.

Streaming-friendly

By default the parser is streaming-friendly by accepting the following:

Standards-compliant

The streaming-friendly features can be supressed by Ecma404.js, an adapter module which provides full ECMA-404/RFC 8259 compliance.

This is confirmed by passing the JSON Parsing Test Suite by Nicolas Seriot, available under test/JSONTestSuite.

Tests can be run with Deno as follows:

deno test --allow-read

Unicode-compatible

The core logic operates on Unicode code points -- in line with spec -- rather than code units or characters.

Rationale

Initially written to enable fast lossless translation between JSON and Jevko, as no suitable JSON parser in JavaScript exists.

I decided to release this as a separate library, because I was tinkering with Deno and found that there was no streaming JSON parser available at all for Deno.

See also

JsonStrum -- a high-level wrapper over JsonHilo which emits fully parsed objects and arrays.

License

Released under the MIT license.

Support this project

I prefer to share my creations for free. However living and creating without money is not possible for me. So I ask companies and people, who want and can, for support. Every symbolic cup of coffee counts!

Donate directly via Stripe   or   Buy Me a Coffee at ko-fi.com   Postaw mi kawę na buycoffee.to

Paid support and online assistance

If you prefer, you can get paid help and support, including direct online assistance, related to JsonHilo.js through Githelp.

At the moment this is a limited opportunity to try an early version of Githelp.


tao-json-logo

A stand-alone part of the TAO-JSON project.

© 2024 xtao.org