James Clark's Random Thoughts

2022-05-04

Why Ballerina is a language

A new programming language is a lot of work, and the chances of any new programming language getting traction are small. Many new languages are created. Very few make it.

Perhaps even more work than the language is the platform - all the other things that are needed to make users productive with the language: standard library, package manager, IDE, debugger, documentation system, testing tools, etc. One way to reduce the cost of a new language is to leverage an existing platform, such as Java or .NET, and rely on that platform for some of the needed functionality.

Ballerina is a new programming language, and is also a platform. Although it's implemented on top of the JVM, it does not embrace the JVM. It is designed with the goal that we can do another implementation that does not use the JVM, and user code will run unchanged. (We do provide JVM interop features, but that is specifically for when you want to interop with existing JVM code.)

This raises an obvious question. Why? In this post, I want to address this question by explaining what there is in the Ballerina language and platform that could not be done except with a new language and platform. These can be grouped into three areas:

networking: networking abstractions, which are part of the language, and implementations of those abstractions provided by the standard library;
data and types: the kinds of values that the language operates on and the ways that the type system provides to describe these;
concurrency: how the language enables the program to describe concurrent execution of code and control concurrent access to mutable state

These three areas are fundamental: they could not be grafted onto another language. They are also deeply interconnected. In addition, there are some supporting features that are not so fundamental, but which together provide significant value.

This blog is not a complete answer to the "Why?" question. Ballerina's development is funded by WSO2 and WSO2's ultimate goal is to create a product that is useful to its customers. But Ballerina is not itself the product: both Ballerina the language and Ballerina the platform are free and open source. The product is separate: it's a cloud service that takes advantage of Ballerina's capabilities.

Network abstractions

The Ballerina language provides abstractions for both network services and network clients, but knows nothing about specific protocols. Protocol-specific library code is needed to make these abstractions available for a specific protocol. The standard library includes this for the following protocols:

HTTP
GraphQL
gRPC
WebSocket
WebSub
FTP
Sockets (TCP/UDP)
Messaging (AMQP, Kafka, NATS, IMAP4, POP3)

The network abstractions for clients are more straightforward than for services. For clients, the network abstraction consists of a distinct kind of object, called a client object, which has a distinct kind of method, called a remote method, which represents outbound network messages. The standard library supports a protocol by providing an implementation of a client object for that protocol. The language provides a distinctive syntax for remote method calls on client objects and syntactically restricts where such calls can appear. This enables the Ballerina VS Code extension to provide a graphical view of a function or program, which uses a sequence diagram to show the interactions between client objects and remote services. This graphical view always remains in sync with the textual view, and both views are editable.

For services, Ballerina provides a distinct kind of object, called a service object. A remote method on a service object represents a network-callable method. Incoming network messages are dispatched to service objects by using objects implementing the language-defined Listener type. The standard library supports a protocol for services by providing an implementation of the Listener type for the protocol. The language also provides convenient syntax for a module to construct a Listener object, and to define a service object and attach it to a Listener object.

For many languages, the execution model of a program is simply to call a function, which represents the entry point of the program. In Ballerina, the services defined by a program's modules are the network entry points of the program, and this is incorporated into the execution model of a Ballerina program. When a program is executed, every module will first be initialized; this will construct and connect up the module's Listener and service objects. After all modules have been initialized, the program enters a listening phase, which makes the Listeners start accepting network input. The execution model also deals with shutting down services.

The language-provided network abstractions make a program's interaction with the network explicit. This is used to provide network observability. It is also the basis for the code-to-cloud support, which uses compiler extensions to generate artifacts needed for deployment to different cloud platforms (K8s, Azure, AWS). This is also the

Service objects support the concept of a resource method, which enables a more data-oriented view of services. This can be thought of as a network-oriented generalization of OO getter/setter methods, where get/set is generalized to the protocol-defined method (e.g. the HTTP method name get/put/post) and the property name is generalized to a path. The standard library provides implementations of this for HTTP and GraphQL services. This avoids the pain that comes from having to artificially combine the HTTP method name and the resource path into a single identifier (what OpenAPI calls the operationId). We are working on extending the resource method concept to client objects.

Normally when a service and client use a request-response message exchange pattern the remote method on the service can use its return value to provide its response to a request. But this is not always sufficient: in some cases, the service may want to control what happens if there is an error in sending the response; in other cases, they may be using a more complex message exchange pattern. Ballerina models this by passing a client object as an argument to the service's remote method; the service's remote method calls remote methods on this client object to send messages back to the client.

Data and types

Plain data

One of the most fundamental aspects of Ballerina is its focus on plain data. This is called anydata in Ballerina and is analogous to the POD (Plain Old Data) concept in C++. It is pure data, independent of processing that might be applied to the data.

Messages exchanged by network protocols are represented by plain data; the implementations of network protocols can automatically serialize plain data in a format appropriate to the protocol. In particular, plain data can be directly serialized to and from JSON in a simple, natural way.

The whole Ballerina platform is designed to maximize use of plain data. Objects, which bundle methods with data, are not plain data, and the platform uses plain data rather than objects, unless the specific functionality provided by objects is needed. Services and clients are represented as objects; the parameters and return values of remote methods are plain data.

Structured data throughout the platform is represented using the built-in map and array types, which are plain data, rather than using library-defined collection types. In addition to maps and arrays, Ballerina provides a built-in table type, which allows for collections with arbitrary plain data keys (maps have only string keys as in JSON); tables are automatically transformed into arrays of objects when serializing to JSON. The table type provides enough power that even sophisticated, complex programs can be written using only the language-provided collection types.

Ballerina has a structural type system, which has several features typically found in schema languages, such as unions and open records. The overall result is that Ballerina types for plain data work well as schemas for network messages. Subtyping is simple and flexible because it is semantic: types are thought as sets of values, with subtyping corresponding to the subset relationship between the corresponding sets. For example, a user-defined record type is a map. This allows the platform to easily convert between user-defined types and generic types (like anydata). Converting to the generic type is a no-op, because of the subtype relationship. In the other direction, the platform uses a language capability (similar to a type cast) to validate and convert the value to a user-defined type.

Types for services

The user defines a service by writing resource methods or remote methods. For the HTTP and GraphQL protocols, the user can define types (most often record types) and use them for the parameters and return values. The platform's Listener implementation for the protocol makes this just work: the incoming messages will be validated and converted using the parameter types. Annotations can be used to fine-tune this, for example to control whether a method parameter should come from a query parameter or the payload.

The platform can also use the service definitions to generate an IDL. For HTTP, this would be OpenAPI. The types specified for the parameters and return value are converted to a JSON schema.

This works for GraphQL in a similar way: the GraphQL Listener exposes a GraphQL service; it constructs the GraphQL schema for this service from the types in the resource methods; GraphQL introspection is used to make the schema available to clients at runtime.

For gRPC, the platform uses an IDL-first approach (the gRPC community's preferred approach). The platform allows a Ballerina service definition stub to be generated from the gRPC service definitions.

Types for clients

The platform uses two approaches to allow clients to work with typed data. The remote method on the generic client class can at runtime convert the response to a user-specified type passed as an argument.

Alternatively, the platform can generate an application-specific client class from the service's IDL. Note that this supports the same graphical view as the generic client. For GraphQL, the platform can generate an application-specific typed client using a user-specified set of GraphQL queries.

Concurrency

The graphical view of a function as a sequence diagram provided by the VS Code extension shows not only how the function accesses network services, but also the concurrent logic of the function. The language's worker-based concurrency primitives are designed for this graphical view. In the sequence diagram, each worker is represented by a vertical lifeline, and a message passed between workers is represented by a horizontal arrow between the corresponding lifelines. The textual representation is more complex, and requires the compiler to pair up sends and receives. The compiler can also detect potential deadlocks. These primitives have limited expressiveness compared to the concurrency primitives offered by most languages, but are much easier and safer to use in cases where this expressiveness is sufficient.

Ballerina allows programmers to make use of shared mutable state in a familiar way, yet the platform also allows user-defined services to be executed in parallel, with a compile-time safety guarantee that this will not cause data races. This leverages a combination of language features: a simple locking primitive, read-only types, and a concept of isolation. The last of these is a complex, multi-faceted feature, but the compiler can infer it within a single module. The overall effect is the compiler can check whether a service's access to mutable state is always properly locked; if it is, then the Listener implementation allows parallel execution of that service; if not, then the compiler can tell the user that they need to add locks.

The platform uses asynchronous IO throughout, but this is not exposed to the programmer. Async functions are not distinguished as a separate kind of function. The programmer can instead think in terms of logical threads of control, which Ballerina calls strands; these are similar to virtual threads proposed for Java, or goroutines in Go.

Supporting features

Transactions

The language provides transaction-related features, which make it easier to code robust transaction logic and enable some logic errors to be caught at compile-time. (Note that this is not transactional memory.) These rely on their being a transaction manager provided by the runtime and standard library.

The language accommodates distributed transactions by allowing service and client remote and resource methods to be transaction-aware. It also supports a form of compensation by allowing participants in a distributed transaction to register code to be run when a distributed transaction completes. What makes distributed transactions work is not so much the language as the runtime and standard libraries: these provide a distributed transaction manager and support for transactions in the HTTP listener and client implementation..

Databases

The standard library provides support for accessing SQL databases. A SQL database is accessed using a client object. Database transactions integrate with the language's transaction features by making the remote methods on the SQL client object be transactional.

Data with types defined by the SQL schema are transformed into Ballerina values having user-defined record types using the same language features that other network clients use to transform data received from the server.

SQL queries are represented using a template feature similar to JavaScript template literals. This allows Ballerina values representing query parameters to be automatically converted to SQL values.

The language-provided stream type is used to return the results of a query. The language-integrated query feature can be applied to streams directly to allow for further program code to further refine the query results or combine them with the results of queries from other databases, without having to keep the full result in memory.

Configuration data

Most real-life programs need access to configuration data at runtime. Ballerina has language support for this. The language support consists just of allowing specific module-level variables to be declared as configurable; there can be a default for the value or it can be required to be specified in the configuration. The runtime uses a TOML file to initialize configurable variables.

Although this is a very simple language feature, it combines with other Ballerina language features (types, plain data, read-only) to provide a powerful capability: the structure and type of all configuration input to a program is known at compile time, which greatly facilitates the management of the data by higher-level layers.

Language-integrated query

Ballerina provides a language-integrated query feature, which is a generalization of the list comprehensions found in many programming languages. The syntax is similar to C# LINQ declarative query syntax. But whereas the semantics of the C# LINQ syntax are defined in terms of a desugaring into method calls, the semantics of the Ballerina query syntax (which are inspired by XQuery FLWOR expressions) are defined directly in terms of operations on Ballerina's built-in collection types.

The table collection type and query are designed to work nicely together. Tables are similar to lists of records with a primary key. List comprehensions can be extended to handle these more smoothly than maps, where the key and value are separate. Queries have a join clause that turns into a hash join when used with tables.

Query allows many data transformations to be written in a declarative way, using expressions rather than statements, which enables a graphical user interface based on data flow.

XML

Ballerina has a separate xml data-type, modeled after XQuery, which also counts as plain data.

The platform supports two ways of serializing xml values. When the entire network message is XML, then the xml value is serialized as an XML document. When an xml value is included within a structure serialized as JSON, the xml value is serialized as a JSON string. This is convenient when the xml value is being used to represent HTML.

The language-integrated query feature also works with xml: XML structures can be used as input and/or output to a query. This combines with a specialized XPath-like XML-navigation syntax.

Future

The long-term vision for Ballerina includes a number of important features which are not yet implemented, but which have a foundation in existing language features.

Event streams (unbounded streams of records with timestamps). Be able to both generate them and query them (using various kinds of windows). Related to this is more first-class support for a client subscribing to a stream of events from the server.
Network security. Language support to help the user avoid network security problems (we have experimented with a feature similar to tainting in Perl); this can leverage the explicitness of network interactions in Ballerina.
Service choreography. Be able to write a single description that describes how multiple services interact and use that to derive the types of individual services. This could handle services implemented in other programming languages by using Ballerina service types as an IDL.
Workflow. Support long-running process execution. Be able to suspend a program and later resume it as a result of an incoming network message. This also requires that transactions get better support for compensation.

2019-09-12

Ballerina Programming Language - Part 1: Concept

In the previous post, I talked about the context for Ballerina. In this post, I want to explain what kind of programming language it is. We can summarize this as a number of design goals:

Provide abstractions for networking
Use sequence diagrams as the visual model
Minimize cognitive load
Leverage familiarity
Enable a semantically-rich static program model
Provide a complete platform, not just a language
Allow multiple implementations, based on different runtime environments

This is not exactly the list we started out with: it has evolved in the light of experience.

I will talk about each of these points in turn. The first two of these are a bit different. They correspond to the two fundamental features that make Ballerina unique.

Networking

The primary function of an ESB is to send and receive network messages. So language-level support for this is central to the Ballerina project.

On the sending side, the key abstraction is a remote method. A remote method is part of a client object; there is a distinct syntax for calling remote methods. A program sends a message by calling a remote method; the return value of the remote method can describe the response to that message. Implementation of remote methods is typically provided by library code or is auto-generated; each protocol will have its own client object implementation. Application code calls remote methods on client objects..

On the receiving side, the key abstraction is a resource method; a resource method is part of a service. Application code provides services by implementing resource methods. This works in conjunction with listener objects. Implementation of listener objects is typically provided by library code; each protocol will have its own listener object implementation. Listener objects call resource methods on services provided by application code.

There's a final twist that ties together the sending and receiving side: resource methods are typically passed a client object to allow them to send messages back to the client.

So what is gained by providing language-level support for networking?

Most importantly, it enables a visual model that shows the program's behaviour in terms of these abstractions: the visual model can show how the program interacts using network messages. This links up with the second unique feature of Ballerina - the use of sequence diagrams as the visual model.

It also provides a purpose-designed syntax, which does not require the developer to jump through a series of hoops. You use the language-provided syntax and it just works. It's as easy as writing a function. This is not all that important for a large program. But many programs that perform integration tasks are small, and with small programs reducing the ceremony matters. You could compare this with how AWK takes care of opening files and iterating over each line of the file: it's not hard to do, but for a small program the fact that AWK takes care of this for you is a significant convenience.

Related to this is that Ballerina's model of program execution incorporates the concept of running as a service. You don't have to write an explicit loop waiting for network requests until you get a signal. The language runtime deals with all that for you. Again, not revolutionary, but it makes a difference.

The final advantage relates to typing. At the moment the type of a resource method is just a function type, and the type of a service is just a collection of these function types. But we want to do better than this. The type of a resource method should capture not just the type of its parameters and return value, but the type of the messages that it expects to receive, and the type of the message that it will send in response. (The former is at the moment partially captured by annotations, which can be generated from, for example, Swagger/OpenAPI descriptions) Furthermore, the type of a service should capture not just the type of each message exchange separately, but also the relationship between the exchanges. This is usually called session typing and is an active area of research.

Sequence diagrams

WSO2's experience from working with customers over many years has been that drawing a sequence diagram is typically the best way to describe visually how services interact. An ESB's visual model is typically based on dataflow model, which works well for simple cases but is not as expressive. So one big idea underlying Ballerina is that you should be able to visualize a function or program as a sequence diagram.

It is important to understand that the visualization of Ballerina code as a sequence diagram is not simply a matter of tooling that is layered on top of the Ballerina language. It took me a long time to really grok Sanjiva's concept for how the language relates to sequence diagrams. My initial reaction was that it seemed to me like a category error. Sequence diagrams are just a kind of picture. What's that got to do with the syntax and semantics of a programming language?

The concept is to design the syntax and semantics of the language's abstractions for sending network messages, for in-process message passing and for concurrency so that they have a close correspondence to sequence diagrams. This enables a bidirectional mapping between the textual representation of a function in Ballerina syntax and the visual representation of the function as a sequence diagram. The sequence diagram representation fully shows the behaviour of the function as it relates to concurrency and network interaction.

The closest analogy I can think of is Visual Basic. The visual model of a UI as a form is integrated with the language semantic to make writing a Windows GUI application much easier than before. Ballerina is trying to do something similar but for a different domain. You could think of it as Visual Basic for the Cloud.

Cognitive load

Programming languages differ in the demands they make of a programmer. One way to look at this is in terms of different developer personas, such Microsoft's Einstein, Elvis and Mort personas. But it's hard to do that without implying that one kind of developer is inherently superior to another, and I don't think that's a helpful way to look at things. I prefer to think of it like this: a programming language both gives and takes. It gives abstractions to make it convenient to express solutions, and it gives the ability to detect classes of errors at compile time. But it takes intellectual effort to understand the abstractions that are provided and to fit the solution into those abstractions. In other words, it relieves the programmer of some of the cognitive load required to write and maintain a program, but it also imposes its own cognitive load. Every programming language needs to strike a balance between what it gives and what it takes that is appropropriate for the kind of program for which it is intended to be suitable.

For Ballerina, the goal has been for it to make only modest demands of the programmer. Integration tasks are often quite mundane; people just want to get things working and move on. But these integrations, although mundane, can be critically important to a business: so they need to be reliable and they need to be maintainable. So the language tries to nudge programmers in the direction of doing things in a reliable and maintainable way.

Familiarity

One way to reduce cognitive load is to take advantage of people's familiarity with programming languages. Specifically, Ballerina tries to take advantage of familiarity with programming languages in the C family, such as C, C++, Java, JavaScript and C#. This applies to both syntax and semantics. It is not a hard and fast rule, but a guideline: don't be different from C without a good reason, and elegance does not by itself count as a good reason. A good example would be the rules for operator precedence: the C rules are quite a bit different from what I would design if I was starting from scratch, but the benefits from better rules just aren't enough to make it worth being different from all the other languages in the C family.

Semantically-rich static program model

I have struggled to find the right phrase to describe this. It is a generalization of static typing. The idea is that the language should enable programs to describe their semantic properties in a machine-readable way. The objective is to enable tools to construct a model of the program that incorporates these properties, and then use that model to help the developer write correct programs. This ties up with the cognitive load point. A semantically-rich model enables more powerful tools, which help reduce the effective cognitive load on the programmer

"Static" means that a tool can build a model of the program just by analysing the source code, without needing to execute the program. Often this is called "compile-time", but that doesn't seem appropriate for the way an IDE will use this model. Visual Basic used to call it "design time", but that seems a bit narrow too: continued maintenance is just as important as initial design.

For types, it means we want a static type system. But our approach to static typing is pragmatic. The static type system is there to help the programmer. We don't want the static system to be so sophisticated or so inflexible that it becomes an obstacle to writing programs. The goal is not to statically type as much as possible, but to statically type to the extent that it is likely to be helpful to the programmer writing the kinds of program for which we intend Ballerina to be used.

Types are just one kind of semantic richness. There are many others.

Sequence diagrams depend on building a model of the program where sends and receives are matched up.
Documentation that is structured, not simply free-form comments, can be checked for consistency with the program, and can be made available through the IDE.
Properties of services and listeners can be used to automate deployment of Ballerina programs to the cloud.

Platform

There's a distinction between the core of a programming language, which defines the syntax of the language and the semantics of that syntax, and the surrounding ecosystem. Often the core comes first, and the ecosystem develops organically as the core gains popularity. A lot of the utility of any language comes from the surrounding ecosystem.

In Ballerina, we refer to the core as "the language" - it's the part that's defined in the language specification. With Ballerina, the language has been designed in conjunction with key components of the surrounding ecosystem, which we call the "platform".

The platform includes:

a standard library
a centralized module repository, and the tooling needed to support that
a documentation system (based on Markdown)
a testing framework
extensions/plug-ins for popular IDEs (notably Visual Studio Code).

This all takes a lot of work, and is a big factor in why Ballerina has required such a large investment of resources from WSO2.

Multiple implementations

Although the current implementation of Ballerina compiles to JVM byte codes, Ballerina is emphatically not a JVM language. We are planning to do an implementation that compiles directly to native code and we've started to look at using LLVM for this. I suspect that an implementation targeting WebAssembly will also be important long-term.

We have been careful to ensure that the language semantics, particularly as regards concurrency, are not tied to the JVM. This was part of the motivation for the initial proof-of-concept implementation approach, which compiled into bytecode for its own virtual machine (BVM), which was then interpreted by a runtime written in Java. Although the 1.0 implementation compiles directly to the JVM, it is not an entirely straightforward mapping; it takes some tricks to implement the Ballerina concurrency semantics (similar to how Kotlin implements coroutines).

There are languages that are defined by an implementation and there are languages defined by a specification. For a language with multiple implementations, it is much better if the language is defined by a specification, rather than by the idiosyncrasies of a particular implementation. The Ballerina language is defined by its specification. This specification does not in any way depend on Java.

Initially, the specification was a partial description of the implementation. But now we have evolved to a situation where the implementation is done based on the specification. From a language design point of view, we are ready for multiple implementations. It is "just" a matter of finding resources to do the implementation. One of my hopes in writing this sequence of blog posts is somebody outside WSO2 will feel inspired to their own implementation. We would be more than happy to work with anybody who wants to take this on.

Conclusion

There has long been a distinction, originally due to John Ousterhout, between systems programming languages, and scripting or glue languages. Systems programming languages are statically typed, high performance and designed for programming in the large. Scripting/glue languages are dynamically typed, low performance and designed for programming in the small.

There are elements of truth in this distinction, but I see it more as a spectrum than as a dichotomy, and I see Ballerina as being somewhere in the middle of that spectrum. It has static typing, but it's much less rigid than the kind of static typing that systems programming languages have. It is capable of decent performance: it should be possible to make it quite a bit faster than Python, but it will never rival Rust or C++. It's not designed for programs with hundreds of thousands of lines of code, but it's also not designed for one-liners. Here's how I would place Ballerina on the spectrum relative to some other languages:

Assembly
Rust, C
C++
Go, Java, C#
Ballerina
TypeScript
Python, JavaScript
PowerShell, Bourne shell, TCL, AWK

Go is a bit hard to place relative to Java/C#. In some ways, it's more on the systems side (no VM); in some ways, it's more on the scripting side (typing). I would put Ballerina between Go and TypeScript.

In future posts, I will get into the concrete language features that these design goals have led us to. The details of the core language are in the language specification. The rest of the platform does not yet have proper specifications, but there is lots of documentation on the web site.

2019-09-11

Ballerina Programming Language - Part 0: Context

Well, it's been 9 years since my last blog post. It's been an eventful period on real life: I got married, we have two children, I became a Thai citizen, built a house and had major back surgery.

For the last 18 months, I have been working on the design of a new programming language called Ballerina. Version 1.0 of Ballerina has just been released, so now is a good time to start explaining what it's all about. In subsequent posts, I will delve into the technical details, but in this post I want to provide some context: the "who" and the "why".

TL;DR Ballerina was designed to be the core of a language-centric, cloud-native approach to enterprise integration. It comes from WSO2, which is a major open source enterprise integration vendor. I have been working on the language design and specification. I think it has potential beyond the world of enterprise integration.

The main person behind Ballerina is Sanjiva Weerawarana. I've known Sanjiva since around 1999 (20 years!), when we were both on the W3C XSL WG doing XPath and XSLT 1.0. Sanjiva at that time was working for IBM Research (where his boss at one point was Sharon Adler, who I had worked with on the ISO DSSSL committee).

This was the era of peak XML, before JSON was invented, and people were using XML for all sorts of things for which it was not very well-suited, including SOAP and the whole Web Services stack built on top of that. Sanjiva worked on several important parts of that including WSDL and BPEL.

Around 2005, Sanjiva decided he wanted to leave IBM and start a company with some fellow IBMers. He is from Sri Lanka, and wanted to go back. At that time, I was working for the Thai government. I had persuaded them to start an open source promotion activity, and I was running that for them (one day I should write a blog about that).

On Boxing Day 2004, there was a huge tsunami in the Indian Ocean, which was a disaster for several countries including Thailand and Sri Lanka. As part of the recovery process, the Thai government had organized an international IT conference in Phuket at the beginning of 2005. Sanjiva came to talk about Sahana, which was an effort started in Sri Lanka to use open source to help with recovery from the tsunami.

On the sidelines of the conference Sanjiva pitched me the idea for the company, at that time called Serendib Systems (the word serendipity comes from Serendip, which is an old name for Sri Lanka). The idea was to do open source related to web services, based in Sri Lanka. It was at the intersection of a number of my main interests at the time (XML and open source in developing countries), and I had confidence in Sanjiva, so it wasn't a hard decision to invest.

The name was changed to WSO2 (WS as in web services, O2 as in oxygen), Sanjiva took the role of CEO and I joined the board. WSO2 has grown steadily in the 14 years since it was founded, and now has about 600 employees. It has remained an open source company and it has developed a comprehensive open source enterprise integration platform. You may well never have heard of WSO2; we have always been rather better at the technical side of things than the marketing side. But we are actually a major vendor in the open source enterprise integration space, with lots of global Fortune 500 customers. In fact, there’s some Gartner report that says we are the world’s #1 open source integration vendor, although I’m not quite sure on what metric.

For quite some time, the workhorse of enterprise integration has been the Enterprise Service Bus (ESB). An ESB sends and receives network messages over a variety of transports, and there is a configuration language, typically in XML, that describes the flow of these messages. The configuration language can be seen as a domain-specific language (DSL) for integration. It supports abstractions like mediators, endpoints, proxy services and scheduled tasks, which allow a given message flow to be described at a higher-level than would be possible if the equivalent code were written in a programming language such as Java or Go. ESB products (including WSO2's) typically include a GUI for editing the configuration language. The ESB's higher-level abstractions allow for a much more useful graphical view than would be possible with a solution that was written in a programming language.

The fact that an ESB is not a full programming language has important consequences. It means that at a certain point you fall off a cliff: there are things you simply cannot express in the XML configuration language. ESBs typically deal with this by allowing you to write extensions in Java. In practice, this means that complex solutions are written as a combination of XML configuration and Java extensions. This creates a number of problems. First, the ESB is tied to Java. 10 years ago that wasn't really a problem, but increasingly Java is the new COBOL. The cool kids are interested in Go, TypeScript or Rust and would not even consider Java. Oracle's stewardship of Java does not help. Second, the Java extensions are a black box as far as the graphical interface is concerned. Third, multiple languages creates additional complexity for many aspects of the software development process: build, deployment, debugging. Fourth, it's bad in terms of the cognitive load that it places on the developer team: the developers have to learn two quite different languages, and continually switch gears between them.

The other fundamental problem with the ESB concept is that is designed for a centralized deployment model. The idea is that the IT department of an enterprise runs the Enterprise Service Bus for the entire enterprise. It is not only the large footprint of an ESB that pushes in this direction, but also the licensing model: ESBs are typically not cheap and are licensed on a per-server basis. If you think of the XML configuration language as a domain-specific programming language, and of the ESB as the runtime for that language, you in effect have one large program, controlling integration across the entire enterprise. Furthermore, this program is not written in a pleasant, modern programming language, with support for modularity, but is rather just a pile of XML. As you can imagine, this is not good for agility or DevOps.

This is the background that led to the creation of Ballerina. The high-level goal is to provide the foundation of a new approach to enterprise integration that is a better fit for current computing trends than the ESB. Obviously, the cloud is a hugely important part of this. The Ballerina concept evolved over a number of years. I see three stages:

Let’s do a better DSL that looks more like a programming language!
Let’s make it full programming language!
Let’s take a shot at becoming a mainstream programming language!

Stage 2 marks the start of the Ballerina project, and was when the name was chosen; that happened in August 2016.

My first involvement with Ballerina was at the beginning of 2017, when Sanjiva asked me to help with the design of the language support for XML. But I only started to get really deeply involved in Ballerina in February 2018. At that point there was already a working, proof-of-concept implementation. Sanjiva asked me to help write a language specification.

When I started, we did not think it would take all that long for me to write a specification. We were completely wrong about that! It's been 18 months already, and it is still a work-in-progress. What happened is that as we dug into the details of the language, it became apparent that there was a lot of scope for improvement in the design. The job turned out to be more about refining and evolving the language design, rather than just documenting what had been implemented. As it became clearer than the goal was to try eventually to become a mainstream programming language, so the quality bar for the implementation needed to be raised.

Sanjiva's primary area of expertise is distributed systems, and WSO2's collective expertise is centered around enterprise middleware, rather than programming language design and implementation. When they started the Ballerina project, I think they underestimated the enormity of the project that they had taken on, as did I to some extent. As I have been wrestling with the Ballerina language design, I have gained a much better appreciation of just how hard programming language design is. I have looked at many other programming languages for inspiration. I've been incredibly impressed by how good the current generation of programming languages are. I would particularly highlight TypeScript, Go, Rust and Kotlin. Each of them has a very different language concept, but every one of them has done an amazing job of designing a programming language that realizes their concept. I take my hat off to their designers.

I should say something about what a version of 1.0 means. I should first explain first we make a distinction between the implementation version and the specification version. 1.0.0 is the implementation version. Language specifications are labelled chronologically (it's a living standard!). The 1.0.0 implementation is based on the language specification labelled 2019R3, which means the 3rd release of 2019.

1.0 does not mean that we have got either the language design or implementation to where we want it. If we lived in a world unsullied by commercial or competitive reality, we could easily spend a couple of years extending and improving the design and implementation. But WSO2 is not a huge company, and we have already made a very substantial investment in Ballerina (of the order of 50 engineers over 3 years). So we need to get something out there, so that we can get some proof points to justify continued investment. The benchmark for 1.0 is whether it works better for enterprise integration than our current ESB-based product. It needs to be sufficiently stable and performant that we can support it in production for enterprise customers.

We also have a reasonable degree of alignment between the language specification and the compiler: what the compiler implements is a subset of what the specification describes, with a couple of caveats. The first caveat is that there are some non-core features that are not quite stable. These are labelled "preview" in the specification. We expect to stabilise these soon, and that will involve some minor incompatible changes. The second caveat is that the implementation has some experimental features, which are not in the specification; we plan that the language will eventually include features that provide similar functionality.

The language design described by the current specification has two fundamental features that are unique (at least not part of any mainstream programming language). Its combination of other features is also unique: each feature is individually in some language, but no language has all of them. I think the language design is interesting not just for enterprise integration, but for any application which is mainly about combining services, whether consuming them or providing them. As things move to the cloud, more and more applications will fall into this category. Although the current state of the language design is interesting, I think the potential is even more interesting. Over the next year or two, we will stabilize more of the integration-oriented language features, which will make Ballerina quite different from any other programming language. Unfortunately, it takes a lot of work to get the general-purpose features solid and that has to be done before the more domain-specific features can be finalized.

Overall, the 2019R3 language design and the 1.0 implementation are an initial, stable step, but there is still a long way to go.

In future posts, I will get into the design of the language. In the meantime, you can try out the implementation and read the specification. The design process was initially quite closed, but has gradually become more open. Most of the discussion on the spec happens in issues in the spec's GitHub repository. Major new language features have public proposals. Comments and suggestions are welcome; the best way to provide input on is to open a new issue.

See the next post in the series.

2010-12-18

More on MicroXML

There's been lots of useful feedback to my previous post, both in the comments and on xml-dev, so I thought I would summarize my current thinking.

It's important to be clear about the objectives. First of all, MicroXML is not trying to replace or change XML. If you love XML just as it is, don't worry: XML is not going away. Relative to XML, my objectives for MicroXML are:

Compatible: any well-formed MicroXML document should be a well-formed XML document.
Simpler and easier: easier to understand, easier to learn, easier to remember, easier to generate, easier to parse.
HTML5-friendly, thus easing the creation of documents that are simultaneously valid HTML5 and well-formed XML.

JSON is a good, simple, extensible format for data. But there's currently no good, simple, extensible format for documents. That's the niche I see for MicroXML. Actually, extensible is not quite the right word; generalized (in the SGML sense) is probably better: I mean something that doesn't build-in tag-names with predefined semantics. HTML5 is extensible, but it's not generalized.

There are a few technical changes that I think are desirable.

Namespaces. It's easier to start simple and add functionality later, rather than vice-versa, so I am inclined to start with the simplest thing that could possibly work: no colons in element or attribute names (other than xml:* attributes); "xmlns" is treated as just another attribute. This makes MicroXML backwards compatible with XML Namespaces, which I think is a big win.
DOCTYPE declaration. Allowing an empty DOCTYPE declaration <!DOCTYPE foo> with no internal or external subset adds little complexity and is a huge help on HTML5-friendliness. It should be a well-formedness constraint that the name in the DOCTYPE declaration match the name of the document element.
Data model. It's a fundamental part of XML processing that <foo/> is equivalent to <foo></foo>. I don't think MicroXML should change that, which means that the data model should not have a flag saying whether an element uses the empty-element syntax. This is inconsistent with HTML5, which does not allow these two forms to be used interchangeably. However, I think the goal of HTML5-friendliness has to be balanced against the goal of simple and easy and, in this case, I think simple and easy wins. For the same reason, I would leave the DOCTYPE declaration out of the data model.

Here's an updated grammar.

# Documents
document ::= comments (doctype comments)? element comments
comments ::= (comment | s)*
doctype ::= "<!DOCTYPE" s+ name s* ">"
# Elements
element ::= startTag content endTag
          | emptyElementTag
content ::= (element | comment | dataChar | charRef)*
startTag ::= '<' name (s+ attribute)* s* '>'
emptyElementTag ::= '<' name (s+ attribute)* s* '/>'
endTag ::= '</' name s* '>'
# Attributes
attribute ::= attributeName s* '=' s* attributeValue
attributeValue ::= '"' ((attributeValueChar - '"') | charRef)* '"'
                 | "'" ((attributeValueChar - "'") | charRef)* "'"
attributeValueChar ::= char - ('<'|'&')
attributeName ::= "xml:"? name
# Data characters
dataChar ::= char - ('<'|'&'|'>')
# Character references
charRef ::= decCharRef | hexCharRef | namedCharRef
decCharRef ::= '&#' [0-9]+ ';'
hexCharRef ::= '&#x' [0-9a-fA-F]+ ';'
namedCharRef ::= '&' charName ';'
charName ::= 'amp' | 'lt' | 'gt' | 'quot' | 'apos'
# Comments
comment ::= '<!--' (commentContentStart commentContentContinue*)? '-->'
# Enforce the HTML5 restriction that comments cannot start with '-' or '->'
commentContentStart ::= (char - ('-'|'>')) | ('-' (char - ('-'|'>')))
# As in XML 1.0
commentContentContinue ::= (char - '-') | ('-' (char - '-'))
# Names
name ::= nameStartChar nameChar*
nameStartChar ::= [A-Z] | [a-z] | "_" | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D]
                | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF]
                | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
nameChar ::= nameStartChar | [0-9] | "-" | "." | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
# White space
s ::= #x9 | #xA | #xD | #x20
# Characters
char ::= s | ([#x21-#x10FFFF] - forbiddenChar)
forbiddenChar ::= surrogateChar | #FFFE | #FFFF
surrogateChar ::= [#xD800-#xDFFF]

2010-12-13

MicroXML

There's been a lot of discussion on the xml-dev mailing list recently about the future of XML. I see a number of different possible directions. I'll give each of these possible directions a simple name:

XML 2.0 - by this I mean something that is intended to replace XML 1.0, but has a high degree of backward compatibility with XML 1.0;
XML.next - by this I mean something that is intended to be a more functional replacement for XML, but is not designed to be compatible (however, it would be rich enough that there would presumably be a way to translate JSON or XML into it);
MicroXML - by this I mean a subset of XML 1.0 that is not intended to replace XML 1.0, but is intended for contexts where XML 1.0 is, or is perceived as, too heavyweight.

I am not optimistic about XML 2.0. There is a lot of inertia behind XML, and anything that is perceived as changing XML is going to meet with heavy resistance. Furthermore, backwards compatibility with XML 1.0 and XML Namespaces would limit the potential for producing a clean, understandable language with really substantial improvements over XML 1.0.

XML.next is a big project, because it needs to tackle not just XML but the whole XML stack. It is not something that can be designed by a committee from nothing; there would need to be one or more solid implementations that could serve as a basis for standardization. Also given the lack of compatibility, the design will have to be really compelling to get traction. I have a lot of thoughts about this, but I will leave them for another post.

In this post, I want to focus on MicroXML. One obvious objection is that there is no point in doing a subset now, because of the costs of XML complexity have already been paid. I have a number of responses to this. First, XML complexity continues to have a cost even when XML parsers and other tools have been written; it is an ongoing cost to users of XML and developers of XML applications. Second, the main appeal of MicroXML should be to those who are not using XML, because they find XML overly complex. Third, many specifications that support XML are in fact already using their own ad-hoc subsets of XML (eg XMPP, SOAP, E4X, Scala). Fourth, this argument applied to SGML would imply that XML was pointless.

HTML5 is another major factor. HTML5 defines an XML syntax (ie XHTML) as well as an HTML syntax. However, there are a variety of practical reasons why XHTML, by which I mean XHTML served as application/xml+xhtml, isn't common on the Web. For example, IE doesn't support XHTML; Mozilla doesn't incrementally render XHTML. HTML5 makes it possible to have "polyglot" documents that are simultaneously well-formed XML and valid HTML5. I think this is potentially a superb format for documents: it's rich enough to represent a wide range of documents, it's much simpler than full HTML5, and it can be processed using XML tools. There's an W3C WD for this. The WD defines polyglot documents in a slightly different way, requiring them to produce the same DOM when parsed as XHTML as when parsed as HTML; I don't see much value in this, since I don't see much benefit in serving documents as application/xml+xhtml. The practical problem with polyglot documents is that they require the author to obey a whole slew of subtle lexical restrictions that are hard to enforce using an XML toolchain and a schema language. (Schematron can do a bit better here than RELAX NG or XSD.)

So one of the major design goals I have for MicroXML is to facilitate polyglot documents. More precisely the goal is that a document can be guaranteed to be a valid polyglot document if:

it is well-formed MicroXML, and
it satisfies constraints that are expressed purely in terms of the MicroXML data model.

Now let's look in detail at what MicroXML might consist of. (When I talk about HTML5 in the following, I am talking about its HTML syntax, not its XML syntax.)

Specification. I believe it is important that MicroXML has its own self-contained specification, rather being defined as a delta on existing specifications.
DOCTYPE declaration. Clearly the internal subset should not be allowed. The DOCTYPE declaration itself is problematic. HTML5 requires valid HTML5 documents to start with a DOCTYPE declaration. However, HTML5 uses DOCTYPE declarations in a fundamentally different way to XML: instead of referencing an external DTD subset which is supposed to be parsed, it tells the HTML parser what parsing mode to use. Another factor is that almost the only thing that the XML subsets out there agree on is to disallow the DOCTYPE declaration. So my current inclination is to disallow the DOCTYPE declaration in MicroXML. This would mean that MicroXML does not completely achieve the goal I set above for polyglot documents. However, you would be able to author a <body> or a <section> or an <article> as MicroXML; this would then have to be assembled into a valid HTML5 document by a separate process (albeit a very simple one). It would be great if HTML5 provided an alternate way (using attributes or elements) to declare that an HTML document be parsed in standards mode. Perhaps a boolean "standard" attribute on the <meta> element?
Error handling. Many people in the HTML community view XML's draconian error handling as a major problem. In some contexts, I have to agree: it is not helpful for a user agent to stop processing and show an error, when a user is not in a position to do anything about the error. I believe MicroXML should not impose any specific error handling policy; it should restrict itself to specifying when a document is conforming and specifying the instance of the data model that is produced for a conforming document. It would be possible to have a specification layered on top of MicroXML that would define detailed error handling (as for example in the XML5 specification).
Namespaces. This is probably the hardest and most controversial issue. I think the right answer is to take a deep breath and just say no. One big reason is that the HTML5 does not support namespaces (remember, I am talking about the HTML syntax of HTML5). Another reason is that the basic idea of binding prefixes to URIs is just too hard; the WHATWG wiki has a good page on this. The question then becomes how does MicroXML handle the problems that XML Namespaces addresses. What do you do if you need to create a document that combines multiple independent vocabularies? I would suggest two mechanisms:
- I would support the use of the xmlns attribute (not xmlns:x, just bare xmlns). However, as far as the MicroXML data model is concerned, it's just another attribute. It thus works in a very similar way to xml:lang: it would be allowed only where a schema language explicitly permits it; semantically it works as an inherited attribute; it does not magically change the names of elements.
- I would also support the use of prefixes. The big difference is that prefixes would be meaningful and would not have to be declared. Conflicts between prefixes would be avoided by community cooperation rather than by namespace declarations. I would divide prefixes into two categories: prefixes without any periods, and prefixes with one or more periods. Prefixes without periods would have a lightweight registration procedure (ie a mailing list and a wiki); prefixes with periods would be intended for private use only and would follow a reverse domain name convention (e.g. com.jclark.foo). For compatibility with XML tools that require documents to be namespace-well-formed, it would be possible for MicroXML documents to include xmlns:* attributes for the prefixes it uses (and a schema could require this). Note that these would be attributes from the MicroXML perspective. Alternatively, a MicroXML parser could insert suitable declarations when it is acting as a front-end for a tool that expects an namespace well-formed XML infoset.
Comments. Allowed, but restricted to be HTML5-compatible; HTML5 does not allow the content of a comment to start with -or ->.
Processing instructions. Not allowed. (HTML5 does not allow processing instructions.)
Data model. The MicroXML specification should define a single, normative data model for MicroXML documents. It should be as simple possible:
- The model for a MicroXML document consists of a single element.
- Comments are not included in the normative data model.
- An element consists of a name, attributes and content.
- A name is a string. It can be split into two parts: a prefix, which is either empty or ends in a colon, and local name.
- Attributes are a map from names to Unicode strings (sequences of Unicode code-points).
- Content is an ordered sequence of Unicode code-points and elements.
- An element probably also needs to have a flag saying whether it's an empty element. This is unfortunate but HTML5 does not treat an empty element as equivalent to a start-tag immediately followed by an end-tag: elements like <br> cannot have end-tag, and elements that can have content such as <a> cannot use the empty element syntax even if they happen to be empty. (It would be really nice if this could be fixed in HTML5.)
Encoding. UTF-8 only. Unicode in the UTF-8 encoding is already used for nearly 50% of the Web. See this post from Google. XML 1.0 also requires support for UTF-16, but UTF-16 is not in my view used sufficiently on the Web to justify requiring support for UTF-16 but not other more widely used encodings like US-ASCII and ISO-8859-1.
XML declaration. Not allowed. Given UTF-8 only and no DOCTYPE declarations, it is unnecessary. (HTML5 does not allow XML declarations.)
Names. What characters should be allowed in an element or attribute name? I can see three reasonable choices here: (a) XML 1.0 4th edition, (b) XML 1.0 5th edition or (c) the ASCII-only subset of XML name characters (same in 4th and 5th editions). I would incline to (b) on the basis that (a) is too complicated and (c) loses too much expressive power.
Attribute value normalization. I think this has to go. HTML5 does not do attribute value normalization. This means that it is theoretically possible for a MicroXML document to be interpreted slightly differently by an XML processor than by a MicroXML processor. However, I think this is unlikely to be a problem in practice. Do people really put newlines in attribute values and rely on their being turned into spaces? I doubt it.
Newline normalization. This should stay. It makes things simpler for users and application developers. HTML5 has it as well.
Character references. Without DOCTYPE declarations, only the five built-in character entities can be referenced. Things could be simplified a little by allowing only hex or only decimal numeric character references, but I don't think this is worthwhile.
CDATA sections. I think best to disallow. (HTML5 allows CDATA sections only in foreign elements.) XML 1.0 does not allow the three-character sequence ]]> to occur in content. This restriction becomes even more arbitrary and ugly when you remove CDATA sections, so I think it is simpler just to require > to always be entered using a character reference in content.

Here's a complete grammar for MicroXML (using the same notation as the XML 1.0 Recommendation):

# Documents
document ::= (comment | s)* element (comment | s)*
element ::= startTag content endTag
          | emptyElementTag
content ::= (element | comment | dataChar | charRef)*
startTag ::= '<' name (s+ attribute)* s* '>'
emptyElementTag ::= '<' name (s+ attribute)* s* '/>'
endTag ::= '</' name s* '>'
# Attributes
attribute ::= name s* '=' s* attributeValue
attributeValue ::= '"' ((attributeValueChar - '"') | charRef)* '"'
                 | "'" ((attributeValueChar - "'") | charRef)* "'"
attributeValueChar ::= char - ('<'|'&')
# Data characters
dataChar ::= char - ('<'|'&'|'>')
# Character references
charRef ::= decCharRef | hexCharRef | namedCharRef
decCharRef ::= '&#' [0-9]+ ';'
hexCharRef ::= '&#x' [0-9a-fA-F]+ ';'
namedCharRef ::= '&' charName ';'
charName ::= 'amp' | 'lt' | 'gt' | 'quot' | 'apos'
# Comments
comment ::= '<!--' (commentContentStart commentContentContinue*)? '-->'
# Enforce the HTML5 restriction that comments cannot start with '-' or '->'
commentContentStart ::= (char - ('-'|'>')) | ('-' (char - ('-'|'>')))
# As in XML 1.0
commentContentContinue ::= (char - '-') | ('-' (char - '-'))
# Names
name ::= (simpleName ':')? simpleName
simpleName ::= nameStartChar nameChar*
nameStartChar ::= [A-Z] | [a-z] | "_" | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D]
                | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF]
                | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
nameChar ::= nameStartChar | [0-9] | "-" | "." | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
# White space
s ::= #x9 | #xA | #xD | #x20
# Characters
char ::= s | ([#x21-#x10FFFF] - forbiddenChar)
forbiddenChar ::= surrogateChar | #FFFE | #FFFF
surrogateChar ::= [#xD800-#xDFFF]

2010-11-24

XML vs the Web

Twitter and Foursquare recently removed XML support from their Web APIs, and now support only JSON. This prompted Norman Walsh to write an interesting post, in which he summarised his reaction as "Meh". I won't try to summarise his post; it's short and well-worth reading.

From one perspective, it's hard to disagree. If you're an XML wizard with a decade or two of experience with XML and SGML before that, if you're an expert user of the entire XML stack (eg XQuery, XSLT2, schemas), if most of your data involves mixed content, then JSON isn't going to be supplanting XML any time soon in your toolbox.

Personally, I got into XML not to make my life as a developer easier, nor because I had a particular enthusiasm for angle brackets, but because I wanted to promote some of the things that XML facilitates, including:

textual (non-binary) data formats;
open standard data formats;
data longevity;
data reuse;
separation of presentation from content.

If other formats start to supplant XML, and they support these goals better than XML, I will be happy rather than worried.

From this perspective, my reaction to JSON is a combination of "Yay" and "Sigh".

It's "Yay", because for important use cases JSON is dramatically better than XML. In particular, JSON shines as a programming language-independent representation of typical programming language data structures. This is an incredibly important use case and it would be hard to overstate how appallingly bad XML is for this. The fundamental problem is the mismatch between programming language data structures and the XML element/attribute data model of elements. This leaves the developer with three choices, all unappetising:

live with an inconvenient element/attribute representation of the data;
descend into XML Schema hell in the company of your favourite data binding tool;
write reams of code to convert the XML into a convenient data structure.

By contrast with JSON, especially with a dynamic programming language, you can get a reasonable in-memory representation just by calling a library function.

Norman argues that XML wasn't designed for this sort of thing. I don't think the history is quite as simple as that. There were many different individuals and organisations involved with XML 1.0, and they didn't all have the same vision for XML. The organisation that was perhaps most influential in terms of getting initial mainstream acceptance of XML was Microsoft, and Microsoft was certainly pushing XML as a representation for exactly this kind of data. Consider SOAP and XML Schema; a lot of the hype about XML and a lot of the specs built on top of XML for many years were focused on using XML for exactly this sort of thing.

Then there are the specs. For JSON, you have a 10-page RFC, with the meat being a mere 4 pages. For XML, you have XML 1.0, XML Namespaces, XML Infoset, XML Base, xml:id, XML Schema Part 1 and XML Schema Part 2. Now you could actually quite easily take XML 1.0, ditch DTDs, add XML Namespaces, xml:id, xml:base and XML Infoset and end up with a reasonably short (although more than 10 pages), coherent spec. (I think Tim Bray even did a draft of something like this once.) But in 10 years the W3C and its membership has not cared enough about simplicity and coherence to take any action on this.

Norman raises the issue of mixed content. This is an important issue, but I think the response of the average Web developer can be summed up in a single word: HTML. The Web already has a perfectly good format for representing mixed content. Why would you want to use JSON for that? If you want to embed HTML in JSON, you just put it in a string. What could be simpler? If you want to embed JSON in HTML, just use <script> (or use an alternative HTML-friendly data representation such as microformats). I'm sure Norman doesn't find this a satisfying response (nor do I really), but my point is that appealing to mixed content is not going to convince the average Web developer of the value of XML.

There's a bigger point that I want to make here, and it's about the relationship between XML and the Web. When we started out doing XML, a big part of the vision was about bridging the gap from the SGML world (complex, sophisticated, partly academic, partly big enterprise) to the Web, about making the value that we saw in SGML accessible to a broader audience by cutting out all the cruft. In the beginning XML did succeed in this respect. But this vision seems to have been lost sight of over time to the point where there's a gulf between the XML community and the broader Web developer community; all the stuff that's been piled on top of XML, together with the huge advances in the Web world in HTML5, JSON and JavaScript, have combined to make XML be perceived as an overly complex, enterprisey technology, which doesn't bring any value to the average Web developer.

This is not a good thing for either community (and it's why part of my reaction to JSON is "Sigh"). XML misses out by not having the innovation, enthusiasm and traction that the Web developer community brings with it, and the Web developer community misses out by not being able to take advantage of the powerful and convenient technologies that have been built on top of XML over the last decade.

So what's the way forward? I think the Web community has spoken, and it's clear that what it wants is HTML5, JavaScript and JSON. XML isn't going away but I see it being less and less a Web technology; it won't be something that you send over the wire on the public Web, but just one of many technologies that are used on the server to manage and generate what you do send over the wire.

In the short-term, I think the challenge is how to make HTML5 play more nicely with XML. In the longer term, I think the challenge is how to use our collective experience from building the XML stack to create technologies that work natively with HTML, JSON and JavaScript, and that bring to the broader Web developer community some of the good aspects of the modern XML development experience.

2010-02-12

A tour of the open standards used by Google Buzz

The thing I find most attractive about Google Buzz is its stated commitment to open standards:

We believe that the social web works best when it works like the rest of the web — many sites linked together by simple open standards.

So I took a bit of time to look over the standards involved. I’ll focus here on the standards that are new to me.

One key design decision in Google Buzz is that individuals in the social web should be identifiable by email addresses (or at least strings that look like email addresses). On balance I agree with this decision: although it is perhaps better from a purist Web architecture perspective to use URIs for this, I think email addresses work much better from a UI perspective.

Google Buzz therefore has some standards to address the resulting discovery problem: how to associate metadata with something that looks like an email address. There are two key standards here:

XRD. This is a simple XML format developed by the OASIS XRI TC for representing metadata about a resource in a generic way. This looks very reasonable and I am happy to see that it is free of any XRI cruft. It seems quite similar to RDDL.
WebFinger. This provides a mechanism for getting from an email address to an XRD file. It’s a two-step process based on HTTP. First of all you HTTP get an XRD file from a well-known URI constructed using the domain part of the email address (the well-known URI follows the Defining Well-Known URIs and host-meta Internet Drafts). This per-domain XRD file provides (amongst other things) a URI template that tells you how to construct a URI for an email address in that domain; dereferencing this URI will give you an XRD representation of metadata related to that email address. There seem to be some noises about a JSON serialization, which makes sense: JSON seems like a good fit for this problem.

One of the many interesting things you can do with such a discovery mechanism is to associate a public key with an individual. There’s a spec called Magic Signatures that defines this. Magic Signatures correctly eschews all the usual X.509 cruft, which is completely unnecessary here; all you need is a simple RSA public key. My one quibble would be that it invents its own format for public keys, when there is already a perfectly good standard format for this: the DER encoding of the RSAPublicKey ASN.1 structure (defined by RFC 3477/PKCS#1), as used by eg OpenSSL.

Note that for this to be secure, WebFinger needs to fetch the XRD files in a secure way, which means either using SSL or signing the XRD file using XML-DSig; in both these cases it is leveraging the existing X.509 infrastructure. The key architectural decision here is to use the X.509 infrastructure to establish trust at the domain level, and then to use Web technologies to extend that chain of trust from the domain to the individual. From a deployment perspective, I think this will work well for things like Gmail and Facebook, where you have many users per domain. The challenge will be do make it work well for things like Google Apps for your Domain, where the number of users per domain may be few. At the moment, Google Apps requires the domain administrator only to set up some DNS records. The problem is that DNS isn’t secure (at least until DNSSEC is widely deployed). Here’s one possible solution: the user’s domain (e.g. jclark.com) would have an SRV record pointing to a host in the provider’s domain (e.g. foo.google.com); the XRD is fetched using HTTP, but is signed using XML-DSig and an X.509 certificate for the user’s domain. The WebFinger service provider (e.g. Google) would take care of issuing these certificates, perhaps with flags to limit their usage to WebFinger (Google already verifies domain control as part of the Google Apps setup process). The trusted roots here might be different from the normal browser vendor determined HTTPS roots.

The other part of Magic Signatures is billed as a simpler alternative to XML-DSig which also works for JSON. The key idea here is to avoid the whole concept of signing an XML information item and thus avoid the need for canonicalization. Instead you sign a byte sequence, which is encoded in base64 as the content of an XML element (or as a JSON string). I don’t agree with the idea of always requiring base64 encoding of the content to be signed: that seems to unnecessarily throw away many of the benefits of a textual format. Instead, when the byte sequence that you are signing is representing a Unicode string, you should be able to represent the Unicode string directly as the content of an XML element or as a JSON string, using the built-in quoting mechanisms of XML (character references/entities and CDATA sections) or JSON. The Unicode string that results from XML or JSON parsing would be UTF-8 encoded before the standard signature algorithm is applied. A more fundamental problem with Magic Signatures is that it loses the key feature of XML-DSig (particularly with enveloped signatures) that applications that don’t know or care about signing can still understand the signed data, simply by ignoring the signature. I completely sympathize with the desire to avoid the complexity of XML-DSig, but I’m unconvinced that Magic Signatures is the right way to do so. Note that XRD has a dependency on XML-DSig, but it specifies a very limited profile of XML-DSig, which radically reduces the complexity of XML-DSig processing. For JSON, I think i

There are also standards that extend Atom. The simplest are just content extensions:

Atom Activity Extensions provides semantic markup for social networking activities (such as "liking" something or posting something). This makes good sense to me.
Media RSS Module provides extensions for dealing with multimedia content. These were originally designed by Yahoo for RSS. I don't yet understand how these interact with existing Atom/AtomPub mechanisms for multimedia (content/@src, link).

There are also protocol extensions:

PubSubHubbub provides a scalable way of getting near-realtime updates from an Atom feed. The Atom feed includes a link to a “hub”. An aggregator can then register with hub to be notified when a feed is updated. When a publisher updates a feed, it pings the hub and the hub then updates all the aggregators that have registered with it. This is intended for server-based aggregators, since the hub uses HTTP POST to notify aggregators.
Salmon makes feed aggregation two-way. Suppose user A uses only social networking site X and user B uses only social networking site Y. If user A wants to network with B, then typically either A has to join Y or B has to join X. This pushes the world in the direction of having one dominant social network (i.e. Facebook). In the long-term I don’t think this is a good thing. The above extensions solve part of the problem. X can expose a profile for A that links to an Atom feed, and Y can use this to provide B with information about A. But there’s a problem. Suppose B wants to comment on one of A’s entries. How can Y ensure that B’s comment flows back to X, where A can see it? Note that there may be another user C on another social networking site Z that may want to see B’s comment on A’s entry. The basic idea is simple: the Atom feed for A exposed by X links to a URI to which comments can be posted. The heavy lifting of Salmon is done by Magic Signatures. Signing the Atom entries is the key to allowing sites to determine whether to accept comments.

Google seems to planning to use the Open Web Foundation (OWF) for some of these standards. Although the OWF’s list of members includes many names that I recognize and respect, I don’t really understand why we need the OWF. It seems very similar to the IETF in its emphasis on individual participation. What was the perceived deficiency in the IETF that motivated the formation of the OWF?

2010-02-06

Mac Day 1

I decided to dip my toe in the Mac world and buy a Mac mini. If I decide to make the switch, I will probably end up getting a fully tricked out MacBook Pro, but I'm not ready for that yet and I want to wait for the expected MacBook Pro refresh.

I've been using it for 24 hours.

Likes

The hardware is beautiful. The attention to detail is fantastic. Somebody has taken the time to think about even something as mundane as the power cord (it's less stiff than normal power cords and curls nicely). The whole package exudes quality.
It's reassuring to have something Unix-like underneath.
Mostly things "just work".
The dock is quite pretty and intuitive.
Set up was smooth and simple.

Dislikes

The menu bar is an abomination. When you have a large screen, it makes no sense to have the menus always at the top left of the screen, which may well be far from the application window.
On screen font rendering seems less good than Windows. I notice this particularly in Safari. It's tolerable, but the Mac is definitely a step down in quality here.
I was surprised how primitive the application install, update and removal experience was. I miss apt-get. Many updates seem to require a restart.
I don't like the wired Apple mouse. Although it looks nice, clicking is not as easy as with a cheap, conventional mouse, plus the lead is way too short.

Minor nits

How is a new user supposed to find the web browser? The icon is a compass (like the iPhone icon that gives a real compass) and the tooltip says "Safari".
A Safari window with tabs looks ugly to me: there's this big band of gray and black at the top of the window.
Not convinced DisplayPort has sufficient benefits over HDMI to justify a separate standard.
I couldn't find a way of playing a VCD using the standard applications. I ended up downloading VLC, which worked fine.
The Magnification preference on the Dock was not on by default, even though it was enabled in the introductory Apple video.

So far I've installed:

NeoOffice
Adium (didn't work well with MSN, which is the dominant chat system in Thailand, so I will probably remove it)
Microsoft Messenger
Emacs
Blogo, which I am using to write this. Is there a better free equivalent to Windows Live Writer?
VLC
Skype

I plan to install

XCode
iWork

Any other software I should install? Should I be using something other than Safari as my Web browser?

2010-01-02

XML Namespaces

One of my New Year’s resolutions is to blog more. I don’t expect I’ll have much more success with this than I usually do with my New Year’s resolutions, but at least I can make a start.

I have been continuing to have a dialog with some folks at Microsoft about M. This has led me to do a lot of thinking about what is good and bad about the XML family of standards.

The standard I found it most hard to reach a conclusion about was XML Namespaces. On the one hand, the pain that is caused by XML Namespaces seems massively out of proportion to the benefits that they provide. Yet, every step on the process that led to the current situation with XML Namespaces seems reasonable.

We need a way to do distributed extensibility (somebody should be able to choose a name for an element or attribute that won’t conflict with anybody else’s name without having to check with some central naming).
The one true way of naming things on the Web is with a URI.
XML is supposed to be human readable/writable so we can’t expect people to put URIs in every element/attribute name, so we need a shorter human-friendly name and a way to bind that to a URI.
Bindings need to nest so that XML Namespace-generating processes can stream, and so that one document can easily be embedded in another.
XML Namespace processing should be layered on top of XML 1.0 processing.
Content and attribute values can contain strings that represent element and attribute names; these strings should be handled uniformly with names that the XML parser recognizes as element and attribute names.

I would claim that the aspect of XML Namespaces that causes pain is the URI/prefix duality: the thing that occurs in the document (the prefix + local name) is not the same as the thing that is semantically significant (the namespace URI + local name). As soon as you accept this duality, I believe you are doomed to a significant extra layer of complexity.

The need for this duality stemmed from the use of URIs for names. As far as I remember, there was actually no discussion in the XML WG on this point when we were doing XML Namespaces: it was treated as axiomatic that URIs were the right thing to use here. But this is where I believe XML Namespaces went wrong.

From a purely practical point of view, the argument for naming namespaces with URIs is that you can do a GET on the URI and get something human- or machine-readable back that tells you about the semantics of the namespace. I have two responses to this:

This is a capability that is occasionally useful, but it’s not that useful. The utility here is of a completely different order of magnitude compared to the disutility that results from the prefix/URI duality. Of course, if you are a RDF aficionado, you probably disagree.
You can make names resolvable without using URIs. For example, a MIME-type X/Y can be made resolvable by having a convention that it http://www.iana.org/assignments/media-types/X/Y; or, if you have a dotted DNS-style name (e.g. org.example.bar.foo), you can use DNS TXT records to make it resolvable.

From a more theoretical point of view, I think the insistence on URIs for namespaces is paying insufficient attention to the distinction between instances of things and types of things. The Web works as well as it does because there is an extraordinarily large number of instances of things (ie Web pages) and a relatively very small number of types of things (ie MIME types). Completely different considerations apply to naming instances and naming types: both the scale and the goals are completely different. URIs are the right way to name instances of things on the Web; it doesn’t follow that they are the right way to name types of things.

I also have a (not very well substantiated) feeling that using URIs for namespaces tends to increase coupling between XML documents and their processing. An example is that people tend to assume that you can determine the XML schema for a document just by looking at the namespace URI of the document element.

What lessons can we draw from this?

For XML, what is done is done. As far as I can tell, there is zero interest amongst major vendors in cleaning up or simplifying XML. I have only two small suggestions, one for XML language designers and one for XML tool vendors:

For XML language designers, think whether it is really necessary to use XML Namespaces. Don’t just mindlessly stick everything in a namespace because everybody else does. Using namespaces is not without cost. There is no inherent virtue in forcing users to stick xmlns=”…” on the document element.
For XML vendors, make sure your tool has good support for documents that don’t use namespaces. For example, don’t make the namespace URI be the only way to automatically find a schema for a document.

What about future formats? First, I believe there is a real problem here and a format should define a convention (possibly with some supporting syntax) to solve the problem. Second, a solution that involves a prefix/URI duality is probably not a good approach.

Third, a purely registry-based solution imposes centralization in situations where there’s no need. On the other hand, a purely DNS-based solution puts all extensions on the same level, when in reality from a social perspective extensions are very different: an extension that has been standardized or has a public specification is very different from an ad hoc extension used by a single vendor. It’s good if a technology encourages cooperation and coordination.

My current thinking is that a blend of registry- and DNS-based approaches would be nice. For example, you might have something like this:

names consist of one or more components separated by dots;
usually names consist of a single component, and their meaning is determined contextually;
names consisting of multiple components are used for extensions; the initial component must be registered (the registration process can be as lightweight as adding an entry to a wiki, like WHATWG does HTML5 for rel values);
there is a well-known URI for each registered initial component;
one registered initial component is “dns”: the remaining components are a reversed DNS name (Mark Nottingham’s had a ID like this for MIME types); there’s some way of resolving such a name into a URI.

Some other people’s thinking on this that I’ve found helpful: Mark Nottingham, Jeni Tennison, Tim Bray (and the rest of that xml-dev thread).

2009-03-23

Getting involved with M

I spent last week in Redmond talking to Microsoft about M and Oslo. The question at the back of my mind before I went was "Does M really have the potential in the long term to be an interesting and useful alternative to XML?". My tentative answer is yes. Here's why:

Although M, as it is today, is interesting in a number of ways, it is obviously a long way from being a serious alternative to XML (at least for the kinds of application I am interested in). One of my concerns was that I would hear "It's too late to change that". I never did: I was pleasantly surprised that Microsoft are still willing to make fundamental changes to M.
Microsoft recognize that M's long-term potential would be severely limited if it was a proprietary, Microsoft-only technology. I believe they realize that M needs to end up as a genuinely open standard. They've already made initial steps towards a more open process for M. On the other hand, they don't believe in design by committee. (And having seen some of the abominations that design by committee can produce, I can certainly sympathise with that.) There's a senior Microsoft guy that gets to make the final call on all design decisions. In other words, it's a benevolent dictator model. I'm OK with this in principle (although I like it even better when I'm the benevolent dictator). I think it's worked really well in a number of cases (C# and Python spring to mind). But obviously it all depends on the qualities of the particular benevolent dictator. From my interactions so far, he seems like a really smart guy and he's willing to listen.
Microsoft is addressing the whole stack. An alternative to XML needs to provide not only an alternative to XML itself but also alternatives to XSD/RELAX NG and XQuery/XSLT.
Microsoft seem to be designing things in a principled way; they are paying attention to the relevant CS theory. For example, ML seems to be a major influence. They are making an effort to produce something clean, elegant, even beautiful, rather than doing just enough to get a product out.
Microsoft seem willing to take documents seriously. This is a make or break issue for me, because the kind of data I care about most is documents and M, as it is today, is not useful for documents. This was probably the issue we spent the most time on; I talked a lot about the importance of mixed content. One of the Microsoft team suggested the goal of using M to do the M specification. I think this sort of dogfooding will be very helpful in ensuring M works well for documents.

Of course, it's early days yet, and it's hard to tell how much leverage M will get, but there's enough potential to make me want to be involved.