Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Just a Theory

By David E. Wheeler

⛰️ Postgres Ecosystem Summit EU

Given the success of the Extension Ecosystem Summit at PGConf.dev back in May, my colleague Floor Drees has organized a sequel, the Extension Ecosystem Summit EU on Tuesday, October 22, at the Divani Caravel Hotel in Athens. That’s “Day 0” at the same hotel as PGConf.eu. Tembo, Percona, Xata, and Timescale co-sponsor.

While the May event took the form of an open-space technology (OST)-style unconference aimed at extension developers, the EU event aims to inform an audience of Postgres users about the history and some exemplary use cases for extensions. From the invite:

Join us for a gathering to explore the current state and future of Postgres extension development, packaging, and distribution. Bring your skills and your devices and start contributing to tooling underpinning many large Postgres installations.

  • Jimmy Angelakos - pg_statviz: pg_statviz is a minimalist extension and utility pair for time series analysis and visualization of PostgreSQL internal statistics.
  • Adam Hendel (Tembo) - pgmq: pgmq is a lightweight message queue. Like AWS SQS and RSMQ but on Postgres. Adam is pgmq’s maintainer since 2023, and will present a journey from pure Rust → pgrx → pl/pgsql.
  • Alastair Turner (Percona) - pg_tde: pg_tde offers transparent encryption of table contents at rest, through a Table Access Method extension. Percona has developed pg_tde to deliver the benefits of encryption at rest without requiring intrusive changes to the Postgres core.
  • Gülçin Yıldırım Jelínek (Xata) - pgzx: pgzx is a library for developing PostgreSQL extensions written in Zig.
  • Mats Kindahl (Timescale) - TimescaleDB (C), [pgvectorscale] (Rust) and pgai (Python): maintaining extensions written in different languages.

I will also deliver the opening remarks, including a brief history of Postgres extensibility. Please join us if you’re in the area or planning to attend PGConf.eu. See you there!

Release: pgxn_meta v0.1.0

Following the recent spate of work drafting RFCs for a binary distribution format and Meta Spec v2, and bearing in mind the vote to implementing PGXN v2 apps in Rust, over the last few weeks I’ve switched gears to write some code.

First I wrote JSON Schemas for the v1 spec and then wrote a full test suite using the boon crate. Next I wrote and tested JSON Schemas for the v2 spec. This process informed revisions to the RFC, uncovering inconsistencies and unnecessary complexities.

With META.json file JSON Schema validation fully worked up, I decided to work the functionality into a proper Rust crate and CLI to eventually replace the old PGXN::Meta::Validator Perl module and its validate_pgxn_meta CLI. This turned out to be a decent Rust starter project, requiring a fairly simple crate and CLI, but also allowed me to develop patterns to build and release binaries for a variety of platforms and architecture.

As a result, I’m happy to announce the release today of the pgxn_meta crate and pgxn_meta CLI v0.1.0, available for download on these platforms, thanks to cross and houseabsolute/actions-rust-cross:

  • darwin-amd64
  • darwin-arm64
  • freebsd-amd64
  • freebsd-i686
  • illumos-amd64
  • linux-amd64
  • linux-arm
  • linux-arm64
  • linux-i686
  • linux-powerpc
  • linux-powerpc64
  • linux-powerpc64le
  • linux-riscv64
  • linux-s390x
  • linux-sparc64
  • netbsd-amd64
  • solaris-amd64
  • solaris-sparcv9
  • windows-amd64
  • windows-arm64
  • windows-i686

Download the archive file appropriate to your platform, decompress it, and put the pgxn_meta (or pgxn_meta.exe) binary in your path. Or use the universal binary installer (ubi) to install it:

ubi --project pgxn/meta --exe pgxn_meta --in ~/bin

And of course you can use cargo to compile it from source:

cargo install pgxn_meta

Usage is simple: just run pgxn_meta in a directory containing the META.json file to validate:

❯ pgxn_meta 
META.json is OK

And optionally pass it the name of the file, as in this example parsing a test file with no version property:

❯ pgxn_meta corpus/invalid.json 
Error: "corpus/invalid.json jsonschema validation failed with https://pgxn.org/meta/v2/distribution.schema.json#\n- at '': missing properties 'version'"```

That’s it!

What’s Next?

Now that I’ve implemented validation and figured out multi-platform binary support for Rust apps, my next tasks are to:

  • Implement a pattern to convert a v1 META.json to the v2 format
  • Create a pattern to merge multiple META.json files into one
  • Write code to build PGXS extension into trunk packages
  • Develop patterns to satisfy third-party dependencies for multiple platforms

Should keep me busy for a few weeks. Updates as I have them.

To Preload, or Not to Preload

The Tembo Blog published a post by yours truly last week about when to preload shared libraries and when not to:

Recently I’ve been trying to figure out when a Postgres extension shared libraries should be preloaded. By “shared libraries” I mean libraries provided or used by Postgres extensions, whether LOADable libraries or CREATE EXTENSION libraries written in C or pgrx. By “preloaded” I mean under what conditions should they be added to one of the Shared Library Preloading variables, especially shared_preload_libraries.

The answer, it turns out, comes very much down to the extension type.

I view this post as a kind of proto-chapter for an imagined book about developing extensions that I’d like to work on someday. I learned quite a lot researching it and responding to extensive feedback from more knowledgeable community members. It resulted in updates to the PGXN Meta preload property that I hope will inform binary distribution in the future. More on that soon.

RFC: PGXN Meta Spec v2

Two bits of news on the “PGXN v2” project.

PGXN RFCs: The Book

First, I’ve moved the RFC process (again, sorry) from PGXN Discussions, which were a bit fussy about Markdown formatting and don’t support inline comments, to the PGXN RFCs project, where use of pull requests on CommonMark Markdown documents address these issues. This process borrows heavily from the Rust RFCs project, right down to publishing accepted RFCs as a “book” site.

So I’d also like to introduce rfcs.pgxn.org, a.k.a., the PGXN RFCs Book.

It currently houses only one RFC: Meta Spec v1, dating from 2010. This document defines the structure of the META.json file required in archives published on PGXN.

But I expect many more RFCs to be drafted in the coming years, starting with draft RFC–2, the binary distribution RFC I POCed a few weeks ago. There has already been some great feedback in that pull request, in addition to the previous discussion. More eyes will make it even better.

PGXN Meta Spec v2 RFC

Last week I also iterated on the PGXN Metadata Sketch several times to produce draft RFC–3: Meta Spec v2. This represents a major reworking of the original spec in an attempt to meet the following goals:

  • Allow more comprehensive dependency specification, to enable packagers to identify and install system dependencies and dependencies from other packaging systems, like PyPI and CPAN
  • Adopt more industry-standard formats like SPDX License Expressions and purls.
  • Improve support multiple types of Postgres extensions, including apps, LOADable modules, background workers, and TLEs.
  • Improve curation and evaluation via categories, badging, and additional download links.

There’s a lot here, but hope the result can better serve the community for the next decade, and enable lots of new services and features.

The proof will be in the application, so my next task is to start building the tooling to turn PGXN distributions into binary distributions. I expect experimentation will lead to additional iterations, but feel confident that the current state of both RFC–2 and RFC–3 is on the right track.

Introducing Go SQL/JSON Path and Playground

For a personal project, I needed to parse and execute PostgreSQL-compatible jsonpath expressions.1 So I’ve spent just about every spare evening and weekend the last several months porting Postgres jsonpath to Go, and it’s finally ready to ship.

Introducing Go SQL/JSON, featuring the path package. This project provides full support for all of the PostgresSQL 17 jsonpath features2 in the Go programming language. An example:

package main

import (
	"context"
	"encoding/json"
	"fmt"
	"log"

	"github.com/theory/sqljson/path"
	"github.com/theory/sqljson/path/exec"
)

func main() {
	// Parse some JSON.
	var value any
	err := json.Unmarshal([]byte(`{"a":[1,2,3,4,5]}`), &value)
	if err != nil {
		log.Fatal(err)
	}

	// Parse a path expression and execute it on the JSON.
	p := path.MustParse("$.a[*] ? (@ >= $min && @ <= $max)")
	res, err := p.Query(
		context.Background(),
		value,
		exec.WithVars(exec.Vars{"min": float64(2), "max": float64(4)}),
	)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("%v\n", res)
    // Output: [2 3 4]
}

I think the API is decent, but may implement better patterns as I discover them. Overall I’m quite satisfied with how it turned out, and just how well its implementation and performance compare to the original.

🛝 Playground

But why stop there? One of the nice things about this project is that Go supports compiling applications into WebAssembly (a.k.a. Wasm) via Go WebAssembly. Borrowing from the Goldmark project, I created and published the sqljson/path playground and populated the docs with links for all of its examples.

Now anyone can experiment with SQL/JSON path expressions, and share links to demonstrate patterns and techniques. The Playground is a stateless JavaScript/Wasm web application: data persists only in permalink URLs.3

🛝 Try this example Playground permalink right now!4

The Path Ahead

I’ve enjoyed learning how to implement a lexer, a goyacc parser, an AST, and an execution engine. The Playground was a bonus bit of fun!

I’m stoked to build cool stuff on this package, but don’t know whether anyone else will find it useful. If you do — or just enjoy messing about on the Playground, let me know!


  1. “Whatever for,” you ask? Well, aside from wanting to see if I could do it, this post describes a POC. Now I’m working to create the real thing — done right and entirely from scratch. ↩︎

  2. Well, nearly full. The only missing feature is the datetime(template) method. See also the comprehensive compatibility notes↩︎

  3. And whatever data GitHub Pages collect 😔. ↩︎

  4. JSON borrowed from MDM↩︎

Patch: Postgres ABI and API Guidance

Update 2024-11-14

I forgot to update this post at the time, but on July 31, Peter Eisentraut committed the patch patch ABI and API guidance to the C language documentation. I only noticed because today’s releases contained a modified ABI that broke a number of extensions. See the hackers thread for details.

TL;DR

If you’re a Postgres extension developer interested in understanding what to expect from core API and ABI stability, please review and give feedback on this patch (or pull request) adding ABI and API Guidance to the documentation.


In my PGConf.dev report a couple days ago, I mentioned that a few actionable items came out of the Improving extensions in core unconference session. One was the need to document the heretofore unofficial policy for API and ABI stability between major and, especially, minor versions of Postgres.

A frequent topic at the Extension Summit and Mini-Summits and a number of PCConf sessions has been concern regarding compatibility changes between minor releases of Postgres. At Mini Summit Five, for example, Yurii Rashkovskii presented a few examples of such changes, leading him to conclude, along with several others in the community, that C API-using extensions can only be used when built against the minor release with which they’re used.

In the Unconference session, core committers reported that such changes are carefully made, and rarely, if ever, affect extensions compiled for different minor releases of the same major version. Furthermore, they carefully make such changes to avoid compatibility issues. In the case Yurii found, for example, a field was added to a struct’s padding, without affecting the ordering of other fields, thus minimizing the risk of runtime failures.

It became clear that, although the committers follow a policy — and read new committers into it via patch review — it’s not documented anywhere. The result has been a bunch of sturm und drang amongst extension developer unsure what level of compatibility to depend on and what changes to expect.

The week after the conference, I started a pgsql-hackers thread proposing to document the committer policy. Following some discussion and review of potential ABI breaks in minor releases, the consensus seemed to be that the committers strive to avoid such breaks, that they’re quite uncommon in minor releases, and that most of the reported issues were due to using more obscure APIs.

As a result, we started drafting a policy, and after a few iterations, Peter Eisentraut pulled things together from the perspective of a core team member, reframed as “Server API and ABI Guidance”. I converted it into a patch (and pull request) to add it to the C Language docs. A key statement on minor releases:

In general, extension code that compiles and works with a minor release should also compile and work with any other minor release of the same major version, past or future.

I hope this document clarifies things. Even if it’s not as strict as some might hope, it at least documents the project approach to compatibility, so we have a better idea what to expect when using the C APIs. If you see gaps, or you have additional questions, please respond to pgsql-hackers thread — or the pull request (I’ll propagate comments to hackers).

PGConf.dev 2024

In addition to the afore-blogged Extension Summit, I also attended a slew of the regular PGConf.dev sessions, gave a talk on the future of the extension ecosystem, socialized with extension authors and core developers, and joined discussions in a number of unconference sessions. Some notes on selected talks and events:

Sessions

I enjoyed The road to new SQL/JSON features, where Álvaro Herrera gave a brief history of SQL/JSON in Postgres, starting with the JSON type in 9.2 (2012), JSONB in 2014, and SQL standard jsonpath in Postgres 12 (2017). Getting the SQL/JSON syntax finished turned out to be substantially more difficult, thanks to parsing issues. It took many attempts and a couple of reversions before most of the functionality was completed last year and included in Postgres 16. The forthcoming Postgres 17 finishes the work, with the standard fully supported except for “the JSON_TABLE plan param and json simplified accessor.”

It’s a great time to use Postgres for JSON object storage and management.

In Anarchy in the Database, subtitled “A Survey and Evaluation of Database Management System Extensibility”, Abigale Kim described her Master’s thesis work investigating Postgres extension incompatibilities. Installing and running tests for pairs of extensions, she found a number of conflicts and issues, such as a bug when Citus was paired with auto_explain (fixed in May). In all, 17% of pairs failed! Abi also found that 19% of extensions contain code copied from the Postgres core; page_inspect is 75% copied code!

Abi advocates for adding an extension manager into core, with well-defined hooks to manage extension load order and to streamline enabling and disabling extensions. Very interesting research, highlighting the need to think more deeply about how best to enable and empower the extension ecosystem.

Jeff Davis and Jeremy Schneider gave a thorough overview of Collations from A to Z. The problem rose to wide attention about six years ago when an libc upgrade changed a collation, leading to data loss, crashes, and duplicate primary keys. Ideally, sort orders would never change. But humans gotta human, language will evolve, and the order of things will need to be updated. In such situations, one must be aware of the changes and reindex or rebuild all indexes (and replace hot standbys, which can’t be reindexed).

I very much appreciated the context, as the ongoing issue with collations and upgrades has confused me. Should application authors choose collations or should DBAs? The new [builtin] collation provider in PostgresSQL 17 tries tries to bridge the gap by supporting unchanging Unicode code-point collation ordering that’s reasonably meaningful to humans. But I also realize that, for some projects with no need for human sort ordering, the C collations is more than sufficient.

In her keynote, When Hardware and Databases Collide, Margo Seltzer offered a provocation: Could PostgreSQL adopt something like CXL to scale to a virtually infinite pool of memory? Could one build a “complete fabric of CXL switches to turn an entire data center into a database”? I have no idea! It sure sounds like it could enable gigantic in-memory databases.

Tricks from in-memory databases by Andrey Borodin mostly went over my head, but each of the experiments sped things up a few percentage points. Together they might add up to something.

The Making PostgreSQL Hacking More Inclusive panel was terrific, and much-needed. I’m grateful that Amit Langote, Masahiko Sawada, and Melanie Plageman shared their experiences as up-and-coming non-white-male committers. I think the resulting discussion will help drive new inclusion initiatives in the PostgreSQL community, such as session moderator Robert Haas’s recently-announced Mentoring Program for Code Contributors.

Oh, and I gave a talk, The future of the extension ecosystem, in which I expanded on my mini-summit talk to suss out the needs of various members of the extension ecosystem (authors, users, DBAs, industry) and our plans to meet those needs in PGXN v2. Links:

Unconference

I also participated in the Friday Unconference. Abi, Yurii, and I led a discussion on Improving extensions in core. We discussed the need for an ABI stability policy, extension management, smoke testing (including for conflicts between extensions), a coalition to advocate for extensions in core (since launched as the Postgres Extension Developers Coalition), inline extensions, WASM-based extensions, and server installation immutability. Great discussions and a few actionable outcomes, some of which I’ve been working on. More soon in future posts.

In Increase Community Participation, we talked about the challenges for broadening the PostgreSQL contributor community, attracting and retaining contributors, recognizing contributions, and how to address issues of burnout and allow people to “retire”. I joined the discourse on how we could adopt or at least support GitHub workflows, such as pull requests, to encourage more patch review in a familiar environment. Personally, I’ve been creating pull requests in my fork for my patches for this very reason.

We also touched on training and mentoring new contributors (hence the mentoring announcement) and changes to postgresql.org, notably adding dedicated pages for each project governance committee, especially for the Contributors Committee (there’s a Contributors Committee?), as well as information for how to become a contributor and be listed on the contributor page.

Final Thoughts

I attended PGCon from 2009 to 2014, and always enjoyed the commearderie in Ottawa every year. Most people went to the same pub after sessions every night (or for some part of each evening), where random connections and deep technical nerdery would continue into the small hours, both indoors and out. The Black Oak was a highlight of the conference for me, every year.

In the intervening years I got busy with non-Postgres work and scaled back my participation. I finally returned in 2023 (other than a virtual unconference in 2022), and found it much the same, although the Black Oak had closed, and now there were 2-3 where people went, diluting the social pool a bit — though still a highlight.

As the new iteration of the Postgres Developer Conference, PGConf.dev is a worthy successor. Vancouver was a nice city for it, and people bought the same energy as always. I connected with far more people, and more meaningfully, than at any other conference. But other than the reception and dinner on Wednesday, there was no one (or three) place where people tended to aggregate into the wee hours. Or at least I wasn’t aware of it. The end of PGCon is bittersweet for me, but I’m happy to continue to participate in PGCONf.dev.

See you next year!

POC: Distributing Trunk Binaries via OCI

A couple months ago, Álvaro Hernández suggested that Postgres extensions should be distributed as OCI (née Docker) images:

It’s all about not reinventing the wheel, and leveraging the ecosystem around OCI. Many of the problems (solutions) in building, packaging and distributing extensions are already solved by OCI: there’s a whole ecosystem of tools around OCI that provide additional benefits in terms of tooling, infrastructure and common knowledge.

As a relatively experienced Docker image builder and distributor, I found this idea intriguing. I wasn’t familiar with the OCI Image Manifest Specification, which defines how to build OCI images containing arbitrary files, or “artifacts”. But if we could adopt an existing protocol and federated registry system like OCI/Docker, it would save pretty significant development time over building our own — plus we’d be adopting and potentially contributing to a standard.

After PGConf.dev, I decided to see if I could work out how to distribute packages in the recently-proposed trunk format such that an OCI/Docker-style image URL could be used to install a version of an extension compiled for the appropriate architecture.

Thanks to the denizens of the #oras and #zot channels on the CNCF Slack, I extended the trunk format POC in pg-semver PR 69 to build the necessary JSON manifest files, push them to a registry, and then pull and install the architecturally-appropriate package. Here’s how it works.

Metadata generation

First, I extended trunk.mk, which builds a trunk package, with a few more targets that create the JSON files with metadata necessary to build OCI manifests. The files that make trunk now also generates are:

{extension}_annotations.json
OCI standard annotations describing a package, including license, vendor, and URLs. The semver_annotations.json file looks like this:
{
  "org.opencontainers.image.created": "2024-06-20T18:07:24Z",
  "org.opencontainers.image.licenses": "PostgreSQL",
  "org.opencontainers.image.title": "semver",
  "org.opencontainers.image.description": "A Postgres data type for the Semantic Version format with support for btree and hash indexing.",
  "org.opencontainers.image.source": "https://github.com/theory/pg-semver",
  "org.opencontainers.image.vendor": "PGXN",
  "org.opencontainers.image.ref.name": "0.32.1",
  "org.opencontainers.image.version": "0.32.1",
  "org.opencontainers.image.url": "https://github.com/theory/pg-semver"
}
{package_name}_config.json
An object with fields appropriate for OCI platform specification, plus the creation date. Here are the content of semver-0.32.1+pg16-darwin-23.5.0-arm64_config.json:
{
  "os": "darwin",
  "os.version": "23.5.0",
  "architecture": "arm64",
  "created": "2024-06-20T18:07:24Z"
}
{package_name}_annotations.json
An object defining annotations to use in an image, built for a specific platform, all under the special key $manifest to be used later by the ORAS CLI to put them in the right place. semver-0.32.1+pg16-darwin-23.5.0-arm64_annotations.json example:
{
  "$manifest": {
    "org.opencontainers.image.created": "2024-06-20T18:07:24Z",
    "org.opencontainers.image.title": "semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk",
    "org.opencontainers.image.licenses": "PostgreSQL",
    "org.opencontainers.image.description": "A Postgres data type for the Semantic Version format with support for btree and hash indexing.",
    "org.opencontainers.image.source": "https://github.com/theory/pg-semver",
    "org.opencontainers.image.vendor": "PGXN",
    "org.opencontainers.image.ref.name": "0.32.1",
    "org.opencontainers.image.version": "0.32.1",
    "org.opencontainers.image.url": "https://github.com/theory/pg-semver",
    "org.pgxn.trunk.pg.version": "16.3",
    "org.pgxn.trunk.pg.major": "16",
    "org.pgxn.trunk.pg.version_num": "160003",
    "org.pgxn.trunk.version": "0.1.0"
  }
}

The org.opencontainers.image keys are the same as in semver_annotations.json, while the new org.pgxn.trunk annotations are intended for an install client to find the image appropriate for the version of Postgres, although that functionality isn’t part of this POC.

The only change to the Makefile to support these annotations are the addition of a DESCRIPTION variable to populate org.opencontainers.image.description and a REPO_URL to populate org.opencontainers.image.source. trunk.mk includes a couple other new variables, too: TITLE (defaults to EXTENSION), VENDOR (defaults to “PGXN”), and URL (defaults to REPO-URL).

Publishing Images

The new shell script push_trunk uses the ORAS CLI and jq to build the necessary manifest files and push them to an OCI registry. It currently works only two trunk files like those built in the trunk POC. It first “pushes” the trunks to a locally-created OCI layout, then constructs manifests associated the SHA ID of each just-pushed image with annotations and platform configurations and writes them into an image index manifest. Finally, it pushes the complete OCI layout described by the index to a remote registry.

If that sounds like a lot of steps, you’re right, it adds up. But the result, following a precedent established by Homebrew (as described in this issue) is multiple images for different platforms indexed at a single URI. Once we publish the two trunks:

./push_trunk localhost:5000/theory/semver:0-32.1 \
    semver-0.32.1+pg16-darwin-23.5.0-arm64 \
    semver-0.32.1+pg16-linux-amd64

We can fetch the manifests. The address for the image index is that first parameter, localhost:5000/theory/semver:0-32.1; we fetch the manifest with the command

oras manifest fetch localhost:5000/theory/semver:0-32.1

Which returns:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "size": 1285,
      "digest": "sha256:1a14997eb380f9641cba6193c001eb630319f345d76ef07aee37f86fafcdbe0b",
      "platform": {
        "os": "linux",
        "architecture": "amd64"
      },
      "annotations": {
        "org.pgxn.trunk.pg.version": "16.3",
        "org.pgxn.trunk.pg.major": "16",
        "org.pgxn.trunk.pg.version_num": "160003",
        "org.pgxn.trunk.version": "0.1.0"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "size": 1302,
      "digest": "sha256:385fcfe6b33c858c3f126fb4284afe23ba8c2f7c32db8a50a607dfece6dd9162",
      "platform": {
        "os": "darwin",
        "os.version": "23.5.0",
        "architecture": "arm64"
      },
      "annotations": {
        "org.pgxn.trunk.pg.version": "16.3",
        "org.pgxn.trunk.pg.major": "16",
        "org.pgxn.trunk.pg.version_num": "160003",
        "org.pgxn.trunk.version": "0.1.0"
      }
    }
  ],
  "annotations": {
    "org.opencontainers.image.created": "2024-06-21T13:55:01Z",
    "org.opencontainers.image.licenses": "PostgreSQL",
    "org.opencontainers.image.title": "semver",
    "org.opencontainers.image.description": "A Postgres data type for the Semantic Version format with support for btree and hash indexing.",
    "org.opencontainers.image.source": "https://github.com/theory/pg-semver",
    "org.opencontainers.image.vendor": "PGXN",
    "org.opencontainers.image.ref.name": "0.32.1",
    "org.opencontainers.image.version": "0.32.1",
    "org.opencontainers.image.url": "https://github.com/theory/pg-semver"
  }
}

Note the manifests array, which lists images associated with this URI. The first one is for amd64 linux and the second for arm64 darwin. They also contain the org.pgxn.trunk annotations that would allow filtering for an appropriate Postgres version. The idea is to download an index like this, find the manifest information for the appropriate platform and Postgres version, and download it. To get the darwin image, pull it by its digest:

oras pull localhost:5000/theory/semver:0-32.1@sha256:385fcfe6b33c858c3f126fb4284afe23ba8c2f7c32db8a50a607dfece6dd9162

Which downloads the file:

$ ls -l *.trunk
semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk

Nice! The OCI protocol allows for filtering on platform directly, skipping the need to download and examine the image index. This is how docker pull --platform works, but is general to OCI. We can fetch a manifest with this command:

oras manifest fetch --platform linux/amd64 localhost:5000/theory/semver:0-32.1

Which returns not the image index, but the manifest for the Linux image:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "artifactType": "application/vnd.pgxn.trunk.layer.v1",
  "config": {
    "mediaType": "application/vnd.oci.image.config.v1+json",
    "digest": "sha256:90c8d6f2e67cba09f8178648ad95a6b31e51c0d902058bf396f9e7e5f50c8dfd",
    "size": 84
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:b52b292cabe3ca479673ab68d3ea647802a86f15059c3e19ed24d5a7688159c3",
      "size": 61983,
      "annotations": {
        "org.opencontainers.image.title": "semver-0.32.1+pg16-linux-amd64.trunk"
      }
    }
  ],
  "annotations": {
    "org.opencontainers.image.created": "2024-06-21T17:55:13Z",
    "org.opencontainers.image.description": "A Postgres data type for the Semantic Version format with support for btree and hash indexing.",
    "org.opencontainers.image.licenses": "PostgreSQL",
    "org.opencontainers.image.ref.name": "0.32.1",
    "org.opencontainers.image.source": "https://github.com/theory/pg-semver",
    "org.opencontainers.image.title": "semver-0.32.1+pg16-linux-amd64.trunk",
    "org.opencontainers.image.url": "https://github.com/theory/pg-semver",
    "org.opencontainers.image.vendor": "PGXN",
    "org.opencontainers.image.version": "0.32.1",
    "org.pgxn.trunk.pg.major": "16",
    "org.pgxn.trunk.pg.version": "16.3",
    "org.pgxn.trunk.pg.version_num": "160003",
    "org.pgxn.trunk.version": "0.1.0"
  }
}

Or we can pull the file by platform with:

rm *.trunk
oras pull --platform linux/amd64 localhost:5000/theory/semver:0-32.1

And now the Linux image has been downloaded:

$ ls -1 *.trunk
semver-0.32.1+pg16-linux-amd64.trunk

Pretty nice! These examples use zot running in a local Docker container, but could just as easily use the Docker registry (docker.io) or the GitHub registry (ghcr.io) — which is where Homebrew stores its images (e.g., sqlite 3.46.0).

Installation

With these manifests configured and pushed, changes to install_trunk use this knowledge to download from the registry instead of relying on an existing file (as implemented for the trunk POC). Now we call it like so:

./install_trunk localhost:5000/theory/semver:0-32.1

First, it assembles platform information from uname, then pulls the platform-specific image with this oras command:

oras pull --no-tty --plain-http \
     --format 'go-template={{(first .files).path}}' 
     --platform "$platform" "$trunk"

As before, it downloads the image appropriate for the platform. The --format option, meanwhile, causes it to also download annotations and extract the path for the downloaded file. So in addition to downloading the file, it also emits its full path:

/tmp/pgxn/semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk

the script proceeds to unpack the image with that file name and continues with the installation process as before.

Demo

The last new file in the PR is docker_compose.yml, which sets up an amd64 Linux container for building an extension for Postgres 16, and a zot container to push to and pull from. I used it to build this POC and record this demo:

To use it yourself, run these commands with docker_compose.yml:

git clone https://github.com/theory/pg-semver.git
cd pg-semver
git checkout -b trunk-oci origin/trunk-oci
docker compose up -d

This clones the pg-semver repository, checks out the trunk-oci branch, fires up the containers. Wait a couple minutes for Postgres to start and be configured, then, assuming you can build against Postgres 16 on your local machine, you can follow the same steps. The commands in the demo are:

make trunk
docker compose exec linux bash
make clean
make trunk
exit
ls -1 *.trunk
ls -1 *.json
./push_trunk localhost:5000/theory/semver:0.32.1 \
    semver-0.32.1+pg16-darwin-23.5.0-arm64 \
    semver-0.32.1+pg16-linux-amd64
./install_trunk localhost:5000/theory/semver:0.32.1
docker compose exec linux bash
./install_trunk zot:5000/theory/semver:0.32.1
exit

You might need to adjust the first trunk image name if your local configuration is not the same as mine.

Concept Proven

Honestly, this POC far exceeded my expectations. It worked great! Not only does the trunk format seem to work well, but distributing via OCI registries is even better! It brings a bunch of benefits:

  • We could build a community registry that automatically builds images for PGXN releases for a variety of platforms. This could grow to become the default method for installing extensions, perhaps via a command such as pgxn trunk install theory/semver.
  • Anyone can use any other registry, and the tooling will work with it. Just as you can pull Docker images from docker.io, you can also pull them from ghcr.io, quay.io, or any other OCI-compliant registry. The same applies here. Extension authors can build and publish trunks to their own registries if they like.
  • Better yet, organizations can build extension registries for their own use cases, to complement the community registry. Think internal registries for private extensions, or commercial registries that additional features, such as security scans or curation.

Super promising! I’m just about ready to get to work building this stuff, though I anticipate a few challenges:

  • We’ll need a way to find the latest version (tag) of a release. I’m sure this is do-able, since Homebrew does it. There must be some other index for tags (ghcr.io/homebrew/core/sqlite:latest doesn’t return a result, alas).
  • In addition to filtering on platform specification when pulling an image, it would be nice to filter on other attributes, such as the org.pgxn.trunk annotations defining Postgres the version. For now it will be fine for the CLI to download an image index and find the right image, but additional server-side filtering would be very nice.
  • Will need to support extensions that can run on any architecture, such as pure SQL extensions. I think this will be pretty easy by publishing a single tagged image instead of an image index.
  • If we build a community registry, where should it be hosted? Homebrew uses ghcr.io, presumably avoiding hosting costs, but it might be nice to have a specific community registry, perhaps at trunk.pgxn.org or perhaps oci.postgresql.org.
  • If we do host a registry, might we want to allow extension authors to publish their own trunks within their namespaces? How might that be organized?

I can imagine workable solutions to these relatively minor challenges. As long as we can encapsulate them into the commands for a single command-line client, it should work out well.

Can’t wait to get started. What do you think?

POC: PGXN Binary Distribution Format

In an effort to make discussion of PGXN and related improvements as accessible as possible, I’ve set up PGXN Discussions on GitHub. Of course GitHub created default categories for all sorts of conversation, and all is welcome, in accordance with the PostgresSQL Code of Conduct.

But I hope more people will be able to find, read, comment on, and even write their own RFCs than was possible on the Postgres Wiki or on Just a Theory. Therefore, please have a look at Proposal-2: Binary Distribution Format, which draws inspiration from the Python wheel format and Trunk to define a packaging format that allows for platform and PostgreSQL version matching to quickly install pre-compiled binary PostgreSQL extension packages.

The proposal has the details, but the basic idea is that files to be installed are stored in directories named for pg_config directory configurations. Then all an installer has to do is install the files in those subdirectories into the pg_config-specified directories.

POC

I ran this idea past some colleagues, and they thought it worth exploring. But the proposal itself didn’t feel sufficient. I wanted to prove that it could work.

So I created a proof-of-concept (POC) implementation in just about the quickest way I could think of and applied it to the semver extension in PR 68. Here’s how it works.

trunk.mk

A new file, trunk.mk, dupes all of the install targets from PGXS and rejiggers them to install into the proposed package directory format. The Makefile simply imports trunk.mk:

--- a/Makefile
+++ b/Makefile
@@ -22,6 +22,7 @@ endif
 
 PGXS := $(shell $(PG_CONFIG) --pgxs)
 include $(PGXS)
+include ./trunk.mk
 
 all: sql/$(EXTENSION)--$(EXTVERSION).sql
 

And now there’s a trunk target that uses those packaging targets. Here’s its output on my amd64 Mac (after running make):

$ make trunk
gmkdir -p 'semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/extension'
gmkdir -p 'semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver'
gmkdir -p 'semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib'
gmkdir -p 'semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/doc/semver'
ginstall -c -m 644 .//semver.control 'semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/extension/'
ginstall -c -m 644 .//sql/semver--0.10.0--0.11.0.sql .//sql/semver--0.11.0--0.12.0.sql .//sql/semver--0.12.0--0.13.0.sql .//sql/semver--0.13.0--0.15.0.sql .//sql/semver--0.15.0--0.16.0.sql .//sql/semver--0.16.0--0.17.0.sql .//sql/semver--0.17.0--0.20.0.sql .//sql/semver--0.2.1--0.2.4.sql .//sql/semver--0.2.4--0.3.0.sql .//sql/semver--0.20.0--0.21.0.sql .//sql/semver--0.21.0--0.22.0.sql .//sql/semver--0.22.0--0.30.0.sql .//sql/semver--0.3.0--0.4.0.sql .//sql/semver--0.30.0--0.31.0.sql .//sql/semver--0.31.0--0.31.1.sql .//sql/semver--0.31.1--0.31.2.sql .//sql/semver--0.31.2--0.32.0.sql .//sql/semver--0.32.1.sql .//sql/semver--0.5.0--0.10.0.sql .//sql/semver--unpackaged--0.2.1.sql  'semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/'
ginstall -c -m 755  src/semver.dylib 'semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib/'
gmkdir -p '/Users/david/.pgenv/pgsql-16.3/lib/bitcode/src/semver'
gmkdir -p 'semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib/bitcode'/src/semver/src/
ginstall -c -m 644 src/semver.bc 'semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib/bitcode'/src/semver/src/
cd 'semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib/bitcode' && /opt/homebrew/Cellar/llvm/18.1.6/bin/llvm-lto -thinlto -thinlto-action=thinlink -o src/semver.index.bc src/semver/src/semver.bc
ginstall -c -m 644 .//doc/semver.mmd 'semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/doc/semver/'
ginstall -c -m 644 .//README.md .//LICENSE .//Changes 'semver-0.32.1+pg16-darwin-23.5.0-arm64/'
rm -f "semver-0.32.1+pg16-darwin-23.5.0-arm64/digests"
cd "semver-0.32.1+pg16-darwin-23.5.0-arm64/" && find * -type f | xargs shasum --tag -ba 256 > digests
tar zcvf semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk semver-0.32.1+pg16-darwin-23.5.0-arm64
a semver-0.32.1+pg16-darwin-23.5.0-arm64
a semver-0.32.1+pg16-darwin-23.5.0-arm64/LICENSE
a semver-0.32.1+pg16-darwin-23.5.0-arm64/Changes
a semver-0.32.1+pg16-darwin-23.5.0-arm64/trunk.json
a semver-0.32.1+pg16-darwin-23.5.0-arm64/README.md
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/digests
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/doc
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/extension
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.3.0--0.4.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.30.0--0.31.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.21.0--0.22.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.32.1.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.10.0--0.11.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.13.0--0.15.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.31.1--0.31.2.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.31.2--0.32.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--unpackaged--0.2.1.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.12.0--0.13.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.17.0--0.20.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.2.1--0.2.4.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.16.0--0.17.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.22.0--0.30.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.20.0--0.21.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.15.0--0.16.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.2.4--0.3.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.31.0--0.31.1.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.11.0--0.12.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.5.0--0.10.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/extension/semver.control
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/doc/semver
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/doc/semver/semver.mmd
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib/bitcode
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib/semver.dylib
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib/bitcode/src
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib/bitcode/src/semver
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib/bitcode/src/semver.index.bc
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib/bitcode/src/semver/src
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib/bitcode/src/semver/src/semver.bc

The trunk target compresses everything into the resulting trunk file:

$ ls -1 *.trunk
semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk

This should work the same everywhere PGXS works. Here’s the output in an amd64 Linux container1 mounted to the same directory:

# make trunk
mkdir -p 'semver-0.32.1+pg16-linux-amd64/pgsql/share/extension'
mkdir -p 'semver-0.32.1+pg16-linux-amd64/pgsql/share/semver'
mkdir -p 'semver-0.32.1+pg16-linux-amd64/pgsql/pkglib'
mkdir -p 'semver-0.32.1+pg16-linux-amd64/pgsql/doc/semver'
install -c -m 644 .//semver.control 'semver-0.32.1+pg16-linux-amd64/pgsql/share/extension/'
install -c -m 644 .//sql/semver--0.10.0--0.11.0.sql .//sql/semver--0.11.0--0.12.0.sql .//sql/semver--0.12.0--0.13.0.sql .//sql/semver--0.13.0--0.15.0.sql .//sql/semver--0.15.0--0.16.0.sql .//sql/semver--0.16.0--0.17.0.sql .//sql/semver--0.17.0--0.20.0.sql .//sql/semver--0.2.1--0.2.4.sql .//sql/semver--0.2.4--0.3.0.sql .//sql/semver--0.20.0--0.21.0.sql .//sql/semver--0.21.0--0.22.0.sql .//sql/semver--0.22.0--0.30.0.sql .//sql/semver--0.3.0--0.4.0.sql .//sql/semver--0.30.0--0.31.0.sql .//sql/semver--0.31.0--0.31.1.sql .//sql/semver--0.31.1--0.31.2.sql .//sql/semver--0.31.2--0.32.0.sql .//sql/semver--0.32.1.sql .//sql/semver--0.5.0--0.10.0.sql .//sql/semver--unpackaged--0.2.1.sql  'semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/'
install -c -m 755  src/semver.so 'semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/'
mkdir -p '/usr/lib/postgresql/16/lib/bitcode/src/semver'
mkdir -p 'semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/bitcode'/src/semver/src/
install -c -m 644 src/semver.bc 'semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/bitcode'/src/semver/src/
cd 'semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/bitcode' && /usr/lib/llvm-16/bin/llvm-lto -thinlto -thinlto-action=thinlink -o src/semver.index.bc src/semver/src/semver.bc
install -c -m 644 .//doc/semver.mmd 'semver-0.32.1+pg16-linux-amd64/pgsql/doc/semver/'
install -c -m 644 .//README.md .//LICENSE .//Changes 'semver-0.32.1+pg16-linux-amd64/'
rm -f "semver-0.32.1+pg16-linux-amd64/digests"
cd "semver-0.32.1+pg16-linux-amd64/" && find * -type f | xargs shasum --tag -ba 256 > digests
tar zcvf semver-0.32.1+pg16-linux-amd64.trunk semver-0.32.1+pg16-linux-amd64
semver-0.32.1+pg16-linux-amd64/
semver-0.32.1+pg16-linux-amd64/LICENSE
semver-0.32.1+pg16-linux-amd64/Changes
semver-0.32.1+pg16-linux-amd64/trunk.json
semver-0.32.1+pg16-linux-amd64/README.md
semver-0.32.1+pg16-linux-amd64/pgsql/
semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/
semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/bitcode/
semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/bitcode/src/
semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/bitcode/src/semver/
semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/bitcode/src/semver/src/
semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/bitcode/src/semver/src/semver.bc
semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/bitcode/src/semver.index.bc
semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/semver.so
semver-0.32.1+pg16-linux-amd64/pgsql/doc/
semver-0.32.1+pg16-linux-amd64/pgsql/doc/semver/
semver-0.32.1+pg16-linux-amd64/pgsql/doc/semver/semver.mmd
semver-0.32.1+pg16-linux-amd64/pgsql/share/
semver-0.32.1+pg16-linux-amd64/pgsql/share/extension/
semver-0.32.1+pg16-linux-amd64/pgsql/share/extension/semver.control
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.3.0--0.4.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.30.0--0.31.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.21.0--0.22.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.32.1.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.10.0--0.11.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.13.0--0.15.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.31.1--0.31.2.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.31.2--0.32.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--unpackaged--0.2.1.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.12.0--0.13.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.17.0--0.20.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.2.1--0.2.4.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.16.0--0.17.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.22.0--0.30.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.20.0--0.21.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.15.0--0.16.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.2.4--0.3.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.31.0--0.31.1.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.11.0--0.12.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.5.0--0.10.0.sql
semver-0.32.1+pg16-linux-amd64/digests

Pretty much the same, as expected. Now we have two trunks:

$ ls -1 *.trunk
semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk
semver-0.32.1+pg16-linux-amd64.trunk

The package name format is:

{package}-{version}+{pgversion}-{os}-{os_version}-{architecture}.trunk

Here you see the same package, version, and Postgres version, but then the OSes differ, macOS includes the optional OS version, and then the architectures differ. This will allow an install client to download the appropriate trunk.

Note the directories into which files are copied under a top-level directory with that format (without the .trunk extension):

  • SHAREDIR files go into pgsql/share
  • DOCDIR files go into pgsql/doc
  • PKGLIB files go into pgsql/pkglib

What else is there?

$ ls -lah semver-0.32.1+pg16-linux-amd64
total 64
-rw-r--r--@ 1 david  staff    12K Jun 20 13:56 Changes
-rw-r--r--@ 1 david  staff   1.2K Jun 20 13:56 LICENSE
-rw-r--r--@ 1 david  staff   3.5K Jun 20 13:56 README.md
-rw-r--r--  1 david  staff   3.2K Jun 20 13:56 digests
drwxr-xr-x  5 david  staff   160B Jun 20 13:56 pgsql
-rw-r--r--  1 david  staff   1.1K Jun 20 13:56 trunk.json

Changes, LICENSE, README.md are simply copied from the source. The digests file contains checksums in the BSD digest format for every file in the package, aside from digests itself. Here are the first 3:

$ head -3 semver-0.32.1+pg16-linux-amd64/digests
SHA256 (Changes) = 98b5e87b8dc71604df4b743b1d80ef2fe40d96809a5fbad2a89ab97584bd9c01
SHA256 (LICENSE) = ff48c81463d79e2a57da57ca1af983c3067e51a8ff84c60296c6fbf0624a0531
SHA256 (README.md) = 99f7c59f796986777f873e78f47f7d44f5ce2deee645b4be3199f0a08dedc22d

This format makes it easy to validate all the files and well as adjust and update the hash algorithm over time.

Finally, the trunk.json file contains metadata about the extension and the system and Postgres on which the system was built:

{
  "trunk": "0.1.0",
  "package": {
    "name": "semver",
    "version": "0.32.1",
    "language": "c",
    "license": "PostgreSQL"
  },
  "postgres": {
    "version": "16.3",
    "major": "16",
    "number": 160003,
    "libs": "-lpgcommon -lpgport -lselinux -lzstd -llz4 -lxslt -lxml2 -lpam -lssl -lcrypto -lgssapi_krb5 -lz -lreadline -lm ",
    "cppflags": "-I. -I./ -I/usr/include/postgresql/16/server -I/usr/include/postgresql/internal  -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 ",
    "cflags": "-Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fno-omit-frame-pointer -fPIC -fvisibility=hidden",
    "ldflags": "-L/usr/lib/x86_64-linux-gnu -Wl,-z,relro -Wl,-z,now -L/usr/lib/llvm-16/lib  -Wl,--as-needed"
  },
  "platform": {
    "os": "linux",
    "arch": "amd64"
  }
}

The trunk proposal doesn’t specify the contents (yet), but the idea is to include information for an installing application to verify that a package is appropriate to install on a platform and Postgres version.

install_trunk

Now we have some packages in the proposed format. How do we install them? install_trunk script is a POC installer. Let’s take it for a spin on macOS:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
$ ./install_trunk semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk 
Unpacking semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk
Verifying all checksums...Changes: OK
LICENSE: OK
README.md: OK
pgsql/pkglib/bitcode/src/semver/src/semver.bc: OK
pgsql/pkglib/bitcode/src/semver.index.bc: OK
pgsql/pkglib/semver.dylib: OK
pgsql/doc/semver/semver.mmd: OK
pgsql/share/extension/semver.control: OK
pgsql/share/semver/semver--0.3.0--0.4.0.sql: OK
pgsql/share/semver/semver--0.30.0--0.31.0.sql: OK
pgsql/share/semver/semver--0.21.0--0.22.0.sql: OK
pgsql/share/semver/semver--0.32.1.sql: OK
pgsql/share/semver/semver--0.10.0--0.11.0.sql: OK
pgsql/share/semver/semver--0.13.0--0.15.0.sql: OK
pgsql/share/semver/semver--0.31.1--0.31.2.sql: OK
pgsql/share/semver/semver--0.31.2--0.32.0.sql: OK
pgsql/share/semver/semver--unpackaged--0.2.1.sql: OK
pgsql/share/semver/semver--0.12.0--0.13.0.sql: OK
pgsql/share/semver/semver--0.17.0--0.20.0.sql: OK
pgsql/share/semver/semver--0.2.1--0.2.4.sql: OK
pgsql/share/semver/semver--0.16.0--0.17.0.sql: OK
pgsql/share/semver/semver--0.22.0--0.30.0.sql: OK
pgsql/share/semver/semver--0.20.0--0.21.0.sql: OK
pgsql/share/semver/semver--0.15.0--0.16.0.sql: OK
pgsql/share/semver/semver--0.2.4--0.3.0.sql: OK
pgsql/share/semver/semver--0.31.0--0.31.1.sql: OK
pgsql/share/semver/semver--0.11.0--0.12.0.sql: OK
pgsql/share/semver/semver--0.5.0--0.10.0.sql: OK
trunk.json: OK
Done!
Verifying compatibility with Trunk package 0.1.0
Verifying compatibility with PostgreSQL 16.3
Verifying compatibility with darwin/arm64:23.5.0 
Installing doc into /Users/david/.pgenv/pgsql-16.3/share/doc...Done
Installing pkglib into /Users/david/.pgenv/pgsql-16.3/lib...Done
Installing share into /Users/david/.pgenv/pgsql-16.3/share...Done

Most of the output here is verification:

  • Lines 3-32 verify each the checksums of each file in the package
  • Line 33 verifies the version of the Trunk format
  • Line 34 verifies Postgres version compatibility
  • Line 35 verifies platform compatibility

And now, with all the verification complete, it installs the files. It does so by iterating over the subdirectories of the pgsql directory and installing them into the appropriate directory defined by pg_config. Two whit:

  • Line 36 installs files from pgsql/doc into pg_config --docdir
  • Line 37 installs files from pgsql/pkglib into pg_config --pkglibdir
  • Line 38 installs files from pgsql/share into pg_config --sharedir

And that’s it. Here’s where it put everything:

❯ (cd ~/.pgenv/pgsql-16.3 && find . -name '*semver*')
./lib/bitcode/src/semver
./lib/bitcode/src/semver/src/semver.bc
./lib/bitcode/src/semver.index.bc
./lib/semver.dylib
./share/extension/semver.control
./share/semver
./share/semver/semver--0.3.0--0.4.0.sql
./share/semver/semver--0.30.0--0.31.0.sql
./share/semver/semver--0.21.0--0.22.0.sql
./share/semver/semver--0.32.1.sql
./share/semver/semver--0.10.0--0.11.0.sql
./share/semver/semver--0.13.0--0.15.0.sql
./share/semver/semver--0.31.1--0.31.2.sql
./share/semver/semver--0.31.2--0.32.0.sql
./share/semver/semver--unpackaged--0.2.1.sql
./share/semver/semver--0.12.0--0.13.0.sql
./share/semver/semver--0.17.0--0.20.0.sql
./share/semver/semver--0.2.1--0.2.4.sql
./share/semver/semver--0.16.0--0.17.0.sql
./share/semver/semver--0.22.0--0.30.0.sql
./share/semver/semver--0.20.0--0.21.0.sql
./share/semver/semver--0.15.0--0.16.0.sql
./share/semver/semver--0.2.4--0.3.0.sql
./share/semver/semver--0.31.0--0.31.1.sql
./share/semver/semver--0.11.0--0.12.0.sql
./share/semver/semver--0.5.0--0.10.0.sql
./share/doc/semver
./share/doc/semver/semver.mmd

Looks like everything’s installed in the right place. Does it work?

# psql -c "CREATE EXTENSION semver; SELECT '1.2.3'::semver"
CREATE EXTENSION
 semver 
--------
 1.2.3
(1 row)

Very nice. What about on Linux?

./install_trunk semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk 
Unpacking semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk
Verifying all checksums...Changes: OK
LICENSE: OK
README.md: OK
pgsql/pkglib/bitcode/src/semver/src/semver.bc: OK
pgsql/pkglib/bitcode/src/semver.index.bc: OK
pgsql/pkglib/semver.dylib: OK
pgsql/doc/semver/semver.mmd: OK
pgsql/share/extension/semver.control: OK
pgsql/share/semver/semver--0.3.0--0.4.0.sql: OK
pgsql/share/semver/semver--0.30.0--0.31.0.sql: OK
pgsql/share/semver/semver--0.21.0--0.22.0.sql: OK
pgsql/share/semver/semver--0.32.1.sql: OK
pgsql/share/semver/semver--0.10.0--0.11.0.sql: OK
pgsql/share/semver/semver--0.13.0--0.15.0.sql: OK
pgsql/share/semver/semver--0.31.1--0.31.2.sql: OK
pgsql/share/semver/semver--0.31.2--0.32.0.sql: OK
pgsql/share/semver/semver--unpackaged--0.2.1.sql: OK
pgsql/share/semver/semver--0.12.0--0.13.0.sql: OK
pgsql/share/semver/semver--0.17.0--0.20.0.sql: OK
pgsql/share/semver/semver--0.2.1--0.2.4.sql: OK
pgsql/share/semver/semver--0.16.0--0.17.0.sql: OK
pgsql/share/semver/semver--0.22.0--0.30.0.sql: OK
pgsql/share/semver/semver--0.20.0--0.21.0.sql: OK
pgsql/share/semver/semver--0.15.0--0.16.0.sql: OK
pgsql/share/semver/semver--0.2.4--0.3.0.sql: OK
pgsql/share/semver/semver--0.31.0--0.31.1.sql: OK
pgsql/share/semver/semver--0.11.0--0.12.0.sql: OK
pgsql/share/semver/semver--0.5.0--0.10.0.sql: OK
trunk.json: OK
Done!
Verifying compatibility with Trunk package 0.1.0
Verifying compatibility with PostgreSQL 16.3
Verifying compatibility with linux/amd64:6.5.11-linuxkit 
Trunk package contains darwin binaries but this host runs linux

Looks goo—oops! look at that last line. It detected an attempt to install Darwin binaries and rejected it. That’s because I tried to install semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk 🤦🏻‍♂️.

Works with the right binary, though:

# ./install_trunk semver-0.32.1+pg16-linux-amd64.trunk 
Unpacking semver-0.32.1+pg16-linux-amd64.trunk
Verifying all checksums...Changes: OK
LICENSE: OK
README.md: OK
pgsql/pkglib/bitcode/src/semver/src/semver.bc: OK
pgsql/pkglib/bitcode/src/semver.index.bc: OK
pgsql/pkglib/semver.so: OK
pgsql/doc/semver/semver.mmd: OK
pgsql/share/extension/semver.control: OK
pgsql/share/semver/semver--0.3.0--0.4.0.sql: OK
pgsql/share/semver/semver--0.30.0--0.31.0.sql: OK
pgsql/share/semver/semver--0.21.0--0.22.0.sql: OK
pgsql/share/semver/semver--0.32.1.sql: OK
pgsql/share/semver/semver--0.10.0--0.11.0.sql: OK
pgsql/share/semver/semver--0.13.0--0.15.0.sql: OK
pgsql/share/semver/semver--0.31.1--0.31.2.sql: OK
pgsql/share/semver/semver--0.31.2--0.32.0.sql: OK
pgsql/share/semver/semver--unpackaged--0.2.1.sql: OK
pgsql/share/semver/semver--0.12.0--0.13.0.sql: OK
pgsql/share/semver/semver--0.17.0--0.20.0.sql: OK
pgsql/share/semver/semver--0.2.1--0.2.4.sql: OK
pgsql/share/semver/semver--0.16.0--0.17.0.sql: OK
pgsql/share/semver/semver--0.22.0--0.30.0.sql: OK
pgsql/share/semver/semver--0.20.0--0.21.0.sql: OK
pgsql/share/semver/semver--0.15.0--0.16.0.sql: OK
pgsql/share/semver/semver--0.2.4--0.3.0.sql: OK
pgsql/share/semver/semver--0.31.0--0.31.1.sql: OK
pgsql/share/semver/semver--0.11.0--0.12.0.sql: OK
pgsql/share/semver/semver--0.5.0--0.10.0.sql: OK
trunk.json: OK
Done!
Verifying compatibility with Trunk package 0.1.0
Verifying compatibility with PostgreSQL 16.3
Verifying compatibility with linux/amd64:6.5.11-linuxkit 
Installing doc into /usr/share/doc/postgresql-doc-16...Done
Installing pkglib into /usr/lib/postgresql/16/lib...Done
Installing share into /usr/share/postgresql/16...Done

# psql -U postgres -c "CREATE EXTENSION semver; SELECT '1.2.3'::semver"
CREATE EXTENSION
 semver 
--------
 1.2.3
(1 row)

RFC

Any PGXS project can try out the pattern; please do! Just download trunk.mk and install_trunk, import trunk.mk into your Makefile, install shasum, jq and rsync (if you don’t already have them) and give it a try.

The intent of this POC is to prove the pattern; this is not a complete or shippable solution. Following a comment period, I expect to build a proper command-line client (and SDK) to package up artifacts generated from a few build systems, including PGXS and pgrx.

Whether you try it out or not, I welcome a review of the proposal and your comments on it. I’d like to get this right, and have surely overlooked some details. Let’s get to the best binary packaging format we can.


  1. I used the pgxn-tools image and started Postgres and installed the necessary tools with the command pg-start 16 rsync jq↩︎

Compiling Postgres with LLVM

A quick post on the need to use a compatible Clang compiler when building Postgres with LLVM support. TL;DR: always point the CLANG variable to the correct Clang binary when compiling Postgres --with-llvm.

The Problem

I’m working on a POC for Postgres binary packaging (more on that soon) and wanted to try it with LLVM support, which generates JIT inline extensions. So I installed LLVM from Homebrew on my Mac and built a new Postgres --with-llvm and a pointer to llvm-config, as described in the docs:

brew install llvm
./configure \
    --prefix=$HOME/pgsql-devel \
    --with-llvm \
    LLVM_CONFIG=/opt/homebrew/opt/llvm/bin/llvm-config
make -j8
make install

No problems, excellent. Now let’s run the tests:

$ make check
# output elided
1..222
# 37 of 222 tests failed.
# The differences that caused some tests to fail can be viewed in the file "src/test/regress/regression.diffs".
# A copy of the test summary that you see above is saved in the file "src/test/regress/regression.out".
make[1]: *** [check] Error 1
make: *** [check] Error 2

This was a surprise! A quick look at that regression.diffs file shows:

+FATAL:  fatal llvm error: Unsupported stack probing method
+server closed the connection unexpectedly
+	This probably means the server terminated abnormally
+	before or while processing the request.
+connection to server was lost

Yikes, the server is crashing! What’s in the log file, src/test/regress/log/postmaster.log? (Took a while to find it, thanks depesz!):

2024-06-18 14:13:52.369 EDT client backend[49721] pg_regress/boolean FATAL:  fatal llvm error: Unsupported stack probing method

Same error. I tried with both the current master branch and the Postgres 16 release branch and got the same result. I pulled together what data I could and opened an LLVM issue.

The Solution

After a few hours, one of the maintainers got back to me:

The error message is LLVM reporting the backend can’t handle the particular form of “probe-stack” attribute in the input LLVM IR. So this is likely a bug in the way postgres is generating LLVM IR: please file a bug against Postgres. (Feel free to reopen if you have some reason to believe the issue is on the LLVM side.)

Okay so maybe it’s actually a Postgres bug? Seems odd, given the failures on both master and Postgres 16, but I wrote to pgsql-hackers about it, where Andres Freund quickly figured it out:

I suspect the issue might be that the version of clang and LLVM are diverging too far. Does it work if you pass CLANG=/opt/homebrew/opt/llvm/bin/clang to configure?

I gave it a try:

make clean
./configure \
    --prefix=$HOME/pgsql-devel \
    --with-llvm \
    LLVM_CONFIG=/opt/homebrew/opt/llvm/bin/llvm-config \
    CLANG=/opt/homebrew/opt/llvm/bin/clang

make -j8
make install

And then make check:

$ make check
# output elided
1..222
# All 222 tests passed.

Yay, that worked! So what happened? Well, take a look at this:

$ which clang                                                    
/usr/bin/clang

That’s Clang as installed by the Xcode CLI tools. Apparently there can be incompatibilities between Clang and LLVM. So one has to be sure to use the Clang that’s compatible with LLVM. Conveniently, the Homebrew LLVM formula includes the proper Clang; all we have to do is tell the Postgres configure script where to find it.

Pity the Xcode CLI package doesn’t include LLVM; it would avoid the problem altogether.

Upshot

Always point the CLANG variable to the correct Clang binary when compiling Postgres --with-llvm. I’ve updated my pgenv configuration, which depends on some other [Homebrew]-installed libraries and plenv-installed Perl, to do the right thing on macOS:

PGENV_CONFIGURE_OPTIONS=(
    --with-perl
    "PERL=$HOME/.plenv/shims/perl"
    --with-libxml
    --with-uuid=e2fs
    --with-zlib
    --with-llvm
    LLVM_CONFIG=/opt/homebrew/opt/llvm/bin/llvm-config
    CLANG=/opt/homebrew/opt/llvm/bin/clang
    --with-bonjour
    --with-openssl # Replaced with --with-ssl=openssl in v14
    --enable-tap-tests
    PKG_CONFIG_PATH=/opt/homebrew/opt/icu4c/lib/pkgconfig
    'CPPFLAGS=-I/opt/homebrew/opt/readline/include -I/opt/homebrew/opt/openssl/include -I/opt/homebrew/opt/libxml2/include -I/opt/homebrew/opt/icu4c/include'
    'CFLAGS=-I/opt/homebrew/opt/readline/include -I/opt/homebrew/opt/openssl/include -I/opt/homebrew/opt/libxml2/include -I/opt/homebrew/opt/icu4c/include'
    'LDFLAGS=-L/opt/homebrew/opt/readline/lib -L/opt/homebrew/opt/openssl/lib -L/opt/homebrew/opt/libxml2/lib -L/opt/homebrew/opt/icu4c/lib'
)

And now perhaps this post has helped you fix the same problem.

🏔 Extension Ecosystem Summit 2024

Logo for PGConf.dev

The PostgreSQL Extension Ecosystem Summit took place at PGConf.dev in Vancouver on May 28, 2024 and it was great! Around 35 extension developers, users, and fans gathered for an open-space technology (OST)-style unconference. I opened with a brief presentation (slides) to introduce the Summit Theme:

  • Extension issues, designs and features
  • Development, packaging, installation, discovery, docs, etc.
  • Simplify finding, understanding, and installing
  • Towards ideal ecosystem of the future
  • For authors, packagers, DBAs, and users
  • Lots of problems, challenges, decisions
  • Which do you care about?
  • Collaborate, discover, discuss, document
  • Find answers, make decisions, set directions
  • Inform the PGXN v2 project

Before the Summit my co-organizers and I had put up large sticky notes with potential topics, and after reviewing the four principles and one law of [OST], we collectively looked them over and various people offered to lead discussions. Others volunteered to take notes and later published them on the community wiki. Here’s our report.

Extension Metadata

Samay Sharma of Tembo took point on this discussion, while David Wagoner of EDB took notes. The wide-ranging discussion among the five participants covered taxonomies, versioning, system dependencies, packaging & discoverability, development & compatibility, and more.

The discoverability topic particularly engaged the participants, as they brainstormed features such as user comments & ratings, usage insights, and test reporting. They settled on the idea of two types of metadata: developer-provided metadata such as external dependencies (software packages, other extensions the extension depends on etc.) and user metadata such as ratings. I’m gratified how closely this hews to the metadata sketch’s proposed packaging (author) and registry (third party) metadata.

Binary Distribution Format

I led this session, while Andreas “ads” Scherbaum took notes. I proposed to my four colleagues an idea I’d been mulling for a couple months for an extension binary distribution format inspired by Python wheel. It simply includes pre-compiled files in subdirectories named for each pg_config directory config. The other half of the idea, inspired by an Álvaro Hernández blog post, is to distribute these packages via OCI — in other words, just like Docker images. The participants agreed it was an interesting idea to investigate.

We spent much of the rest of the time reviewing and trying to understand the inherent difficulty of upgrading binary extensions: there’s a period between when an extension package is upgraded (from Yum, Apt, etc.) and ALTER EXTENSION UPDATE updates it in the database. If the new binary doesn’t work with old versions, it will break (and potentially crash Postgres!) until they update. This can be difficult in, say, a data analytics environment with uses of the extension in multiple databases and functions, and users may not have the bandwidth to ALTER EXTENSION UPDATE any code that depends on the extension.

This issue is best solved by defensive coding of the C library to keep it working for new and old versions of an extension, but this complicates maintenance.

Other topics included the lack of support for multiple versions of extensions at one time (which could solve the upgrade problem), and determining the upgrade/downgrade order of versions, because the Postgres core enforces no version standard.

ABI/API discussion

Yurii Rashkovskii took point on this session while David Christensen took notes. Around 25 attendees participated. The discussion focused in issues of API and ABI compatibility in the Postgres core. Today virtually the entire code base is open for use by extension developers — anything in header files. Some recent research revealed a few potentially-incompatible changes in minor releases of Postgres, leading some to conclude that extensions must be compiled and distributed separately for every minor release. The group brainstormed improvements for this situation. Ideas included:

  • Spelunking the source to document and categorize APIs for extensions
  • Documenting color-coded safety classifications for APIs: green, yellow, or red
  • Designing and providing a better way to register and call hooks (observability, administration, isolation, etc.), rather than the simple functions Postgres offers today
  • Developing a test farm to regularly build and tests extensions, especially ahead of a core release
  • And of course creating more hooks, such as custom relation type handling, per-database background workers, a generic node visitor pattern, and better dependency handling

Including/Excluding Extensions in Core

Keith Fiske led the discussion and took notes for this session, along with 10-15 or so attendees. It joined two topics: When should an extension be brought into core and when should a contrib extension be removed from core. The central point was the adoption of new features in core that replace the functionality of and therefore reduce the need for some extensions.

Replacing an extension with core functionality simplifies things for users. However, the existence of an extension might prevent core from ever adding its features. Extensions can undergo faster, independent development cycles without burdening the committers with more code to maintain. This independence encourages more people to develop extensions, and potentially compels core to better support extensions overall (e.g., through better APIs/ABIs).

Contrib extensions currently serve, in part, to ensure that the extension infrastructure itself is regularly tested. Replacing them with core features would reduce the test coverage, although one participant proposed a patch to add such tests to core itself, rather than as part of contrib extensions.

The participants collaborated on a list of contrib extensions to consider merging into core:

  • amcheck
  • pageinspect
  • pg_buffercache
  • pg_freespacemap
  • pg_visibility
  • pg_walinspect
  • pgstattuple

They also suggested moving extension metadata (SQL scripts and control files) from disk to catalogs and adding support for installing and using multiple versions of an extension at one time (complicated by shared libraries), perhaps by the adoption of more explicit extension namespacing.

Potential core changes for extensions, namespaces, etc.

Yurii Rashkovskii and David Christensen teamed up on this session, as well (notes). 15-20 attendees brainstormed core changes to improve extension development and management. These included:

  • File organization/layout, such as putting all the files for an extension in a single directory and moving some files to the system catalog.
  • Provide a registry of “safe” extensions that can be installed without a superuser.
  • Adding a GUC to configure a second directory for extensions, to enable immutable Postgres images (e.g., Docker, Postgres.app). The attendees consider this a short term fix, but still useful. (Related: I started a pgsql-hackers thread in April for a patch to to just this).
  • The ability to support multiple versions of an extension at once, via namespacing, came up in this session, as well.
  • Participants also expressed a desire to support duplicate names through deeper namespacing. Fundamentally, the problem of namespace collision redounds to issues un-relocatable extensions.

Until Next Time

I found it interesting how many topics cropped up multiple times in separate sessions. By my reading most cited topics were:

  • The need to install and use multiple versions of an extension
  • A desire for deeper namespacing, in part to allow for multiple versions of an extension
  • A pretty strong desire for an ABI compatibility policy and clearer understanding of extension-friendly APIs

I expect to put some time into these topics; indeed, I’ve already started a Hackers thread proposing an ABI policy.

I greatly enjoyed the discussions and attention given to a variety of extension-related topics at the Summit. So much enthusiasm and intelligence in one places just makes my day!

I’m thinking maybe we should plan to do it again next year. What do you think? Join the #extensions channel on the Postgres Slack with your ideas!

POSETTE 2024

The POSETTE elephant logo

Last week, I attended and presented at POSETTE, An Event for Postgres. A selection of the presentations I found worthy of attention.

Good Talks

Comparing Postgres connection pooler support for prepared statements by Jelte Fennema-Nio. Jelte did a great job outlining the challenges he encountered adding protocol-level prepared query support to PgBouncer. So many edge cases! Very much a worthwhile effort, and an important contribution. In the Discord “hallway track”, Jelte said he has some ideas how to add support for LISTEN/NOTIFY, which also requires connection affinity. Alas, there’s no protocol-level support, so it’ll be tricky. I suspect I’ll eventually move the PGXN Bot to something like pgmq to avoid the issue.

How to work with other people by Jimmy Angelakos and Floor Drees. Jimmy & Floor capably highlighted issues of neurodiversity and mental health in the Postgres community and the workplace. I greatly appreciate the increasing awareness of and discussions about these issues, which for far to long have remained hidden or misunderstood. All too often they still are. The more people talk about them, the more they’ll be accepted and the better things will become. Love seeing this.

Even JSONB In Postgres Needs Schemas by Chris Ellis. Chris concisely introduced the concept of JSON validation via check constraints to ensure the integrity of data. He started with simple validation with json_typeof(), moved to more comprehensive validation of various parts of a JSON or JSONB object, and ultimately full JSON Schema validation with the pg_jsonschema extension. Having recently written my own JSON Schema extension, I was happy to see this topic receive more attention. The tool Chris developed to convert JSON schema to a SQL function seems super neat, too — a great way to bootstrap a check constraint from a JSON Schema where no such extension exists, such as the big cloud providers.

I Also Spoke

I also gave a talk, State of the Postgres Extension Ecosystem, expanding upon my mini-Summit session. I think it came out pretty well, and hope it helps to get more people interested in extensions and solve the challenges for finding, evaluating, installing, and using them everywhere. Links:

Next Year

I found POSETTE a very nice Postgres conference. I applaud its commitment to a fully-virtual venue. In-person get-togethers are great, but not everyone can travel to them for reasons of cost, time, family, health, and more. Better still, the speakers recorded their presentations in advance, allows us to fully participate in discussion during our talks! (I mostly used my time to offer corrections and links to relevant resources.)

For those interested in Postgres, I heartily endorse this free, fully remote conference. Perhaps I’ll “see” you there next year.

Update 2024-06-24: Added link to PDF with notes.

Mini Summit Six

Last week, a few members of the community got together for for the sixth and final Postgres Extension Ecosystem Mini-Summit. Follow these links for the video and slides:

Or suffer through my interpolation of YouTube’s auto-generated transcript, interspersed with chat activity, if you are so inclined.

Introduction

  • I opened the meeting, welcomed everyone, and introduced myself as host. I explained that today I’d give a brief presentation on the list of issues I I’ve dreamed up and jotted down over the last couple mini-summits as possible potential topics to take on at the Summit in Vancouver on May 28th.

Presentation

  • These are things that I’ve written down as I’ve been thinking through the whole architecture myself, but also that come up in these Summits. I’m thinking that we could get some sense of the topics that we want to actually cover at the summit. There is room for about 45 people, and I assume we’ll break up “unconference style” into four or five working groups. People an move to corners, hallways, or outdoors to discuss specific topics.

  • Recall the first mini-summit I showed a list of things that of potential topics that might come up as we think through what’s issues in the ecosystem. I left off with the prompt “What’s important to you?” We hope to surface the most important issues to address at the summit and create a hierarchy. To that end, I’ve created this Canva board1 following Open Space Technology2 to set things up, with the rules and an explanation for how it workjs.

  • I expect one of us (organizers) to give a brief introduction at the start of the summit to outline the principles of Open Space Technology, which are similar to unconferences.

  • Open Space Technology principles are:

    • Whoever comes are the right people
    • Whatever happens is the only thing that could happen
    • Whenever it starts at the right time (but we start at 2 p.m. and we have only three hours so we’ll try to make the best of it)
    • When it’s over it’s over
    • And whatever happens is the right place
  • There is also a “Law of Mobility”. If you start out interested in one topic and attending a session or discussion about one topic, and you decide you want to do something else, you can wander over to another session . Open Space Technology calls these people “bumblebees” who cross-pollinate between topics. “Butterflies” are the people who hover around a particular topic to make it happen.

  • And “Come to be Surprised” about what will come up.

  • I’ve split potential topics into topics in Post-its. we might have four or five spaces. Pick a space, pick a session; we have two two-hour-long sessions. I assume we’ll have 15-30 minutes to open the Summit, do intros, and split up the sessions; then have people do an hour on one topic and an hour on a second topic. At the end, we’ll do the readout in which we talk about decisions we came to.

  • If you’re interested in facilitating any of these topics, simply drag it in and stick your name on it.

  • First I thought I’d briefly go over the list of topics as I’ve imagined them. I posted the list on Slack a couple weeks ago and added to it as things have come up in the discussions. But I want to give a high level view of what these brief descriptions mean.

  • This is ad-hoc; I don’t have anything super planned. Please feel free to jump in at any time! I think I’ve turned on “talking permitted” for everybody, or stick your hand up and we’ll be glad to figure out other stuff, especially if you’re thinking of other topics or related things, or if you think things should be merged.

  • Any questions or thoughts or comments?

  • I put the topics in broad categories. There’s some crossover, but the the first one I think of is metadata. I’ve thought about metadata a fair bit, and drafted an RFC for the kinds of things to put in an updated metadata standard, like:

    • How do you specify third-party dependencies? For example, PostGIS depends on additional libraries; how can those be specified in an ideally platform neutral way within the metadata?

    • How to specify the different types of extensions there are? Stephen wrote a blog post last year about this: you have CREATE EXTENSION extensions, LOAD command extensions, background workers, applications, and more. You have things that need shared_preload_libraries and things that don’t. How do we describe those things about an extension within a distribution package?

    • Taxonomies have come up a few times. PGXN currently allows extension authors to put an arbitrary number of tags into their META.json file. Maybe in part because of the precedent of the stuff that that I released early on, people mostly put stuff in there to describe it, like “fdw”, or “function” or “JSON”. Some of the newer uh binary distribution packaging systems, in particular Trunk, have a curated list of categories that they assign. so there might be different ways we want to classify stuff.

      Another approach is crates.io, which has a canonical list of categories (or “slugs”), that authors can assign. These are handy they group things together in a more useful way, like “these are related to data analytics” or “these are related to Vector search” — as opposed to the descriptive tags PGXN has now. So, what ought that to look like? What kind of controls should we have? And who might want to use it?

    • How would we specify system requirements. For example “this package requires only a subset of OSes”, or the version of an OS, or the version of postgres, or CPU features. Steven’s mentioned vector-based ones a few times, but there’s also things like encryption instructions provided by most chips. Or the CPU architecture, like “this supports aarch64 but not amd64.” How should we specify that?

    • I covered categorization under taxonomies

    • Versioning. I blogged about this a couple months ago. I’m reasonably sure we should just stick to SemVer, but it’s worth bringing up.

  • Thoughts on metadata, or stuff I’ve left out? This is in addition to the stuff that’s in the META.json spec. It leaves room for overlap with core stuff. How do we create one sort of metadata for everything, that might subsume the control file as well as the metadata spec or trunk.toml?

    • Jeremy S in chat: So far this is seeming like a good recap of ground that’s been covered, questions & topics that have been raised. Great to see how broad it’s been
  • The next category is the source registry. This is thinking through how we should evolve the PGXN root registry for distributing extension source code. There are questions like identity, namespacing, and uniqueness.

    • These are broad categories but identity is how do you identify yourself to the system and claim ownership over something.

    • What sort of namespacing should we use? Most systems, including PGXN, just use an arbitrary string and you own a string from [first release]. But other registries, like Go, allow you to use domain-based namespacing for packages. This is really nice because it allows a lot more flexibility, such as the ability to switch between different versions or forks.

    • Then there’s the level of uniqueness of the namespacing. This is kind of an open question. Another another approach I thought of is that, rather than string that names your extension distribution being unique, it could be your username and the string. That makes it easier when somebody abandoned something and somebody else forks it and has a new username. Then maybe people can switch more easily. To be able to account for and handle that sort of evolution in a way that single string uniqueness makes trickier.

    • Distributed versus centralized publishing. I’ve written about this a couple times. I am quite attracted to the Go model where packages are not centrally distributed but are in three or four supported Version Control Systems, and as long as they use SemVers and appropriate tags, anybody can use them. The centralized index just indexes a package release the first time it’s pulled. This is where host names come into play as part of the namespacing. It allows the system to be much more distributed. Now Go caches all of them in a number of different regions, so when you download stuff it goes through the Go stuff. When you say “give me the XYZ package,” it’ll generally give you the cached version, but will fall back on the repositories as well. So there’s still the centralized stuff.

      I think there’s a a lot to that and it goes along with the namespacing issue. But there are other ideas at play as well. For example, almost all the other source code distribution systems just use a centralized system: crates.io, CPAN, npm, and all the rest.

      And maybe there are other questions to consider, like is there some sort of protocol we should adopt as an abstraction, such as Docker, where Docker is not a centralized repository other than hub.docker.com. Anyone can create a new Docker repository, give it a host name, and then it becomes something that anybody can pull from. It’s much more distributed. So there are a number of ideas to think through.

    • Binary packaging and distribution patterns. I have a separate slide that goes into more detail, but there are implications for source code distribution, particularly with the metadata but perhaps other things. We also might want to think through how it might vary from source distribution.

    • Federated distribution gets at the Docker idea, or the OCI idea that Alvaro proposed a few weeks ago. Stuff like that.

    • What services and tools to improve or build. This goes to the fundamental question of why we’ve had all these new packaging systems pop up in the last year or so. People were saying “there are problems that aren’t solved by PGXN.” How do we as a community collectively decide what are the important bits and what we should build and provide. Features include developer tools, command line clients, search & browse, and discovery.

    • Stats, reports, and badging. This is another fundamental problem that some of the emerging registries have tried to to address: How do you find something? How do you know if it’s any good? How do you know who’s responsible for it? How do you know whether there’s some consensus across the community to use it? The topic, then, is what sort of additional metadata could we provide at the registry level to include some hint about these issues. For example, a system to regularly fetch stars and statistical analysis of a GitHub or a Bitbucket project. Or people wanted review sites or the ability to comment on on systems.

      There’s also badging, in particular for build and test matrices for extensions that will not only encourage people to better support broad arrays of versions of Postgres and platforms. There could be badges for that. so you can see how well an extension supports various platforms. And any other sort of badging, like quality badging. The idea is a brainstorming of what sorts of things might be useful there, and what what might be best to build first, might be the the low hanging fruit.

  • Any questions, comments,m thoughts, additional suggestions on the root registry?

Interlude

  • Steven Miller: So the idea is there are topics on the left and then they get lined up into the schedule? So there are five five different rooms, so horizontally aligned it4ms are at the same time?

  • David Wheeler (he/him): Correct. These are session one and these are session two.

  • Jeremy S: I was kind of waiting to jump to that. It seemed like you were just doing a review of all the topics we’ve covered, but I was waiting until till you got through everything to bring that up.

  • Steven Miller: Oh yeah, good call, good call.

  • Jeremy S: I have the same kind of question/concern. This is a great list of topics, now what do we want to do with the time in Vancouver? David, do you think we need to go through everything on the list? How do you want to spend the time today?

  • David Wheeler (he/him): I was trying to do a quick review just so people knew what these words mean. If you all feel like you have a good idea, or you want to add topics of your own, please do!

  • Jeremy S: Like I commented in the chat, it’s amazing to see how much ground we’ve covered, and it’s good to have a a quick recap. It’s 9:22 right now Pacific time — 22 after the hour wherever you are — I just want to make sure we don’t run out of time going through everything.

  • David Wheeler (he/him): I agree, I’ll make it work. I can speed up a little. I know I can be verbose about some of this stuff.

  • David G. Johnson: Unless the ones from India, in which case they have half hour time zone.

  • David Wheeler (he/him): I was gonna say! [Laughs]

Presentation Continues

  • Binary packaging. This is the problem that PGXMan and trunk have tried to solve with varying degrees of success. I think it’d be worthwhile for us to think through as a community what, ideally, should a community-provided binary packaging system look like?

    • And what’s the format? Do we want to do tarballs, do OCI like Alvaro proposed? Do we want something like RPM or Apt or Python wheels? That’s a that’s actually something I’m super interested to get into. There was a question that came up two weeks ago in Yurii’s presentation. I think Daniele suggested that the Python wheel package format allows you to put dynamic libs into the wheel. That’s pretty interesting and worth looking into as well.

    • How we go about building a community-based binary packaging registry? How do we do the build farming, what platforms and architectures and OSes would it support, and what sort of security, trust, and verification? And the centralization: who runs it, who’s responsible for it, how should it work at a high level?

    • Philippe Noël in chat: Phil from ParadeDB here (pg_search, pg_analytics, pg_lakehouse) — First minisummit I can attend, glad to be here

  • Thank for coming, Philippe! Again, interrupt me anytime.

  • The next topic is developer tooling. Developer tooling today is kind of all over the place. There a PGXN client, there’s the PGXN utils client (which doesn’t compile anymore, as far as I can tell), there’s pgrx stuff, and maybe a few other things. What sorts of tools would be useful for developers who actually develop and build extensions?

    • CLIs and APIs can do metadata management, or scaffolding your source code and adding new features through some sort of templating system.

    • The packaging and Publishing system based on how we uh ultimately elect to distribute source code, and how we ultimately elect to distribute binary code. How does that get packaged up with the namespacing and all the decisions we made, to be as easy as possible for the developer?

    • What build pipelines do we support? today PGXS and pgrx are perhaps the most common, but I’ve seen GNU autoconf configure stuff and stuff that uses Rust or Go or Python-based builds. Do we want to support those? And how do we integrate them with our binary packaging format and where Postgres expects to put stuff?

      I think this is an important topic. One of the things I’ve been dealing with as I’ve talked to the people behind Apache Age and a couple other projects is how they put put stuff in /usr/local by default. I suggest that it’d be better if it went where pg_config wants to put it. How do we want to go about integrating those things?

    • Tooling for CI/CD workflows to make it as easy as possible to test across a variety of platforms, Postgres versions, and those pipelines.

  • Kind of a broad Community topic here. This gets to where things are hosted. There’s a Postgres identity service that does Oauth 2; is that something we want to plug into? Is there a desire for the community to provide the infrastructure for the systems or for at least the core systems of PGXN v2? Who would support it? The people who work on the development ideally would also handle the devops, but should work closely with whoever provides the infrastructure to make sure it’s all copacetic. And that there’s a a plan for when something happens. People exit the community for whatever reason; how will systems continue to be maintained? I don’t think there’s a plan today for PGXN.

  • Another topic is documentation. How do we help engineers figure out how to build extensions; tutorials and references for all the things and all the various details. Do we end up writing a book, or do we just have very specifically-focused tutorials like, “So you want to build a foreign data wrapper; here’s a guide for that.” Or you just need to write a background worker, here’s an example repository to clone. What should those things look like?

    • CREATE EXTENSION
    • Hooks
    • Background workers
    • CLI apps/services
    • Web apps
    • Native apps

    This also kind of covers the variety of different kinds of extensions we might want to package and distribute.

  • Then there’s all the stuff that I filed under “core,” because I think it impacts the core Postgres project and how it may need to evolve or we might want it to evolve over time. One is the second extension directory; there’s a patch pending now, but it’ll probably be deferred until until Postgres 17 ships; it’s on hold while we’re in the freeze. This is a patch that Christoph Berg wrote for the Debian distribution; it allows you to have a second destination directory for your extensions where Postgres knows to find stuff, including shared object libraries. This would make it easier for projects like Postgres.app and for immutable Docker containers to mount a new directory and have all the stuff be there.

  • I would love to see some sort of more coherent idea of what an extension pack package looks like, where like if I install pgTAP, all of its files are in a single subdirectory that Postgres can access. Right now it’s in package config, and the sharedir and the libder and the docdir — it’s kind spread all over.

  • Should there be a documentation standard, in the way you have JavaDoc and rustdoc and Godoc, where docs are integrated into the source code, so it’s easy to use, and there’s tooling to build effective documentation. Today, people mostly just write short READMEs and leave it at that, which is not really sufficient for a lot of projects.

  • There’s the longstanding idea of inline extensions that Dimitri proposed back as far as 2013, something they called “units”. Oracle calls them “packages” or “modules”. Trusted Language Extensions start making a stab at the problem, trying to make something like inline extensions with the tooling we have today. How should that evolve? What sorts of ideas do we want to adapt to make it so that you don’t have to have physical access to the file system to manage your extensions? Where you could do it all over SQL or libpq.

  • How can we minimize restarts? A lot of extensions require loading DSOs in the shared_preload_libraries config, which requires a cluster restart. How can we minimize that need? There are ways to minimize restarts; it’s just a broad category I threw in.

  • What Namespacing is there? I touched on this topic when I wrote about Go Namespacing a while ago. My current assumption is, if we decided to support user/extension_string or hostname/user/extension_string namespacing for package and source distribution, Postgres itself still has to stick to a single string. How would we like to see that evolve in the future?

  • What kind of sandboxing, code signing, security and trust could be built into the system? Part of the reason they’ve resisted having a second extension directory up to now is to have one place where everything was, where the DBA knows where things are, and it’s a lot it’s easier to manage there. But it’s also because otherwise people will put junk in there. Are there ideas we can borrow from other projects or technologies where anything in some directory is sandboxed, And how is it sandboxed? Is it just for a single database or a single user? Do we have some sort of code signing we can build into the system so that Postgres verifies an extension when you install it? What other kinds of security and trust could implement?

    This is a high level, future-looking topic that occurred to me, but it comes up especially when I talk to the big cloud vendors.

  • An idea I had is dynamic module loading. It came up during Jonathan’s talk, where there was a question about how one could use Rust crates in PL/Rust, a trusted language. Well, a DBA has to approve a pre-installed list of crates that’s on the file system where PL/Rust can load them. But what if there was a hook where, in PL/Perl for example, you use Thing and a hook in the Perl use command knows to look in a table that the DBA manages and can load it from there. Just a funky idea I had, a way to get away from the file system and more easily let people, through permissions, be able to load modules in a safe way.

  • A topic that came up during Yurii’s talk was binary compatibility of minor releases, or some sort of ABI stability. I’d be curious what to bring up with hackers on formalizing something there. Although it has seemed mostly pretty stable over time to me, that doesn’t mean it’s been fully stable. I’d be curious to hear about the exceptions.

So! That’s my quick review. I did the remainder of them in 11 minutes!

Discussion

  • Jeremy S: Well done.

  • David Wheeler (he/him): What I’d like to do is send an email to all the people who are registered to come to The Summit in two weeks, as well as all of you, to be able to access this board and put stars or icons or something — stickers which you can access —

  • Jeremy S: I do feel like there’s something missing from the board. I don’t know that it’s something we would have wanted to put on sooner, but I kind of feel like one of the next steps is just getting down into the trenches and looking at actual extensions, and seeing how a lot of these topics are going to apply once we start looking like at the list. I was looking around a bit.

    It’s funny; I see a mailing list thread from a year or two ago where, right after Joel made his big list of 1,000 extensions, he jumped on the hackers list and said, “hey could we stick this somewhere like on the wiki?” And it looks like nobody quite got around to doing anything like tha. But that’s where I was thinking about poking around, maybe maybe starting to work on something like that.

    But I think once we start to look at some of the actual extensions, it’ll help us with a lot of these topics, kind of figure out what we’re talking about. Like when you’re when you’re trying to figure out dependencies, once you start to figure out some of the actual extensions where this is a problem and other ones where it’s not, it might help us to have be a lot more specific about the problem that we’re trying to solve. Or whether it’s versioning, which platform something is going to build on, all that kind of stuff. That’s where I was thinking a topic — or maybe a next step or a topic that’s missing, or you were talking about how many extensions even build today. If you go through the extensions on PGXN right now, how many of them even work, at all. So starting to work down that list.

  • David Wheeler (he/him): So, two thoughts on that. One is: create a sticky with the topic you want and stick it in a place that’s appropriate, or create another category if you think that’s relevant.

  • Jeremy S: It’s kind of weird, because what I would envision is what I want to do on the wiki — I’ll see if I can start this off today, I have rights to make a Postgres Wiki page — is I want to make a list of extensions, like a table, where down the left is the extensions and across the top is where that extension is distributed today. So just extensions that are already distributed like in multiple places. I’m not talking about the stuff that’s on core, because that’s a given that it’s everywhere. But something like pg_cron or PGAudit, anybody who has extensions probably has them. That gives some sense of the extensions that everybody already packages. Those are obviously really important extensions, because everybody is including them.

    And then the next thing I wanted to do was the same thing with the list of those extensions on the left but a column for each of the categories you have here. For, say, PGAudit, for stuff across the top — metadata, registry packaging, developer stuff — for PGAudit are their packaging concerns? For PGAudit, go down the list of registry topics like identity, where’s the where is the source for PGAudit, is the definitive upstream GitLab, isit GitHub, is it git.postgresql.org? I could go right down the list of each of these topics for PGAudit. and then go down the list of all of your topics for pg_hint_plan. That’s another big one; pg_hint_plan is all over the place. Each of your topics I could take and apply to each of the top 10 extensions and there might be different things that rise to the surface for pg_hint_plan than there are for, like, pgvector.

  • David Wheeler (he/him): That sounds like a worthwhile project to me, and it could be a useful reference for any of these topics. Also a lot of work!

  • Jeremy S: Well, in another way to like think about Vancouver might be, instead of like splitting people up by these topics — I’m spitballing here, this this might be a terrible idea — but you could take a list of like 20 or 30 important extensions split people up into groups and say, “here’s five extensions for you, now cover all these topics for your five extensions.” You might have one group that’s looking at like pg_hint_plan and pgvector and PGAudit, and then a different group that has pg_cron and whatever else we come up with. That’s just another way you could slice it up.

  • David Wheeler (he/him): Yeah, I think that you’re thinking about it the inverse the way I’ve been thinking of it. I guess mine is perhaps a little more centralized and top down, and that comes from having worked on PGXN in the past and thinking about what we’d like to build in the future. But there’s no reason it couldn’t be bottom up from those things. I will say, when I was working on the metadata RFC, I did work through an example of some actually really fussy extension — I don’t remember which one it was — or no, I think it was the ML extension.3 I think that could be a really useful exercise.

    But the idea the Open Technology Space is that you can create a sticky, make a pitch for it, and have people vote by putting a star or something on them. I’m hoping that, a. we can try to figure out which ones we feel are the most important, but ultimately anybody can grab one of these and say “I want to own this, I’m putting it in session session one, and put your put your name on it. They ca be anything, for the most part.

  • Jeremy S: Sure. I think I don’t totally grok the Canva board and how that all maps out, but at the end of the day whatever you say we’re doing in Vancouver I’m behind it 100%.

  • David Wheeler (he/him): I’m trying to make it as open as possible. If there’s something you want to talk about, make a sticky.

  • Jeremy S: I’ll add a little box. I’m not sure how this maps to what you want to do with the time in Vancouver.

  • David Wheeler (he/him): Hopefully this will answer the question. First we’ll do an intro and welcome and talk about the topics, give people time to look at them, I want to send it in advance so people can have a sense of it in advance. I know the way they do the the Postgres unconference that’s been the last day of PGCon for years, they have people come and put a sticky or star or some sort of sticker on the topics they like, and then they pick the ones that have the most and and those are the ones they line up in here [the agenda]. But the idea of the Open Technology stuff is a person can decide on whatever topic they want, they can create their sticky, they can put it in the set slot they want and whatever space they want, and —

  • Jeremy S: Ooooh, I think I get it now. Okay, I didn’t realize that’s what you were doing with the Canva board. Now I get it.

  • David Wheeler (he/him): Yeah, I was trying to more or less do an unconference thing, but because we only have three hours try to have a solid idea of the topics we want to address are before we get there.

  • Jeremy S: I don’t know though. Are you hoping a whole bunch of people are going to come in here and like put it — Okay, it took me five or ten minutes to to even realize what you were doing, and I don’t have high hopes that we’ll get 20 people to come in and vote on the Post-it notes in the next seven days.

  • David Wheeler (he/him): Yeah, maybe we need to… These instructions here are meant to help people understand that and if that needs to be tweaked, let’s do it.

    • David G. Johnston in chat: How many people are going to in this summit in Vancouver?
    • David G. Johnston in chat: Is the output of a session just discussions or are action items desired?
  • Steven Miller: I have another question. Are people invited to present at the Summit if they’re not physically present at the Summit? And then same question for viewership

  • David Wheeler (he/him): I don’t think they are providing remote stuff at the Summit

  • Steven Miller: okay

  • David Wheeler (he/him): David, last I heard there were 42 people registered. I think we have space for 45. We can maybe get up to 50 with some standing room, and there’s a surprisingly large number of people (laughs).

    • David G. Johnston in chat: So average of 10 in each space?
  • Jeremy S: Have you gone down the list of names and started to figure out who all these people? Cuz that’s another thing. There might be people who have very little background and just thought “this sounds like an interesting topic.” How those people would contribute and participate would be very different from someone who’s been working with extensions for a long time.

  • David Wheeler (he/him): David, yeah, and we can add more spaces or whatever if it makes sense, or people can just arbitrarily go to a corner. Because it’s an unconference they can elect to do whatever interests them. I’m just hoping to have like the top six things we think are most important to get to ahead of time.

    Jeremy, Melanie sent me the list of participants, and I recognized perhaps a quarter of the names were people who’re pretty involved in the community, and the rest I don’t know at all. so I think it’s going to be all over the map.

  • Steven Miller: So would it work if somebody wanted to do a presentation, they can. They grab stickies from the left and then you could also duplicate stickies because maybe there’d be some overlap, and then you put them in a session. But there’s basically supposed to be only one name per field, and that’s who’s presenting.

  • David Wheeler (he/him): You can put however many names on it as you want. Open technology usually says there’s one person who’s facilitating and another person should take notes.

  • Steven Miller: Okay.

  • David Wheeler (he/him): But whatever works! The way I’m imagining it is, people say, “Okay I want to talk to other people about make some decisions about, I don’t know, documentation standards.” So they go off to a corner and they talk about it for an hour. There are some notes. And the final half hour we’ll have readouts from those, from whatever was talked about there.

  • Steven Miller: These are small working sessions really,it’s not like a conference presentation. Okay, got it

  • David Wheeler (he/him): Yeah. I mean, somebody might come prepared with a brief presentation if they want to set the context. [Laughs] Which is what I was trying to do for the overall thing here. But the idea is these are working sessions, like “here’s the thing we want to look at” and we want to have some recommend commendations, or figure out the parameters, or you have a plan — maybe — at the end of it. My ideal, personally, is that at the at the end of this we have a good idea of what are the most important topics to address earlier on in the process of building out the ecosystem of the future, so we can start planning for how to execute on that from those proposals and decisions. That’s how I’m thinking about it

  • Steven Miller: Okay, yeah I see.

  • Jeremy S: This sounds a lot like the CoffeeOps meetups that I’ve been to. They have a similar process where you use physical Post-it notes and vote on topics and then everybody drops off into groups based on what they’re interested in.

  • David Wheeler (he/him): Yeah it’s probably the same thing, the Open Technology stuff.

  • Steven Miller: Maybe we should do one field so we kind of get an idea?

  • David Wheeler (he/him): Sure. Let’s say somebody comes along and there are a bunch of stickers on this one [drops stickers on the sticky labeled “Identity, namespacing, and uniqueness”]. So so we know that it’s something people really want to talk about. So if somebody will take ownership of it, they can control click, select “add your name”, find a slot that makes sense (and we may not use all of these) and drag it there. So “I’m going to take the first session to talk about this.” Then people can put the stickies on it over here [pasties stickers onto the topic sticky in the agenda], so you have some sense of how many people are interested in attending and talking about that topic. But there are no hard and fast rules.

    Whether or not they do that, say, “David wants to talk about identity name spacing uniqueness in the core registry,” we’re going to do that in the first session. We’ll be in the northeast corner of the room — I’m going to try to get access to the room earlier in the day so I can have some idea of how it breaks up, and I’ll tweak the the Canva to to add stuff as appropriate.

    • David G. Johnston in chat: Same thing multiple times so people don’t miss out on joining their #2 option?
    • David G. Johnston in chat: How about #1, #2, #3 as labels instead of just one per person?
  • Jeremy S: Are you wanting us to put Post-it notes on the agenda now, before we know what’s been voted for?

  • David Wheeler (he/him): Yep! Especially if there’s some idea you had Jeremy. If there’s stuff you feel is missing or would be a different approach, stick it in here. It may well be not that many people interested in what I’ve come up with but they want to talk about those five extensions.

  • David Wheeler (he/him): (Reading comment from David Johnson): “One two and three as labels instead of just one per person?” David I’m sorry I don’t follow.

  • David G. Johnston: So basically like rank choice. If you’re gonna do I core one time and binary packaging one time, and they’re running at the same time, well I want to do both. I want to do core — that’s my first choice — I want to do binary packaging — that’s my second choice. If I had to choose, I’d go to number one. But if you have enough people saying I want to see this, that’s my number two option, you run binary packaging twice, not conflicting with core so you can get more people.

  • David Wheeler (he/him): I see, have people stick numbers on the topics that most interest in them. Let’s see here… [pokes around the Canva UX, finds stickers with numbers.] There we go. I’ll stick those somewhere that’s reasonable so people can rank them if they want, their top choices.

    This is all going to be super arbitrary and unscientific. The way I’ve seen it happen before is people just drop stars on stuff and say, okay this one has four and this one has eight so we definitely want to talk about that one, who’s going to own it, that sort of thing. I think what makes sense is to send this email to all the participants in advance; hopefully people will take a look, have some sense of it, and maybe put a few things on. Then, those of us who are organizing it and will be facilitating on the day, we should meet like a day or two before, go over it, and make some decisions about what we definitely think should be covered, what things are open, and get a little more sense of how we want to run things. Does that make sense?

  • Jeremy S: Yeah, I think chatting ahead of time would be a good idea. It’ll be interesting to see how the Canva thing goes and what happens with it.

  • David Wheeler (he/him): It might be a mess! Whatever! But the answer is that whatever happens this is the right place. Whenever it starts is the right time. Whatever happens could only happen here. It’s super arbitrary and free, and we can adapt as much as we want as it goes.

  • David Wheeler (he/him): I think that’s it. Do you all feels like you have some sense of what we want to do?

  • Jeremy S: Well not really, but that’s okay! [Laughs]

  • Steven Miller: Okay, so here’s what we are supposed to do. Are we supposed to go find people who might be interested to present — they will already be in the list of people who are going to Vancouver. Then we talk to them about these Post-its and we say, “would you like to have a small discussion about one of these things. If you are, then put a sticky note on it.” And then we put the sticky notes in the fields, we have a list of names associated with the sticky notes. Like, maybe Yurii is interested in binary distribution, and then maybe David is also interested in that. So there’s like three or four people in each section, and we’re trying to make sure that if you’re interested multiple sections you get to go to everything?

  • David Wheeler (he/him): Yeah you can float and try to organize things. I put sessions in here assuming people would want to spend an hour, but maybe a topic only takes 15 minutes.

  • David G. Johnston: Staying on my earlier thought on what people want to see, people who are willing to present and can present on multiple things, if we have a gold star for who’s willing to actually present on this topic. So here’s a topic, I got eight people who want to see it but only one possible presenter. Or I got five possible presenters and three possible viewers. But you have that dynamic of ranked choice for both “I’ll present stuff” or “I’m only a viewer.

  • David Wheeler (he/him): I think that typically these things are self-organizing. Somebody says, “I want to do this, I will facilitate, and I need a note taker.” But they negotiate amongst themselves about how they want to go about doing it. I don’t think it necessarily has to be formal presentation, and usually these things are not. Usually it’s like somebody saying, “here’s what this means, this is the topic, we’re going to try to cover, these are the decisions we want to make, Go!”

  • Jeremy S: You’re describing the the the unconference component of PGCon that has been down in the past.

  • David Wheeler (he/him): More or less, yes

  • Jeremy S: So should we just come out and say this is a unconference? Then everybody knows what you’re talking about really fast, right?

  • David Wheeler (he/him): Sure, sure, yeah. I mean —

  • Jeremy S: We’re just we’re doing the same thing as – yeah.

  • David Wheeler (he/him): Yeah, I try to capture that here but we can use the word “unconference” for sure. [Edits the Canva to add “an unconference session” to the title.] There we go.

  • Steven Miller: I imagine there are people who might be interested to present but they just aren’t in this meeting right now. So maybe we need to go out and advertise this to people.

  • David Wheeler (he/him): Yeah, I want to draft an email to send to all the attendees. Melanie told me we can send an email to everybody who’s registered.

  • Jeremy S: And to be clear it’s full, right? Nobody new can register at this point?

  • David Wheeler (he/him): As far as I know, but I’m not sure how hard and fast the rules are. I don’t think any more people can register, but it doesn’t mean other people won’t wander in. People might have registered and then not not come because they’rein the patch the patch session or something.

    So I volunteer to draft that email today or by tomorrow and share it with the Slack channel for feedback. Especially if you’re giving me notes to clarify what things mean, because it seems like there are more questions and confusions about how it works than I anticipated — in part because it’s kind of unorganized by design [chuckles].

  • David Wheeler (he/him): Oh that’s a good thing to include Jeremy. that’s a good call. But to also try to maximize participation of the people who’re planning to be there. It may be that they say, “Oh this sounds interesting,” or whatever, so and I’ll add some different stickers to this for some different meanings, like “I’m interested” or “I want to take ownership of this” or “this is my first, second, third, or fourth choice”. Sound good?

  • Steven Miller: Yes, it sounds good to me!

  • David Wheeler (he/him): Thanks Steven.

  • Jeremy S: Sounds good, yeah.

  • David Wheeler (he/him): All right, great! Thanks everybody for coming!


  1. Hit the #extensions channel on the Postgres Slack for the link! ↩︎

  2. In the meeting I kept saying “open technology” but meant Open Space Technology 🤦🏻‍♂️. ↩︎

  3. But now I can look it up. It was pgml, for which I mocked up a META.json↩︎

Extension Summit Topic Review

Boy howdy that went fast.

This Wednesday, May 15, the final Postgres extension ecosystem mini-summit will review topics covered in previous Mini-Summits, various Planet PostgreSQL posts, the #extensions channel on the Postgres Slack and the Postgres Discord. Following a brief description of each, we’ll determine how to reduce the list to the most important topics to take on at the Extension Ecosystem Summit at PGConf.dev in Vancouver on May 28. I’ll post a summary later this week along with details for how to participate in the selection process.

In the meantime, here’s the list as of today:

  • Metadata:
    • Third-party dependencies
    • Types of extensions
    • Taxonomies
    • System requirements (OS, version, CPU, etc.)
    • Categorization
    • Versioning
  • Registry:
    • Identity, namespacing, and uniqueness
    • Distributed vs. centralized publishing
    • Binary packaging and distribution patterns
    • Federated distribution
    • Services and tools to improve or build
    • Stats, Reports, Badging: (stars, reviews, comments, build & test matrices, etc.)
  • Packaging:
    • Formats (e.g., tarball, OCI, RPM, wheel, etc.)
    • Include dynamic libs in binary packaging format? (precedent: Python wheel)
    • Build farming
    • Platforms, architectures, and OSes
    • Security, trust, and verification
  • Developer:
    • Extension developer tools
    • Improving the release process
    • Build pipelines: Supporting PGXS, prgx, Rust, Go, Python, Ruby, Perl, and more
  • Community:
    • Community integration: identity, infrastructure, and support
    • How-Tos, tutorials, documentation for creating, maintaining, and distributing extensions
    • Docs/references for different types of extensions: CREATE EXTENSION, hooks, background workers, CLI apps/services, web apps, native apps, etc.
  • Core:
    • Second extension directory (a.k.a. variable installation location, search path)
    • Keeping all files in a single directory
    • Documentation standard
    • Inline extensions: UNITs, PACKAGEs, TLEs, etc.
    • Minimizing restarts
    • Namespacing
    • Sandboxing, code signing, security, trust
    • Dynamic module loading (e.g., use Thing in PL/Perl could try to load Thing.pm
    • from a table of acceptable libraries maintained by the DBA)
    • Binary compatibility of minor releases and/or /ABI stability

Is your favorite topic missing? Join us at the mini-summit or drop suggestions into the #extensions channel on the Postgres Slack.

PGXN Language Poll Result

Back on March 28, I asked the Postgres community whether new services for PGXN v2 should be written in Go, Rust, or “some of each”. I went so far as to create a poll, which ran through April 12. A month later you might reasonably be wondering what became of it. Has David been refusing to face reality and accept the results?

The answer is “no”. Or at least I don’t think so. Who among us really knows ourselves. Since it closed, the poll has provided the results since it closed, but I suspect few have looked. So here they are:

Candidate Votes % All Votes
🦀 Rust 102 60.4%
🐿️ Go 53 31.4%
🐿️ + 🦀 Some of each 13 7.7%

🦀 Rust is the clear winner.

I don’t know whether some Rust brigade descended upon the poll, but the truth is that the outcome was blindingly apparent within a day of posting the poll. So much so that I decided to get ahead of things and try writing a pgrx extension. I released jsonschema on PGXN on April 30. Turned out to be kind of fun, and the pgrx developers kindly answered all my questions and even made a new release to simplify integration testing, now included in the pgrx-build-test utility in the pgxn-tools Docker image.

But I digress. As a result of this poll and chatting with various holders of stakes at work and haunting the #extensions Slack channel, I plan to use Rust for all new PGXN projects — unless there is an overwhelmingly compelling reason to use something else for a specific use case.

Want to help? Rustaceans welcome! Check out the project plan plan or join us in the #extensions channel on the Postgres Slack.

Mini Summit Five

The video for Yurii Rashkovskii’s presentation at the fifth Postgres Extension Ecosystem Mini-Summit last week is up. Links:

Here’s my interpolation of YouTube’s auto-generated transcript, interspersed with chat activity.

Introduction

Presentation

  • Yurii: Today I’m going to be talking about universally buildable extensions. This is going to be a shorter presentation, but the point of it is to create some ideas, perhaps some takeaways, and actually provoke a conversation during the call. It would be really amazing to explore what others think, so without further ado…

  • I’m with Omnigres, where we’re building a lot of extensions. Often they push the envelope of what extensions are supposed to do. For example, one of our first extensions is an HTTP server that embeds a web server inside of Postgres. We had to do a lot of unconventional things. We have other extensions uniquely positioned to work both on developer machines and production machines — because we serve the the developers and devops market.

  • The point of Omnigres is turning Postgres into an application runtime — or an application server — so we really care how extensions get adopted. When we think about application developers, they need to be able to use extensions while they’re developing, not just in production or on some remote server. They need extensions to work on their machine.

  • The thing is, not everybody is using Linux Other people use macOS and Windows and we have to account for that. There are many interesting problems associated with things like dependencies.

  • So there’s a very common approach used by those who who try to orchestrate such setups and by some package managers: operating out of container. The idea is that with a can create a stable environment where you bring all the dependencies that your extension would need, and you don’t have to deal with the physical reality of the host machine. Whether it’s a developer machine, CI machine, production machine, you always have the same environment. That’s definitely a very nice property.

  • However, there are some interesting concerns that we have to be aware when we operate out of a container. One is specifically mapping resources. When you have a container you have to map how many cores are going there, memory, how do we map our volumes (especially on Docker Desktop), how we connect networking, how we pass environment variables.

  • That means whenever you’re running your application — especially locally, especially in development — you’re always interacting with that environment and you have to set it up. This is particularly problematic with Docker Desktop on macOS and Windows because these are not the same machines. You’re operating out of a virtual machine machine instead of your host machine, and obviously containers are Linux-specific, so it’s always Linux.

  • What we found is that often times it really makes a lot of sense to test extensions, especially those written in C, on multiple platforms. Because in certain cases bugs, especially critical memory-related bugs, don’t show up on one platform but show up on another. That’s a good way to catch pretty severe bugs.

  • There are also other interesting, more rare concerns. For example, you cannot access the host GPU through Docker Desktop on macOS or through Colima. If you’re building something that could have use the host GPU that would work on that machine it’s just not accessible. If you’re working something ML-related, that can be an impediment

  • This also makes me wonder: what are other reasons why we’re using containers. One reason that struck out very prominently was that Postgres always has paths embedded during compile time. That makes it very difficult to ship extensions universally across different installations, different distributions. I wonder if that is one of the bigger reasons why we want to ship Postgres as a Docker container: so that we always have the same path regardless of where where it’s running.

  • Any questions so far about Docker containers? Also if there’s anybody who is operating a Docker container setup — especially in their development environment — if you have any thoughts, anything to share: what are the primary reasons for you to use a Docker container in your development environment?

    • Jeremy S in chat: When you say it’s important to test on multiple platforms, do you mean in containers on multiple platforms, or directly on them?

    • Jeremy S in chat: That is - I’m curious if you’ve found issues, for example, with a container on Mac/windows that you wouldn’t have found with just container on linux

  • Daniele: Probably similarity with the production deployment environments. That’s one. Being free from whatever is installed on your laptop, because maybe I don’t feel like upgrading the system Python version and potentially breaking the entire Ubuntu, whereas in a Docker container you can have whatever version of Python, whatever version of NodeJS or whatever other invasive type of service. I guess these are these are good reasons. These were the motivation that brought me to start developing directly in Docker instead of using the desktop.

  • Yurii: Especially when you go all the way to to production, do you find container isolation useful to you?

  • Daniele: Yeah I would say so; I think the problem is more to break isolation when you’re are developing. So just use your editor on your desktop, reload the code, and have a direct feedback in the container. So I guess you have to break one barrier or two to get there. At least from the privilege points of having a Linux on desktop there is a smoother path, because it’s not so radically different being in the container. Maybe for Windows and macOS developers it would be a different experience

  • Yurii: Yeah, I actually wanted to drill down a little bit on this In my experience, I build a lot on macOS where you have to break through the isolation layers with the container itself and obviously the VM. I’ve found there are often subtle problems that make the experience way less straightforward.

  • One example I found it that, in certain cases, you’re trying to map a certain port into the container and you already have something running [on that port] on your host machine. Depending on how you map the port — whether you specify or don’t specify the address to bind on — you might not get Docker to complain that this port is actually overridden.

  • So it can be very frustrating to find the port, I’m trying to connect to it but it’s not connecting to to the right port. There’s just very small intricate details like this, and sometimes I’ve experienced problems like files not perfectly synchronizing into the VM — although that has gotten a little better in the past 2–3 years — but there there were definitely some issues. That’s particularly important for the workflows that we’re doing at Omnigres, where you’re running this entire system — not just the database but your back end. To be able to connect to what’s running inside of the container is paramount to the experience.

  • Daniele: Can I ask a question about the setup you describe? When you go towards production, are those containers designed to be orchestrated by Kubernetes? Or is there a different environments where you have your Docker containers in a local network, I assume, so different Dockers microservices talking to each other. Are you agnostic from what you run in it, or do you run it on Kubernetes or on Docker Compose or some other form of glue that you you set up yourself, or your company has set up?

    • Steven Miller in chat: … container on Mac/windows [versus linux]
    • Steven Miller in chat: Have seen with chip specific optimizations like avx512
  • Yurii: Some of our users are using Docker Compose to run everything together. However, I personally don’t use Docker containers. This is part of the reason why the topic of this presentation is about universally buildable extensions. I try to make sure that all the extensions are easily compilable and easily distributable on any given supported platform. But users do use Docker Compose, it’s quite common.

  • Does anyone else here have a preference for how to move Docker containers into production or a CI environment?

  • Nobody? I’ll move on then.

    • Steven Miller in chat: Since in docker will run under emulation, but on linux will run with real hardware, so the environment has different instruction set support even though the docker —platform config is the same

    • Jeremy S in chat: That makes sense

  • Yurii: I wanted to show just a little bit of a proof of concept tool that we’ve been working on, on and off for the last year—

  • David Wheeler (he/him): Yurii, there are a couple comments and questions in chat, I don’t know if saw that

  • Yurii: I didn’t see that sorry.

  • Jeremy is saying, “when you say it’s important to test on multiple platforms do you mean in containers on multiple platforms or directly on them?” In that particular instance I meant on multiple platforms, directly.

  • The other message from Jeremy was, “I’m curious if you found issues for example with a container on Mac or Windows that you wouldn’t have found with just container on Linux?” Yeah I did see some issues depending on the type of memory-related bug. Depending on the system allocator, I was either hitting a problem or not. I was not hitting it on Linux, I believe and it was hidden macOS. I don’t remember the details right now, unfortunately, but that difference was indicative of a bug.

  • Steven wrote, trying to connect this… “Have * seen chip-specific optimizations for containers?” And, “Docker will run under emulation but on Linux will run with real Hardware.” Yeah that’s an interesting one about ax512. I suppose this relates to the commentary about about GPU support, but this is obviously the other part of supporting specific hardware, chip-specific optimizations That’s an interesting thing to learn; I was not aware of that! Thank you Steven.

  • Let’s move on. postgres.pm is a pro of concept that I was working on for some time. The idea behind it was both ambitious but also kind of simple: Can we try describing Postgres extensions in such a way that they will be almost magically built on any supported platform?

  • The idea was to build an expert system of how to build things from a higher level definition. Here’s an example for pgvector:

    :- package(vector(Version), imports([git_tagged_revision_package(Version)])).
    git_repo("https://github.com/pgvector/pgvector").
    :- end_package.
    

    It’s really tiny! There are only two important things there: the Git tagged revision package and Git repo. There’s nothing else to describe the package.

  • The way this works is by inferring as much information as possible from what’s available. Because it’s specified as a Git-tagged revision package, it knows that it can download the list of version-shaped revisions — the versions — and it can checkout the code and do further inferences. It infers metadata from META.json if it’s available, so it will know the name of the package, the description, authors, license, and everything else included there.

    • David G. Johnston in chat: PG itself has install-check to verify that an installed instance is functioning. What are the conventions/methods that extension authors are using so that a deployed container can be tested at a low level of operation for the installed extensions prior to releasing the image to production?
  • It automatically infers the build system. For example for C extensions, if it sees that there’s a Makefile and C files, it infers that you need make and a C compiler and it tries to find those on the system: it will try to find cc, gcc, Clang — basically all kinds of things.

    *David Wheeler (he/him)() in chat: Feel free to raise hands with questions

  • Here’s a slightly more involved example for pg_curl. Ah, there was a question from David Johnson. David says, “PG has install-check to verify that installed instance is functioning. What are the conventions methods that extension authors are using so the deployed container can be tested at a low level of operation for the installed extension prior to releasing the image to production?”

  • I guess the question is about general conventions for how extension authors ensure that the extensions work, but I suppose maybe part of this question is whether that’s also testable in a production environment. David, are you talking about the development environment alone or both?

  • David G. Johnston: Basically, the pre-release to production. You go in there in development and you cut up an extension and source and then you build your image where you compile it — you compile PG, you compile it, or you deploy packages. But now you have an image, but you’ve never actually tested that image. I can run installcheck on an installed instance of Postgres and know that it’s functioning, but it won’t test my extension. So if I install PostGIS, how do I test that it has been properly installed into my database prior to releasing that image into production?

    • Tobias Bussmann in chat: shouldn’t have the extension a make installcheck as well?
  • Yurii: To my knowledge there’s no absolutely universal method. Of course the PGXS methods are the most standard ones — like installcheck — to to run the tests. In our [Omnigres’s] case, we replaced pg_regress with pg_yregress, another tool that we’ve developed. It allows for more structural tests and tests that test certain things that pg_regress cannot test because of the way it operates.

  • I can share more about this later if that’s of interest to anybody. So we basically always run pg_yregress on our extensions; it creates a new instance of Postgres — unless told to use a pre-existing instance — and it runs all the tests there as a client. It basically deploys the the extension and runs the set of tests on it.

  • David G. Johnston: Okay.

    Yurii: I guess you know it depends on how you ship it. For example, if you look at the pgrx camp, they have their own tooling for that, as well. I’ve also seen open-source extensions where they could be written in, say, Rust, but still using pg_regress tests to test their behavior. That would often depend on how their build system is integrated in those tests. I guess the really short answer is there’s probably no absolutely Universal method.

  • David thank you for pasting the link to pg_yregress. If there are ny questions about it, feel free to ask me. Any other thoughts or questions before I finish this slide? Alright will carry on then.

    :- package(pg_curl(Version), imports(git_explicit_revision_package(Version))).
    :- inherit(requires/1).
    git_repo("https://github.com/RekGRpth/pg_curl").
    git_revisions([
            '502217c': '2.1.1',
            % ... older versions omitted for now ...
        ]).
    requires(when(D := external_dependency(libcurl), version::match(D, '^7'))).
    :- end_package.
    
  • The difference between this example and the previous one is that here it specifies that there will be an explicit revision map because that project does not happen to have version tags, so they have to be done manually. You can see that in the Git revision specification. But what’s more interesting about this is that it specifies what kind of dependency it needs. In this particular instance it’s libcurl, and the version has to match version 7 — any version 7.

  • These kinds of requirements, as well as compiler dependencies, make dependencies, and others are always solved by pluggable satisfiers. They look at what’s available depending on the platform — Linux, a particular flavor of Linux, macOS, etc — and picks the right tools to see what’s available. In the future there’s a plan to add features like building these dependencies automatically, but right now it depends on the host system, but in a multi-platform way.

    • David Wheeler (he/him) in chat: How does it detect that libcurl is required?
  • The general idea behind this proof of concept is that we want to specify high level requirements and not how exactly to satisfy them. If you compare this to a Docker file, the Docker file generally tells you exactly what to do step by step: let’s install this package and that package, let copy files, etc. so it becomes a very specific set of instructions.

    • Jeremy S in chat: And how does it handle something with different names in different places?
  • There was a question: “how does it detect that libcurl is required?” There there is this line at the bottom says “requires external dependency libcurl, so that was the definition.”

  • The other question was “how does it handle something with different names in different places?” I’m not sure I understand this question.

  • Jeremy S: I can be more spe specific. A dependency like libc is called libc on Debian platforms and it’s called glibc on Enterprise Linux. You talked about available satisfiers like Homebrew, Apt and package config, but what if it has a different name in Homebrew than in Apt or something like? Does it handle that or is that just something you haven’t tackled yet?

  • Yurii: It doesn’t tackle this right now, but it’s part of the division where it should go. For certain known libraries there’s an easy way to add a mapping that will kick in for a distribution, and otherwise it will be a satisfier for another one. They’re completely pluggable, small satisfiers looking at all the predicates that describe the system underneath.

    • David G. Johnston in chat: How is the upcoming move to meson in core influencing or impacting this?
  • Just for point of reference, this is built on top of Prolog, so it’s like a knowledge base and rules for how to apply on this knowledge to particular requirements.

    • Tobias Bussmann in chat: Prolog 👍

    • Shaun Thomas in chat: What if there are no satisfiers for the install? If something isn’t in your distro’s repo, how do you know where to find the dependency? And how is precedence handled? If two satisfiers will fulfill a requirement, will the highest version win?

  • Jeremy S: I remember Devrim talking about, if you read through the [RPM] spec files, what find is all this spaghetti code with #ifdefs and logic branches and in his case is just dealing with differences between Redhat and SUSE. If this is something that we manually put in, we kind of end up in a similar position where it’s on us to create those mappings, it’s on us to maintain those mappings over time — we kind of own it — versus being able to automate some kind of automatic resolution. I don’t know if there is a good automatic way to do it. David had found something that he posted, which I looked at a little bit, but Devrim talked about how much of maintenance overhead it becomes in the long run to constantly have to maintain this which seemed less than ideal.

  • Yurii: It is less than ideal. For now, I do think that would have to be manual, which is less than ideal. But it could be addressed at least on on a case-by-case basis. Because we don’t really have thousands of extensions yet — in the ecosystem maybe a thousand total — I think David Wheeler would would know best from his observations, and I think he mentioned some numbers in his presentation couple of weeks ago. But basically handling this on on a case-by-case basis where we need this dependency and apparently it’s a different one on a different platform, so let’s address that. But if there can be a method that can at least get us to a certain level of unambiguous resolution automatically or semi-automatically, that would be really great.

    • Samay Sharma in chat: +1 on the meson question.
  • Jeremy S: I think there’s a few more questions in the chat.

  • Yurii: I’m just looking at them now. “how is the upcoming move to meson and core influencing or impacting this?” I don’t think it’s influencing this particular part in any way that I can think of right now. David, do you have thoughts how it can? I would love to learn.

  • David G. Johnston: No, I literally just started up a new machine yesterday and decided to build it from meson instead of make and the syntax of the meson file seems similar to this. I just curious if there are any influences there or if it’s just happenstance.

  • Yurii: Well from from what I can think right now, there’s just general reliance on either implicitly found PG config or explicitly specified PG config. That’s just how you discover Postgres itself. There’s no relation to how Postgres itself was built. The packaging system does not handle say building Postgres itself or providing it so it’s external to this proof of concept.

  • David G. Johnston: That’s a good separation of concerns, but there’s also the idea that, if core is doing something, we’re going to build extensions against PostgresSQL, if we’re doing things similar to how core is doing them, there’s less of a learning curve and less of everyone doing their own thing and you have 500 different ways of doing testing.

  • Yurii: That’s a good point. That’s something definitely to reflect on.

  • I’ll move on to the next question from Sean. “What if there are no satisfiers for the install? If something isn’t in your distro how do you know where to find the dependency?” And “if two satisfiers will fulfill a requirement, will the highest version win?” If there are no satisfiers right now it will just say it’s not solvable. So we fail to do anything. You would have to go and figure that out. It is a proof of concept, it’s not meant to be absolutely feature complete but rather an exploration of how we can describe the the packages and their requirements.

  • David Wheeler (he/him): I assume the idea is that, as you come upon these you would add more satisfiers.

  • Yurii: Right, you basically just learn. We learn about this particular need in a particular extension and develop a satisfier for it. The same applies to precedence: it’s a question of further evolution. Right now it just finds whatever is available within the specified range.

  • If there are no more pressing questions I’ll move to the next slide. I was just mentioning the problem of highly specific recipes versus high-level requirements. Now I want to shift attention to another topic that has been coming up in different conversations: whether to build and ship your extension against minor versions of Postgres.

  • Different people have different stances in this, and even package managers take different stands on it. Some say, just build against the latest major version of Postgres and others say build extensions against every single minor version. I wanted to research and see what the real answer should be: should we build against minor versions or not?

  • I’ve done a little bit of experimentation and my answer is “perhaps”, and maybe even “test against different minor versions.” In my exploration of version 16 (and also 15 bu Id didn’t include it) there there are multiple changes between minor versions that can potentially be dangerous. One great example is when you have a new field inserted in the middle of a structure that is available through a header file. That definitely changes the layout of the structure.

     typedef struct BTScanOpaqueData
     {
    -    /* these fields are set by _bt_preprocess_keys(): */
    +    /* all fields (except arrayStarted) are set by _bt_preprocess_keys(): */
         bool            qual_ok;                /* false if qual can never be satisfied */
    +    bool            arrayStarted;     /* Started array keys, but have yet to "reach
    +                                                               * past the end" of all arrays? */
         int                     numberOfKeys    /* number of preprocessed scan keys */
     }
    
  • In this particular case, for example, will not get number of keys if you’re intending to. I think that change was from 16.0 to 16.1. If you build against 16.0 and then try to run on 16.1, it might not be great.

    The other concern that I found is there are new apis appearing in header files between different versions. Some of them are implemented in header files, either as macros or static and line functions. When you’re building against that particular version, you’ll get the particular implementation embedded.

  • Others are exports of symbols, like in this case, try index open and contain mutable functions after planning, if you’re using any of this. But this means that these symbols are not available on some minor versions and they’re available later on, or vice versa: they may theoretically disappear.

  • There are also changes in inline behavior. There was a change between 16.0 and 16.1 or 16.2 where an algorithm was changed. Instead of just > 0 there’s now >= 0, and that means that particular behavior will be completely different between these implementations. This is important because it’s coming from a header file, not a source file, so you’re embedding this into your extension.

  • David Wheeler (he/him) in chat: That looks like a bug fix

  • Yeah it is a bug fix. But what I’m saying is, if you build your extension against say 16.0m which did not have this bug fix, and then you deploy it on 16.1, then you still have the bug because it’s coming from the header file.

  • *David Wheeler (he/him): Presumably they suggest that you build from the latest minor release and that’s Backward compatible to the earlier releases.

  • Yurii: Right and that’s a good middle ground for this particular case. But but of course sometimes when you do a minor upgrade you have to remember that you have to rebuild your extensions against that minor version so you can just easily transfer them yeah.

    • Jeremy S in chat: The struct change in a minor is very interesting
  • *David Wheeler (he/him)Jeremy points out that struct change is pretty interesting.

  • Yurii: Yeah, it’s interesting because it’s super dangerous! Like if somebody is expecting a different versioned structure, then it can be pretty nasty.

    • Shaun Thomas in chat: Yeah. It’s a huge no-no to insert components into the middle of a struct.
  • Jeremy S: Is that common? I’m really surprised to see that in a minor version. On the other hand, I don’t know that Postgres makes promises about — some of this seems to come down to, when you’re coding in C and you’re coding directly against structures in Postgres, that’s really interesting. That’s — I’m surprised to see that still.

    • Steven Miller in chat: In the case of trunk, we would have built against minor versions in the past then upgrade the minor version of postgres without reinstalling the binary source of the extension, so this is an issue

    • David G. Johnston in chat: Yeah, either that isn’t a public structure and someone is violating visibility (in which case yes, you should be tracking minor builds)

    • Shaun Thomas in chat: I’m extremely shocked that showed up in 16.2.

  • Yurii: Yeah, I didn’t expect that either, because that’s just a great way to have absolutely undefined behavior. Like if somebody forgot to rebuild their extension against a new minor, then this can be pretty terrible.

  • But my general answer to all of this unless you’re going really deep into the guts of Postgres, unless you’re doing something very deep in terms query planning, query execution, you’re probably okay? But who knows.

    • Jason Petersen in chat: yeah it feels like there’s no stated ABI guarantee across minor updates

    • Jason Petersen in chat: other than “maybe we assume people know not to do this"

    • David Christensen in chat: yeah ABI break in minor versions seems nasty

  • Jeremy S: But it’s not just remembering to rebuild your extension. Let’s let’s suppose somebody is just downloading their extensions from the PGDG repo, because there’s a bunch of them there. They’re not compiling anything! They’re they’re downloading an RPM and the extension might be in a different RPM from Postgres and the extension RPMs — I don’t know that there have been any cases with any of the extensions in PGDG, so far, where a particular extension RPM had to have compatibility information at the level of minors.

    • Shaun Thomas in chat: There was actually a huge uproar about this a couple year ago because they broke the replication ABI by doing this.

    • David G. Johnston in chat: I see many discussions about ABI stability on -hackers so it is a goal.

    • Steven Miller in chat: PGDG is the same binaries for each minor version because the postgres package is only major version, right?

  • Yurii: Yeah, that’s definitely a concern, especially when it comes to the scenario when you rebuild your extensions but just get pre-built packages. It’s starting to leak out of the scope of this presentation, but I thought it was a very interesting topic to bring to everybody’s attention.

    • Jason Petersen in chat: “it’s discussed on hackers” isn’t quite the same as “there’s a COMPATIBILITY file in the repo that states a guarantee”

    • Jason Petersen in chat: (sorry)

  • My last item. Going back to how we ship extensions and why do we need complex build systems and packaging. Oftentimes you want your extensions to depend on some library, say OpenSSL or SQLite or whatever, and the default is to bring the shared dependency that would come from different packages on different systems.

  • What we have found at Omnigres is that it is increasingly simpler to either statically link with your dependencies — and pay the price of larger libraries — but then you have no questions about where it comes from — what what package, which version – you know exactly what which version it is and how it’s getting built. But of course you also have a problem where, if you want to change the version of the dependency it’s harder because it’s statically linked. The question is whether you should be doing that or not, depending on the authors of the extension and their promises for compatibility with particular versions of their dependencies. This one is kind of naive and simple, as in just use static. Sometimes it’s not possible or very difficult to do so, some some libraries don’t have build systems amenable to static library production.

  • What we found that works pretty nicely is using rpath in your dynamic libraries. You can use special variables — $ORIGIN or @loader_path on Linux or macOS, respectively, to specify that your dependency is literally in the same folder or directory where your extension is. So you can ship your extension with the dependencies alongside, and it will not try to load them immediately from your system but from the same directory. We find this pretty pretty useful.

  • That’s pretty much it. Just to recap I talked about the multi-platform experience, the pros and cons of containers, inferencing how you build and how you can build extensions with dependencies, static and rpath dependencies, and the problems with PG minor version differences. If anybody has thoughts, questions, or comments I think that would be a great. Thank you.

Discussion

  • David Wheeler (he/him): Thank you, Yurii, already some good discussion. What else do you all have?

  • David G. Johnston: PG doesn’t use semantic versioning. They we have a major version and a minor version. The minor versions are new releases, they do change behaviors. There are goals from the hackers to not break things to the extent possible. But they don’t guarantee that this will not change between dot-three and dot-four. When you’re releasing once a year that’s not practical if things are broken, you can’t wait nine months to fix something. Some things you need to fix them in the next update and back-patch.

    • Steven Miller in chat: Thank you, this is very useful info

    • Jeremy S in chat: Dependency management is hard 🙂 it’s been a topic here for awhile

  • David G. Johnston: So we don’t have a compatibility file, but we do have goals and if they get broken there’s either a reason for it or someone just missed it. From an extension standpoint, if you want to be absolutely safe but absolutely cost intensive, you want to update every minor release: compile, test, etc. Depending on what your extension is, you can trade off some of that risk for cost savings. That’s just going to be a personal call. The systems that we build should make it easy enough to do releases every “dot” and back-patching. Then the real cost is do you spend the time testing and coding against it to make sure that the stuff works. So our tool should assume releasing extensions on every minor release, not every major release, because that’s the ideal.

    • Shaun Thomas in chat: It’s good we’re doing all of this though. It would suck to do so much work and just become another pip spaghetti.
  • Yurii: That’s exactly what I wanted to bring to everybody’s attention, because there’s still a lot of conversations about this and there was not enough clarity. So that helps a lot.

  • Jeremy S: Did you say release or did you say build with every Miner? I think I would use the word “build”.

  • David G. Johnston: Every minor release, the ones that go out to the public. I mean every commit you could update your extension if you wanted. but really the ones that matter are the ones that go public. So, 16.3 or 16.4 comes out, automation would ideally would build your extension against it run your test and see if anything broke. And then deploy the new [???] of your extension against version 16.3. Plus that would be your your release.

  • Jeremy S: I think there are two things there: There’s rebuilding it — because you can rebuild the same version of the extension and that would pick up if they they added a field in the middle of a struct which is what happened between 16.0 and 16.1, rebuild the same version. Versus: the extension author … what would they be doing? If they they could tag a new version but they’re not actually changing any code I don’t think it is a new release of the extension, because you’re not even changing anything in the extension, you’re just running a new build. It’s just a rebuild.

    • David Wheeler (he/him) in chat: It’d be a new binary release of the same version. In RPM it goes from v1.0.1-1 to v1.0.1-2

    It reminds me of what Alvaro did in his his OCI blog post, where he said you really have to … Many of us don’t understand how tightly coupled the extensions need to be to the database. And these C extensions that we’re we’re building have risks when we separate them don’t just build everything together.

  • David G. Johnston: The change there would be metadata. Version four of my extension, I know it works on 16.0 to 16.1. 16.2 broke it, so that’s where it ends and my version 4.1 is known to work on 16.2.

  • Jeremy S: But there is no difference between version 4 and version 4.1. There’s a difference in the build artifact that your build farm spit out, but there’s no difference in the extension, right?

    • Keith Fiske in chat: Still confusing if you don’t bump the release version even with only a library change

    • Keith Fiske in chat: How are people supposed to know what library version is running?

  • David G. Johnston: Right. If the extension still works, then` your metadata would just say, “not only do I work through version 16.2, I now work through 16.3.

  • Jeremy S: But it goes back to the question: is the version referring to a build artifact, or is the version referring to a version of the code? I typically think of versions as a user of something: a version is the thing. It would be the code of the extension. Now we’re getting all meta; I guess there are arguments to be made both ways on that.

    • Jason Petersen in chat: (it’s system-specific)

    • Jason Petersen in chat: no one talks in full version numbers, look at an actual debian apt-cache output

  • David Wheeler (he/him): Other questions? Anybody familiar with the rpath stuff? That seems pretty interesting to me as a potential solution for bundling all the parts of an extension in a single directory — as opposed to what we have now, where it’s scattered around four different directories.

  • Jason Petersen: I’ve played around with this. I think I was trying to do fault injection, but it was some dynamically loaded library at a different point on the rpath. I’m kind of familiar with the mechanics of it.

    I just wanted to ask: In a bigger picture, this talks about building extensions that sort of work everywhere. But the problems being solved are just the duplication across the spec files, the Debian files, etc. You still have to build a different artifact for even the same extension on the same version of Postgres on two different versions of Ubuntu, Right? Am I missing something? It is not an extension that runs everywhere.

  • Yurii: No, you still have to build against the set of attributes that constitute your target, whether that’s architecture, operating system, flavor. It’s not yet something you can build and just have one binary. I would love to have that, actually! I’ve been pondering a lot about this. There’s an interesting project, not really related to plugins, but if you’ve seen A.P.E. and Cosmopolitan libc, they do portable executables. It’s a very interesting hack that allows you to run binaries on any operating system.

  • Jason Petersen: I expected that to be kind of “pie in the sky.”

  • Yurii: It’s more of a work of art.

  • Jason Petersen: Do you know of other prior art for the rpath? Someone on Mastodon the other day was talking about Ruby — I can’t remember the library, maybe it was ssh — and they were asking, “Do I still have to install this dynamic library?” And they said, “No, we vendor that now; whenever you install this it gets installed within the Ruby structure.” I’m not sure what they’re doing; maybe it’s just a static linking. But I was curious if you were aware of any prior art or other packaging systems where system manages its own dynamic libraries, and use rpath to override the loading of them so we don’t use the system ones and don’t have to conflict with them. Because I think that’s a really good idea! I just was wondering if there’s any sort of prior art.

  • Daniele: There is an example: Python Wheels binaries us rpath. A wheel is a ZIP file with the C extension and all the depending libraries the with the path modified so that they can refer to each other in the the environment where they’re bundled. There is a tool chain to obtain this packaging — this vendoring — of the system libraries. There are three, actually: one for Unix, one for macOS, one for Windows. But they all more or less achieve the same goal of having libraries where they can find each other in the same directory or in a known directory. So you could take a look at the wheel specification for Python and the implementation. That could be a guideline.

  • Jason Petersen: Cool.

  • Yurii: That’s an excellent reference, thank you.

  • David Wheeler (he/him): More questions?

  • Jeremy S: Yeah, I have one more. Yurii, the build inferencing was really interesting. A couple things stood out to me. One that you mentioned was that you look for The META.json file. That’s kind of neat, just that it’s acknowledged a useful thing; and a lot of extensions have it and we want to make use of it. I think everybody knows part of the background of this whole series of meetings is — one of the things we’re asking is, how can we improve what’s the next generation of META.json to make all of this better? Maybe I missed this, but what was your high-level takeaway from that whole experience of trying to infer the stuff that wasn’t there, or infer enough information to build something if there isn’t a META.json at all? Do you feel like it worked, that it was successful? That it was an interesting experiment but not really viable long term? How many different extensions did you try and did it work for? Once you put it together, were you ever able to point it at a brand new extension you’d never seen before and actually have it work? Or was it still where you’d try a new extension and have to add a little bit of extra logic to handle that new extension? What’s your takeaway from that experience?

  • Yurii: The building part is largely unrelated to META.json, that was just primarily the metadata itself. I haven’t used in a lot of extensions because I was looking for different cases — extensions that exhibit slightly different patterns — not a whole ton of them yet. I would say that, so far, this is more of a case-by-case scenario to see for a particular type of or shape of extension what we need to do. But generally, what I found so far that it works pretty nicely for C extensions: it just picks up where all the stuff is, downloads all the necessary versions, allows to discover the new versions — for example you don’t need to update the specification for a package if you have a new release, it will just automatically pick that up rom the list of tags. These these were the current findings. I think overall the direction is promising, just need to continue adjusting the results and see how much further it can be taken and how much more benefit it can bring.

  • Jeremy S: Thank you.

  • Yurii: Any other comments or thoughts?

  • David Wheeler (he/him): Any more questions for Yurii?

  • David Wheeler (he/him): I think this is a an interesting space for some research between Devrim’s presentation talking about how much effort it is to manually maintain all the extensions in the Yum repository. I’ve been doing some experiments trying to build everything from PGXN, and the success rate is much lower than I’d like. I think there are some interesting challenges to automatically figuring out how things work versus convincing authors to specify in advance.

  • Jeremy S: Yep. Or taking on that maintenance. Kind of like what a spec file maintainer or a Debian package maintainer is doing.

  • Yurii: Yeah, precisely.

Wrap Up

  • David Wheeler (he/him): Thanks, Yurii, for that. I wanted to remind everyone that we have our final Mini-Summit before PGConf on May 15th. That’s two weeks from today at noon Eastern or 4 pm UTC. We’re going to talk about organizing the topics for the Summit itself. I posted a long list of stuff that I’ve extracted from my own brain and lots more topics that I’ve learned in these presentations in the Slack. Please join the community Slack to participate.

    The idea is to winnow down the list to a reasonable size. We already are full with about 45 attendees, and we we can maybe have a few more with standing room and some hallway track stuff. We’ll figure that out, but it’s a pretty good size, so I think we’ll be able to take on a good six or maybe eight topics. I’m going to go over them all and we’ll talk about them and try to make some decisions in advance, so when we get there we don’t have to spend the first hour figuring out what we want to, we can just dive in.

    And that’s it. Thank you everybody for coming, I really appreciate. We’ll see you next time

    • Tobias Bussmann in chat: Thanks for the insights and discussion!

    • Jeremy S: Thank you!