Static search trees: faster than binary search

skrebbel · 2025-01-01T13:15:00 1735737300

Totally off topic, but I've started to notice that more and more algorithmic low-ish level content assumes Rust by default. My entire life I've been used to anything algorithmic being either written in sciency pseudocode (invariably in latex-generated PDFs), or plain old C(++), the vulgar latin of computer programming. I know C and I don't really know Rust but nevertheless I love that this is changing! I'd never advise a new programmer interested in low-level/fast stuff to begin with C now, so much good content is written around Rust. Like the old C examples, the post doesn't even say "this is Rust". You're expected to expect it to be Rust.

And, well the language choice doesn't really matter for the core subject here. I could follow this post fine even though I don't really know Rust (just like I assume a Rust programmer could follow a well written algorithms deep dive with examples in C with a bit of dedication). EDIT okok that was a lie, I could follow the first half of this post, ish, and then it got way too deep for me.

But anyway, standardize on Rust, I like it. I'd love if a bit more Zig was thrown into the mix here but either of them feel like a better default than C at this point to me. After decades of C being The Standard for this stuff, I love that this is finally changing.

uecker · 2025-01-01T13:25:59 1735737959

This assumes that Rust is actually a better language. IMHO it isn't.

p2detar · 2025-01-01T13:55:34 1735739734

I’m still unsure of how to feel about Rust. But if it has made it into the Linux Kernel, then I assume adoption is ongoing.

Still, Zig seems somewhat more intuitive to me, even though I’ve used it as little as Rust thus far.

A nice write up on the topic Zig and Rust: https://matklad.github.io/2023/03/26/zig-and-rust.html

edit: typos

deaddodo · 2025-01-01T19:53:37 1735761217

Generally Zig is going to be more intuitive for C developers (people who prefer more low-level and procedural style coding), while Rust would be more intuitive/conducive to C++ developers (those who prefer more abstractions and language level features/guarantees).

Which is totally fine, there's a reason C++ never completely subsumed C; both are valid avenues.

jltsiren · 2025-01-01T15:27:08 1735745228

I think it's almost irrelevant how good Rust is as a language. Rust is winning in many contexts, because you can use it to replace C++, but the standard library and tools are substantially better.

nostradumbasp · 2025-01-01T19:52:30 1735761150

The entire user experience is so much nicer that even if Rust was slightly worse I would prefer it. I don't miss header files, makefiles, not having a package manager, maintaining all the crazy code people wrote themselves as pet projects instead of using an existing project in an ecosystem. Thankfully its not worse, so I don't even have to weigh the odds.

amelius · 2025-01-01T23:23:20 1735773800

Without makefiles, how do you do projects with multiple languages or code generators like lex and yacc?

eslaught · 2025-01-02T01:21:35 1735780895

If you're embedding, say, a C library inside of Rust, you can do so via a so-called "build.rs" script [1]. These run at build time so you can do pretty much whatever you want. They're not necessarily pretty, but given that Cargo isn't going to support other languages natively, it's often a less-evil option.

Then there are build systems that natively support Rust in addition to other languages and know how to plug them together [2] [3] [4]. If you're already using one of these existing tools, then you don't need to roll your own, you can plug it into your existing project.

Of course if you really are using Makefiles in C you're already living in the wild west, so it's on you to figure out how to add Rust to your projects in that case (or switch to a better build system).

[1]: https://doc.rust-lang.org/cargo/reference/build-scripts.html

[2]: https://bazelbuild.github.io/rules_rust/

[3]: https://buck.build/rule/rust_library.html

[4]: https://mesonbuild.com/Rust.html

nostradumbasp · 2025-01-02T00:39:30 1735778370

Well, I don't use makefiles to deploy software with Rust. I also have never used lex or yacc, but I bet there are similar tools in the ecosystem, or wrappers for those. That would obviate what I will offer below.

Often a new language in a project would define an application boundary. So those would be different containers or services. I may deploy via container images, or an OS specific installer, etc. If we aren't crossing an application boundary I may use FFI. Sometimes I use https://rust-lang.github.io/rust-bindgen/ to smooth that over for C dependencies. There is also a nice concept called a build.rs file: https://doc.rust-lang.org/cargo/reference/build-script-examp.... There's also tools like: https://github.com/casey/just and https://sagiegurari.github.io/cargo-make/

I rarely use multiple languages with Rust. A lot of interpreted languages have bindings through crates and can go in to a project through Cargo (python, etc). If it involves JS/TS on desktop, I'm usually using Tauri for that. Guess it depends on the system?

Hopefully that helps. You can also still use a Makefile if you want I just haven't dealt with one in a long time.

richrichie · 2025-01-02T07:02:32 1735801352

Rust is great for pet projects. Otherwise, it is a big security risk for large commercial projects, since it lacks a useful standard library and any non trivial program needs to download a thousand crates to work.

alfiedotwtf · 2025-01-02T12:49:45 1735822185

What languages are you thinking of that have all the libraries you could ever possibly need for a commercial project built-in?

richrichie · 2025-01-02T14:08:34 1735826914

Go and C++ have pretty decent standard libraries, for example.

alfiedotwtf · 2025-01-03T14:50:54 1735915854

Apart from `simd`, and if you allow us to embelish by using Tokio and I'll ignore the C++'s reliance on Boost, there's not really much I can see from C++'s standard library that's anyhow wildly different from Rust.

What in particular do you see lacking in Rust that you can do in C++?

uecker · 2025-01-03T21:05:20 1735938320

Also from legal and liability point of view it opens a can of worms.

tomcam · 2025-01-01T22:02:52 1735768972

Wait isn't cargo a package manager? Not a rust expect so I'm probably missing something

Tanjreeve · 2025-01-01T22:47:30 1735771650

I think their point was that Rust has a batteries included package manager (cargo). Another advantage of which being that all rust projects are pretty standard in that particular aspect.

nostradumbasp · 2025-01-02T00:16:14 1735776974

That is correct.

110jawefopiwa · 2025-01-01T15:37:30 1735745850

I think the main problem is that Rust doesn't allow for basic data structures like graphs or doubly-linked lists without extra effort, which is why I've generally always preferred pseudocode.

emilsayahi · 2025-01-01T20:18:58 1735762738

I often hear this and am confused; not only are things like 'object soup'[0] possible and straightforward (putting things in collections and referring to them by indices), I never concretely hear why a graph or doubly-linked list becomes uniquely difficult to implement in Rust (and would genuinely be curious to learn why you feel this way). If you needed such data structures anyway, they're either in the standard library or in the many libraries ('crates' in Rust-lingo) available on Rust's package registry[1]---using dependencies in Rust is very straightforward & easy.

[0]: https://jacko.io/object_soup.html

[1]: https://crates.io/

amelius · 2025-01-01T23:26:36 1735773996

Object soup is like reinventing the heap, badly.

oconnor663 · 2025-01-03T05:37:39 1735882659

I have two different reactions to this:

- From the perspective of e.g. C++, I don't think it's reinventing the heap. We often prefer to use indexes into a std::vector or similar, even when putting everything in std::shared_ptr is an option, because the dense layout of the vector gives us the blazing fast cache performance we crave. It also avoids memory leaks from reference cycles.

- From the perspective of e.g. Python, yeah it's reinventing the heap. Unless you need to serialize the whole collection into JSON or something, it's much less convenient. But (almost) no one uses Rust instead of Python just for the convenience. (There are dozens of us! Dozens!)

MobiusHorizons · 2025-01-02T02:56:13 1735786573

One of the reasons to reach for a low level language is to have control over tradeoffs specific to your use case instead of using some pre-packaged option. Data structures are a very good example. Generic data structures like those in the standard library will likely suffice for many use cases, but a low level language must give the user the control to write their own. It’s just table stakes for this type of language.

rcxdude · 2025-01-02T09:19:19 1735809559

Thankfully, rust does give you that level of control, you can write your own doubly-linked list implementation if you want, and you can pick your safety-overhead tradeoff.

itishappy · 2025-01-02T15:29:31 1735831771

The ownership and model makes working with pointers a lot more complicated, which is the core of a good graph or doubly-linked list datatype.

> Learn Rust With Entirely Too Many Linked Lists

> In this series I will teach you basic and advanced Rust programming entirely by having you implement 6 linked lists. In doing so, you should learn:

* The following pointer types: &, &mut, Box, Rc, Arc, *const, *mut, NonNull(?)

* Ownership, borrowing, inherited mutability, interior mutability, Copy

* All The Keywords: struct, enum, fn, pub, impl, use, ...

* Pattern matching, generics, destructors

* Testing, installing new toolchains, using miri

* Unsafe Rust: raw pointers, aliasing, stacked borrows, UnsafeCell, variance

https://rust-unofficial.github.io/too-many-lists/

papichulo2023 · 2025-01-02T07:32:11 1735803131

Not sure about Rust std. But on cpp I heard a lot low-level companies avoid using its standard library since it is too bloated and optimized for too many cases.

fpoling · 2025-01-01T18:27:01 1735756021

Well, one can always emulate that in Rust with array indexes. And it will be closer how the actual CPU works where a pointer is just an offset into a memory chunk.

Perhaps this is the reason performance oriented articles prefer Rust. In allows to skip a lot of memory allocation while still catching memory-safety bugs, which is a hard problem with C++.

szundi · 2025-01-01T20:35:10 1735763710

How could something to implement a pointer be more straightforward than a pointer? People seem to like Rust in excessive, sometimes botheringly excessive proportions.

oconnor663 · 2025-01-03T04:57:33 1735880253

I don't think the indexing strategy is more straightforward in general. It's a thing you have to learn, and the fact that you have to learn it is part of Rust's famously steep learning curve. That said, there are some common cases where you might wish you'd used indexes instead of pointers in other languages too:

- Working with resizeable collections like std::vector in any non-GC language, where resizing invalidates pointers.

- Serialization. If your objects contain indexes to each other, it's easy to turn them into e.g. JSON. But if they contain pointers to each other, it's tricky.

amelius · 2025-01-01T23:42:55 1735774975

How are you going to garbage collect inside your array? Sounds like you are just evading the problem and reinventing the heap.

burntsushi · 2025-01-02T03:26:00 1735788360

Not easily. But not all use cases require it. The regex crate makes heavy use of this pattern for finite machine states, for example. But this fits nicely with an arena allocation pattern, so everything can just be dropped at once.

Despite this, it's one of the fastest regex engines around: https://github.com/BurntSushi/rebar#summary-of-search-time-b...

Here's a related but more isolated analysis: https://github.com/burntsushi/rsc-regexp

amelius · 2025-01-02T15:27:28 1735831648

You could probably make an even faster regex by JIT-compiling the resulting automaton.

burntsushi · 2025-01-02T16:23:40 1735835020

The benchmarks I linked include multiple regex engines with JITs.

oconnor663 · 2025-01-03T05:47:34 1735883254

I'm not sure whether this is exactly what you mean, but when you need to support deletions you can switch to a "generational" collection like Slab or SlotMap. Your indexes are still integers, but under the covers some of the bits get repurposed to disambiguate new elements that reuse old slots.

thayne · 2025-01-01T22:54:00 1735772040

It doesn't though.

Those are fairly straightforward to implement if you are ok using a reference counted type (along with weak references), although that will hurt performance. That would be similar to using a shared_pointer in c++.

And you can implement code similar to what you would in c or c++ if you use raw pointers and unsafe code.

Where you run into difficulties is if you want something that is fast and safe. And then you need a more complicated design that probably involves using indices into a Vec instead of pointers.

amelius · 2025-01-01T23:25:12 1735773912

Using a Vec for doubly linked lists sounds like overkill. There's got to be more efficient and more elegant way.

thayne · 2025-01-02T09:02:18 1735808538

Using a Vec actually has some benefits beyond the safety guarantees from rust. There are fewer allocations, and you probably are more likely to get nearby nodes on the same cache line.

unrealhoang · 2025-01-02T00:08:17 1735776497

Yes, there is, just use plain old pointers. Who cares about UB anyway.

norman784 · 2025-01-01T13:53:17 1735739597

I would argue that in some sense it is, just because of the enforced safety and correctness.

uecker · 2025-01-01T22:13:00 1735769580

It does neither enforce correctness nor safety. It enforces memory safety in programs that stick to the safe subset. But don't get me wrong, Rust certainly has many good ideas and the focus on memory safety is good. I still do not think it is a great language.

peoplefromibiza · 2025-01-01T14:07:44 1735740464

So it's Ada since 1980, before C++ was even created.

sroussey · 2025-01-01T16:05:25 1735747525

I wrote my first optimizing compiler for Ada in C++ around 1990.

matt3210 · 2025-01-02T05:22:38 1735795358

The problem with rust is that it’s has lots of features but isn’t c like. Python doesn’t have much syntax so it’s fine to be not c like but rust has 100% new syntax and a lot of syntax with respect to c. I can pick up c like languages easy but rust just makes me mad to look at.

curiouscoding · 2025-01-01T15:36:40 1735745800

Anything in particular that threw you off? I'd be happy to add a few words to briefly explain some of the less intuitive rust syntax.

skrebbel · 2025-01-01T15:47:40 1735746460

Nah just the final bits, stuff like `Simd::<u32, 8>::splat(q)`, I'm not sure what splatting is or how Rust's Simd library works or, admittedly, how SIMD works in detail at all. So eg I'm not sure what that 8 is doing there in the type parameters, details like that. Maybe this isn't a Rust thing but a SIMD thing, btw, I don't know much Rust but I also never wrote any SIMD code ever. I don't know how the article could be clearer, IMO once you go into the deep nitty gritty optimization stuff you just gotta assume the reader knows the tools you're using. I'm OK with dropping out halfway cause the core idea is there even before you squeeze out all the perf.

burntsushi · 2025-01-01T16:13:13 1735747993

`Simd::<u32, 8>` is describing a vector with 8 lanes, where the width of each lane is 32-bits. For targets that support it, that should get you a 256-bit register.

The `splat(q)` constructor is just saying, "populate each lane with `q`." That is, the value is "splatted" across every lane in the vector.

The main idea of SIMD is simple: you represent multiple points of data in the same register and use specialized CPU instructions to operate on those points simultaneously. The difficulty of SIMD is coming up with clever algorithms that use the available instructions in an efficient way.

skrebbel · 2025-01-01T16:49:05 1735750145

Ahh right, I've been doing too much TypeScript lately. I thought a type parameter couldn't impact behavior but only typechecking but clearly in Rust (like in C++) it can. Thanks for the explanation!

Ah so splat is like a memset for the entire vector to the same value. OK makes sense, I bet that wasn't actually a Rust-ism at all, just basic SIMD lingo I didn't know. Thanks again!

shawn_w · 2025-01-02T04:30:58 1735792258

It's the Intel intrinsic `_mm256_set1_epi32()` except with an amusing name. Got to give Rust the win here.

skrebbel · 2025-01-02T20:08:09 1735848489

wow yeah, i way prefer Rust's notation there! but i assume you could make a similar zero-cost abstraction with some mildly crazy C++ template programming.

bennythomsson · 2025-01-01T17:03:11 1735750991

Nice to see other real experts (and a bit of celebrities) in here.

(For the uninitiated: Burntsushi of ripgrep fame.)

skrebbel · 2025-01-01T17:54:39 1735754079

yeah tbh i was a bit starstruck that burntsushi would reply to my comment explaining what i assume is utter SIMD basics. that must be how swifties feel when she (or, likelier, her PR drones) click "like" on their insta comments.

freeopinion · 2025-01-01T21:44:50 1735767890

Does burntsushi have PR drones (who can explain SIMD)? :)

User23 · 2025-01-01T16:13:36 1735748016

Fun fact: a lot of that “sciency pseudocode” is actually Algol.

mananaysiempre · 2025-01-01T20:31:46 1735763506

Because that’s the original problem statement for Algol! As in, we’ve been publishing all that pseudocode for quite a bit and it seems like conventions have emerged, let’s formalize them and program in that. ... People were markedly more naïve back then, but then that’s sometimes what it takes.

Narew · 2025-01-01T14:46:05 1735742765

I don't know if it's good or bad, with pseudo code you put no constraints on how it should be implemented. It's known that some kind of algorithm are really hard to implement in Rust (every one use the link list data structure as an exemple). So having the article that use rust is good to see it can fit to rust constraints but at the same time does this constraint limit the algorithm itself ?

npalli · 2025-01-01T16:00:09 1735747209

Quite honestly, doesn't seem like Rust is a win here over C++. In fact, C++ has fewer sigils that make it somewhat easier to read sort of striking the balance between being simple enough (C-like) and abstractions (templates). Readability wise, I would have preferred Julia as that would have been the cleanest explanation of the code through the different topics at sufficient level of detail. Alas, it seems to have stalled somewhat (relatively speaking). It also doesn't help that every Rust article has Rust advocates jumping on you with "WAHT ABOUT SAFETY" which is off-putting since not every program needs to deal with all the complexity of the Rust memory/lifetime model to enable safety (simple bounds checking etc. might be sufficient).

fpoling · 2025-01-01T18:36:53 1735756613

The article is about how to squeeze absolute maximum in performance from the hardware. So you need a language that allows a relatively low-level access to hardware. And big plus of Rust is that it has standard SIMD libraries while C++ will get them only in C++26

EGreg · 2025-01-02T01:15:54 1735780554

And I thought it would be python?

Scientific stuff (scipy) and now all the ML stuff too

orlp · 2025-01-01T02:02:35 1735696955

One dimension that is not explored is partitioning the queries in batches. The primary cost is doing lookups on the out-of-cache table, so if you have a sufficiently large amount of queries you can resolve a couple layers of the tree in one step while those layers are in cache, grouping them based on where they land deeper in the tree, and then resolving all those queries that touch the same deeper part of the tree in one batch as well.

In theory, with an infinitely large amount of input queries you will never have to do an out-of-cache lookup, it can be completely amortized away into linear scans.

That said, you now end up with a bunch of results that need to be put back into the correct output order, which likely makes it not worth it. But if the operation can be fused into a reduction (e.g. find the sum of the binary search results) or the output order does not matter in some other way then all of a sudden it might make sense.

mlochbaum · 2025-01-01T13:41:51 1735738911

I have explored it, see https://mlochbaum.github.io/BQN/implementation/primitive/sor....

I implemented this method in Dyalog 18.0 with BlockQuicksort-like partitioning, using vectorized comparison with bit-boolean output. It's faster than you'd expect, maybe twice as slow as regular binary search when searched values are in cache, and better once they fall out of L2. But Dyalog reverted to 17.1 for later versions so you won't see it in a new download. It'll probably make it into CBQN eventually, perhaps with radix partitioning. Note that both quicksort and radix partitioning can be done and undone in a cache-friendly way.

Unlike quicksort, there's no issue of pivot selection since you always choose the midpoint of the searched values. However, there's a complementary issue of memory if the partitions become unbalanced, because the larger partition can require saved memory of roughly the depth times the number of elements. With a bit per comparison it's bounded by the size of the input.

mlochbaum · 2025-01-01T17:19:31 1735751971

Wasn't the best section link, last paragraph here has more detail: https://mlochbaum.github.io/BQN/implementation/primitive/sor...

curiouscoding · 2025-01-03T01:07:34 1735866454

Nice overview of sorting methods, thanks for sharing!

I also looked a bit into radix and distribution sort at some point over the past year, but in the end high performance sorting is actually too big of a thing to just do quickly on the side, as your post well shows :") In fact I wasn't aware of the associativity issues for radix sort. That's definitely something to keep in mind and investigate.

Will definitely refer back to it once I'm looking at sorting again in more detail at some point!

tmyklebu · 2025-01-01T05:08:59 1735708139

There are theory papers on "buffer trees"---B-trees where each node is augmented with an O(B)-length array of pending updates and queries. I believe there were also some attempts at building SQL databases based on them. It sounds like you're reaching for the same trick.

koverstreet · 2025-01-01T05:30:40 1735709440

that's a hybrid compacting data structure: compacting within the btree node, normal btree topology otherwise.

And it works much better than pure compacting (i.e. the leveldb lineage), because you avoid lock contention at the root on multithreaded update workloads, and the compaction/resort is much lower overhead when it fits in l2.

incidentally, there's a filesystem that uses this technique.

loeg · 2025-01-01T06:10:18 1735711818

> incidentally, there's a filesystem that uses this technique.

BetrFS?

loeg · 2025-01-02T03:49:37 1735789777

Sounds like "fractal trees" and TokuDB.

mikepurvis · 2025-01-01T02:29:16 1735698556

Interesting stuff, definitely the kind of real world optimization that happens when you’re able to look at actual access characteristics rather highly abstracted models.

At one level you could just be saving the results into a hashtable to bubble them out again, but at the extreme end the lookup is actually spread across multiple machines in a cluster, so it’s more like throwing the query to the proper one with the right chunk of the index, along with instructions about where the final RPC is supposed to go with the result.

hinkley · 2025-01-01T20:54:00 1735764840

There was some work a while back on streaming data past queries, but you need a fairly bounded data set for that to work. Having ten years of historical data in a data set would gum that up severely.

natmaka · 2025-01-01T06:01:24 1735711284

It enhances the throughput (on average everyone waits less) at the price of a higher max latency (some, who posted a request mobilizing a very-recently-out-of-cache-index, will wait way, way more...), isn't it? In the real world those worst cases quite often kill such optimization.

The (data size / cache size) ratio and queries local (in time) dispersion are key.

Bulat_Ziganshin · 2025-01-03T14:49:49 1735915789

higher throughput means we can serve more people, not that anyone will be served faster. it's like a multi-lane highway

curiouscoding · 2025-01-01T15:47:58 1735746478

Ah yes, I did consider sorting the queries at some point but I guess I then forgot about it again. If the queries are random and much less than the input size, probably most queries will hit different cache lines in the final layer, and I suspect benefits will be limited at maybe 2x best case since the last layer is where the bottleneck is.

Also (radix) sorting is very memory bound usually, and we probably need to sort in at 16^2=256 buckets or so to get sufficient reusing of the higher layers, but I don't have the numbers of what a round of radix sort takes. (My guess is order 1ns per query? Maybe I'll find time to investigate and add it to the post.)

Bulat_Ziganshin · 2025-01-03T16:04:37 1735920277

high-performance sorting algos do either merging or partitioning. I.e., you merge R input streams into one, or split one input stream into R (for quick, radix and sample sort).

1. For merge sort of N elements, you have to perform log(N)/log(R) passes

2. For sample sort - C*log(N)/log(R), where С depends on the distribution of your data, there are no strict guarantees

3. For radix sort of K-byte elements exactly K passes (indeed, 256 ways is optimal according to my tests), which is usually larger than the previous value

While Mergesort looks preferable since it uses the least and fixed amount of passes, the merging code itself is less efficient - there are not much independent CPU operations, making it slower than samplesort and especially radix sort for small inputs.

So, it seems that the best strategy combines two different levels - for larger blocks you absolutely need to minimize the amount of passes [over memory] and employ multithreading, while for smaller blocks we need to minimize the amount of CPU operations and increase ILP.

Today two well-recognized algos are IPS4O for larger blocks and Google's SIMD QuickSort for smaller ones.

bobmcnamara · 2025-01-01T05:31:27 1735709487

This works really well for subset testing, especially if sets sorted.

Walk the larger tree, using the smaller tree.

koverstreet · 2025-01-01T04:07:56 1735704476

That presumes there's both locality in your queries, and it's not an online system - result latency doesn't matter much.

That's not terribly common.

wolfgangK · 2025-01-01T01:13:54 1735694034

Amazingly thorough ! I love how the author leaves no stone unturned. I had no idea you could do the kind of low level efficiency shaving in Rust. I wonder how a C++ implementation with https://github.com/jfalcou/eve would compare.

curiouscoding · 2025-01-01T01:20:33 1735694433

Thanks! It's somewhat tiring to not have loose ends, but I agree it pays off :)

Doing this stuff in Rust is absolutely possible, and I'd do it again since my C++ days are now past me, but the endless transmuting between portable-simd types, plain rust arrays, and intrinsic types is quite annoying.

Also rust is not made for convenient raw pointer arithmetic. There plain C would be much more to the point.

ant6n · 2025-01-01T09:37:43 1735724263

You kind of lost me towards the end, so I’m not sure whether you attempted it, but I was wondering whether it would be possible for the internal nodes to store only the upper 8/16 bits (that are nonzero for the current subtree). This implies that one 64 byte cache line stores 32 or 64 entries instead of 16 (or better: 31 or 62/63, since u may need some bookkeeping dat).

The next level would be to keep track of how many prefix bits are implied by the parents, so leaf nodes could perhaps also only use 8/12/16 bits, if the the higher bits are implied by the parent. or instead of bit masks, use offsets (i.e. leaves store k bits offset from parent value).

That may screw around with the branch predictor, and may work very good with evenly distributed data vs uneven data.

curiouscoding · 2025-01-01T16:15:48 1735748148

Ohh yes some good ideas there! I've thought about pretty much exactly this at some point but then didn't end up including it. But I do think it's quite promising, and most of the bookkeeping can be made to be branchless.

On the other hand, most of the latency is in the last few layers, and probably there isn't as much to be saved there.

The biggest problem might be that the bottom layer will anyway have to store full-width numbers, since we must be sure to have the low-order bits somewhere in case they weren't covered yet in earlier layers. Or we could have variable width encoding per node maybe (instead of per layer) but that does sound a bit iffy on the branch predictor.

In the end I guess I kinda like the simplicity and hence reliability of the 'just do the full tree and the first layers are cheap anyway' approach. Probably another factor 2 speedup is possible on specific datasets, but anything beyond this may not be reliably good on worst case inputs (like is the issue for the prefix partitioning methods).

ant6n · 2025-01-01T20:46:18 1735764378

The lowest level could be encoded with whatever number of bits is required for the numeric distance between the leafs. With proper book keeping, the full 32 bit values don’t need to be stored. The value for each node should be something like (parent << shift + node), where node has ideally 8 Bits, or maybe 16 bits.

It kind of comes down to how to deal with bad distribution of values, like large differences between adjacent values. One could for example deal with this by deliberately leaving „gaps“ in the tree, like using a 59-ary node on places (make the last values in the array large, so they will never get visited). With dynamic programming, perhaps an optimal distribution of the leaf level could be had - although it requires more bookkeeping bits of one needs to have the index of the found value, not just whether the value is in the tree.

The question could also be whether it could be interesting to store delta valeues, so that 63 8-bit valeues gives a larger numeric range - this means computing some sort of cumulative sum on on the 63 values, not sure whether there is a simd instruction for that.

One more thought: if there is different ways different layers of the tree are stored, but they are always the same for each layer, the code be unrolled, with every layer having different code to deal with it.

Last thought: if the index is not important to know, just whether the value is in the set or not, one could use a bitmap as a data structure. It would require 256MB for the 32bit space, but large spans of zeros imply empty pages.

rurban · 2025-01-01T14:27:52 1735741672

There are stones still unturned. For typical unicode table lookup's you'd need batches for all characters of a word. It would be much faster to lookup eg 16 chars at once, to optimize cache misses.

But when I tested this even Eytzinger was too slow.

secondcoming · 2025-01-01T12:13:49 1735733629

Have you used Eve successfully for anything? It’s very confusing, it took me far too long to uppercase a string with it.

wolfgangK · 2025-01-01T23:42:44 1735774964

Just played a bit with it. Were you working with ASCII ? This example didn't work for you ? https://github.com/jfalcou/eve/blob/a141ba93048bb2916c2157a9...

npalli · 2025-01-01T01:33:29 1735695209

Yeah, Compare to highway as well

https://github.com/google/highway

Bulat_Ziganshin · 2025-01-03T17:39:00 1735925940

I once searched github for simd libraries, sorted by popularity, and added most popular of them to my list: https://github.com/stars/Bulat-Ziganshin/lists/simd

indeed, highway is the popularity leader, it implements lots of SIMD operations and supports lots of hardware including SVE/RVV with runtime SIMD width, although IMHO it has a bit longer learning curve

koverstreet · 2025-01-01T01:51:19 1735696279

Nifty thing about eytzinger trees is that there's a formula for converting from a node index to its position in an inorder traversal.

This is useful if your primary keystore is a normal sorted set of keys - easier to work with - you then don't need to store explicit pointers.

Also, he didn't mention that with eytzinger you can prefetch multiple loop iterations in advance - use 1-based indexing so that sibling nodes line up nicely on cachelines.

curiouscoding · 2025-01-01T16:41:46 1735749706

You're right. While I wasn't very explicit about this (because algorithmica and the linked paper spend plenty of words on it already), the code snippet for Eytzinger does indeed do both the prefetching and the formula.

In fact, I'm pretty sure a similar formula could be made to work for higher branching factors, although it would surely be slower. (Probably it depends on the number of times 17 divides the final index it so, which is not great, but with B=15 it would be the number of factors of 16 which is easy again.) The more annoying issue with not storing everything in the final layer is that we have to keep a running-answer for each query as we go down the tree, which I suspect will add measurable runtime overhead. But maybe that's a tradeoff one is willing to make to avoid the 6.25% overhead.

sujayakar · 2025-01-01T01:53:01 1735696381

this is unbelievably cool. ~27ns overhead for searching for a u32 in a 4GB set in memory is unreal.

it's interesting that the wins for batching start diminishing at 8. I'm curious then how the subsequent optimizations fare with batch size 8 (rather than 128).

smaller batch sizes are nice since it determines how much request throughput we'd need to saturate this system. at batch size 8, we need 1s / ~30ns * 8 = 266M searches per second to fully utilize this algorithm.

the multithreading results are also interesting -- going from 1 to 6 threads only improves overhead by 4x. curious how this fares on a much higher core count machine.

curiouscoding · 2025-01-01T16:29:05 1735748945

Just fyi: the throughput numbers with batching are per _query_, not per _batch_, so I think the *8 is too optimistic ")

I suspect that at higher core counts, we can still saturate the full RAM bandwidth with only 4-5 cores, so that the marginal gains with additional cores will be very small. That's good though, because that gives CPU time to work on the bigger problem to determine the right queries, and to deal with the outputs (as long as that is not too memory bound in itself, although it probably is).

Bulat_Ziganshin · 2025-01-03T17:17:59 1735924679

with m/t, the algorithm is memory-bound, so the performance should be determined strictly by the memory throughput

gorset · 2025-01-01T13:18:10 1735737490

The number of unique int32 values is not that great, and a full bitset would only be 512MB. Hard to bit a dense bitset in performance.

As a general purpose data structure, I would look at roaring bitmaps for this purpose, which has good trade-offs with great performance for most use-cases.

If only simple lookups are needed over a static set, then it's worth looking at minimal perfect hash functions (https://sux.di.unimi.it).

curiouscoding · 2025-01-01T16:00:17 1735747217

Hmm, bitmaps is an interesting idea! If the data is dense enough, then yeah I guess a quick linear scan would work.

There's also the extreme of simply storing the answer for each possible query as a u32 and just index the array, but there the overhead is much larger.

Rank-select is also interesting, but I doubt it comes anywhere close in performance.

I happen to also be working on a minimal perfect hashing project that has way higher throughput than other methods (mostly because batching), see https://curiouscoding.nl/posts/ptrhash-paper/ and the first post linked from there.

gorset · 2025-01-01T16:39:35 1735749575

Nice work! I love the care and discussion around low level optimizations, as such things are often ignored.

There are a lot of interesting variants of rank/select and other succinct data structures which has interesting tradeoffs. Maybe most useful when compression is critical. At least my own data structures can’t compete with the query performance you are showing, but they are great when the memory footprint matters.

shawn_w · 2025-01-01T02:54:25 1735700065

All this nattering about rust and C++, and then there's me thinking about how to do it in Common Lisp, possibly keeping the simd bits.

wk_end · 2025-01-01T06:58:17 1735714697

FWIW, I’d love to read a blog post about that, too.

shawn_w · 2025-01-02T03:59:52 1735790392

I'm not up to that, but I have a working S-Tree builder and basic AVX2-using `lower-bound` for it (as described in the Algorithmica article linked to in the post) up and running in SBCL. Haven't played with any of the fancier optimizations yet, much less done any benchmarking. Should put it up in a gist or paste site to share...

openquery · 2025-01-01T04:12:02 1735704722

This is really interesting and a thorough write up. Thanks to the author for sharing their work.

Whenever I read about super low-level optimisation, my immediate feeling is that of gratitude to the author for spending so much time shaving off nanoseconds which the entire SE community gets to enjoy.

I wonder how much time humanity has collectively saved simply as a result of how all these optimisations stack up.

curiouscoding · 2025-01-01T16:46:33 1735749993

Assuming this is going to save someone 10ns per query in the end and I spent 150h on coding and writing this up, we only(?) need around 10^14 queries to make it worth it!

But yes, as new technologies stack up, we achieve exponential growth in throughput in the end, which usually enables new science :)

DerSaidin · 2025-01-01T06:19:42 1735712382

I think there is an error in 1.7 figure 3: Layer 1 should have a 10 instead of a 12.

Edit: Also, 1.3 figure 1: Should the y-axis label be "Inverse throughput" to match later figures?

curiouscoding · 2025-01-01T16:47:23 1735750043

You're right about the mistake in the figure. Will fix soon, thanks :)

And yeah, maybe I should just label all of then with throughout for consistency.

topbanana · 2025-01-01T12:13:24 1735733604

Interesting. If you need to support occasional writes, you could have a large static search tree and a small writable tree that you check afterwards. The when you have enough changes, you could occasionally ship those changes into a new version of the static tree

westurner · 2025-01-01T00:58:18 1735693098

suffix-array-searching/static-search-tree/src/s_tree.rs: https://github.com/RagnarGrootKoerkamp/suffix-array-searchin... partitioned_s_tree.rs: https://github.com/RagnarGrootKoerkamp/suffix-array-searchin...

estebarb · 2025-01-01T01:49:15 1735696155

Recently I tried to optimize a set implementation and found that interpolation search works quite well, being competitive with HashSet in C# (both single digit ns for my use case), with zero memory overhead. Unfortunately, it only works well with uniform data, so I guess I'll give static search trees a try. This explanation was clear and extremelly well detailed.

curiouscoding · 2025-01-01T16:50:20 1735750220

At some point I was playing around with interpolation search on the human genome dataset, and it was really really terrible. It had some very bad worst case behaviour when the input data has big plateaus and you're searching for something around the edges of the plateau. This can be fixed by only using interpolation for the first few iterations and doing binary search for as long as needed afterwards, but that kills a lot of the predictability of the current approach. So while I really like it in theory, in practice I'm not quite convinced it can be made to work efficiently.

amluto · 2025-01-02T04:40:46 1735792846

Given the fact that the keys are fixed-size integers with a not-too-terrible distribution, I would consider chasing the partitioning idea a bit harder, and possibly even removing all the fancy S-tree parts. Some ideas:

For 32 bits, using the first few or even a lot (16? 20?) bits to index an array and then chasing down the remaining bits somehow (adaptively depending on size of that partition?) seems like it could work.

In this direction, and with really impressive asymptotic performance, there’s the van Emde Boas tree and its variants.

NooneAtAll3 · 2025-01-01T02:11:34 1735697494

I wish graph did not show 2 values with the same blue color

jasonjmcghee · 2025-01-01T02:15:41 1735697741

It wasn't only the one graph - there's another graph that has a whole spectrum of blue colors - nearly impossible to tell which is which without going in with an eye drop tool.

I found this to really detract from an otherwise interesting post.

curiouscoding · 2025-01-01T16:52:22 1735750342

You're absolutely right. At some point I spent so long working on the plotting code that I really couldn't be bothered anymore to make it prettier. Hopefully I eventually find some motivation to fix this.

jules · 2025-01-01T18:50:07 1735757407

Nice post. Would the larger amount of code result in different performance in a scenario where other code is being run as well, or would the instruction cache be large enough to make this a non-issue?

owenthejumper · 2025-01-01T02:15:24 1735697724

Elastic Binary Trees https://wtarreau.blogspot.com/2011/12/elastic-binary-trees-e...

nielsole · 2025-01-01T19:33:17 1735759997

I've never had the use case for this amount of throughput. What are use-cases for this. From the website I presume this is gene-matching related?

curiouscoding · 2025-01-01T21:53:26 1735768406

Jup. I've added to the future work section now that the plan is to use this to speed up suffix array searching. Suffix arrays are pretty regularly used to build indices over say a human genome, that can then be queried.

And since DNA sequencers are still increasing there throughput and accuracy, it's always good to have faster algorithms as well.

marginalia_nu · 2025-01-02T02:20:23 1735784423

I'm using a variant of static search trees in my search engine index though in this scenario the main bottleneck is block reads from disk, but the thinking is relatively similar to cache lines.

Use case is to query hundreds of gigabytes of data in tens of milliseconds.

(Skiplists is also used by some search engines.)

Bulat_Ziganshin · 2025-01-03T17:11:49 1735924309

JOIN SQL operation is another usecase

vlovich123 · 2025-01-01T07:55:34 1735718134

Is binary search on su ch large integer data sets common? I guess maybe time series data. But I would think that this isn’t amenable to representing strings?

throw-qqqqq · 2025-01-01T12:04:27 1735733067

Hash the string to convert to integer -> double hash or truncate to lowest 32 bits and this technique works for strings as well.

You can replace string with any data structure you can hash or index really.

vlovich123 · 2025-01-01T17:31:02 1735752662

Ah but this then absolutely fails for range queries but makes sense if you want to build an index for point lookups.

throw-qqqqq · 2025-01-01T19:02:18 1735758138

Definitely! If the integers are (truncated) hashes, ranges are completely meaningless.

But what would a range over strings even represent?

Perhaps if you e.g. use string indices into an array of sorted strings (by length, by prefix or whatever), you could use it for something. Depends on the application I guess :)

vlovich123 · 2025-01-02T01:03:08 1735779788

Range queries over strings are extremely common. For example, databases like cockroachdb and yuggabytedb use the table name as a prefix which then requires a range query to narrow down queries against a specific table.

throw-qqqqq · 2025-01-02T09:29:33 1735810173

Ah, I misunderstood what you meant by range :)

For such prefix queries, I think a radix-tree/Trie or ordered collections are more useful (than the index-based collection I initially mentioned).

Thanks for following up :)

vlovich123 · 2025-01-03T02:56:08 1735872968

But then why bother with any of this at all instead of building a perfect hash function like BoomPHF

swiftcoder · 2025-01-01T09:42:36 1735724556

Absolutely love the level of detail here! Not often you see the process of optimising something spelled out in so much depth

anamax · 2025-01-02T01:10:19 1735780219

This looks like something that patricia trees would be good for.

Then again, Eytzinger looks like binary heaps.

purple-leafy · 2025-01-01T02:24:56 1735698296

cant understand the comp sci stuff yet, but love the way this site looks

curiouscoding · 2025-01-01T16:56:32 1735750592

Thanks! I did spend some time fiddling with CSS to make the yellow highlights so that the titles stand out a bit more, and to improve the code blocks, but otherwise it's a relatively common theme for Hugo. (I think it's linked in the footer.)

jokoon · 2025-01-01T19:40:00 1735760400

so a tree when the data cannot be updated?

What is the point?

curiouscoding · 2025-01-01T22:03:12 1735768992

I've added a little motivation section on this :)

The final goal is to index DNA, say a human genome, or a bunch of them, and this is static data.

Then as new DNA comes in (eg is read by a DNA sequencer) we can efficiently query against the index built on the reference genome, and this reference is often fixed.

astronautas · 2025-01-01T10:25:22 1735727122

Cool! Thanks for the investigation.

xpe · 2025-01-01T21:23:50 1735766630

I predict that more and more of these kinds of optimized algorithms will be designed / written / assisted by machines.

I wonder when the next such malicious algorithm(s) passes under our noses, and what its hidden purpose might be.

ryao · 2025-01-01T01:04:14 1735693454

This would have been easier for a larger number of people to read if it had been in C or Python.

unrealhoang · 2025-01-01T01:38:41 1735695521

What a weird thing to complain about. The Rust code in TFA are purely arithmetic and imperative function calls. Any C developer that is good enough to care about the algorithm mentioned will have no issue reading such Rust code, heck, they could even throw the code to chatgpt to translate for them no problem.

Such a good and detailed article that packed with techniques that are applicable everywhere yet the complaints are: “but rust”.

Regarding Python, how could such optimizations be implemented in Python? Generating LLVM bytecodes directly aside.

shawn_w · 2025-01-01T04:22:23 1735705343

Fluent in C and C++ among others, know nothing about Rust. It's mostly easy to follow, but I'm having trouble with notation like `&[u32]` and `&[u32; P]`. Arrays that hold unsigned 32 bit integers? But there's `Vec<u32>` which also seems like an array type? Is it like the difference between C++ primitive arrays and std::vector?

loeg · 2025-01-01T06:16:53 1735712213

`&[u32]` is essentially `std::span<const uint32_t>`.

`&[u32; P]` is similarly something like `std::span<const uint32_t, P>`.

`[u32; P]` (no `&`) is essentially `std::array<uint32_t, P>`.

`Vec<u32>` is essentially `std::vector<uint32_t>`, yes.

shawn_w · 2025-01-01T17:54:20 1735754060

Thanks

dwattttt · 2025-01-01T05:03:35 1735707815

For `&[u32]` and `&[u32; P]`, those are both references to an array of unsigned 32 but integers, however the former is to an array of any size (and carries its size at runtime), whereas the latter is to an array with an explicitly known size at compile time. It's unusual in C, I think you can express it as `int[24] *` (or possibly some other permutation, I'm away from a compiler).

Vec<u32> is pretty much like std::vector compared to them, yeah.

ryao · 2025-01-01T01:56:28 1735696588

I have not studied Rust and find its syntax alien. It looks like a cross between a functional language and C++. Trying to read it is like forcing myself to read Italian. I don’t know Italian, but have studied Spanish and Latin. I can sort of figure things out if I put in effort, but there is always uncertainty when I do. That is the problem with reading Rust.

This could be addressed if I just learned Rust (much like my difficulty with Italian could be addressed by studying Italian). However, I already know at least 10 programming languages. I am unwilling to learn the latest fashionable language unless it can promise me that the language will not change and there will never be another fashionable language for me to learn. Given that Rust rejected the standardization process that C and C++ embraced, and Zig is expected to replace Rust one day, I doubt Rust can make such assurances.

As for Python, it is often used for teaching CS and everything except the assembly code is doable in it. Using low level assembly from Python would require using the FFI. Also, Python has nothing to do with LLVM.

unrealhoang · 2025-01-01T02:08:43 1735697323

Suck to be you, I guess? If you don’t care enough then you don’t care. You wouldn’t bitch when there’s an article written in Italian with good knowledge inside: “but Italian”. You either trying to read using translation tool or ignore it entirely.

I didn’t say python has anything to do with LLVM, it’s just a technique, read about it, or not.

quotemstr · 2025-01-01T01:59:32 1735696772

> Zig is expected to replace Rust one day

That's a bold prediction considering that Zig is unsafe and Rust is safe.

If Zig's idioms make it safe, then modern C++ is safe too.

ryao · 2025-01-01T02:10:16 1735697416

Zig replacing Rust is not my prediction. It is what I am hearing from the Zig enthusiast I know. I have also seen others express the sentiment that the next fashionable language after Rust will be Zig. Regardless of whether it is Zig, there will be another. Rust is just one part of a never ending cycle of people trying to reinvent programming.

kllrnohj · 2025-01-01T15:00:30 1735743630

There's approximately a 0% chance of Zig seeing non-trivial adoption anywhere other than maybe the embedded world.

It otherwise solves basically nobodies problems in the real world and is just a love letter to C. It's cute, kinda, but that's about it.

quotemstr · 2025-01-01T17:08:33 1735751313

Zig is like the Vulcan Centaur next to Starship.

nicoburns · 2025-01-01T03:04:15 1735700655

> Rust is just one part of a never ending cycle of people trying to reinvent programming.

This is true of every single programming language.

ryao · 2025-01-01T03:28:30 1735702110

That is my complaint against the next fashionable language. Once you have some level of proficiency in around 10 languages, adding more to the list is a chore. It lacks the enjoyment that came from the first few. At least, that is my experience. That is why I am resistant to learning new languages unless software I want to modify is already written in it. Then, I am forced to learn and time spent on it is time that actually needed to be spent, unlike my brief time spent teaching myself FORTRAN in college, which was largely a waste of time.

Sharlin · 2025-01-01T09:22:12 1735723332

> Zig enthusiasts

Yeah, don’t you think they might be perhaps just slightly biased? I have some news for you about your average technology enthusiast…

Anyone who claims Zig is going to replace Rust knows nothing about either Zig, Rust, or both. Zig is designed to solve none of the problems that Rust solves. Trillion-dollar corporations are rewriting things in Rust because it solves problems that really matter to them.

The US government strongly recommends writing all new mission-critical code in memory-safe languages. Guess which one of Rust and Zig is memory-safe?

I have nothing against Zig, it looks like a very nice better C, and the comptime stuff is both cool and useful. But its priorities are not Rust’s priorities.

nottorp · 2025-01-01T14:56:13 1735743373

Of course, Rust evangelists are also more than slightly biased.

Sharlin · 2025-01-02T08:49:30 1735807770

Of course.

rurban · 2025-01-01T08:03:08 1735718588

Both are unsafe, just zig is usually safer than rust.

And also so more readable, ie. maintainable

Sharlin · 2025-01-01T09:18:45 1735723125

That’s one crazy claim to make.

quotemstr · 2025-01-01T01:46:48 1735696008

> Regarding Python, how could such optimizations be implemented in Python? Generating LLVM bytecodes directly aside.

You don't. You instead use your favorite Python extension system (regular Python modules, Cython, Numba, whatever) to implement this tree algorithm and expose it to Python already shaped into a container.

wolfgangK · 2025-01-01T01:15:57 1735694157

I don't think that Python would be the right language for such low-level performance maxxing endeavor. I would have picked C++ but t was eye opening for me to see how rust enabled such low level optimization, so I'm grateful for the choice.

ryao · 2025-01-01T01:25:53 1735694753

C would be useful to the broadest audience. C++ programmers can read C while C programmers cannot always read C++, especially when the newer language constructs are used. I mentioned Python because of its popularity.

Interesting, the Algorithmica article the author cited is in C:

https://en.algorithmica.org/hpc/data-structures/s-tree/

npalli · 2025-01-01T01:30:35 1735695035

C++ not C. The author explains his choices elsewhere in the site.

ryao · 2025-01-01T01:38:04 1735695484

The code snippets should be accepted by a C compiler and therefore are valid C code.

npalli · 2025-01-01T01:43:34 1735695814

If you want to be pedantic, there is a constexpr in the snippets, wont be accepted by the C compiler. More broadly though it is a C++ tutorial. I insisted on it because the author of algorithmica actually explains he chose C++ and perhaps sometime later C/Rust or something might be better. Should respect his POV.

ryao · 2025-01-01T01:58:24 1735696704

I had not seen the use of constexpr. You are correct that those are in C++.

lubutu · 2025-01-01T07:40:28 1735717228

C23 has constexpr, albeit only for objects, not functions. The code also uses namespaces, as in `std::aligned_alloc(P, T)`.

ryao · 2025-01-01T10:35:40 1735727740

You have to be 1/3 down the page to see any of this in the code examples. By that point, you would already have seen a number of examples that are valid C code and it is reasonable to expect the rest to be. At least, that was my experience.

curiouscoding · 2025-01-01T01:23:26 1735694606

Yeah, I don't think python is the right tool here. C++ definitely would be an option though.

Anyway very happy that this is also showing off what rust can do

ryao · 2025-01-01T03:17:31 1735701451

As far as I can tell, the community is largely divided into three major groups. Those that can read C, those that can read Python and those that can read both. Using either of them would have dodged criticism that your code examples are not accessible to much of the community.

That said, you are right that Python is not the best language to use when you need to use intrinsics, as you would be writing it in another language and using the Python FFI to access it.

oguz-ismail · 2025-01-01T03:40:42 1735702842

> this is also showing off what rust can do

To people who already know Rust, yes. To others, not so much

FridgeSeal · 2025-01-01T01:38:56 1735695536

I have a far easier time reading Rust than C. Probably because it’s been years since I’ve had to do C, and I find some of the syntax and patterns quite C-specific and confusing. Python also does not express some of the important implementation details well enough in its standard syntax IMO: there’s no obvious delineation between references and pass by value in Python, borrowing, etc.

So realistically, this is just the other half of the coin for all the articles where the code examples were written in C and everyone who didn’t really read C just had to put up with it.

ryao · 2025-01-01T01:40:35 1735695635

The TIOBE index suggests that you are in a much smaller group than those who have difficulty reading Rust:

https://www.tiobe.com/tiobe-index/

coliveira · 2025-01-01T04:25:14 1735705514

Yes, he is in a bubble.

quotemstr · 2025-01-01T01:25:45 1735694745

Rust is now common enough in the industry that everyone should be expected to at least read it, just like every programmer should be able to fizzbuzz in Python and JavaScript just as a matter of general technical literacy. There's no advanced Rust here and no concepts specific to the language.

It's a good article.

If Heinlein were in our industry, he might write this:

A programmer should be able to write a shell, parse binary data, design a protocol (and reverse engineer one too), authenticate a password, not fumble multi-code-point graphemes, wrangle semaphores, plug memory leaks, compress data, migrate databases, automate builds, git rebase a tree, profile tail latency, write a simple design doc for a complex system, draw a triangle, multiply tensors, and do fizzbuzz in every popular language on GitHub. Specialization is for insects.

ryao · 2025-01-01T01:34:52 1735695292

Here is a list of languages people can be generally expected to be able to read:

  * C
  * C++
  * Java
  * JavaScript
  * Python

It is the top 6 of the TIOBE index, minus C#:

https://www.tiobe.com/tiobe-index/

Rust is not very high on the TIOBE index. It has rank 14 with a 1.29% rating. That is not much higher than COBOL.

bschwindHN · 2025-01-01T01:50:49 1735696249

You forgot the part where no one cares about TIOBE though.

ryao · 2025-01-01T02:52:06 1735699926

As the TIOBE index says “The ratings are based on the number of skilled engineers world-wide, courses and third party vendors”. Very many people do care about that. Is there any reason I should think your remark is from some legitimate issue rather than you not liking the index results?

bschwindHN · 2025-01-01T03:33:33 1735702413

I would argue most people working software jobs either haven't heard of TIOBE or don't care about it. I only occasionally hear about it from strange internet comments that are far too focused on language particulars.

ryao · 2025-01-01T04:27:02 1735705622

That is surprising. I heard about the TIOBE index often about 15 years ago, but now that I think about it, I have not heard about it lately. I wonder if the rise of Python and ML has caused people to stop asking “what language should I learn”. That is the question the TIOBE index often was cited to answer.

quotemstr · 2025-01-01T01:42:19 1735695739

Rust is where the action is though. It's not just about cumulative SLOC weight, but about mindshare among the "movers and shakers" of the industry. Think of it as a prestige language, an acrolect, you should at least vaguely know so that you can participate in the most challenging conversations in the field.

ryao · 2025-01-01T02:24:17 1735698257

That is an interesting perspective. Here is my perspective. There are no popular projects that use Rust as a primary language. The current fashionable area of computers is machine learning. As far as high level languages go, you will see a mix of Python, C++ and perhaps a tiny bit of C there. Rust is nowhere to be seen.

Your description would be better applied to C, C++ and Python. That is what the majority of popular software uses and people having challenging conversations are often using one of those languages.

tyre · 2025-01-01T03:03:43 1735700623

> There are no popular projects that use Rust as a primary language.

Servo is pretty notable and visible. ripgrep, influxdb, wasmer, deno. uv and ruff in the python ecosystem are written in rust.

AWS, Cloudflare, Discord (iirc), and Microsoft have all adopted Rust extensively.

As of late 2023 there are committed drivers in the Linux kernel written in rust. (Linux is pretty popular.) prior to those, only C and assembly were allowed.

ryao · 2025-01-01T04:18:39 1735705119

None of this contradicts what I said.

quotemstr · 2025-01-01T06:43:38 1735713818

Don't forget Pydantic

dwattttt · 2025-01-01T05:10:12 1735708212

> There are no popular projects that use Rust as a primary language

You may be surprised to learn, Python's `cryptography` project is written (EDIT: maybe not primarily, it's hard to compare given dependencies) in Rust, and saw around 8.1 million downloads from PyPI yesterday. That puts it close to the top 20 (the 20th project comes in at 8.5m).

nostradumbasp · 2025-01-01T19:41:52 1735760512

I would like to also point out that rust is boring as hell. Lots of otherwise ordinary people write super simple non-fancy programs using it too :)

forrestthewoods · 2025-01-01T06:52:59 1735714379

If you know C/C++ you could have learned to read Rust in the time you spent commenting in this thread. The number of new syntaxes you need to learn is quite minimal.

zahlman · 2025-01-01T03:12:49 1735701169

As an aside: why is Java winning out over C# these days? When I first experienced C# I didn't want to go back.

devvvvvvv · 2025-01-02T01:31:05 1735781465

>Rust is now common enough in the industry

Source?

kubb · 2025-01-01T01:12:52 1735693972

Maybe, but should the author really cater to people who can’t overcome a language barrier? They probably won’t understand the concept anyway.

ryao · 2025-01-01T02:02:16 1735696936

The ACM for a time only accepted code examples in ALGOL because it was felt that there should be a standard language that everyone understood. ALGOL eventually gave way to C and others, but there is definite value in ensuring things are accessible to the largest possible audience.

kubb · 2025-01-01T02:25:14 1735698314

You're free to use ALGOL in your posts if you want. It reads like pseudocode used in algorithmic books, and should be familiar to anyone who used the Pascal family.

ALGOL gave way to C, and C is now giving way to Rust, Zig and others. You're playing the role of the people who wanted to keep the example snippets in ALGOL.

ryao · 2025-01-01T02:33:09 1735698789

The point was that examples should be in widely understood languages. The niche languages you list do not qualify as widely understood.

Estne melius scribere exempla latine?

mplanchard · 2025-01-01T14:43:28 1735742608

This is someone’s blog, not an ACM publication. Why hold it to the same standard? They can write in whatever they like. I have a hard time reading Haskell, but I don’t sit and kvetch about it anytime someone wants to use it in a blog post.

dxbydt · 2025-01-01T02:10:15 1735697415

usaco explicitly nudges you towards C++ on the first page itself. problems are graded on running time, even with moderately large datasets, Python simply doesn’t make the cut. So most high schoolers end up using C++ for usaco, and Java for AP Comp Science.

mangamadaiyan · 2025-01-01T02:05:22 1735697122

Erm, the concepts that you're referring to are independent of the language being used. The author is free to use whatever language they want. The GP (I think) was merely trying to make a point about making the article accessible to a wider audience that is capable of appreciating the concepts involved.

kubb · 2025-01-01T02:21:00 1735698060

If you can't understand the code snippets in this post, the concepts will most likely be lost on you, no matter what other language would be used for the code.

mangamadaiyan · 2025-01-01T08:14:08 1735719248

Isn't that precisely the point that the GP was trying to make, about the accessibility of the article?

Edit: Eytzinger layout, Cache lines, SIMD, et cetera are independent of the particular language used in the article. They're just as valid in C, for example. I don't understand your point about them being tied to Rust.

ryao · 2025-01-01T02:42:17 1735699337

You are correct.

Rendello · 2025-01-01T01:13:38 1735694018

I wonder what percentage of Rust users have significant experience with C. The difficulty curve of Rust was too high for me when I was coming from Python, but after doing HolyC and Zig, I find it fairly straightforward to conceptualize, though still difficult.

I've been working with real C today and have been having fun, but it's very different for me to step back into the C world again after working with more modern tools.

barbegal · 2025-01-01T01:15:10 1735694110

Neither of those languages gives you portable SIMD. Rust is rapidly becoming the language of choice for high performance code.

npalli · 2025-01-01T01:28:50 1735694930

google has a mature C++ library for portable SIMD. The original article seems to be a translation of the excellent algorithmica site which had it in C++.

https://github.com/google/highway

ryao · 2025-01-01T02:13:25 1735697605

You can do portable SIMD in GNU C/C++ using the vector extension:

https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html

swiftcoder · 2025-01-01T09:45:20 1735724720

I'm not sure that "portable" and "only works on one specific compiler vendor" are very compatible concepts

bigpingo · 2025-01-01T11:46:27 1735731987

gcc is available on and targets more platforms than rust. So if GNU C is not considered portable you can forget about 'portable rust'.

swiftcoder · 2025-01-01T15:22:17 1735744937

Portability in the C and C++ worlds generally means across standard-conforming compilers, not just across platforms

tmtvl · 2025-01-01T09:53:05 1735725185

Oh no, it would've been better if it was written using the most readable language: Scheme.

bawolff · 2025-01-01T03:49:35 1735703375

I've never programmed in rust and i can understand find.

At the end of the day rust is just another imperative programming language. You shouldn't need to know the language at all to understand the very simple examples written in rust.

peutetre · 2025-01-01T04:11:59 1735704719

Well, you can do the C implementation then.

Don't see it as a problem, see it as an opportunity.

plasticeagle · 2025-01-01T01:22:35 1735694555

As a computer science article, it should have mostly been pseudocode. Concrete implementations can be given in an appendix, where they belong.

curiouscoding · 2025-01-01T01:26:37 1735694797

The problem with pseudocode is that it's completely underspecified. And how would I ever write intrinsics in pseudocode? Much easier to do a proper language directly so people can actually Google things and read their official documentation.

sigbottle · 2025-01-01T02:48:55 1735699735

My former roommate works in a similar domain; https://dl.acm.org/doi/pdf/10.1145/3448016.3452841 is an example of a paper implementing intrinsics in pseudocode. They "unroll" the intrinsics with a comment saying that this is implemented with an intrinsic. Of course though, your blog isn't a paper, don't know why people are getting up in arms about it.

Also the other comment saying that "psuedocode is not concerned with intrinsics" is false. You can get "great" theoretical speedups (the shaveoffs are tiny but hey, they're better) with "simple" intrinsic operations - that's my roommate's entire research lol. The external memory model, for example, formalizes caching and allows all these low level optimizations to flourish. I'm not sure how intrinsics tied into it, but he's published so I'm not gonna question it :)

---

Speaking of which, I noticed that you did competitive programming. How does CP compare to research? I loved data structure manipulation problems in CP when they were clever - often because they involved noticing that you can take a "common" model, but then optimize it significantly because you only needed to support a subset of the operations through a clever mathematical proof based on the structure of the problem - but as I got to the higher levels it felt more and more that a lot of them became really obscure "support 413 operations on a tree, and yes, you really need to support all 413 operations on a tree" and that's kind of my opinion of data structure research unfortunately as well :( I guess because solving broad general instances is more important. I'd love to hear your perspective though.