Features:

  • Retrieves the differentiating bit of a key with respect to the previous one and stores only this position in the index.
  • With at most one single disk read, assuming the index is in memory, it determines whether the key exists or not.
  • The index is always sorted and therefore requires no reorganization.
  • Performance is proportional to the length of the keys, not to the size of the index.
  • Insertions require updating at most two or three nodes.

The Differential Directory (DiDi) stores only the difference between each key and the
previous one (already sorted), rather than storing full keys in each node.
Much like when writing a list by hand and using quotation marks (“) to indicate repetition
from the line above —for example:
John Miles
“ Smith
DiDi applies a similar concept, but instead of letters, it identifies the first differing bit
between keys. It stores this position in a node, which also contains references to the left and
right nodes (if they exist), and to the disk record where the full key and its data are stored.
For instance, if the first key inserted is "John Smith", the index is initially empty. The first
differing character (compared to “nothing”) is J at position 1, so the index becomes 1J →
record 1.
If we then insert "John Miles", we read 1J, confirm it matches at position 1, and read record

  1. The index is updated to:
    1J → record 1, right → record 2, and then 6S → record 2.
    Now, if we insert "John Serrote", we find that position 6 in record 1 is not S, so we follow
    the pointer to record 2. If it matches, we insert a new node 6e between records 1 and 2,
    updating the structure to:
    1J → record 1, right → record 3,
    6S → record 3, right → record 2,
    and finally
    7m → record 2 (with no children).
    With this structure —which is a bit more elaborate— the index remains sorted, and only a
    maximum of 3 nodes and a single disk read are required per insertion.

DiDi appears to be some China based UBER service but then again I can't find a question or much else to discuss here.

About DiDi

DiDi (Differential Directory) was originally developed as part of my thesis project in the early 1990s. The name bears no relation to the more recent Chinese ride-sharing company. At the time, DiDi was a novel approach within its academic context, but a change in legislation unfortunately led to the closure of the school and the discontinuation of its degrees.

Despite that setback, I have always believed in the value of the project. Now, decades later, I feel it is the right time to share it publicly. This release is both a technical revival and a tribute to its original vision.

commented: Was it Tr-mp University? +17

For example, as Donald Knuth points out in The Art of Computer Programming, the theoretical lower bound for comparison-based sorting algorithms is K × log₂(N). I developed a very simple method that matches this performance. However, DiDi goes far beyond: its performance is proportional to K × (maximum key length), regardless of the number of elements.

As an illustration, consider the theoretical lower bound for comparison-based sorting, as stated by Donald Knuth in The Art of Computer Programming: K × log₂(N). I developed a simple method that matches this limit. For example, to sort the list {2, 5, 7, 1, 4, 3, 8, 6}:

Sort pairs:
(2, 5) → [1]
(1, 7) → [2]
(3, 4) → [3]
(6, 8) → [4]

Merge into groups of four:
(2, 5) + (1, 7) → max 3 comparisons
(3, 4) + (6, 8) → max 3 comparisons
→ Total so far: 10 comparisons

Merge into a single group of eight:
(1, 2, 5, 7) + (3, 4, 6, 8) → max 7 comparisons
→ Total: 17 comparisons

Extending this to 16 elements would require at most: 2 × 17 + 15 = 49 comparisons

However, DiDi is not just about matching the theoretical lower bound. It takes a fundamentally different approach: its performance is proportional to K × (maximum key length), independent of the number of elements. This opens new perspectives for data indexing and retrieval, beyond traditional comparison-based sorting.

"Just to clarify a previous mistake: the efficiency should be K × N × log₂(N), not K × log₂(N) as I initially wrote."

I've just made an update because some records weren't being added properly. The issue was that the form didn't take into account that the register field (in the call to DiDi) is passed by reference.

This is a brilliant explanation of how Differential Directory (DiDi) indexing works — very efficient and elegant. The idea of storing only the first differing bit between keys reminds me of how we optimize systems for speed and memory, much like how some tech repair services streamline diagnostics.

At FixnVibe, our approach to mobile phone repair in Stirling is similarly smart: we focus on identifying the exact fault quickly, reducing turnaround times, and minimizing disruption to our customers. Whether it’s a software glitch or a deep hardware issue, we use efficient diagnostics and keep your data structure — and your device — running smoothly.

Always impressed to see concepts like DiDi that value performance and simplicity. Great read!

commented: spam -4

Just a quick update for anyone interested:

I have revisited and significantly improved the code, aiming for a more professional structure and better performance.
The updated version avoids freezing the UI during long operations and follows more robust programming practices.

If you’d like to see DiDi in action, there is also a video demonstration available — you can find it by searching for "xrjunque downloads" online.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.