Differential Directory, indexing method

Question

xrjf 230 Posting Whiz

1 Month Ago

Features:

Retrieves the differentiating bit of a key with respect to the previous one and stores only this position in the index.
With at most one single disk read, assuming the index is in memory, it determines whether the key exists or not.
The index is always sorted and therefore requires no reorganization.
Performance is proportional to the length of the keys, not to the size of the index.
Insertions require updating at most two or three nodes.

The Differential Directory (DiDi) stores only the difference between each key and the
previous one (already sorted), rather than storing full keys in each node.
Much like when writing a list by hand and using quotation marks (“) to indicate repetition
from the line above —for example:
John Miles
“ Smith
DiDi applies a similar concept, but instead of letters, it identifies the first differing bit
between keys. It stores this position in a node, which also contains references to the left and
right nodes (if they exist), and to the disk record where the full key and its data are stored.
For instance, if the first key inserted is "John Smith", the index is initially empty. The first
differing character (compared to “nothing”) is J at position 1, so the index becomes 1J →
record 1.
If we then insert "John Miles", we read 1J, confirm it matches at position 1, and read record

The index is updated to:
1J → record 1, right → record 2, and then 6S → record 2.
Now, if we insert "John Serrote", we find that position 6 in record 1 is not S, so we follow
the pointer to record 2. If it matches, we insert a new node 6e between records 1 and 2,
updating the structure to:
1J → record 1, right → record 3,
6S → record 3, right → record 2,
and finally
7m → record 2 (with no children).
With this structure —which is a bit more elaborate— the index remains sorted, and only a
maximum of 3 nodes and a single disk read are required per insertion.

vb.net

Edited 1 Month Ago by Reverend Jim because: Removed promotional link

2 Contributors
7 Replies
137 Views
1 Week Discussion Span
Latest Post 1 Month Ago Latest Post by xrjf

rproffitt 2,701 https://5calls.org

1 Month Ago

DiDi appears to be some China based UBER service but then again I can't find a question or much else to discuss here.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

xrjf 230 Posting Whiz · Answer 1 · 2025-04-16T07:13:13+00:00

About DiDi

DiDi (Differential Directory) was originally developed as part of my thesis project in the early 1990s. The name bears no relation to the more recent Chinese ride-sharing company. At the time, DiDi was a novel approach within its academic context, but a change in legislation unfortunately led to the closure of the school and the discontinuation of its degrees.

Despite that setback, I have always believed in the value of the project. Now, decades later, I feel it is the right time to share it publicly. This release is both a technical revival and a tribute to its original vision.

xrjf 230 Posting Whiz · Answer 2 · 2025-04-16T07:30:54+00:00

For example, as Donald Knuth points out in The Art of Computer Programming, the theoretical lower bound for comparison-based sorting algorithms is K × log₂(N). I developed a very simple method that matches this performance. However, DiDi goes far beyond: its performance is proportional to K × (maximum key length), regardless of the number of elements.

xrjf 230 Posting Whiz · Answer 3 · 2025-04-16T08:00:37+00:00

As an illustration, consider the theoretical lower bound for comparison-based sorting, as stated by Donald Knuth in The Art of Computer Programming: K × log₂(N). I developed a simple method that matches this limit. For example, to sort the list {2, 5, 7, 1, 4, 3, 8, 6}:

Sort pairs:
(2, 5) → [1]
(1, 7) → [2]
(3, 4) → [3]
(6, 8) → [4]

Merge into groups of four:
(2, 5) + (1, 7) → max 3 comparisons
(3, 4) + (6, 8) → max 3 comparisons
→ Total so far: 10 comparisons

Merge into a single group of eight:
(1, 2, 5, 7) + (3, 4, 6, 8) → max 7 comparisons
→ Total: 17 comparisons

Extending this to 16 elements would require at most: 2 × 17 + 15 = 49 comparisons

However, DiDi is not just about matching the theoretical lower bound. It takes a fundamentally different approach: its performance is proportional to K × (maximum key length), independent of the number of elements. This opens new perspectives for data indexing and retrieval, beyond traditional comparison-based sorting.

xrjf 230 Posting Whiz · Answer 4 · 2025-04-16T08:14:21+00:00

"Just to clarify a previous mistake: the efficiency should be K × N × log₂(N), not K × log₂(N) as I initially wrote."

xrjf 230 Posting Whiz · Answer 5 · 2025-04-19T13:49:08+00:00

I've just made an update because some records weren't being added properly. The issue was that the form didn't take into account that the register field (in the call to DiDi) is passed by reference.

xrjf 230 Posting Whiz · Answer 6 · 2025-04-26T14:01:24+00:00

Just a quick update for anyone interested:

I have revisited and significantly improved the code, aiming for a more professional structure and better performance.
The updated version avoids freezing the UI during long operations and follows more robust programming practices.

If you’d like to see DiDi in action, there is also a video demonstration available — you can find it by searching for "xrjunque downloads" online.