Foss Sqlite Internals
Foss Sqlite Internals
build: 0.12.0
Foreword
Introduction
Contributors
The Story Behind
Technical Context
Overview
File & Record Format
Rollback & WAL mode
Bytecodes
Interesting Features
Knowing Internals
How SQLite Is modified
The Future
References
1. Foreword on SQLite
Internals
To all SQLite lovers. This book discusses SQLite
internals in depth.
You can [ view compileralchemy.com ] or [ read
online ] or [ contribute to the book ] or
[ download
the book ] or
[ support by buying on leanpub ]
It is
OpenSource! Feel free to contribute a section,
propose rewrites, fix typos etc. If you have
comments, mail them to arj.python at gmail dot
com .
Main content:
Abdur-Rahmaan Janhangeer,
https://github.com/Abdur-RahmaanJ
Thanks
General improvements
Jaime Terreu,
https://github.com/Confidenceman02
Steps explanation
👉 Tokeniser - Parser: The parser is a push-down
automaton parser.
It is reentrant and thread-
safe. It is generated by lemon. Relevent files
include parse.y , tool/lemon.c .
Outputs
AST ( sqliteInt.h ).
Important concepts
Those are some concepts which occur frequently
and it pays to know about them in advance.
Bytes
B-tree
A B-tree is a data structure providing logarithmmic
operation time.
SQLite keeps the depth as low as
possible.
It plays on the breadth of the 2nd and 3rd
layers.
It provides storage in this usecase for
key/data storage with unique and ordered keys.
TODO
Types of pages
In this section we pass over the different types of
pages used by SQLite.
Any page will be one of these
types:
👉 The lock-byte page
👉 A freelist page
👉 A b-tree page
👉 A payload overflow page
👉 A pointer map page
The Lock-Byte Page
This page is retained only to preserve backward
compatibility.
It was conceived for Microsoft 95.
When it is present, it occurs at bytes offset
1073741824 and 1073742335.
If the file doesn’t
have that many bytes, the page does not exist.
If it
does have the necessary bytes, there is only one
such page.
It’s dealt with by the VFS layer rather
than SQLite core.
Freelist pages
B-Tree pages
About overflowing
If the data section in a leaf page becomes bigger
than the space available in a page, it is linked to
another page. If it’s size exceeds the other page, it is
added to yet other another page.
Large keys on index b-trees are split up into
overflow pages so that no single key uses more than
one fourth of the available storage space on the
page and hence every internal page is able to store
at least 4 keys
Record format
The data part of a leaf page is stored in binary
format and consists of 3 parts:
👉 The header
👉 The type part
👉 The data part
price 3
name shoe
Would be encoded as
[ 04 | 01 | 01 | 21 ] [ 00 | 03 | shoe ]
(N-13)/2 == 4
N == 4 * 2 + 13
N == 21
[ 00 ] value of id field
TOADD: Freeblock
8. Rollback & WAL
mode
In case of power cuts, SQLite ensures that data is
not lost.
The pager layer responsible for executing
these two modes.
The Write Ahead Log (WAL) mode
is better than the Rollback for for two reasons:
👉 It is faster
👉 It allows reads and writes at the same time
The Rollback mode is the default primarily due to
these reasons
👉 Some computers are still around which have weird memory
mappings
👉 Several computers accessing the file might cause issues
👉 Backward compatibility is not guaranteed. Waiting until WAL
is even more stable.
👉 Hash lookup for page in WAL mode is in shared memory
The Rollback mode
When reading occurs, the process acquires a shared
lock.
#define OP_Savepoint 0
#define OP_AutoCommit 1
#define OP_Transaction 2
#define OP_Checkpoint 3
#define OP_JournalMode 4
#define OP_Vacuum 5
...
opcodes.c
/* Automatically generated. Do not edit */
#if !defined(SQLITE_OMIT_EXPLAIN) \
|| defined(VDBE_PROFILE) \
|| defined(SQLITE_DEBUG)
#if defined(SQLITE_ENABLE_EXPLAIN_COMMENTS)
|| defined(SQLITE_DEBUG)
#else
# define OpHelp(X)
#endif
/* 0 */ "Savepoint" OpHelp(""),
/* 1 */ "AutoCommit" OpHelp(""),
/* 2 */ "Transaction" OpHelp(""),
/* 3 */ "Checkpoint" OpHelp(""),
/* 4 */ "JournalMode" OpHelp(""),
/* 5 */ "Vacuum" OpHelp(""),
/* 6 */ "VFilter"
OpHelp("iplan=r[P3] zplan='P4'"),
...
Save points
Partial indexes
Developed for Expensify.
Misc
CREATE INDEX Idx1 ON fruitsforsale(fruit);
LumoSQL
LumoSQL is a clone that is 100% on time. It does not
rely on merging the master.
It has swappable db
engine and btree.
It has an edge on cryptography.
Martina Palmucci’s Master Thesis [11]: Martina
wrote a thesis entitled “Securing databases using
Attribute Based
Encryption and Shamir’s Secret
Sharing (SSS)” on the LumoSQL project. The thesis
combines SSS and access based on user attributes
like SELECT etc.
It is abbreviated as ABE-SSS
Attribute-based Encryption Shamir’s Secret Sharing.
There is an increased need to saveguard data
privacy. File-based encryption means that the data is
in the clear once the file is decrypted.
Another layer
of encryption at the data-level, particularly the field
level protects against internal attacks.
SSS operates by having shares: secrets that, when
combined together produce a key.
Elliptic curves
reveal interesting properties for cryptographic uses.
A standard protocol used is the Elliptic-curve Diffie–
Hellman (ECDH).
It allows two parties to create a
shared secret across an unsecured channel.
But,
many protocols based on ECDH often require a
prime-order group.
Elliptic curve groups are often
compound (group that is not made up of prime
numbers). Using the Decaf technique, it is possible
to obtain a prime-order group from an elliptic curve
group. Applying the Decaf technique to Curve25519
yields Ristretto255.
Elliptic-curve Integrated Encryption Scheme (ECIES)
is a hybrid encryption scheme.
The term “hybrid”
refers to the use of both symmetrical and
asymmetrical cryptosystems inside it.
Distributed clones
TOADD
Bloomberg
Bloomberg uses the SQLite code generator and
storage engine.
The replaced the layers after by
their own implementation of a scaled, massively
concurrent, multi-data center storage engine. [10]
13. The Future
LibSQL and LumoSQL are great OpenSource
projects.
14. Ending Quotes
On not listening to institutionilized experts