Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Crate vers_vecs

Source
Expand description

This crate provides a collection of data structures supported by fast implementations of rank and select queries. The data structures are static, meaning that they cannot be modified after they have been created.

§Data structures

  • Bit-Vector with no overhead. The only data structure that can be modified after creation.
  • Succinct Bit-Vector supporting fast rank and select queries.
  • Elias-Fano encoding of monotone sequences supporting constant-time predecessor queries.
  • Two Range Minimum Query structures for constant-time range minimum queries.
  • Wavelet Matrix encoding k-bit symbols, supporting rank, select, statistical, and predecessor/successor queries in O(k).
  • Succinct Tree supporting tree navigation in O(log n) time, as well as subtree size, level-order, and ancestor queries, and fast depth-first iteration.

§Performance

Performance was benchmarked against publicly available implementations of the same (or similar) data structures on crates.io. Vers is among the fastest for all benchmarked operations. The benchmark results can be found in the Benchmark repository. Some tradeoffs between average time, worst-case time, and available API features should be taken into consideration when selecting among the fastest libraries (see the GitHub repository for a discussion).

§Intrinsics

This crate uses compiler intrinsics for bit-manipulation. The intrinsics are supported by all modern x86_64 CPUs, but not by other architectures. The crate will compile on other architectures using fallback implementations, but the performance will be significantly worse. It is strongly recommended to enable the BMI2 and popcnt target features when using this crate.

The intrinsics in question are popcnt (supported since SSE4.2 resp. SSE4a on AMD, 2007-2008), pdep (supported with BMI2 since Intel Haswell resp. AMD Excavator, in hardware since AMD Zen 3, 2011-2013), and tzcnt (supported with BMI1 since Intel Haswell resp. AMD Jaguar, ca. 2013).

§Safety

When the simd crate feature is not enabled (default), this crate uses no unsafe code, with the only exception being compiler intrinsics for bit-manipulation, if available. The intrinsics do not operate on addresses, so even if they were to be implemented incorrectly, no memory safety issues would arise.

§Crate Features

  • simd (disabled by default): Enables the use of SIMD instructions in the RsVec implementation, and an additional iterator for the RsVec data structure.
  • serde (disabled by default): Enables serialization and deserialization support for all data structures in this crate using the serde crate.
  • bp_u16_lookup (disabled by default): Uses a 16-bit lookup table for the balanced parenthesis tree data structure. This is faster, but requires 128 KiB instead of 4 KiB.

Re-exports§

pub use bit_vec::fast_rs_vec::RsVec;
pub use bit_vec::sparse::SparseRSVec;
pub use bit_vec::BitVec;
pub use elias_fano::EliasFanoVec;
pub use rmq::binary_rmq::BinaryRmq;
pub use rmq::fast_rmq::FastRmq;
pub use trees::bp::BpBuilder;
pub use trees::bp::BpTree;
pub use trees::IsAncestor;
pub use trees::LevelTree;
pub use trees::SubtreeSize;
pub use trees::Tree;
pub use trees::TreeBuilder;
pub use wavelet::WaveletMatrix;

Modules§

bit_vec
This module contains a simple bit vector implementation with no overhead and a fast succinct bit vector implementation with rank and select queries.
elias_fano
Elias-Fano encoding for sorted vectors of u64 values. It reduces the space required to represent all numbers (compression ratio dependent on data) and allows for constant-time predecessor queries.
rmq
Range minimum query data structures. These data structures allow for the calculation of the index of the minimum element in a range of a static array in constant-time. The implementations are located in the binary_rmq and fast_rmq modules.
trees
Tree data structures. Currently only the BP tree is exposed. The trees are succinct, approaching the information-theoretic lower bound for the space complexity: They need O(n) bits to store a tree with n nodes, and theoretically o(n) extra bits to support queries. However, this is relaxed to O(n) with a factor smaller than 1 in practice.
wavelet
This module contains the implementation of a wavelet matrix. The wavelet matrix is a data structure that encodes a sequence of n k-bit words into a matrix of bit vectors, allowing for fast rank and select queries on the encoded sequence.