merge column: small refactors #2579

PSeitz · 2025-02-18T14:23:12Z

Some refactors & tests when investigating a crash in merge_dict_column

Feb 18 09:57:32 quickwit-indexer-13 quickwit-indexer thread 'merge_thread_0' panicked at /usr/local/cargo/git/checkouts/tantivy-f70b7ea03dadae9a/71cf198/columnar/src/columnar/merge/merge_dict_column.rs:70:42:
Feb 18 09:57:32 quickwit-indexer-13 quickwit-indexer index out of bounds: the len is 1034 but the index is 1876
Feb 18 09:57:32 quickwit-indexer-13 quickwit-indexer note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Feb 18 09:57:32 quickwit-indexer-13 quickwit-indexer Rayon: detected unexpected panic; aborting

fulmicoton · 2025-02-19T07:29:54Z

columnar/src/column_index/optional_index/mod.rs

    block_data: OwnedBytes,
    block_metas: Arc<[BlockMeta]>,
 }

 impl Iterable<u32> for &OptionalIndex {
    fn boxed_iter(&self) -> Box<dyn Iterator<Item = u32> + '_> {
-        Box::new(self.iter_rows())
+        Box::new(self.iter_docs())


@PSeitz The reason why columnar was originally not using "docs" but "rows" is because this is a columnar format. The concept of docs is specific to search engine or document DBs.

I am not sure this makes the code any better, but ok.

It's been some time, but we decided to use docs everywhere. (I can make a full PR later)
The naming is confusing to me, as the meaning of row is different to the meaning in the DB I worked before. The row docs naming also caused a bug before (or 2? not sure).

fulmicoton · 2025-02-19T07:30:38Z

columnar/src/columnar/merge/merge_dict_column.rs

        } else {
            term_ord_mapping.add_segment(0);
-            field_term_streams.push(Streamer::empty());
+            field_term_streams.push(TermsWithSegmentOrd {


This is indeed more explicit and easier to proofread.

fulmicoton · 2025-02-19T07:31:47Z

columnar/src/columnar/merge/mod.rs

@@ -391,7 +394,6 @@ fn is_empty_after_merge(
 fn group_columns_for_merge<'a>(
    columnar_readers: &'a [&'a ColumnarReader],
    required_columns: &'a [(String, ColumnType)],
-    _merge_row_order: &'a MergeRowOrder,


merge order is not relevant to group columns by (string, type category)

merge column: small refactors

d3dcc37

PSeitz requested a review from fulmicoton February 18, 2025 16:37

PSeitz added 2 commits February 18, 2025 19:20

make ord dependency more explicit

907d3a2

add columnar merge crashtest proptest

d0e6c96

fulmicoton reviewed Feb 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge column: small refactors #2579

merge column: small refactors #2579

PSeitz commented Feb 18, 2025 •

edited

Loading

fulmicoton Feb 19, 2025

PSeitz Feb 19, 2025

fulmicoton Feb 19, 2025

fulmicoton Feb 19, 2025

PSeitz Feb 19, 2025

merge column: small refactors #2579

Are you sure you want to change the base?

merge column: small refactors #2579

Conversation

PSeitz commented Feb 18, 2025 • edited Loading

fulmicoton Feb 19, 2025

Choose a reason for hiding this comment

PSeitz Feb 19, 2025

Choose a reason for hiding this comment

fulmicoton Feb 19, 2025

Choose a reason for hiding this comment

fulmicoton Feb 19, 2025

Choose a reason for hiding this comment

PSeitz Feb 19, 2025

Choose a reason for hiding this comment

PSeitz commented Feb 18, 2025 •

edited

Loading