Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge column: small refactors #2579

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

merge column: small refactors #2579

wants to merge 3 commits into from

Conversation

PSeitz
Copy link
Contributor

@PSeitz PSeitz commented Feb 18, 2025

Some refactors & tests when investigating a crash in merge_dict_column

Feb 18 09:57:32 quickwit-indexer-13 quickwit-indexer thread 'merge_thread_0' panicked at /usr/local/cargo/git/checkouts/tantivy-f70b7ea03dadae9a/71cf198/columnar/src/columnar/merge/merge_dict_column.rs:70:42:
Feb 18 09:57:32 quickwit-indexer-13 quickwit-indexer index out of bounds: the len is 1034 but the index is 1876
Feb 18 09:57:32 quickwit-indexer-13 quickwit-indexer note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Feb 18 09:57:32 quickwit-indexer-13 quickwit-indexer Rayon: detected unexpected panic; aborting

@PSeitz PSeitz requested a review from fulmicoton February 18, 2025 16:37
block_data: OwnedBytes,
block_metas: Arc<[BlockMeta]>,
}

impl Iterable<u32> for &OptionalIndex {
fn boxed_iter(&self) -> Box<dyn Iterator<Item = u32> + '_> {
Box::new(self.iter_rows())
Box::new(self.iter_docs())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PSeitz The reason why columnar was originally not using "docs" but "rows" is because this is a columnar format. The concept of docs is specific to search engine or document DBs.

I am not sure this makes the code any better, but ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been some time, but we decided to use docs everywhere. (I can make a full PR later)
The naming is confusing to me, as the meaning of row is different to the meaning in the DB I worked before. The row docs naming also caused a bug before (or 2? not sure).

} else {
term_ord_mapping.add_segment(0);
field_term_streams.push(Streamer::empty());
field_term_streams.push(TermsWithSegmentOrd {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed more explicit and easier to proofread.

@@ -391,7 +394,6 @@ fn is_empty_after_merge(
fn group_columns_for_merge<'a>(
columnar_readers: &'a [&'a ColumnarReader],
required_columns: &'a [(String, ColumnType)],
_merge_row_order: &'a MergeRowOrder,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wuss

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge order is not relevant to group columns by (string, type category)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants