Increase OpenSearch mapping limit dynamically during indexing of csv/jsonl data #3257
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change dynamically increases the OpenSearch mapping limit during the indexing process to ensure successful data ingestion.
Problem:
Timesketch, when indexing timelines with a large number of unique fields, can encounter OpenSearch's default mapping limit (typically 1000 fields). This results in indexing failures and data loss.
Solution:
This PR introduces a mechanism to:
index.mapping.total_fields.limit
setting in OpenSearch to the newly calculated limit if it exceeds the current limit.Configuration:
Two new configuration options are added to
timesketch.conf
:OPENSEARCH_MAPPING_BUFFER
: A float representing the percentage buffer to add to the calculated mapping limit (default: 0.2 = 20%).OPENSEARCH_MAPPING_UPPER_LIMIT
: An integer representing the maximum allowed mapping limit (default: 2000).Benefits:
Note:
Increasing the mapping limit can impact OpenSearch cluster performance and storage requirements. Users should carefully consider the
OPENSEARCH_MAPPING_UPPER_LIMIT
setting and monitor their cluster's resource usage.Alternatives considered