Increase OpenSearch mapping limit dynamically during indexing of csv/jsonl data #3257

jkppr · 2025-01-08T13:27:05Z

This change dynamically increases the OpenSearch mapping limit during the indexing process to ensure successful data ingestion.

Problem:

Timesketch, when indexing timelines with a large number of unique fields, can encounter OpenSearch's default mapping limit (typically 1000 fields). This results in indexing failures and data loss.

Solution:

This PR introduces a mechanism to:

Track unique fields: Monitor the number of unique fields encountered during the indexing of each timeline.
Calculate new limit: Dynamically calculate a new mapping limit based on the number of unique fields, adding a configurable buffer percentage (default 20%) to account for future growth.
Update OpenSearch settings: Update the index.mapping.total_fields.limit setting in OpenSearch to the newly calculated limit if it exceeds the current limit.
Enforce upper limit: Implement an upper mapping limit (configurable, default 2000) to prevent uncontrolled growth and potential performance issues. If the calculated limit exceeds this upper limit, the indexing process will fail with a clear error message.
Improve error reporting: Enhance error messages to provide more specific information about mapping limit issues, including the current number of mapped fields, the calculated limit, and the upper limit.

Configuration:

Two new configuration options are added to timesketch.conf:

OPENSEARCH_MAPPING_BUFFER: A float representing the percentage buffer to add to the calculated mapping limit (default: 0.2 = 20%).
OPENSEARCH_MAPPING_UPPER_LIMIT: An integer representing the maximum allowed mapping limit (default: 2000).

Benefits:

Reduced indexing failures: Prevents indexing failures due to exceeding the default mapping limit.
Improved data ingestion: Allows Timesketch to handle timelines with a larger number of unique fields.
Controlled growth: Prevents uncontrolled mapping growth by enforcing an upper limit.
Better error handling: Provides more informative error messages to help users troubleshoot mapping-related issues.

Note:

Increasing the mapping limit can impact OpenSearch cluster performance and storage requirements. Users should carefully consider the OPENSEARCH_MAPPING_UPPER_LIMIT setting and monitor their cluster's resource usage.

Alternatives considered

Hard fail when the default limit of 100 field mappings is hit and require the user to reduce the data import.
- This would have the least resource impact on the OpenSearch cluster but the most impact on the Analyst and data pipeline.
Increase the mapping limit for the whole OpenSearch cluster as default for each new index.
- Testing has shown that this will also increase the resource consumption of the cluster even for indices that don't make use of the additional mapping limit.

…jsonl

jaegeral

Please add either / or:

unit tests
e2e tests

otherwise looks good imho, I have fixed a small typo already

berggren

LGTM - let's make sure we monitor this for unforeseen cluster consequences.

Dynamically increase OpenSearch mapping limit during indexing of csv/…

702255f

…jsonl

jkppr added the Data import All things that are with importing data label Jan 8, 2025

jkppr self-assigned this Jan 8, 2025

jkppr requested a review from jaegeral January 8, 2025 13:28

jkppr and others added 2 commits January 8, 2025 13:30

formatter

62b6059

Update tasks.py

0619493

jaegeral reviewed Jan 8, 2025

View reviewed changes

berggren self-requested a review January 8, 2025 15:47

berggren approved these changes Jan 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase OpenSearch mapping limit dynamically during indexing of csv/jsonl data #3257

Increase OpenSearch mapping limit dynamically during indexing of csv/jsonl data #3257

jkppr commented Jan 8, 2025

jaegeral left a comment

berggren left a comment

Increase OpenSearch mapping limit dynamically during indexing of csv/jsonl data #3257

Are you sure you want to change the base?

Increase OpenSearch mapping limit dynamically during indexing of csv/jsonl data #3257

Conversation

jkppr commented Jan 8, 2025

jaegeral left a comment

Choose a reason for hiding this comment

berggren left a comment

Choose a reason for hiding this comment