Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Estimate length of capped batched background migrations

Batched background migrations can have a ceiling set for their batch sizes, by setting the max_batch_size on the record. If a max_batch_size is present, the optimizer won't choose a batch size larger than the configured value.

This has been used in a few instances for migrations with uneven data distribution. While it can simplify batching operations for some types of migrations, it can also greatly increase the migration's total duration, and should be used with caution.

We recently had a migration merged with a low max_batch_size that would have taken over a month to complete execution. Since the migration processing queue is single-threaded, no other migration can run during this time. We were able to speed it up by increasing the max_batch_size through a production change, but it would be better if we can catch this earlier in the process.

After talking with @iroussos, he suggested adding a hook into migration testing which will estimate the length of batched background migrations that have their max_batch_size set. If the total duration is an unreasonable amount of time, we should add a warning on the testing pipeline comment.

Edited by Yannis Roussos
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information