MySQL 5.7 comes with a new backup tool, named mysqlpump, which is almost the same as mysqldump with the ability of extracting data in parallel threads.
I tried a little experiment. Using a server containing 11 databases, with a total of 300 tables and about 20 million rows (roughly ≈ 10GB,) I used both mysqldump and mysqlpump to get a backup.
mysqldump --all-databases > dump.sql
mysqlpump --all-databases \
--add-drop-database --add-drop-table --skip-watch-progress \
--default-parallelism=10 \
--parallel-schemas=db,db1,db2 \
--parallel-schemas=db3,db4,db5 \
--parallel-schemas=db6,db7,db8 \
--parallel-schemas=db9,db10 > pump.sql
The backup with mysqldump took 3 minutes and 33 seconds. The one with mysqlpump took 2 minutes and 55 seconds (saving 38 seconds). This does not seem to be a great gain. I experimented with several values of default-parallelism and different grouping of databases, and also without any parameters at all, but I always get the same time.
If there is a different way of invoking mysqlpump to use parallelism better, I would like to know.
There are four interesting points about mysqlpump that users should know:
- mysqlpump has options to include and exclude objects (databases, tables, routines, users) from the backup. This is a long awaited feature that will be welcome by many DBAs.
- The option --no-data is called --skip-dump-rows. (Just in case you want to use the new tool alternate way of reproducing DDL. But be aware that there is at least one bug)
- A backup created with mysqlpump can only be loaded into a database of the same name. This is due to the parallel work, which requires that the INSERT statements contain both the database and the table names. But it means that, unlike with mysqldump, you can't backup tables from database X and load them to database Y.
- The most serious limitation of mysqlpump, which I have seen both in the manual and in a blog article is that, while the backup is parallelized, the restore is serialized. Both sources say to run "mysqlpump > file.sql" and "mysql < file.sql". What is the advantage of extracting data with N parallel threads if I then need to apply it with a single thread? I would have expected an option to create N files, which I can then load using several background tasks, or even better an option in the mysql client to handle parallel backup files. I may be missing something here. I will appreciate comments by more savvy users.
The idea is good. The tool still has some rough edges, but I am sure it can be improved.