The Data Charmer: September 2015

Monday, September 28, 2015

MySQL 5.7 : Playing with mysqlpump

MySQL 5.7 comes with a new backup tool, named mysqlpump, which is almost the same as mysqldump with the ability of extracting data in parallel threads.

I tried a little experiment. Using a server containing 11 databases, with a total of 300 tables and about 20 million rows (roughly ≈ 10GB,) I used both mysqldump and mysqlpump to get a backup.

mysqldump --all-databases  > dump.sql

mysqlpump --all-databases \
    --add-drop-database --add-drop-table --skip-watch-progress \
    --default-parallelism=10 \
    --parallel-schemas=db,db1,db2 \
    --parallel-schemas=db3,db4,db5 \
    --parallel-schemas=db6,db7,db8 \
    --parallel-schemas=db9,db10 > pump.sql

The backup with mysqldump took 3 minutes and 33 seconds. The one with mysqlpump took 2 minutes and 55 seconds (saving 38 seconds). This does not seem to be a great gain. I experimented with several values of default-parallelism and different grouping of databases, and also without any parameters at all, but I always get the same time.

If there is a different way of invoking mysqlpump to use parallelism better, I would like to know.

There are four interesting points about mysqlpump that users should know:

mysqlpump has options to include and exclude objects (databases, tables, routines, users) from the backup. This is a long awaited feature that will be welcome by many DBAs.
The option --no-data is called --skip-dump-rows. (Just in case you want to use the new tool alternate way of reproducing DDL. But be aware that there is at least one bug)
A backup created with mysqlpump can only be loaded into a database of the same name. This is due to the parallel work, which requires that the INSERT statements contain both the database and the table names. But it means that, unlike with mysqldump, you can't backup tables from database X and load them to database Y.
The most serious limitation of mysqlpump, which I have seen both in the manual and in a blog article is that, while the backup is parallelized, the restore is serialized. Both sources say to run "mysqlpump > file.sql" and "mysql < file.sql". What is the advantage of extracting data with N parallel threads if I then need to apply it with a single thread? I would have expected an option to create N files, which I can then load using several background tasks, or even better an option in the mysql client to handle parallel backup files. I may be missing something here. I will appreciate comments by more savvy users.

The idea is good. The tool still has some rough edges, but I am sure it can be improved.

Tuesday, September 15, 2015

Percona Live Amsterdam - September 21-23, 2015

PL EuropeLogo FullInv CMYK Final Horiz EMAIL

I am attending Percona Live Amsterdam 2015 on September 21-23, 2015.

I will be on stage three times:

September 22nd at 2:10pm Pivot tables: analytics in pure SQL
September 23rd at 2:10pm The future of replication is today: new features in practice
September 23rd at 5:10pm as host and speaker in the Lightning talks

My first talk is a topic that has ben among my favorites for long time: I published an article about it in 2001, and several more in the years to come.

The second one is a summary of what I have written recently about replication technologies.

The lightning talks are a collection of 5-minutes long talks that are presented by different speakers. For the first time, the LT are held in a separate room instead of being attached to one of the community events. It will be fun!

Percona has just released a mobile app for the conference for both iOS and Android. With it, it is possible to set a personalized schedule, follow the show more closely, and get in touch with other attendees. It is a very good addition!

There is much to watch at the conference, and I look forward to seeing the latest innovation in the field. I will miss some very interesting talks because they are at the same time as mine (!!) but I hope I will catch up with the speakers in the conference hall.

Monday, September 14, 2015

Improving Sakila database

The Sakila sample database was created almost 10 years ago, as a sample set of data for MySQL courses and examples.

The database was developed by MySQL employees, with substantial contributions form the community.

Recently, the database was updated to use some of the features in MySQL 5.7. As a result, we had two sets of samples, one to use with MySQL 5.0+, and one that only loads with MySQL 5.7.

I filed a feature request, offering a patch to use conditional schema and data changes, which was incorporated very quickly into the official release.

The current release, available within the MySQL docs, has conditional comments such as this:

/*!50610 ALTER TABLE film_text engine=InnoDB */ ;

Using these comments, we can enable specific features if the version is at least the one indicated in the comment. So, for example, we can use InnoDB tables with full-text indexes starting with version 5.6. The original table is MyISAM, but if the current version is at least 5.6.10 (that's the meaning of !50610) then the engine is changed to InnoDB.

>Similarly, there is a GEOMETRY column and SPATIAL key in the 'address' table, which are only enabled for MySQL 5.7.5+. A similar comment allows the loading of the relevant data only in MySQL 5.7.

Using these new files, you can install the Sakila database using any version of MySQL from 5.0 onwards, and it will always load correctly.

Monday, September 07, 2015

Sample employees database migrated to GitHub

It's migration time. There was another project that I use often and was still in Launchpad. The Sample Employees Database is now on GitHub, under the same license it had before (CC A-SA 3).

Figure 1 - Employees database
This database is interesting because it is not too small (like Sakila) and not too big. It has enough data to allow you to test in a non trivial way.

How MySQL-Sandbox is tested, and tests MySQL in the process

MySQL-Sandbox is a great tool for testing a new release, and in fact this is what I do when a new MySQL tarball becomes available. I don't think many people are aware of the full testing capabilities of the sandbox, though.
When you think about testing, you may just think of creating a sandbox with the new tarball, and then hammering it with your pet procedure. That works, of course, as the main purpose of MySQL-Sandbox is to allow you to do just that. There is, however, a full test suite that can tell you in a short while if your tarball is compatible with the past or not.
This procedure is quite strict. It has happened several times that I caught a bug in a new release of MySQL, or Percona Server, or MariaDB, just by running this suite.

The Data Charmer

Monday, September 28, 2015

MySQL 5.7 : Playing with mysqlpump

Tuesday, September 15, 2015

Percona Live Amsterdam - September 21-23, 2015

Monday, September 14, 2015

Improving Sakila database

Monday, September 07, 2015

Sample employees database migrated to GitHub

Wednesday, September 02, 2015

How MySQL-Sandbox is tested, and tests MySQL in the process

Pages

Giuseppe Maxia

Like my posts? Buy me a book!

Julian's force

Yes, I am one of those

Sapere, saper fare, fare, far sapere

Blog Archive

Planet MySQL