The Data Charmer: development

Showing posts with label development. Show all posts

Wednesday, March 06, 2019

dbdeployer community - Part 1: TiDB

After a conference, when I take stock of what I have learned, I usually realise that the best achievements are the result of interacting with other attendees during the breaks, rather than simply listening to the lectures. It might be because I follow closely the blogosphere and thus the lectures have few surprises in store for me, or perhaps because many geeks take the conference as an excuse to refresh dormant friendships, catch up with technical gossip, and ask their friends some questions that were too sensitive to be discussed over Twitter and have been waiting for a chance of an in-person meeting to see the light of the day.

I surely had some of such questions, and I took advantage of the conference to ask them. As it often happens, I got satisfactory responses, but the latest FOSDEM conference was different than usual, because I got the best experience from the questions that others did ask me.

As it turned out, others were waiting for a chance to discuss things over coffee or food, and I saw that my pet project (dbdeployer) is a lot more popular than I thought, and it is being used silently in several environments. It should not be surprising if you read several MySQL reports on bugs at bugs.mysql.com where it is common the usage of sandboxes to reproduce user issues. Anyway, I got some praise, some requests, a few ideas for improvements, advance notice of an incoming graphical interface, and a few concrete collaboration proposals.

One of such proposals came from Morgan Tocker, who suggested enhancing dbdeployer to support TiDB. At first, it seemed uninteresting, as TiDB is designed to be distributed, and installing just a component didn’t immediately look useful. However, Morgan pointed out that it could be used as a tool to test compatibility with existing applications, and as such it could gain much more value than I initially thought. We decided to try a quick hackathon to make a proof of concept.

It was a great pleasure to figure out, in just over one hour of close interaction, that dbdeployer design for flexibility was up to the task. We managed to make TiDB work with dbdeployer simply by exporting, editing, and re-loading a few templates.

The exercise showed strengths and limitations in both projects. We agreed that dbdeployer had to lose some assumptions (such as “I am working with a MySQL server”) and become able to recognise which flavor of MySQL-lookalike we are dealing with. At the same time, we noted that TiDB is not compatible when it comes to deployment and bootstrap: it is so simple and straightforward that its initialisation doesn’t fit in the complex operation that is a MySQL server warm-up.

Pleased with the initial success, we kept in touch and, after dbdeployer acquired the ability of telling one flavor from another, we put together the various pieces to make dbdeployer recognise and install TiDB. We found and fixed several bugs in both project, and finally released dbdeployer 1.19.0, which can use a TiDB server transparently.

What does transparently mean? It means that tests for TiDB deployment can run alongside tests for other MySQL servers, and the sandbox scripts (such as start, stop, use, status, and test_sb) work as expected and produce a compatible output. Thus, there is a TiDB test running together with another dozen MySQL versions.

Now, if you want, you can evaluate TiDB in your computer without installing the full stack. It won’t be as fast as the real thing: what is installed as a single node is a slower emulation of the real database, but it is enough to give you an idea of what queries you can and cannot run in TiDB, and perhaps try to see if your application could run on TiDB at all.

The collaboration with TiDB was especially useful because the changes needed to smooth the TiDB integration have made made dbdeployer better suited to add support for more not-quite-mysql servers, such as the one that we’ll see in the next post.

But before reaching that point, here’s an example of TiDB deployment on Linux:

$ wget https://download.pingcap.org/tidb-master-linux-amd64.tar.gz  
[...]  
2019-02-24 04:46:26 (2.26 MB/s) - 'tidb-master-linux-amd64.tar.gz' saved [16304317/16304317]

$ dbdeployer unpack tidb-master-linux-amd64.tar.gz --unpack-version=3.0.0 --prefix=tidb  
Unpacking tarball tidb-master-linux-amd64.tar.gz to $HOME/opt/mysql/tidb3.0.0  
1
Renaming directory /home/msandbox/opt/mysql/tidb-master-linux-amd64 to /home/msandbox/opt/mysql/tidb3.0.0

TiDB tarballs doesn't come with a client. We need to use one from MYSQL 5.7. Rather than downloading the huge tarball from MySQL site, we can get a smaller one from a GitHub repository, using dbdeployer itself (NB: this reduced tarball is only for Linux)


$ dbdeployer remote list  
Files available in https://raw.githubusercontent.com/datacharmer/mysql-docker-minimal/master/dbdata/available.json  
5.7 -> [mysql-5.7.24 mysql-5.7.25]  
8.0 -> [mysql-8.0.13 mysql-8.0.15]  
4.1 -> [mysql-4.1.22]  
5.0 -> [mysql-5.0.15 mysql-5.0.96]  
5.1 -> [mysql-5.1.72]  
5.5 -> [mysql-5.5.61 mysql-5.5.62]  
5.6 -> [mysql-5.6.41 mysql-5.6.43]

$ dbdeployer remote get mysql-5.7.25  
File /home/msandbox/mysql-5.7.25.tar.xz downloaded

$ dbdeployer unpack mysql-5.7.25.tar.xz  
[...]  
Renaming directory /home/msandbox/opt/mysql/mysql-5.7.25 to /home/msandbox/opt/mysql/5.7.25

Now we are ready to install TiDB:

$ dbdeployer deploy single tidb3.0.0 --client-from=5.7.25  
Creating directory /home/msandbox/sandboxes  
Database installed in $HOME/sandboxes/msb_tidb3_0_0  
run 'dbdeployer usage single' for basic instructions'  
. sandbox server started

Once installed, a TiDB sandbox behaves like a MySQL sandbox.

$ $HOME/sandboxes/msb_tidb3_0_0/use  
Welcome to the MySQL monitor.  Commands end with ; or \g.  
Your MySQL connection id is 2  
Server version: 5.7.10-TiDB-v3.0.0-beta-111-g266ff4b6f MySQL Community Server (Apache License 2.0)

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its  
affiliates. Other names may be trademarks of their respective  
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql [localhost:3000] {msandbox} ((none)) >

Sunday, September 25, 2016

Showing the hidden tables in MySQL 8 data dictionary

The freshly released MySQL 8.0 includes a data dictionary, which makes MySQL much more reliable. Thanks to this features, we don't have any '.frm' files, and querying the information_schema is 30x to 100x faster than previous versions.

One drawback of the implementation is that the data dictionary tables are hidden by design.

While the reason is fully understandable (they don't want to commit on an interface that may change in the future) many curious users are disappointed, because openness is the basis of good understanding and feedback.

The problem to access the dictionary tables can be split in three parts:

Finding the list of tables;
Finding the description of the tables;
Getting the contents of each table.

The first part is easily solved. We know that the data dictionary tables are accessed from some information_schema views (the views are defined during the initialization, at which point the DD tables are readable.)

For example:

 show create view information_schema.tables\G
*************************** 1. row ***************************
                View: TABLES
         Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost`
SQL SECURITY DEFINER VIEW `TABLES` AS select `cat`.`name` AS
`TABLE_CATALOG`,`sch`.`name` AS `TABLE_SCHEMA`,`tbl`.`name` AS
`TABLE_NAME`,`tbl`.`type` AS `TABLE_TYPE`,if((`tbl`.`type` = 'BASE
TABLE'),`tbl`.`engine`,NULL) AS `ENGINE`,if((`tbl`.`type` = 'VIEW'),NULL,10)
AS `VERSION`,`tbl`.`row_format` AS `ROW_FORMAT`,`stat`.`table_rows` AS
`TABLE_ROWS`,`stat`.`avg_row_length` AS `AVG_ROW_LENGTH`,`stat`.`data_length`
AS `DATA_LENGTH`,`stat`.`max_data_length` AS
`MAX_DATA_LENGTH`,`stat`.`index_length` AS `INDEX_LENGTH`,`stat`.`data_free`
AS `DATA_FREE`,`stat`.`auto_increment` AS `AUTO_INCREMENT`,`tbl`.`created` AS
`CREATE_TIME`,`stat`.`update_time` AS `UPDATE_TIME`,`stat`.`check_time` AS
`CHECK_TIME`,`col`.`name` AS `TABLE_COLLATION`,`stat`.`checksum` AS
`CHECKSUM`,if((`tbl`.`type` =
'VIEW'),NULL,get_dd_create_options(`tbl`.`options`,if((ifnull(`tbl`.`partition_expression`,'NOT_PART_TBL')
= 'NOT_PART_TBL'),0,1))) AS
`CREATE_OPTIONS`,internal_get_comment_or_error(`sch`.`name`,`tbl`.`name`,`tbl`.`type`,`tbl`.`options`,`tbl`.`comment`)
AS `TABLE_COMMENT` from ((((`mysql`.`tables` `tbl` join `mysql`.`schemata`
`sch` on((`tbl`.`schema_id` = `sch`.`id`))) join `mysql`.`catalogs` `cat`
on((`cat`.`id` = `sch`.`catalog_id`))) left join `mysql`.`collations` `col`
on((`tbl`.`collation_id` = `col`.`id`))) left join `mysql`.`table_stats`
`stat` on(((`tbl`.`name` = `stat`.`table_name`) and (`sch`.`name` =
`stat`.`schema_name`)))) where (can_access_table(`sch`.`name`,`tbl`.`name`)
and (not(`tbl`.`hidden`)))
character_set_client: latin1
collation_connection: latin1_swedish_ci
1 row in set (0.00 sec)

Here we see many tables (such as mysql.schemata or mysql.collations) that don't show up when we run SHOW TABLES in the mysql database.

We can use a script like this to get the list of all tables:

MYSQL=~/sandboxes/msb_full_8_0_0/use
TABLES=$($MYSQL  information_schema -BN -e 'show tables')

function show_tables
{
    for T in $TABLES
    do
        is_view=$($MYSQL information_schema -BN -e "show create table $T\G" | grep ALGORITHM)
        if [ -n "$is_view" ]
        then
            $MYSQL information_schema -e "show create table $T\G" \
               | perl -lne 'print $1 while /mysql.\..(\w+)/g'
        fi
    done
}
show_tables | sort | uniq

This script searches all information_schema tables, skips the ones that are not views, and then searches in the view definition every table from the mysql database. The result is this:

catalogs
character_sets
collations
columns
foreign_key_column_usage
foreign_keys
index_column_usage
indexes
index_stats
schemata
tables
table_stats

Good. Now we have the list of tables that we can't see. The second operation is getting the description.

So, I looked at the source code, and I found out where the prohibition originated. From there, I saw that the table is accessible when the variable skip_dd_table_access_check is set. Looking at the variables inside the server, I did not find any skip_dd_table_access_check, as I was expecting, since it would not make sense to provide this information in the open after going through the pains of making all DD tables unreachable.

I searched the code for the string skip_dd_table_access_check and I found out how it is used in the test suite. The key is using the debug build of the MySQL server.

Using MySQL Sandbox, with a sandbox made from the full tarball of MySQL 8.0, I run:

 ~/sandboxes/msb_full_8_0_0/restart --mysqld=mysqld-debug

Now I have loaded the debug-enabled server. Let's try:

$ ~/sandboxes/msb_full_8_0_0/use mysql
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 91
Server version: 8.0.0-dmr-debug MySQL Community Server - Debug (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql [localhost] {msandbox} (mysql) > show create table schemata\G
ERROR 3554 (HY000): Access to system table 'mysql.schemata' is rejected.

Having the debug build is not enough. We need to use the magic spell.

mysql [localhost] {msandbox} (mysql) > SET SESSION debug='+d,skip_dd_table_access_check';
Query OK, 0 rows affected (0.00 sec)

mysql [localhost] {msandbox} (mysql) > show create table schemata\G
*************************** 1. row ***************************
       Table: schemata
Create Table: CREATE TABLE `schemata` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `catalog_id` bigint(20) unsigned NOT NULL,
  `name` varchar(64) COLLATE utf8_bin NOT NULL,
  `default_collation_id` bigint(20) unsigned NOT NULL,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `last_altered` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `catalog_id` (`catalog_id`,`name`),
  KEY `default_collation_id` (`default_collation_id`),
  CONSTRAINT `schemata_ibfk_1` FOREIGN KEY (`catalog_id`) REFERENCES `catalogs` (`id`),
  CONSTRAINT `schemata_ibfk_2` FOREIGN KEY (`default_collation_id`) REFERENCES  `collations` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
1 row in set (0.00 sec)

Yay! The table is now visible! We can modify the above script as follows:

mysql_tables=$(show_tables | sort | uniq)
for T in $mysql_tables
do
    echo "-- $T "
    $MYSQL -e "SET SESSION debug= '+d,skip_dd_table_access_check'; show create table mysql.$T\G"
done

And we get the description of every table in the data dictionary. Here it goes:

-- catalogs
*************************** 1. row ***************************
       Table: catalogs
Create Table: CREATE TABLE `catalogs` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(64) COLLATE utf8_bin NOT NULL,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `last_altered` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
-- character_sets
*************************** 1. row ***************************
       Table: character_sets
Create Table: CREATE TABLE `character_sets` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(64) CHARACTER SET utf8 NOT NULL,
  `default_collation_id` bigint(20) unsigned NOT NULL,
  `comment` varchar(2048) COLLATE utf8_bin NOT NULL,
  `mb_max_length` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `name` (`name`),
  KEY `default_collation_id` (`default_collation_id`),
  CONSTRAINT `character_sets_ibfk_1` FOREIGN KEY (`default_collation_id`) REFERENCES `collations` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=249 DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
-- collations
*************************** 1. row ***************************
       Table: collations
Create Table: CREATE TABLE `collations` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(64) CHARACTER SET utf8 NOT NULL,
  `character_set_id` bigint(20) unsigned NOT NULL,
  `is_compiled` tinyint(1) NOT NULL,
  `sort_length` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `name` (`name`),
  KEY `character_set_id` (`character_set_id`),
  CONSTRAINT `collations_ibfk_1` FOREIGN KEY (`character_set_id`) REFERENCES `character_sets` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=278 DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
-- columns
*************************** 1. row ***************************
       Table: columns
Create Table: CREATE TABLE `columns` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `table_id` bigint(20) unsigned NOT NULL,
  `name` varchar(64) CHARACTER SET utf8 COLLATE utf8_tolower_ci NOT NULL,
  `ordinal_position` int(10) unsigned NOT NULL,
  `type` enum(/* removed */) COLLATE utf8_bin NOT NULL,
  `is_nullable` tinyint(1) NOT NULL,
  `is_zerofill` tinyint(1) DEFAULT NULL,
  `is_unsigned` tinyint(1) DEFAULT NULL,
  `char_length` int(10) unsigned DEFAULT NULL,
  `numeric_precision` int(10) unsigned DEFAULT NULL,
  `numeric_scale` int(10) unsigned DEFAULT NULL,
  `datetime_precision` int(10) unsigned DEFAULT NULL,
  `collation_id` bigint(20) unsigned DEFAULT NULL,
  `has_no_default` tinyint(1) DEFAULT NULL,
  `default_value` blob,
  `default_value_utf8` text COLLATE utf8_bin,
  `default_option` blob,
  `update_option` varchar(32) COLLATE utf8_bin DEFAULT NULL,
  `is_auto_increment` tinyint(1) DEFAULT NULL,
  `is_virtual` tinyint(1) DEFAULT NULL,
  `generation_expression` longblob,
  `generation_expression_utf8` longtext COLLATE utf8_bin,
  `comment` varchar(2048) COLLATE utf8_bin NOT NULL,
  `hidden` tinyint(1) NOT NULL,
  `options` mediumtext COLLATE utf8_bin,
  `se_private_data` mediumtext COLLATE utf8_bin,
  `column_key` enum('','PRI','UNI','MUL') COLLATE utf8_bin NOT NULL,
  `column_type_utf8` mediumtext COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `table_id` (`table_id`,`name`),
  UNIQUE KEY `table_id_2` (`table_id`,`ordinal_position`),
  KEY `collation_id` (`collation_id`),
  CONSTRAINT `columns_ibfk_1` FOREIGN KEY (`table_id`) REFERENCES `tables` (`id`),
  CONSTRAINT `columns_ibfk_2` FOREIGN KEY (`collation_id`) REFERENCES `collations` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3450 DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
-- foreign_key_column_usage
*************************** 1. row ***************************
       Table: foreign_key_column_usage
Create Table: CREATE TABLE `foreign_key_column_usage` (
  `foreign_key_id` bigint(20) unsigned NOT NULL,
  `ordinal_position` int(10) unsigned NOT NULL,
  `column_id` bigint(20) unsigned NOT NULL,
  `referenced_column_name` varchar(64) CHARACTER SET utf8 COLLATE utf8_tolower_ci NOT NULL,
  PRIMARY KEY (`foreign_key_id`,`ordinal_position`),
  UNIQUE KEY `foreign_key_id` (`foreign_key_id`,`column_id`,`referenced_column_name`),
  KEY `column_id` (`column_id`),
  CONSTRAINT `foreign_key_column_usage_ibfk_1` FOREIGN KEY (`foreign_key_id`) REFERENCES `foreign_keys` (`id`),
  CONSTRAINT `foreign_key_column_usage_ibfk_2` FOREIGN KEY (`column_id`) REFERENCES `columns` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
-- foreign_keys
*************************** 1. row ***************************
       Table: foreign_keys
Create Table: CREATE TABLE `foreign_keys` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `schema_id` bigint(20) unsigned NOT NULL,
  `table_id` bigint(20) unsigned NOT NULL,
  `name` varchar(64) CHARACTER SET utf8 NOT NULL,
  `unique_constraint_id` bigint(20) unsigned NOT NULL,
  `match_option` enum('NONE','PARTIAL','FULL') COLLATE utf8_bin NOT NULL,
  `update_rule` enum('NO ACTION','RESTRICT','CASCADE','SET NULL','SET DEFAULT') COLLATE utf8_bin NOT NULL,
  `delete_rule` enum('NO ACTION','RESTRICT','CASCADE','SET NULL','SET DEFAULT') COLLATE utf8_bin NOT NULL,
  `referenced_table_catalog` varchar(64) COLLATE utf8_bin NOT NULL,
  `referenced_table_schema` varchar(64) COLLATE utf8_bin NOT NULL,
  `referenced_table_name` varchar(64) COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `schema_id` (`schema_id`,`name`),
  UNIQUE KEY `table_id` (`table_id`,`name`),
  KEY `unique_constraint_id` (`unique_constraint_id`),
  CONSTRAINT `foreign_keys_ibfk_1` FOREIGN KEY (`schema_id`) REFERENCES `schemata` (`id`),
  CONSTRAINT `foreign_keys_ibfk_2` FOREIGN KEY (`unique_constraint_id`) REFERENCES `indexes` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=46 DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
-- index_column_usage
*************************** 1. row ***************************
       Table: index_column_usage
Create Table: CREATE TABLE `index_column_usage` (
  `index_id` bigint(20) unsigned NOT NULL,
  `ordinal_position` int(10) unsigned NOT NULL,
  `column_id` bigint(20) unsigned NOT NULL,
  `length` int(10) unsigned DEFAULT NULL,
  `order` enum('UNDEF','ASC','DESC') COLLATE utf8_bin NOT NULL,
  `hidden` tinyint(1) NOT NULL,
  UNIQUE KEY `index_id` (`index_id`,`ordinal_position`),
  UNIQUE KEY `index_id_2` (`index_id`,`column_id`,`hidden`),
  KEY `f2` (`column_id`),
  CONSTRAINT `index_column_usage_ibfk_1` FOREIGN KEY (`index_id`) REFERENCES `indexes` (`id`),
  CONSTRAINT `index_column_usage_ibfk_2` FOREIGN KEY (`column_id`) REFERENCES `columns` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
-- indexes
*************************** 1. row ***************************
       Table: indexes
Create Table: CREATE TABLE `indexes` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `table_id` bigint(20) unsigned NOT NULL,
  `name` varchar(64) CHARACTER SET utf8 COLLATE utf8_tolower_ci NOT NULL,
  `type` enum('PRIMARY','UNIQUE','MULTIPLE','FULLTEXT','SPATIAL') COLLATE utf8_bin NOT NULL,
  `algorithm` enum('SE_SPECIFIC','BTREE','RTREE','HASH','FULLTEXT') COLLATE utf8_bin NOT NULL,
  `is_algorithm_explicit` tinyint(1) NOT NULL,
  `is_visible` tinyint(1) NOT NULL,
  `is_generated` tinyint(1) NOT NULL,
  `hidden` tinyint(1) NOT NULL,
  `ordinal_position` int(10) unsigned NOT NULL,
  `comment` varchar(2048) COLLATE utf8_bin NOT NULL,
  `options` mediumtext COLLATE utf8_bin,
  `se_private_data` mediumtext COLLATE utf8_bin,
  `tablespace_id` bigint(20) unsigned DEFAULT NULL,
  `engine` varchar(64) COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `table_id` (`table_id`,`name`),
  KEY `tablespace_id` (`tablespace_id`),
  CONSTRAINT `indexes_ibfk_1` FOREIGN KEY (`table_id`) REFERENCES `tables` (`id`),
  CONSTRAINT `indexes_ibfk_2` FOREIGN KEY (`tablespace_id`) REFERENCES `tablespaces` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=235 DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
-- index_stats
*************************** 1. row ***************************
       Table: index_stats
Create Table: CREATE TABLE `index_stats` (
  `schema_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `table_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `index_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `column_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `cardinality` bigint(20) unsigned DEFAULT NULL,
  UNIQUE KEY `schema_name` (`schema_name`,`table_name`,`index_name`,`column_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
-- schemata
*************************** 1. row ***************************
       Table: schemata
Create Table: CREATE TABLE `schemata` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `catalog_id` bigint(20) unsigned NOT NULL,
  `name` varchar(64) COLLATE utf8_bin NOT NULL,
  `default_collation_id` bigint(20) unsigned NOT NULL,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `last_altered` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `catalog_id` (`catalog_id`,`name`),
  KEY `default_collation_id` (`default_collation_id`),
  CONSTRAINT `schemata_ibfk_1` FOREIGN KEY (`catalog_id`) REFERENCES `catalogs` (`id`),
  CONSTRAINT `schemata_ibfk_2` FOREIGN KEY (`default_collation_id`) REFERENCES `collations` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
-- tables
*************************** 1. row ***************************
       Table: tables
Create Table: CREATE TABLE `tables` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `schema_id` bigint(20) unsigned NOT NULL,
  `name` varchar(64) COLLATE utf8_bin NOT NULL,
  `type` enum('BASE TABLE','VIEW','SYSTEM VIEW') COLLATE utf8_bin NOT NULL,
  `engine` varchar(64) CHARACTER SET utf8 NOT NULL,
  `mysql_version_id` int(10) unsigned NOT NULL,
  `row_format` enum('Fixed','Dynamic','Compressed','Redundant','Compact','Paged') COLLATE utf8_bin DEFAULT NULL,
  `collation_id` bigint(20) unsigned DEFAULT NULL,
  `comment` varchar(2048) COLLATE utf8_bin NOT NULL,
  `hidden` tinyint(1) NOT NULL,
  `options` mediumblob,
  `se_private_data` mediumtext COLLATE utf8_bin,
  `se_private_id` bigint(20) unsigned DEFAULT NULL,
  `tablespace_id` bigint(20) unsigned DEFAULT NULL,
  `partition_type` enum(/*removed*/) COLLATE utf8_bin DEFAULT NULL,
  `partition_expression` varchar(2048) COLLATE utf8_bin DEFAULT NULL,
  `default_partitioning` enum('NO','YES','NUMBER') COLLATE utf8_bin DEFAULT NULL,
  `subpartition_type` enum('HASH','KEY_51','KEY_55','LINEAR_HASH','LINEAR_KEY_51','LINEAR_KEY_55') COLLATE utf8_bin DEFAULT NULL,
  `subpartition_expression` varchar(2048) COLLATE utf8_bin DEFAULT NULL,
  `default_subpartitioning` enum('NO','YES','NUMBER') COLLATE utf8_bin DEFAULT NULL,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `last_altered` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `view_definition` longblob,
  `view_definition_utf8` longtext COLLATE utf8_bin,
  `view_check_option` enum('NONE','LOCAL','CASCADED') COLLATE utf8_bin DEFAULT NULL,
  `view_is_updatable` enum('NO','YES') COLLATE utf8_bin DEFAULT NULL,
  `view_algorithm` enum('UNDEFINED','TEMPTABLE','MERGE') COLLATE utf8_bin DEFAULT NULL,
  `view_security_type` enum('DEFAULT','INVOKER','DEFINER') COLLATE utf8_bin DEFAULT NULL,
  `view_definer` varchar(93) COLLATE utf8_bin DEFAULT NULL,
  `view_client_collation_id` bigint(20) unsigned DEFAULT NULL,
  `view_connection_collation_id` bigint(20) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `schema_id` (`schema_id`,`name`),
  UNIQUE KEY `engine` (`engine`,`se_private_id`),
  KEY `engine_2` (`engine`),
  KEY `collation_id` (`collation_id`),
  KEY `tablespace_id` (`tablespace_id`),
  CONSTRAINT `tables_ibfk_1` FOREIGN KEY (`schema_id`) REFERENCES `schemata` (`id`),
  CONSTRAINT `tables_ibfk_2` FOREIGN KEY (`collation_id`) REFERENCES `collations` (`id`),
  CONSTRAINT `tables_ibfk_3` FOREIGN KEY (`tablespace_id`) REFERENCES `tablespaces` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=322 DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
-- table_stats
*************************** 1. row ***************************
       Table: table_stats
Create Table: CREATE TABLE `table_stats` (
  `schema_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `table_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `table_rows` bigint(20) unsigned DEFAULT NULL,
  `avg_row_length` bigint(20) unsigned DEFAULT NULL,
  `data_length` bigint(20) unsigned DEFAULT NULL,
  `max_data_length` bigint(20) unsigned DEFAULT NULL,
  `index_length` bigint(20) unsigned DEFAULT NULL,
  `data_free` bigint(20) unsigned DEFAULT NULL,
  `auto_increment` bigint(20) unsigned DEFAULT NULL,
  `checksum` bigint(20) unsigned DEFAULT NULL,
  `update_time` timestamp NULL DEFAULT NULL,
  `check_time` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`schema_name`,`table_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC

In addition to the tables referred in views, there are three that are mentioned in the documentation but not implemented as a view in information_schema: triggers, events, and routines.


show create table triggers\G
*************************** 1. row ***************************
       Table: triggers
Create Table: CREATE TABLE `triggers` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `schema_id` bigint(20) unsigned NOT NULL,
  `name` varchar(64) CHARACTER SET utf8 NOT NULL,
  `event_type` enum('INSERT','UPDATE','DELETE') COLLATE utf8_bin NOT NULL,
  `table_id` bigint(20) unsigned NOT NULL,
  `action_timing` enum('BEFORE','AFTER') COLLATE utf8_bin NOT NULL,
  `action_order` int(10) unsigned NOT NULL,
  `action_statement` longblob NOT NULL,
  `action_statement_utf8` longtext COLLATE utf8_bin NOT NULL,
  `created` timestamp(2) NOT NULL DEFAULT CURRENT_TIMESTAMP(2) ON UPDATE CURRENT_TIMESTAMP(2),
  `last_altered` timestamp(2) NOT NULL DEFAULT CURRENT_TIMESTAMP(2),
  `sql_mode` set(/*removed*/) COLLATE utf8_bin NOT NULL,
  `definer` varchar(93) COLLATE utf8_bin NOT NULL,
  `client_collation_id` bigint(20) unsigned NOT NULL,
  `connection_collation_id` bigint(20) unsigned NOT NULL,
  `schema_collation_id` bigint(20) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `schema_id` (`schema_id`,`name`),
  UNIQUE KEY `table_id` (`table_id`,`event_type`,`action_timing`,`action_order`),
  KEY `client_collation_id` (`client_collation_id`),
  KEY `connection_collation_id` (`connection_collation_id`),
  KEY `schema_collation_id` (`schema_collation_id`),
  CONSTRAINT `triggers_ibfk_1` FOREIGN KEY (`schema_id`) REFERENCES `schemata` (`id`),
  CONSTRAINT `triggers_ibfk_2` FOREIGN KEY (`table_id`) REFERENCES `tables` (`id`),
  CONSTRAINT `triggers_ibfk_3` FOREIGN KEY (`client_collation_id`) REFERENCES `collations` (`id`),
  CONSTRAINT `triggers_ibfk_4` FOREIGN KEY (`connection_collation_id`) REFERENCES `collations` (`id`),
  CONSTRAINT `triggers_ibfk_5` FOREIGN KEY (`schema_collation_id`) REFERENCES `collations` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC
show create table events\G
*************************** 1. row ***************************
       Table: events
Create Table: CREATE TABLE `events` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `schema_id` bigint(20) unsigned NOT NULL,
  `name` varchar(64) CHARACTER SET utf8 NOT NULL,
  `definer` varchar(93) COLLATE utf8_bin NOT NULL,
  `time_zone` varchar(64) COLLATE utf8_bin NOT NULL,
  `definition` longblob NOT NULL,
  `definition_utf8` longtext COLLATE utf8_bin NOT NULL,
  `execute_at` datetime DEFAULT NULL,
  `interval_value` int(11) DEFAULT NULL,
  `interval_field` enum('YEAR','QUARTER','MONTH','DAY','HOUR','MINUTE','WEEK','SECOND','MICROSECOND','YEAR_MONTH','DAY_HOUR','DAY_MINUTE','DAY_SECOND','HOUR_MINUTE','HOUR_SECOND','MINUTE_SECOND','DAY_MICROSECOND','HOUR_MICROSECOND','MINUTE_MICROSECOND','SECOND_MICROSECOND') COLLATE utf8_bin DEFAULT NULL,
  `sql_mode` set('REAL_AS_FLOAT','PIPES_AS_CONCAT','ANSI_QUOTES','IGNORE_SPACE','NOT_USED','ONLY_FULL_GROUP_BY','NO_UNSIGNED_SUBTRACTION','NO_DIR_IN_CREATE','POSTGRESQL','ORACLE','MSSQL','DB2','MAXDB','NO_KEY_OPTIONS','NO_TABLE_OPTIONS','NO_FIELD_OPTIONS','MYSQL323','MYSQL40','ANSI','NO_AUTO_VALUE_ON_ZERO','NO_BACKSLASH_ESCAPES','STRICT_TRANS_TABLES','STRICT_ALL_TABLES','NO_ZERO_IN_DATE','NO_ZERO_DATE','INVALID_DATES','ERROR_FOR_DIVISION_BY_ZERO','TRADITIONAL','NO_AUTO_CREATE_USER','HIGH_NOT_PRECEDENCE','NO_ENGINE_SUBSTITUTION','PAD_CHAR_TO_FULL_LENGTH') COLLATE utf8_bin NOT NULL,
  `starts` datetime DEFAULT NULL,
  `ends` datetime DEFAULT NULL,
  `status` enum('ENABLED','DISABLED','SLAVESIDE_DISABLED') COLLATE utf8_bin NOT NULL,
  `on_completion` enum('DROP','PRESERVE') COLLATE utf8_bin NOT NULL,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `last_altered` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `last_executed` datetime DEFAULT NULL,
  `comment` varchar(2048) COLLATE utf8_bin NOT NULL,
  `originator` int(10) unsigned NOT NULL,
  `client_collation_id` bigint(20) unsigned NOT NULL,
  `connection_collation_id` bigint(20) unsigned NOT NULL,
  `schema_collation_id` bigint(20) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `schema_id` (`schema_id`,`name`),
  KEY `client_collation_id` (`client_collation_id`),
  KEY `connection_collation_id` (`connection_collation_id`),
  KEY `schema_collation_id` (`schema_collation_id`),
  CONSTRAINT `events_ibfk_1` FOREIGN KEY (`schema_id`) REFERENCES `schemata` (`id`),
  CONSTRAINT `events_ibfk_2` FOREIGN KEY (`client_collation_id`) REFERENCES `collations` (`id`),
  CONSTRAINT `events_ibfk_3` FOREIGN KEY (`connection_collation_id`) REFERENCES `collations` (`id`),
  CONSTRAINT `events_ibfk_4` FOREIGN KEY (`schema_collation_id`) REFERENCES `collations` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC

show create table routines\G
*************************** 1. row ***************************
       Table: ROUTINES
Create Table: CREATE TEMPORARY TABLE `ROUTINES` (
  `SPECIFIC_NAME` varchar(64) NOT NULL DEFAULT '',
  `ROUTINE_CATALOG` varchar(512) NOT NULL DEFAULT '',
  `ROUTINE_SCHEMA` varchar(64) NOT NULL DEFAULT '',
  `ROUTINE_NAME` varchar(64) NOT NULL DEFAULT '',
  `ROUTINE_TYPE` varchar(9) NOT NULL DEFAULT '',
  `DATA_TYPE` varchar(64) NOT NULL DEFAULT '',
  `CHARACTER_MAXIMUM_LENGTH` int(21) DEFAULT NULL,
  `CHARACTER_OCTET_LENGTH` int(21) DEFAULT NULL,
  `NUMERIC_PRECISION` bigint(21) unsigned DEFAULT NULL,
  `NUMERIC_SCALE` int(21) DEFAULT NULL,
  `DATETIME_PRECISION` bigint(21) unsigned DEFAULT NULL,
  `CHARACTER_SET_NAME` varchar(64) DEFAULT NULL,
  `COLLATION_NAME` varchar(64) DEFAULT NULL,
  `DTD_IDENTIFIER` longtext,
  `ROUTINE_BODY` varchar(8) NOT NULL DEFAULT '',
  `ROUTINE_DEFINITION` longtext,
  `EXTERNAL_NAME` varchar(64) DEFAULT NULL,
  `EXTERNAL_LANGUAGE` varchar(64) DEFAULT NULL,
  `PARAMETER_STYLE` varchar(8) NOT NULL DEFAULT '',
  `IS_DETERMINISTIC` varchar(3) NOT NULL DEFAULT '',
  `SQL_DATA_ACCESS` varchar(64) NOT NULL DEFAULT '',
  `SQL_PATH` varchar(64) DEFAULT NULL,
  `SECURITY_TYPE` varchar(7) NOT NULL DEFAULT '',
  `CREATED` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `LAST_ALTERED` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `SQL_MODE` varchar(8192) NOT NULL DEFAULT '',
  `ROUTINE_COMMENT` longtext NOT NULL,
  `DEFINER` varchar(93) NOT NULL DEFAULT '',
  `CHARACTER_SET_CLIENT` varchar(32) NOT NULL DEFAULT '',
  `COLLATION_CONNECTION` varchar(32) NOT NULL DEFAULT '',
  `DATABASE_COLLATION` varchar(32) NOT NULL DEFAULT ''
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Looking at the code again, I see that there are also tables tablespaces and version:

show create table tablespaces\G
*************************** 1. row ***************************
       Table: tablespaces
Create Table: CREATE TABLE `tablespaces` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(255) COLLATE utf8_bin NOT NULL,
  `options` mediumtext COLLATE utf8_bin,
  `se_private_data` mediumtext COLLATE utf8_bin,
  `comment` varchar(2048) COLLATE utf8_bin NOT NULL,
  `engine` varchar(64) COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC

show create table version\G
*************************** 1. row ***************************
       Table: version
Create Table: CREATE TABLE `version` (
  `version` int(10) unsigned NOT NULL,
  PRIMARY KEY (`version`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0 ROW_FORMAT=DYNAMIC

Now we can try the last part of our task, i.e., querying the data directory for some specific info.

mysql [localhost] {msandbox} (mysql) > SET SESSION debug='+d,skip_dd_table_access_check';
mysql [localhost] {msandbox} (mysql) > select * from version;
+---------+
| version |
+---------+
|       1 |
+---------+

mysql [localhost] {msandbox} (mysql) > select id, name from schemata;
+----+--------------------+
| id | name               |
+----+--------------------+
|  2 | information_schema |
|  1 | mysql              |
|  3 | performance_schema |
|  4 | sys                |
|  5 | test               |
+----+--------------------+


mysql [localhost] {msandbox} (mysql) > select id, name, type, engine, mysql_version_id, comment from tables where name = 'user' and schema_id=1;
+----+------+------------+--------+------------------+-----------------------------+
| id | name | type       | engine | mysql_version_id | comment                     |
+----+------+------------+--------+------------------+-----------------------------+
| 84 | user | BASE TABLE | InnoDB |            80000 | Users and global privileges |
+----+------+------------+--------+------------------+-----------------------------+

Now the data dictionary is much more readable!

DISCLAIMER: there may be a simpler or more elegant solution to this problem. The method shown here is what I got by researching. But in fact, if there is a better method, short of recompiling the server, I'd like to know.

WARNING: Don't do what I do in the following paragraphs!

To complete the experiment, I am going to do what the MySQL team does not want me to do at all.

First, I create a directory inside the data directory. As shown in data dictionary limitations, this is not supported. But since we can access the data dictionary ...


$ mkdir ~/sandboxes/msb_full_8_0_0/data/db1

Now for the felony part:

mysql [localhost] {msandbox} (mysql) > SET SESSION debug= '+d,skip_dd_table_access_check';
Query OK, 0 rows affected (0.00 sec)

mysql [localhost] {msandbox} (mysql) > select * from schemata;
+----+------------+--------------------+----------------------+---------------------+---------------------+
| id | catalog_id | name               | default_collation_id | created             | last_altered        |
+----+------------+--------------------+----------------------+---------------------+---------------------+
|  1 |          1 | mysql              |                    8 | 0000-00-00 00:00:00 | 0000-00-00 00:00:00 |
|  2 |          1 | information_schema |                   33 | 2016-09-25 18:06:00 | 2016-09-25 18:06:00 |
|  3 |          1 | performance_schema |                   33 | 0000-00-00 00:00:00 | 0000-00-00 00:00:00 |
|  4 |          1 | sys                |                   33 | 0000-00-00 00:00:00 | 0000-00-00 00:00:00 |
|  5 |          1 | test               |                    8 | 0000-00-00 00:00:00 | 0000-00-00 00:00:00 |
+----+------------+--------------------+----------------------+---------------------+---------------------+
5 rows in set (0.00 sec)

mysql [localhost] {msandbox} (mysql) > insert into schemata values (6, 1, 'db1', 8, now(), now());
Query OK, 1 row affected (0.01 sec)

mysql [localhost] {msandbox} (mysql) > show schemas;
+--------------------+
| Database           |
+--------------------+
| db1                | ## TA-DA!
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| test               |
+--------------------+

Now, pay attention! This why the MySQL team don't want anyone to mess up with the data dictionary tables.

DOUBLE WARNING! Don't do the following!

mysql [localhost] {msandbox} (mysql) > insert into schemata values (7, 1, 'db2', 8, now(), now());
Query OK, 1 row affected (0.01 sec)

mysql [localhost] {msandbox} (mysql) > show schemas;
+--------------------+
| Database           |
+--------------------+
| db1                |
| db2                |
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| test               |
+--------------------+
7 rows in set (0.00 sec)

mysql [localhost] {msandbox} (mysql) > use db2
ERROR 1049 (42000): Unknown database 'db2'

There! I broke the system. Lesson learned: read, don't write data dictionary tables.

Monday, May 02, 2016

Taking the MySQL document store for a spin

This is not a comprehensive review, nor an user guide. It's a step-by-step account of my initial impressions while trying the new MySQL XProtocol and the document store capabilities. In fact, I am barely scratching the surface here: more articles will come as time allows.

MySQL 5.7 has been GA for several months, as it was released in October 2015. Among the many features and improvements, I was surprised to see the MySQL team emphasizing the JSON data type. While it is an interesting feature per se, I failed to see the reason why so many articles and conference talks were focused around this single feature. Everything became clear when, with the release of MySQL 5.7.12, the MySQL team announced a new release model.

Overview

In MySQL 5.7.12, we get the usual MySQL server, which shouldn't have new features. However, in an attempt to combine the stability of the server with a more dynamic release cycle, the server ships with a new plugin, unimaginatively named X-Plugin, which supports an alternative communication protocol, named X-Protocol.

In short, the X-Protocol extends and replaces the traditional client/server protocol, by allowing asynchronous communication to the server, using different API calls, which are available, as of today, in Javascript, Python, C#, and Java, with more languages to come.

The reason for this decision is easy to see. Many developers struggle with relational tables and SQL, while they understand structures made of arrays and associative arrays (maps.) This is also one of the reasons for the recent popularity of NoSQL databases, where schemas and tables are replaced by collections of documents or similar schema-less structures. With this new release, MySQL wants to offer the best of two worlds, by allowing developers to use the database with the tools they feel most comfortable with.

To use the new plugin, you need two components:

The plugin itself, which ships with the server package, but is not enabled by default;
The MySQL shell, a new command line tool that you have to download and install separately and will allow you to use Javascript or Python with the database.

As a QA professional, I am a bit concerned about this mix of GA and alpha features (The MySQL shell is defined as alpha software. and the shell itself says development preview in its help). Theoretically, the two worlds should be separated. If you don't install the plugin, the server should work as usual. But practice and experience tell me that there are dangers waiting for a chance to destroy our data. If you want a single piece of advice to summarize this article, DON'T USE the new MySQL shell with a production server. That said, let's start a quick tour.

Installation

You need to install the shell, which comes in a package that is different from the rest of MySQL products. The manual shows how to install it on OSX or Linux. The only mention that this product could be dangerous to use is a note reminding the user to enable the MySQL Preview Packages when installing from a Linux repository. The procedure, on any operating system, will install library and executables globally. Unlike the server package, it is not possible to install it in a user-defined directory, like you install the server with MySQL Sandbox. In this context, the standard Oracle disclaimer may have a meaning that goes beyond a regular CYA.

Next, you need to enable the plugin. You can do it in three ways:

(1)

$ mysqlsh --classic -u youruser -p --dba enableXProtocol
mysqlx: [Warning] Using a password on the command line interface can be insecure.
Creating a Classic Session to youruser@localhost:3306
Enter password:
No default schema selected.

enableXProtocol: Installing plugin mysqlx...
enableXProtocol: done

(2)

Start the server with --plugin-load=mysqlx=mysqlx.so. This will enable the plugin, although it does not seem to work the way it should.

(3)

Enable the plugin with a SQL command.

mysql> install plugin mysqlx soname 'mysqlx.so';

I prefer method #3 because is the only one that does not have side effects or misunderstanding. The issue that hit me when I tried method #1 for the first time is that calling mysqlsh --classic uses the client/server protocol on port 3306 (or the port that you defined for the database) while subsequent calls will use the X-Protocol on port 33060.

Alternatives. Using Docker

If what I said previously has made you cautious and you have decided not to use the shell in your main computer (as you should), there are alternative ways. If you have a data center at your disposal, just fire a virtual machine and play with it. However, be aware that the MySQL shell does not install in Ubuntu 15.04 and 16.04.

A lightweight method to try on the new shell without endangering your production server is to use a Docker image for MySQL, or a combination of MySQL Sandbox and Docker.

In Docker, the MySQL shell does not ship together with the server. It requires a separate image. A quick guide is available in a recent article. I don't like the current approach: having two images is a waste of space. It would be acceptable if the images were based on a slim Linux distribution, such as Alpine. Since they run on OracleLinux, instead, you need to download two beefy images to start testing. With a fast internet connection this should not be a problem, but if you live in a place where 3 MBPS is the norm or if you are traveling, this could become an annoyance. Once you have pulled the images, you can use them at will, even without internet connection.

The above mentioned quick guide suggests using docker run --link to connect the two containers. I recommend a different approach, as the link option is now considered legacy.

$ docker network create mynet
edcc36be21e54cdb91fdc91f2c320efabf62d36ab9d31b0142e901da7e3c84e9
$ docker network ls
NETWORK ID          NAME                DRIVER
a64b55fb7c92        bridge              bridge
0b8a52002dfd        none                null
cc775ec7edab        host                host
edcc36be21e5        mynet               bridge

$ docker run --name mybox  -e MYSQL_ROOT_PASSWORD=secret -d --net mynet mysql/mysql-server:5.7.12 \
    --plugin-load=mysqlx=mysqlx.so
ecbfc322bb17ec0b1511ea7321c2b10f9c7b5091baee4240ab51b7bf77c1e424

$ docker run -it --net mynet mysql/shell -u root -h mybox -p
Creating an X Session to root@mybox:33060
Enter password:
No default schema selected.

Welcome to MySQL Shell 1.0.3 Development Preview

Copyright (c) 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type '\help', '\h' or '\?' for help.

Currently in JavaScript mode. Use \sql to switch to SQL mode and execute queries.
mysql-js>

The first command creates a network (called mynet).

The second command creates the server container, which is launched using the network mynet and with the plugin-load option (which seems to work well with the docker image). When you use a docker network, the container name is recognized by the network as an host name, and can be called by other members of the network. This is much cleaner than using --link.

The third command runs the MySQL shell, using the same network. This allows us to use the container name (mybox) without any other options.

Running the MySQL Javascript shell

My favorite setup for this test is a mix of MySQL Sandbox for the server and Docker for the shell. This way I can use the alpha shell without polluting my Linux host and use a feature rich MySQL Sandbox to control the server.

Here is what I do:

$ make_sandbox 5.7.12 -- --no_show -c general_log=1 -c general_log_file=general.log

I start a sandbox with MySQL 5.7.12 (tarball expanded and renamed into /opt/mysql/5.7.12), with the general log enabled. We need this to peek under the hood when we use the document store.

Next, we load the sample world_x database from the MySQL documentation page.

$ ~/sandboxes/msb_5_7_12/use  -e 'source world_x.sql'

Finally, we enable the plugin.

$ ~/sandboxes/msb_5_7_12/use  -e "install plugin mysqlx soname 'mysqlx.so'"

Now we can connect the shell:

$ docker run -it --net host mysql/shell -u msandbox -pmsandbox world_x
mysqlx: [Warning] Using a password on the command line interface can be insecure.
Creating an X Session to msandbox@localhost:33060/world_x
Default schema `world_x` accessible through db.

Welcome to MySQL Shell 1.0.3 Development Preview

Copyright (c) 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type '\help', '\h' or '\?' for help.

Currently in JavaScript mode. Use \sql to switch to SQL mode and execute queries.
mysql-js>

What have we done? We use the network named 'host', which is a standard Docker protocol that lets a container use the host environment. We don't need to specify a port, since the shell assumes 33060 (enabled by the X-Plugin). The username and password are the usual ones for a sandbox. We enter inside a Javascript shell, where we can communicate with the database server using an alternative syntax. Let's see what we have:

We have an "X-Session" using port 33060 and working on database world_x;
There is a help, same as in the MySQL client;
The database world_x is accessible through the variable db.
Note: all the commands used below are the same for Python and Javascript. There are differences only when using the language extensively.

With the above elements, we can try getting data from the database.

mysql-js> db.collections
{
    "CountryInfo": <Collection:CountryInfo>
}

mysql-js> db.tables
{
    "City": <Table:City>,
    "Country": <Table:Country>,
    "CountryLanguage": <Table:CountryLanguage>
}

What does it mean? Let's abandon the Javascript shell and look at the traditional client:

mysql [localhost] {msandbox} (world_x) > show tables;
+-------------------+
| Tables_in_world_x |
+-------------------+
| City              |
| Country           |
| CountryInfo       |
| CountryLanguage   |
+-------------------+
4 rows in set (0.00 sec)

Here we see 4 tables, while the Javascript console lists only 3. However, the fourth table has the same name as the "collection." Let's have a look:

mysql [localhost] {msandbox} (world_x) > desc CountryInfo;
+-------+-------------+------+-----+---------+------------------+
| Field | Type        | Null | Key | Default | Extra            |
+-------+-------------+------+-----+---------+------------------+
| doc   | json        | YES  |     | NULL    |                  |
| _id   | varchar(32) | YES  |     | NULL    | STORED GENERATED |
+-------+-------------+------+-----+---------+------------------+
2 rows in set (0.00 sec)

mysql [localhost] {msandbox} (world_x) > show create table CountryInfo\G
*************************** 1. row ***************************
       Table: CountryInfo
Create Table: CREATE TABLE `CountryInfo` (
  `doc` json DEFAULT NULL,
  `_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'))) STORED
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

Look what we got! A JSON column with a dynamic index implemented as a virtual column. Now we can appreciate why the JSON data type was such an important thing.

Back to the Javascript shell, let's get something from the database. (You can get all the commands I am using, and much more, from the manual.)

mysql-js> db.collections.CountryInfo.find("_id='USA'")
[
    {
        "GNP": 8510700,
        "IndepYear": 1776,
        "Name": "United States",
        "_id": "USA",
        "demographics": {
            "LifeExpectancy": 77.0999984741211,
            "Population": 278357000
        },
        "geography": {
            "Continent": "North America",
            "Region": "North America",
            "SurfaceArea": 9363520
        },
        "government": {
            "GovernmentForm": "Federal Republic",
            "HeadOfState": "George W. Bush"
        }
    }
]
1 document in set (0.00 sec)

Apart from the feeling of being back in the good old times when MySQL was still playing with IPO dreams (look at the HeadOfState field in the above data), this record is a straightforward JSON document, where data that should belong to different normalized tables are bundled together in this unified view. So, we are really querying a Table that contains JSON data associated with an _id. We know because the general log lists what happens after our simple query:

SELECT doc FROM `world_x`.`CountryInfo` WHERE (`_id` = 'USA')

Let's try a more complex query. We want all countries in Oceania with a population of more than 150,000 people, and whose Head of State is Elisabeth II. The query is a bit intimidating, albeit eerily familiar:

mysql-js> db.collections.CountryInfo.find("government.HeadOfState='Elisabeth II' AND geography.Continent = 'Oceania' AND demographics.Population > 150000").fields(["Name", "demographics.Population","geography.Continent"])
[
    {
        "Name": "Australia",
        "demographics.Population": 18886000,
        "geography.Continent": "Oceania"
    },
    {
        "Name": "New Zealand",
        "demographics.Population": 3862000,
        "geography.Continent": "Oceania"
    },
    {
        "Name": "Papua New Guinea",
        "demographics.Population": 4807000,
        "geography.Continent": "Oceania"
    },
    {
        "Name": "Solomon Islands",
        "demographics.Population": 444000,
        "geography.Continent": "Oceania"
    }
]
4 documents in set (0.00 sec)

Here is the corresponding SQL query recorder in the general log:

SELECT JSON_OBJECT(
    'Name', JSON_EXTRACT(doc,'$.Name'),'demographics.Population', \
    JSON_EXTRACT(doc,'$.demographics.Population'),'geography.Continent', \
    JSON_EXTRACT(doc,'$.geography.Continent')
) AS doc 
FROM `world_x`.`CountryInfo` \
WHERE (
    ((JSON_EXTRACT(doc,'$.government.HeadOfState') = 'Elisabeth II') \
    AND (JSON_EXTRACT(doc,'$.geography.Continent') = 'Oceania')) \
    AND (JSON_EXTRACT(doc,'$.demographics.Population') > 150000)
    )

I am not sure which one I prefer. The SQL looks strange, with all those JSON functions, while the Javascript command seems more readable (I had never thought I would say what I have just said!)

Enough with reading data. I want to manipulate some. I'll start by creating a new collection.

 mysql-js> db.createCollection('somethingNew')
 <Collection:somethingNew>

And the general log shows what should not be a surprise, as we have seen a similar structure for CountryInfo:

CREATE TABLE `world_x`.`somethingNew` (doc JSON, \
_id VARCHAR(32) \
    GENERATED ALWAYS AS (JSON_UNQUOTE(JSON_EXTRACT(doc, '$._id'))) \
    STORED NOT NULL UNIQUE
) CHARSET utf8mb4 ENGINE=InnoDB

Now, to the data manipulation:

mysql-js> mynew=db.getCollection('somethingNew')
<Collection:somethingNew>

The variable mynew can access the new collection. It's a shortcut to avoid db.collections.somethingNew

mysql-js> db.collections
{
    "CountryInfo": <Collection:CountryInfo>,
    "somethingNew": <Collection:somethingNew>
}
mysql-js> mynew.find()
Empty set (0.00 sec)

As expected, there is nothing inside the new collection. Now we enter a very minimal record.

mysql-js> mynew.add({Name:'Joe'})
Query OK, 1 item affected (0.01 sec)

mysql-js> mynew.find()
[
    {
        "Name": "Joe",
        "_id": "e09ef177c50fe6110100b8aeed734276"
    }
]
1 document in set (0.00 sec)

The collection contains more than what we have inserted. There is an apparently auto-generated _id field. Looking at the general log, we see that the data includes the new field.

INSERT INTO `world_x`.`somethingNew` (doc) VALUES ('{\"Name\":\"Joe\",\"_id\":\"e09ef177c50fe6110100b8aeed734276\"}')

As you can see, an _id field was added automatically. We could override that behavior by providing our own value:

mysql-js> mynew.add({_id: "a dummy string", Name:"Frank", country: "UK"})

The data inserted now includes the _id filed with our manual value. The general log says:

INSERT INTO `world_x`.`somethingNew` (doc) VALUES ('{\"Name\":\"Frank\",\"_id\":\"a dummy string\",\"country\":\"UK\"}')

The value of _id, however, must be unique, or the engine will generate an error:

mysql-js> mynew.add({_id: "a dummy string", Name:"Sam", country: "USA"})
MySQL Error (5116): Document contains a field value that is not unique but required to be

If all this gives you a sense of deja-vu, you're right. This feels and smells a lot like MongoDB, and I am sure it isn't a coincidence.

Synchronizing operations

As our last attempt for the day, we will see what happens when we manipulate data in SQL and then retrieve it in Javascript or Python.

We leave the JS console open, and we do something in SQL

mysql [localhost] {msandbox} (world_x) > drop table somethingNew;
Query OK, 0 rows affected (0.01 sec)

How does it look like on the other side?

mysql-js> db.collections
{
    "CountryInfo": <Collection:CountryInfo>,
    "somethingNew": <Collection:somethingNew>
}
mysql-js> db.getCollections()
{
    "CountryInfo": <Collection:CountryInfo>,
    "somethingNew": <Collection:somethingNew>
}

Oops! mysqlsh didn't get the memo! It still considers somethingNew to be available.

mysql-js> db.collections.somethingNew.find()
MySQL Error (1146): Table 'world_x.somethingNew' doesn't exist

We need to refresh the connection. Unlike the SQL client, you need to specify the connection parameters.

mysql-js> \connect msandbox:msandbox@localhost:33060/world_x
Closing old connection...
Creating an X Session to msandbox@localhost:33060/world_x
Default schema `world_x` accessible through db.

mysql-js> db.collections
{
    "CountryInfo": <Collection:CountryInfo>
}

We can see the same happening when we create a new table in SQL. The session in mysqlsh keeps showing the cached contents, and we need to refresh the session to see the changes. Looking at the general log, there are no changes when we issue commands asking for metadata, such as db.collections or db.tables. Instead, when we refresh the session, we see this:

SELECT table_name, COUNT(table_name) c FROM information_schema.columns \
   WHERE ((column_name = 'doc' and data_type = 'json') 
   OR (column_name = '_id' and generation_expression = 'json_unquote(json_extract(`doc`,''$._id''))')) \
   AND table_schema = 'world_x' GROUP BY table_name HAVING c = 2
SHOW FULL TABLES FROM `world_x`

The first query lists all tables that contain a JSON document and a generated _id (these are the collections). The second one lists all tables. Then the shell removes from the table list all the ones that were in the collections list.

Given the way it is done, we can cheat the system easily by creating something that looks like a collection, but has extra fields:

CREATE TABLE strangedoc (doc JSON, \
_id VARCHAR(32) \
    GENERATED ALWAYS AS (JSON_UNQUOTE(JSON_EXTRACT(doc, '$._id'))) \
    STORED NOT NULL UNIQUE,
    secret_stash varchar(200),
    more_secret_info mediumtext
) CHARSET utf8mb4 ENGINE=InnoDB;

mysql [localhost] {msandbox} (world_x) > insert into strangedoc (doc,secret_stash,more_secret_info) values \
('{"_id": "abc", "name": "Susan"}', \
'and now for something completely different', \
'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.');
Query OK, 1 row affected (0.00 sec)

mysql [localhost] {msandbox} (world_x) > select * from strangedoc\G
*************************** 1. row ***************************
             doc: {"_id": "abc", "name": "Susan"}
             _id: abc
    secret_stash: and now for something completely different
more_secret_info: Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
1 row in set (0.00 sec)

And the Javascript console will be unaware of the extra material:

mysql-js> db.collections
{
    "CountryInfo": <Collection:CountryInfo>,
    "strangedoc": <Collection:strangedoc>
}
mysql-js> db.strangedoc.find()
[
    {
        "_id": "abc",
        "name": "Susan"
    }
]
1 document in set (0.00 sec)

We can add contents to the collection in Javascript, and the database server won't protest (provided that the extra fields are nullable or have a default value). Is it a bug or a feature?

Parting thoughts

As I have said at the beginning, this is a very simple exploration. More work is required to test the full potential of the new model. My impressions are mildly positive. On one hand, it's an exciting environment, which promises to expand to better usefulness with more programming languages and possibly better coordination between shell and server software. On the other hand, there are many bugs, and the software is still very green. It will require more iterations from the community and the development team before it could be trusted with important data.

Saturday, July 25, 2015

MySQL 5.7 : no more password column!

Maintaining a project like MySQL::Sandbox is sometimes tiring, but it has its advantages. One of them is that everything related to the server setup comes to my attention rather earlier than if I were an average DBA or developer.

I try to keep MySQL Sandbox up to date with every release of MySQL and (to a lesser extent) MariaDB [1]. For this reason, I am used to trying a new release with MySQL Sandbox, and … seeing it fail.

Of the latest changes in MySQL, probably the most disruptive was what happened in MySQL 5.7.6, where the mysql.user table lost the password column.

Yep. No ‘password’ column anymore. And just to make the setup procedure harder, the syntax of SET PASSWORD was changed, and deprecated.

Previously, I could run:

mysql [localhost] {msandbox} (mysql) > select version();  
+-----------+  
| version() |  
+-----------+  
| 5.6.25    |  
+-----------+  
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (mysql) > select host,user,password from user;  
+-----------+-------------+-------------------------------------------+  
| host      | user        | password                                  |  
+-----------+-------------+-------------------------------------------+  
| localhost | root        | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| 127.%     | msandbox    | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| localhost | msandbox    | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| localhost | msandbox_rw | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| 127.%     | msandbox_rw | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| 127.%     | msandbox_ro | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| localhost | msandbox_ro | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| 127.%     | rsandbox    | *B07EB15A2E7BD9620DAE47B194D5B9DBA14377AD |  
+-----------+-------------+-------------------------------------------+  
8 rows in set (0.00 sec)

In the latest releases, though, this fails.

mysql [localhost] {msandbox} (mysql) > select version();  
+-----------+  
| version() |  
+-----------+  
| 5.7.8-rc  |  
+-----------+  
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (mysql) > select host,user,password from user;  
ERROR 1054 (42S22): Unknown column 'password' in 'field list'

Instead of a password column (which was CHAR(41)), we have now an authentication_string column of type TEXT.

+-----------+-------------+-------------------------------------------+  
| host      | user        | authentication_string                     |  
+-----------+-------------+-------------------------------------------+  
| localhost | root        | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| 127.%     | msandbox    | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| localhost | msandbox    | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| localhost | msandbox_rw | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| 127.%     | msandbox_rw | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| 127.%     | msandbox_ro | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| localhost | msandbox_ro | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |  
| 127.%     | rsandbox    | *B07EB15A2E7BD9620DAE47B194D5B9DBA14377AD |  
+-----------+-------------+-------------------------------------------+

Fixing MySQL Sandbox to handle this issue and to be at the same time compatible with previous releases was quite challenging, but in the end I did it. Recent versions of the sandbox can handle all the releases from Oracle, Percona, and MariaDB without showing hiccups.

So, for testing, the issue is solved. Now comes the hard part: when thousands of database administration procedures will break down for lack of the password column. To all the DBAs and database developers out there: good luck!

It is my pleasure to disclose that MariaDB 10.1 runs in MySQL Sandbox 3.0.55+, with only minimal changes. ↩

Friday, October 11, 2013

Tungsten Replicator Filters: A trove of golden secrets unveiled

UPDATE: This post is obsolete, for most of the URLs referenced here have been removed by the contents owner.

Since I joined the company in late 2010, I have known that one of the strong points of Tungsten Replicator is its ability of setting filters. The amazing capabilities offered by Tungsten filters cannot be fully grasped unless we explain how stage replication works.

There are several default stages in the replication stream. Every stage has an extraction task and an apply task. The extraction task will get data from the previous step repository and the apply task will save the data to the next repository, which can be either a temporary storage (memory queue, THL file) or the final destination (slave database server). Consider that the architecture allows developers to add stages, and you will appreciate its full power. For every stage, we can insert one or more filter between the two tasks. This architecture gives us ample freedom to design our pipelines.

What’s more interesting, filters in Tungsten can be of two types:

built-in filters, written in Java, which require users to compile and build the software before the filter is available;
Javascript filters, which can do almost everything the built-in scripts can do, but don’t require recompiling. All you need is deploy the filter in the appropriate folder, and run a tpm command to install or update the replicator.

The Tungsten team has developed a large number of filters, either to achieve general purpose pipeline control (such as renaming objects, excluding or including schemas and tables from replication) or filters providing very specific tasks that were needed to meet customer requirements or to help implementing heterogeneous pipelines (such as converting ENUM and SET columns to strings, dropping UPDATE and DELETE events, normalize DDL, etc). There are 23 built-in and 18 Javascript filters available. They were largely undocumented, or sparsely documented in the code. Now, thanks to my colleague MC Brown, Continuent director of documentation, the treasure is there for the taking.

First off, there is a general guide of filter usage, which tells you how to enable or disable filters.
Then, we have a full list of the built-in filters with their purpose, required parameters, and some applicability notes.
For Javascript filters there is a section that explains in detail How to create and using Javascript filters, followed by the Reference list of all available filters.

Tungsten filters are a beautiful feature to explore, but also a powerful tool for users with special needs who want to create their own customized pipeline. A word of warning, though. Filters are powerful and sometimes fun to implement and use, but they don’t come free. There are two main things to keep in mind:

Using more than one filter for the same stage may result in performance loss. The amount of performance loss depends on the stage and on your systems resources. For example if your replication weakest point is data transfer across the network, adding two or three filters to this stage (remote-to-thl) will make things worse. If you can apply the filter to the previous or next stages, you may achieve your purpose without many side effects. As always, the advice is: “test, benchmark, repeat.”
Filters that alter data (such as exclude schemas or table, drop events) will make the slave unsuitable for promotion. You should always ask yourself if the filter may make the slave a lousy master, and if it does, make sure you have at least another slave that is replicating without filters.

As a final parting thought, be aware that yours truly and the above-mentioned MC Brown will be speaking at Percona Live London 2013, with a full, take-no-prisoners, 6 hours long complete Tungsten Replicator tutorial, where we will cover filters and give some explosive and sensational examples.

Sunday, July 07, 2013

On contributing to MySQL

Dave Stokes has just written that MySQL is Looking for External Contributions. The first comments on that were negative, saying that forcing developers to sign a contributor agreement is not friendly, and that MySQL developers don't play well with external contributors.

To be fair, it is not Oracle that has an unfriendly policy about contributions. It was already like this with MySQL AB, and the reason is simply that the company wants to maintain ownership of the code, so that it will be able to sell dual licensing agreements.
I may not like it, but licensing was still a consistent part of the business when I left Oracle, and I assume it still is. Since this “feature” helps paying the developers that create open source software, I believe it is a reasonable trade-off.
Besides, also MontyProgram asks the same thing (https://kb.askmonty.org/en/mca/), and so does Apache (http://www.apache.org/licenses/icla.txt) and Canonical (http://www.canonical.com/contributors).
It is a reasonable request. Not only it allows the main project to retain ownership of the code, which is often a requirement to trigger further revenues, but it also gives the project some legal protection when the contributor submits code that was created by others.

All in all, it’s a good thing that Oracle is encouraging contributions!

About the MySQL developers not playing well with external contributors, this is also nothing new. To my chagrin, when I was working with MySQL I realized early on that not all developers have a mindset that plays well with community participation. That is just to be expected. I know brilliant developers who can deliver exceptional code, but don't understand the concept of coding in an open environment, or don't have the ability of cooperating with other developers that are not colleagues. Is that a bad thing? For someone, yes, but I don't think so. If the final result is that the community gets great code for the effort of non social-oriented developers, I still take it ans day "thank you!"

As a community manager in MySQL, I tried to improve MySQL attitude towards external contributors, but I had to acknowledge that we could not force the developers to work against their inclination. There were (and still are!) some developers who play well with external contributors, but forcing everyone to work with contributors outside the company may have endangered productivity. The developers at MySQL were hired for their technical skills, and with small exception the choice has proven right, as there are developers capable of delivering first class software, which is used in mission critical applications by countless businesses.

So, there are proposals that are rejected by the developers. This is a bad sign for the contributor, but a good sign for the users. A rejected proposal means that someone took the time to examine it, and balanced the options between accepting the contribution and writing the same feature from scratch. Or even rejecting the contribution because it is not worth it.

Let me clarify, because this point is not widely known. This description refers to the process as I knew it 3 years ago. It might be changed now, but I doubt it has changed for the worse.
For a feature to get into MySQL code, by any developer, meaning either an employee of MySQL or an external contributor, it must pass three obstacles:

There must be tests for the code provided. A feature that does not include tests, is rejected.
The code must be reviewed by one or two senior developers. The first things that the developers notice is (besides having tests or not) if the code duplicates something that was already done. If it does, they suggest rewriting the code in such a way that it uses existing features rather than creating new ones.
The code must pass the test suites in all platforms supported by MySQL. This is more easily said than done. External contributors test in their preferred operating system, and then they think they are done. But MySQL has a testing farm that submits every proposed commit to the full range of the operating systems that are actively supported. So chances are that a proposed contribution breaks the tests in one or two operating systems. At that point, the employees who have been reviewing the code can either ask the initial developer to change the code so that it is multi-platform friendly, or they can do it themselves. Having supervised many such exchanges between employees and external contributors, I know for a fact that the contribution is a hard task for both sides. Since MySQL has a policy for high quality (mandatory tests, code review, test passing in all platforms), every piece of code submitted to trunk is the result of a long process, which every would-be contributor must be aware of and willing to endure.

One more thing that comes to mind in this context. An additional difficulty of contributing to MySQL is given by MySQL code being quite intricate. It was initially created when resources were scarce, and thus it includes design decisions that made sense 15 years ago, but could be written quite differently nowadays. The Drizzle project exposed most of these issues, when it stripped down most of the legacy code and implemented the database server in a more agile way, at the same time making it easier for contributors. But it did that at the expense of breaking compatibility with the past. Since MySQL gets most of its popularity by being already widely adopted and available, breaking compatibility is not a path that the project wants to walk now. I know that the developers are refactoring the code to make it more modern and manageable, but they are doing that while trying not to break anything that works well today. It's a slow process, and someone may not like it. But it's a sign of continuous progress, for which the community should be grateful. As I am.

Tuesday, March 13, 2012

MySQL Sandbox at the OTN MySQL Developers day in Paris, March 21st

On March 21st I will be in Paris, to attend the OTN MySQL Developers Day. Oracle is organizing these events all over the world, and although the majority are in the US, some of them are touching the good old European continent. Previous events were an all-Oracle show. Recently, the MySQL Community team has been asking for cooperation from the community, and in such capacity I am also presenting at the event, on the topic of testing early releases of MySQL in a sandbox. Of course, this is one of my favorite topics, but it is quite appropriate in this period, when Oracle has released a whole lot of preview features in its MySQL Labs. Which is another favorite topic of mine, since I was the one who insisted for having the Labs when I was working in the community team. It's nice to see that the labs are still in place, and being put to good use.

MySQL Sandbox

Speaking of sandboxes, I was making some quick tests yesterday, and I installed 15 sandboxes at once (all different versions, from 5.0.91 to 5.6.5). Installing a single sandbox, depending on the version, takes from 5 to 19 seconds. Do you know how long it takes to install 15 sandboxes, completely, from tarball to working conditions? It takes 19 seconds. How's so? It's because I have been working at a large project where we are dealing with many replicated clusters spread across three continents. Administering these clusters is a problem in itself, and so we are using tools to do our work in parallel. At the same time, using a host with a fast 16 core CPU I can install many sandboxes at once. It's a real joy to see software behaving efficiently the way it should! It works so fast, in fact, that I found a race condition bug. If you install more than one sandbox at once, the MySQL bootstrap process may try to open the same temporary file from two different servers. That's because I did not indicate a dedicated temporary directory for the bootstrap (I was using one only for the installed sandbox). When this happens, you may find that instead of 15 sandboxes you have installed only 9 or 11. So I fixed the bug, by adding --tmpdir to mysql_install_db, and now you can install more than one sandbox in parallel.

Friday, September 16, 2011

Welcome, MySQL commercial extensions

I saw yesterday that MySQL has finally done the right thing, and announced new commercial extensions.
What this means is that paying customers receive something more than users who get the community edition for free.
Believe it or not, when I was working in the community team at MySQL, I was already an advocate of this solution. You may see a contradiction, but there isn't. I would like to explain how this works.

An open source product needs to be developed. And the developers need to get paid. Ergo, the company needs to make money from that product if it wants to continue developing it. Either that, or the company needs to sell something else to pay the bills. (Let's not get into the argument that a pure open source project with universal participation is better, faster, or more marvelous: MySQL was never that, not with Oracle, not with Sun, and not when it was an independent company. If you want a extra-super-ultra open project, go with Drizzle. With MySQL, we need to continue reasoning with the raw material at hand.)
When MySQL was scaling its business, it realized that many customers were not willing to pay for support and consulting alone. To expand the business beyond the simple offer of services, the company created MySQL Network, which soon evolved into MySQL Enterprise, with the addition of the MySQL Monitoring tools and a fast upgrade plan. This was a good proposal for small to medium customers, but not as good for customers with large installations. When you deploy thousands of MySQL servers, you really don't want to upgrade every month. Anyway, for some time, the value proposition from MySQL was that the community users would get one release twice a year, and the paying customers would get one every month.
As a community manager, I strongly objected to that formula, not only because it hurts the community, but also because it hurts customers. When the release is exposed to millions of free users before it goes to the paying customers, chances are that serious bugs are discovered by the community and fixed in due time, before it hurts a high profile customer and needs to be fixed in a hurry at higher cost. One of the main values of MySQL is that it's that its large community adoption and feedback increases stability. Fortunately, I was not the only one who believed that larger distribution is valuable for customers, and the decision was reversed at the end of 2008.
In that period, I and other employees recommended a different value proposition for our customers. Instead of selling fast upgrade plans (which become a liability), MySQL could develop some reserved features that would be given only to paying customers.
There are two problems with reserved features, though: you need to develop them internally. You can't start them in the open, asking the community to test them for bugs, and then give them only to customers when they are ready (There was a faux pas in that direction in early 2008, but it was promptly retracted). These features must be developed as closed source, and tested only internally. The second problem is that MySQL had little internal QA manpower when this discussion arose.
There was another issue, namely that the code base for the next intended version (the fabled 6.0) was brittle. After 2 years in alpha stage, there was little to show for the effort. In parallel to the Oracle acquisition, two important things happened: version 6 was abandoned, and a new effort was started, using the more stable version 5.x as a code base, and a new developing model was launched, based on milestones and robustness.
This new foundation, combined with the injection of experienced QA personnel from the ranks of Sun and Oracle, made the project ready to offer reserved features to customers, while continuing the development of a lot more features for the community edition.
From a community standpoint, I welcome the commercial extensions. It means that the MySQL project will get more revenues, and be able to sustain the public release more easily. In my opinion it's the logical evolution of the project, and it's what MySQL should have done already years ago if it had had the manpower.
There are already detractors who see this release as a sign of the apocalypse. They probably want to see only this one feature in the commercial arena, dismissing the dozens of new features released to the general public under Oracle stewardship. I refuse to enroll in the chorus of critics who judge Oracle on prejudice. I am so far satisfied with what Oracle has done with MySQL. My opinion is based on facts, and since the facts are in favor of the community, I am pleased to say so. If future facts will make me change my opinion, I will say it. But now, I want to say thank you to the MySQL team at Oracle!

Monday, August 15, 2011

CodeBits - An event of competitive innovation

It was my pleasure and privilege to attend Codebits in 2009. As Roland Bouman says, its talk choice method is based on public voting, and therefore everyone cha have contribute to the schedule. But that is not the main reason for attending this extraordinary event. It is not just a conference. It's an innovation fest. For 1 and 1/2 days, it's a conference, where the speakers are encouraged to bring to their audience the most innovative and inspiring talks. In the afternoon of the second day, the event becomes a competition, where the teams that have registered will have 24 hours to bring a project to completion, and they have to start and finish within the allotted time. The project can be anything, and I have seen quite a lot of exciting stuff rolling live in the huge pavilion: I could hardly ignore robotics, as these little mechanical smurfs were running all over the place and you would have to be careful not to squash them when you walked. There was plenty of occasions for planning of great projects, together with attempts at improving social relations, and mixing up with big brother. There were projects based on 3D printing, and less broad projects like all-seasons keyboards. A very popular session, followed by practical workshops was lock picking. I attended one of them, learned how to pick simple and less simple locks, and I brought home some lockpicking tools. On a more technical level, I was there with Lenz Grimmer and Kai Seidler, we spoke about MySQL and other cool things, and we had lots of fun for three days. Besides the teams hacking away at their projects, there were several teams showcasing technology that had been developed by winners of the previous years, such as 3D television and intelligent phone networks. In short, This was an inspiring event, which I can warmly recommend.

Saturday, March 05, 2011

A hidden options file trick

I was listening today to the OurSQL Episode 36: It's Not Our (De)fault! Part 1. As usual, Sheeri and Sarah are very informational and entertaining while explaining the innards of MySQL and their best practices.
Being a DBA oriented show, there was an omission in this podcast. There was no mention of custom groups that you can have for your my.cnf. This is mostly useful for developers. If your application requires some specific settings, instead of using a separated configuration file, you can use a different group, and then instruct your client applications to use that group.
By default, all client applications read the "[client]" group.
But you can tell your client to read a group that you can call whatever you like.
For example, with this configuration file,

[client]
user=common_user
password=common_password

[logrotation]
user=log_rotation_daemon
password=specific_password

You can have a Perl script that takes care of your particular log rotation needs. Instead of the normal credentials, it will use the ones listed in the [logrotation] group.

use strict;
use warnings;
use DBI;

my $dsn =   "DBI:mysql:test;"
            . "mysql_read_default_group=logrotation;"
            . "mysql_read_default_file=$ENV{HOME}/./my.cnf";
my $dbh = DBI->connect($dsn);

Notice that, for this option to work, the [logrotation] group must come after the [client] group, or the directives in the [client] group will override the ones in [logrotation]. That's why, in the options file, you find the directives for [mysqldump] at the bottom of the file.

So far, so good. This was a trick for developers, and probably many developers know it already. But there is another, related trick, that can be used by non-developers as well.
If you knew about these customized groups, you may have realized that you can't use them with the mysql standard command line client. Or, to say it better, there is no clearly documented way of doing so. There is, in fact, a cute trick that you can use.
Let's say that, from time to time, you want to use a different prompt, but you don't want to edit your $HOME/.my.cnf to change it. You just want your prompt to be there in the option file, and be able to recall it when the situation calls for it.
The mysql internal help does not tell anything about groups. However, a careful search of the manual gives this cryptic entry:

--defaults-group-suffix=suffix, -g suffix

In addition to the groups named on the command line, read groups that have the given suffix.

When I found it, I stared at this puzzling statement for a while. I could not understand which are the groups that are named in the command line.
Eventually, I figured out why there is a group-suffix and not simply a group. It means that if you add a suffix to a default group name, and you tell mysql to look for this suffix, then you will be able to use the appropriate group on demand.
For example, this options file will not work.

# wrong
[pinocchio]
prompt='I have a long nose  =======> '

[master]
prompt='master [\h] {\u} (\d) > '

[slave]
prompt='slave [\h] {\u} (\d) > '

But this one will work:

[mysqlpinocchio]
prompt='I have a long nose  =======> '

[mysqlmaster]
prompt='master [\h] {\u} (\d) > '

[mysqlslave]
prompt='slave [\h] {\u} (\d) > '

Here is a test run:

$ mysql --defaults-group-suffix=pinocchio
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 22
Server version: 5.1.54 MySQL Community Server (GPL)

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

I have a long nose  =======>

The meaning of the suffix part is that mysql will read the default groups (which are [client], and [mysql]), and it will also read any groups that are named "mysqlSUFFIX" or "clientSUFFIX". I have named the group "mysqlpinocchio" and therefore it has been used. It would have worked the same if I had called it "clientpinocchio".

Monday, November 01, 2010

Book review: MySQL 5.1 plugin development

MySQL 5.1 Plugin Development,
by Sergei Golubchik and Andrew Hutchings.
Packt Publishing, 2010.
Executive summary: Highly recommended. If you want to develop MySQL extensions, buy this book. It's a must, written by two expert professionals who probably know more than anyone else on this matter. The book is full of practical examples explained with the theoretical information necessary to make it stick.

This book fills a gap in the world of MySQL documentation. Although the MySQL docs are extensive and thorough, to the point that sometimes you wished that the writers were less verbose and stick more to the topic, when it comes to the plugin architecture, there are still more some unanswered questions. I guess that the ones who have implemented MySQL extensions so far have read the code, more than the documentation.

But, back to the point. The void is filled now, and in such a wonderful way! I have to disclose that I was the reviewer of this book, and while this fact puts me in a conflicting position when it comes to a review, it has also given me inner reason for praise, beyond the experience of a regular reader. The people who have worked with me will know that I am hard to please, and so my review was peppered with suggestions for improvements that, I admit it, made the authors' life quite difficult. I mostly complained that some of the examples proposed for each chapter were not enough useful and interesting. In short, there was not enough incentive for the reader to start coding immediately. I am glad to report that the authors were very responsive, and, rather than being annoyed by my demands, did work enthusiastically at the examples in the book until they became a true gem in this category. This book will surprise you for its practical approach, when it guides you through the examples, until you end up with a working extension.
The first chapter is probably the part that will save many hours of attempts to everyone who wants to build a plugin. In that chapter the authors explain how to build the different types of plugins in several operating systems. If you have ever tried to build a plugin, you will appreciate this chapter.
The book's body covers every type of plugin that you can create in MySQL, from the classic UDF to the more recent storage engine. For each chapter, you will have code ready to compile and try on your own, and the line-by-line explanation of what that code does, and why it was written the way it is.
Whenever the code is less than intuitive, the authors take care of it with additional theoretical and practical explanation, making sure that the reader is fully informed on the topic, and can complete the task effectively.
Summing up, this is not a book for the idle reader. It's a a practical help for the doers, the ones who want to get the job done, and need guidance through the maze of the MySQL plugin protocol.

Update. There is also a review at Slashdot.

The Data Charmer