It's been just over a year since Sun announced it had agreed to purchase MySQL, the ever popular open source database technology. At the time most people saw the move as a way for Sun to make it's way into the internet space, where MySQL has made a lot of in-roads. There was also a thought that Sun might be able to help MySQL overcome some of the technical hurdles it faced when moving into enterprise level usage.
So after one year, what has this marriage produced? Well, they finally pushed out the 5.1 release which had been stuck in development for a couple years, although it didn't include any major changes from what was on the road-map before the Sun purchase. They also pushed out a fork of the code base, which took the interesting step of removing features, rather than adding the enterprise technology many people were looking for, leaving Sun a bill for $1 Billion dollars but without an "Enterprise" database to call it's own. The irony of all of this is that, even before the MySQL purchase, Sun already had a product containing technologies similar to today's leading commercial database, it's just that the technology lives in a file system, specifically ZFS (Zettabyte Filesystem).
One of the basic tenets of a database system is that you can guarantee that data is safe on disk, and generally that any database will give you a chance to throw away changes if you need to. In the database world, you know this as COMMIT
and ROLLBACK
, common operations to most people, although missing from the MYISAM technology that Sun purchased from MySQL. In the ZFS world, while not implemented the same way as in a database, these ideas are embodied in the commands zfs snapshot
and zfs rollback
. Both of the commands work with active data partitions, and work so well that you can use them as protection in large batch command style operations against MYISAM; simply zfs snapshot
your system before hand, run your large MYISAM command, and then zfs rollback
afterwards if you find you need to go back.
Of course what good is a system, database or any other, if you cannot back it up? The back-up process for MySQL is straightforward, although it's use of LOCK TABLES
makes it a second-rate solution at best. Consider with any sufficiently large system, LOCK TABLES
will keep you from providing five nines uptime almost by definition. ZFS on the other hand gives you the ability to make backups with ease. Once you have a snapshot of your system, the ability to clone, promote, or send a snapshot gives you quite a bit of flexibility for backing up your system, and it can all be done on-line.
But it gets even better really. One of the things that many databases deal with is caching data files from the file system, using some algorithm to determine what should be kept in memory. In MySQL, the database only caches index files (data files themselves are left to the OS to handle), and it does so using a simple LRU (least recently used) cache; a caching mechanism where the least recently used data is purged whenever new data requests are made. Again, ZFS contains something more sophisticated. ZFS uses something known as an ARC (adaptive replacement cache), which improves upon the LRU idea by keeping track of not just how recently something is used, but also how frequently it is used. Again the nature of work being done makes for different specifics in implementation, but other database have looked at and implemented ARC systems and seen significant improvements over the LRU method.
And still there are other examples, take the ZFS intent log. The ZFS intent log is used by ZFS to gather systems calls in memory and log them, both for purposes of performance; system calls can be aggregated together before execution; and crash recovery; in the event of a crash, ZFS can examine the log and replay any system calls that did not finish execution. Of course those familiar with databases will recognize this approach, as it is commonly implemented as a transaction log within database systems, for much the same reasons; commits to the database can be aggregated for performance, and in the event of a system crash, the commit log can be replayed to ensure all committed transaction made it to disk. Unfortunately, MYISAM, the storage engine owned by Sun, does not get these benefits.
Now, we must say that MySQL has been around for some time, so it's users have gone through the trouble of finding workarounds to the lack of functionality we've been seeing in ZFS. Luckily Oracle provides a storage engine for MySQL, known as InnoDB, which implements much of the features discussed above. Also MySQL has simplified replication support built in, which allows for users to set up multiple copies of the database without significant effort. In fact, these techniques are encouraged, as you can use the slave database system for taking backups or for crash recovery in case of loss of the primary node. What we think is often overlooked is that here, the database, which should be a model of data integrity and robustness, gives you workarounds and tools like CHECK
and REPAIR
, while the filesystem, what you typically expect your database to protect you from, is so carefully designed in ZFS to ensure data integrity, that CHECK
/REPAIR
are unnecessary.
Unfortunately the cynic in us has to wonder if we will ever see some of the more sophisticated ideas from ZFS make their way into MySQL. After all, since the current workarounds tend to require running multiple instances, and Sun is in the business of selling hardware (either multiple servers, or servers large enough to house multiple virtual servers, take your pick), keeping things status quo creates a nice relationship between these two divisions of the company. Given that, maybe there is no irony at after all.