Twitter Open Sources its MySQL Fork
In sticking with the ideals of Open Source and sharing of knowledge and innovation, Twitter today has open sourced its forked MySQL database implementation which it uses in the Twitter interest graph, timelines, user data and of course the Tweets themselves (ie. pretty much all of Twitter) and released it under the new BSD license via GitHub. MySQL, released under the GNU General Public License and principally owned by Oracle Corp is of course currently probably the most popular RDBMS system in the world and is literally used by millions upon millions of web developers for both large and small projects alike.
However MySQL by default is known to have scalability and performance issues when under the pressure of continuous high transaction rates as experienced and documented by large high traffic web projects like those hosted by Google, Facebook and of course Twitter. This is why it’s exciting news for fellow developers who are potentially working on similar scaling problems with MySQL that a private for profit company like Twitter would release their full internal customized MySQL fork with no vendor licenses or support contracts or lock-in counterweights attached. Obviously as with hugely successful web applications like those run by Google and Facebook, Twitter is definitely up there with a very high traffic and continuously growing extremely heavy DB centric web apps with transaction rates continuously pushing MySQL databases to higher and higher levels and performance requirements.
Some of the work done includes:
-Add additional status variables, particularly from the internals of InnoDB. This allows us to monitor our systems more effectively and understand their behavior better when handling production workloads.
-Optimize memory allocation on large NUMA systems: Allocate InnoDB’s buffer pool fully on startup, fail fast if memory is not available, ensure performance over time even when server is under memory pressure.
-Reduce unnecessary work through improved server-side statement timeout support. This allows the server to proactively cancel queries that run longer than a millisecond-granularity timeout.
-Export and restore InnoDB buffer pool in using a safe and lightweight method. This enables us to build tools to support rolling restarts of our services with minimal pain.
-Optimize MySQL for SSD-based machines, including page-flushing behavior and reduction in writes to disk to improve lifespan.
As it so happens, exactly a few days to this same date last year, Twitter had also open sourced some of its MySQL middleware components such as Gizzard which they use for creating distributed databases to serve tens of thousands of queries per second across distributed data and of course FlockDB which they use to build their primary databases of Twitter users and managing relationships one to another.
So it would seem Twitter’s underlying MySQL implementation and key internal development tools are slowly coming together and being shared with the open source community as a whole which is always a great think to see, one only hopes more companies would take such initiatives. Some wonder though if they are still holding anything crucial back, but so far it would appear they’re sharing with the altruistic goal of being able to collaborate with other MySQL high transaction and scaling problem solvers coming from the likes of experienced companies such as Google, Facebook, Percona, MariaDB, Tokutek, ScaleBase etc.
All in all for high performance MySQL developers this is surely great news and hopefully helping the scene head towards a little more transparency and democracy in terms of opening up the enterprise application level database options and hopefully driving down the price on enterprise lock ins like Oracle DB and others while in the process of future proofing and continuing to evolve MySQL to new levels hopefully eventually providing vital upstream commits to improve MySQL for everyone.