TokuMX
TokuMX is an open-source distribution of MongoDB[2] which, among other things, replaces the default B-tree data structure found in the basic MongoDB distribution with a Fractal Tree index. It is a drop-in replacement for MongoDB (applications will run "as is") that offers the scalability and performance improvements associated with Fractal Tree indexing. It also adds support for document-level locking, transaction support with ACID and MVCC, and replication optimization; it does not support full-text search.
Developer(s) | Tokutek |
---|---|
Stable release | 2.0.0
/ 30 September 2014 |
Repository | |
Type | Database |
License | GNU Affero General Public License (version 3)[1] |
Website | www |
TokuMX is specifically designed for high performance on write-intensive workloads. It achieves this using a Fractal Tree index,[3] which replaces 40-year-old B-tree indexing and is based on cache-oblivious algorithms. This approach to building memory-efficient systems was originally jointly developed by researchers at the Massachusetts Institute of Technology,[4] Rutgers University,[5] and the State University of New York at Stony Brook (SUNY).[6] TokuMX is a scalable, ACID, and MVCC compliant distribution of MongoDB that provides indexing-based query improvements, offers online schema modifications, and reduces slave lag for both hard disk drives and flash memory. It also adds transactions with MVCC and ACID reliability to any MongoDB application, making MongoDB suitable for a much wider range of solutions.[7]
Most TokuMX source files are made available under the terms of the GNU Affero General Public License (AGPL). The TokuKV Fractal Tree Indexing library is made available under the terms of the GNU General Public License (GPL) version 2, with an additional grant of a patent license.
B-trees
Most relational databases use indexes to increase query performance. Databases can leverage indexes to significantly reduce the amount of data they examine while responding to queries. Indexes are commonly implemented with B-trees, a data structure first described in 1970. The B-tree data structure allows for operations like inserting data and sorted order iteration, the primary operation used by an index. Depending on the workload and implementation, B-tree performance can be limited by the random I/O characteristics of disks. In addition, while freshly loaded databases tend to have good sequential behavior, this behavior becomes increasingly difficult to maintain as a database grows, resulting in more random I/O and performance challenges.
With the advent of Big Data and the ever-increasing database needs of the 21st century, many niche databases have been created to get around the limitations of 50-year-old B-tree indexing. These include some optimized for reads, some optimized for writes, and a series of other special-purpose databases designed for a narrow problem set.[8]
Fractal tree indexes
Overview
Fractal tree indexing technology is a new approach to indexing that replaces B-trees.
Fractal tree indexes implement the same operations as a B-tree, and thus are a drop-in replacement for B-trees. Fractal tree indexes effectively replaces small, frequent writes with larger, less frequent ones, which enables better compression and insertion performance.[9][10] Fractal Trees also allow for messages to be injected into the tree in such a fashion that schema changes such as adding or dropping a column, or adding an index, can be done online, and in the background.[11] As a result, more indexes can be maintained without a drop in performance. This is because adding data to indexes tends to stress the performance of B-trees, but performs well in Fractal Tree indexes.[12] Fractal tree index modifications do not cause the database files to fragment, so periodic maintenance to compactify files is not needed.[13]
Uses
Fractal tree indexes can be applied to a number of applications characterized by near-real time analysis of streaming data. They can be used as the storage layer of a database or as the storage layer of a file system. When used in a database, they can be used in any setting where a B-tree is used, with improved performance. Examples include: network event management, online advertising networks, web 2.0 and clickstream analytics, and air traffic control management.[14] Other uses include accelerated crawler performance for search engines for social media sites. It can also be used to create indexes and columns online, enabling query flexibility for ecommerce personalization. It is also suited to improving performance and reducing existing loads on transactional websites. In general it performs well in applications that must simultaneously store log file data and execute ad hoc queries.
See also
References
- "TokuMX README". Retrieved 2014-03-19.
- "TokuMX – High-Performance MongoDB Distribution". Tokutek. Retrieved 2014-03-10.
- "How TokuDB Fractal Tree Databases Work". O'Reilly. Retrieved 2011-01-17.
- "Cache-Oblivious Search Trees Project". Massachusetts Institute of Technology. Retrieved 2011-01-17.
- "Cache-Oblivious B-trees" (PDF). Rutgers University. Retrieved 2011-01-17.
- "Cache Oblivious B-trees". State University of New York (SUNY) at Stony Brook. Retrieved 2011-01-17.
- "TokuMX is MongoDB on steroids". Percona. Retrieved 2014-04-30.
- "Cache Oblivious B-trees". State University of New York (SUNY) at Stony Brook. Retrieved 2011-01-17.
- "TokuMX VS MongoDB Bake Off Based on a Primary AOL Use case" (PDF). Meetup / AOL. Retrieved 2014-04-30.
- "Insert benchmark for InnoDB, MongoDB and TokuMX and flash storage". Retrieved 2014-04-30.
- "Covering Indexes: Orders-of-Magnitude Improvements" (PDF). Percona. Retrieved 2011-01-17.
- "Detailed review of Tokutek storage engine". Percona. Retrieved 2012-02-22.
- "NoSQL Battle of the East Coast - Benchmarking MongoDB vs TokuMX Cluster". SeveralNines. Retrieved 2014-04-30.
- "Air traffic queries in MyISAM and Tokutek (TokuDB)". MySQL Performance Blog. Retrieved 2011-01-17.