ToroDB a new NoSQL database that replaces mongoDB
Document databases store documents, which are basically hierarchical, nested data structures of sets of key-value pairs. Current state-of-the-art approaches to store them in relational databases is limited to storing documents in some form of binary serialization of the document (like a blob or PostgreSQL's hstore or jsonb). What our research found is a set of algorithms to transform a document into a set of document-parts that can individually be stored in relational tables, leveraging the power of relational databases. This includes dynamic creation of tables, when needed, to match a table's structure to that of the information to be stored.
The advantages of this approach are profound. There is no engineering effort required in building the storage subsystem, which should handle durability, isolation and concurrency –all of which are tough properties to implement. But even more importantly, there are very significant performance advantages, both in query time and storage savings.
Query time improves as queries targeting subsets of the documents (which are most of the queries) need only to address a subset of the data -as it is partitioned into tables- rather than reading the whole database. Storage savings are achieved by avoiding repetition of the schema of every document –many documents share the same schema (“structure”) but all them need to repeat that. Our benchmarks shows that JSON documents require in ToroDB 29% to 68% of the storage required for the same data on a MongoDB database. These means significant less I/O, significant less cost, and greater (vertical) scalability.
This presentation aims to show how the internal algorithms of this open source software, ToroDB, work. How the JSON documents are split into tables, how is this more efficient -both in terms of query time and storage savings-. Why current document-oriented databases fail to maximize the performance of BigData requirements –ToroDB also includes a mechanism for storing in columnar format parts of the documents to improve aggregate-type queries, obtaining impressive performance benefits. And, finally, how this all can be done in a compatible way with existing systems: ToroDB includes a layer that natively speaks the MongoDB protocol, hence becoming a drop-in replacement for MongoDB installations, but running on top of existing relational databases.