High Performant JSON Parsing

written by Iftah BenZaken & Karmel Indych

RavenDB v4.0 is coming soon and we at Hibernating Rhinos are very excited about it. Over the past couple of years we have been analyzing the pain points of our software. Let’s examine an optimization we made which boosted our performance significantly.

RavenDB is a NoSql Document Database and as such we keep our entities in JSON format, a ‚human-readable‘ key-value representation.

As you can imagine we worked a lot with building and parsing JSON. In the past, we used Newtonsoft JSON all over the place, mainly because it did the job and it was easy and simple to use.

But at what cost? well, think of the JSON document above. There are two major issues.

The first issue is that when accessing the state property of the address object, we must parse the entire document. Here is a simplified illustration of the object in memory:


The second issue is that for such a small and simple document, we allocate three costly dictionaries. It becomes even worse when we deal with nested objects. We had a case where a document with some nested properties took 10KB on disk (as JSON). Once parsed it allocated over 50 KB of memory.

So we have come up with a new data format to solve these issues – the Blittable format. It is designed for fast access, even to nested properties. It is also much more efficient in terms of memory usage (unmanaged memory).

This design is based on an offset-based header-body structure, where for each block we store offsets to its members, allowing to iterate the block using pointer arithmetics, rather than loading the data into hierarchy of dictionaries.

The most important performance impact on RavenDB is related to indexing. In order to find a document, first we need to index it. So the speed of accessing the properties of documents, impacts directly on the speed of indexing.

As you can see, we achieved a x100 speed up when loading and indexing 18,000 documents.

You can read about the full impact on the system, and more implementation details in this blog series.