JSON and BSON are close cousins, as their nearly identical names imply, but you wouldn’t know it by looking at them side-by-side. JSON, or JavaScript Object Notation, is the wildly popular standard for data interchange on the web, on which BSON (Binary JSON) is based. We’ll take a look at each, and hopefully throw some light on the JSON vs BSON mystery: what’s the difference, and why does it matter?

  1. What is JSON?

  2. The MongoDB JSON Connection

  3. MongoDB: JSON vs BSON

What is JSON?

JavaScript Object Notation, more commonly known as JSON, was defined as part of the JavaScript language in the early 2000s by JavaScript creator Douglas Crockford, though it wasn’t until 2013 that the format was officially specified.

JavaScript objects are simple associative containers, wherein a string key is mapped to a value (which can be a number, string, function, or even another object). This simple language trait allowed JavaScript objects to be represented remarkably simply in text:

{
  "_id": 1,
  "name" : { "first" : "John", "last" : "Backus" },
  "contribs" : [ "Fortran", "ALGOL", "Backus-Naur Form", "FP" ],
  "awards" : [
    {
      "award" : "W.W. McDowell Award",
      "year" : 1967,
      "by" : "IEEE Computer Society"
    }, {
      "award" : "Draper Prize",
      "year" : 1993,
      "by" : "National Academy of Engineering"
    }
  ]
}

As JavaScript became the default language of client-side web development, JSON began to take on a life of its own. By virtue of being both human- and machine-readable, and comparatively simple to implement support for in other languages, JSON quickly moved beyond the web page, and into software everywhere.

JSON shows up in many different cases:

  • APIs
  • Configuration files
  • Log messages
  • Database storage

JSON quickly overtook XML, is more difficult for a human to read, significantly more verbose, and less ideally suited to representing object structures used in modern programming languages.

The MongoDB JSON Connection

MongoDB was designed from its inception to be the ultimate data platform for modern application development. JSON’s ubiquity made it the obvious choice for representing data structures in MongoDB’s innovative document data model.

However, there are several issues that make JSON less than ideal for usage inside of a database.

  1. JSON is a text-based format, and text parsing is very slow

  2. JSON’s readable format is far from space-efficient, another database concern

  3. JSON only supports a limited number of basic data types

In order to make MongoDB JSON-first, but still high-performance and general-purpose, BSON was invented to bridge the gap: a binary representation to store data in JSON format, optimized for speed, space, and flexibility. It’s not dissimilar from other interchange formats like protocol buffers, or thrift, in terms of approach.

What is BSON?

BSON simply stands for “Binary JSON,” and that’s exactly what it was invented to be. BSON’s binary structure encodes type and length information, which allows it to be parsed much more quickly.

Since its initial formulation, BSON has been extended to add some optional non-JSON-native data types, like dates and binary data, without which MongoDB would have been missing some valuable support.

Languages that support any kind of complex mathematics typically have different sized integers (ints vs longs) or various levels of decimal precision (float, double, decimal128, etc.).

Not only is it helpful to be able to represent those distinctions in data stored in MongoDB, it also allows for comparisons and calculations to happen directly on data in ways that simplify consuming application code.

Does MongoDB use BSON, or JSON?

MongoDB stores data in BSON format both internally, and over the network, but that doesn’t mean you can’t think of MongoDB as a JSON database. Anything you can represent in JSON can be natively stored in MongoDB, and retrieved just as easily in JSON.

The following are some example documents (in JavaScript / Python style syntax) and their corresponding BSON representations.

{"hello": "world"} →
\x16\x00\x00\x00           // total document size
\x02                       // 0x02 = type String
hello\x00                  // field name
\x06\x00\x00\x00world\x00  // field value
\x00                       // 0x00 = type EOO ('end of object')

{"BSON": ["awesome", 5.05, 1986]} →
 \x31\x00\x00\x00
 \x04BSON\x00
 \x26\x00\x00\x00
 \x02\x30\x00\x08\x00\x00\x00awesome\x00
 \x01\x31\x00\x33\x33\x33\x33\x33\x33\x14\x40
 \x10\x32\x00\xc2\x07\x00\x00
 \x00
 \x00

Unlike systems that simply store JSON as string-encoded values, or binary-encoded blobs, MongoDB uses BSON to offer the industry’s most powerful indexing and querying features on top of the web’s most usable data format.

For example, MongoDB allows developers to query and manipulate objects by specific keys inside the JSON/BSON document, even in nested documents many layers deep into a record, and create high performance indexes on those same keys and values.

When using a MongoDB driver in your language of choice, it’s still important to know that you’re accessing BSON data through the abstractions available in that language.

Firstly, BSON objects may contain Date or Binary objects that are not natively representable in pure JSON. Second, each programming language has its own object semantics. JSON objects have ordered keys, for instance, while Python dictionaries (the closest native data structure that’s analogous to JavaScript Objects) are unordered, while differences in numeric and string data types can also come into play. Third, BSON supports a variety of numeric types that are not native to JSON, and each language will represent these differently.

Check your driver documentation to make sure you understand how to best access MongoDB BSON-backed data in your language to avoid confusion, and get the most out of your MongoDB experience.

JSON vs BSON

JSONBSON
EncodingUTF-8 StringBinary
Data SupportString, Boolean, Number, ArrayString, Boolean, Number (Integer, Float, Long, Decimal128...), Array, Date, Raw Binary
ReadabilityHuman and MachineMachine Only

JSON and BSON are indeed close cousins by design. BSON is designed as a binary representation of JSON data, with specific extensions for broader applications, and optimized for data storage and retrieval.

One particular way in which BSON differs from JSON is in its support for some more advanced types of data. JavaScript does not, for instance, differentiate between integers (which are round numbers), and floating-point numbers (which have decimal precision to various degrees).

Most server-side programming languages have more sophisticated numeric types (standards include integer, regular precision floating point number aka “float”, double-precision floating point aka “double”, and boolean values), each with its own optimal usage for efficient mathematical operations.

Schema Flexibility and Data Governance

One of the big attractions for developers using databases with JSON and BSON data models is the dynamic and flexible schema they provide when compared to the rigid, tabular data models used by relational databases.

Firstly, JSON documents are polymorphic – fields can vary from document to document within a single collection (analogous to table in a relational database). Documents make modeling diverse record attributes easy for developers, elegantly handling data of any structure.

Secondly, there is no need to declare the structure of documents to the database – documents are self-describing. Developers can start writing code and persist objects as they are created.

Thirdly, if a new field needs to be added to a document, it can be created without affecting all other documents in the collection, without updating a central system catalog and without taking the database offline. When you need to make changes to the data model, the document database continues to store the updated objects without the need to perform costly ALTER TABLE operations – or worse, having to redesign the schema from scratch.

Through these advantages, the flexibility of the document data model is well suited to the demands of modern application development practices.

While a flexible schema is a powerful feature, there are situations where you might want more control over the data structure and content of your documents. Most document databases push enforcement of these controls back to the developer to implement in application code. However more advanced document databases provide schema validation, using approaches such as the IETF JSON Schema standard adopted by MongoDB.

For more information, check out related resources: