BSON and Golang Interfaces

This weekend I decided to implement BSON.

http://bsonspec.org/

BSON is just a binary representation of JSON with some extra types and traversal speed improvements.

Traversal speed is important for rapidly scanning a group of BSON objects (called Documents) for specific pieces of information.

Lets imagine you had a list of JSON like the following, and you were searching for the information under the key value “secret”.

{ “aaaaaaaaaaaaaa…..aaaaaaaaaaaa”: “world”, “secret”: “IMPORTANT INFO” } { ‘aaaaaa’: 123, “secret”: “more important info”}

A scanner has to linearly read each character one at a time since it doesn’t know when the ‘a’s end. If that list of ‘a’ was 10 megabytes long, we’re going to waste a second or two every time we want to read the value of ‘secret’.

BSON makes this simpler by pre-pending string lengths so if you can jump throughout the Document for much quicker scanning. BSON leaves little clues about the contents of each element so you can jump around without having to scan the entire contents.

Specification

BSON is binary JSON with some extra types. The top level object in BSON is a Document. A document is a collection elements. Each element contains three things:

  • The Type Identifier (byte)
  • Element Name (string)
  • Contents (string,date,int,subdocument)

And that’s it!

Implementation

Normally when I implement something like this, I jump write into implementing the Marshal/Unmarshal code (serialize/deserialize).

This time I decided to take a different approach and instead focus on the types.

I started with the base types, and how they should be represented in BSON.

So types like int32, int64, double, cstring, string.

type Double float64
type Int32 int32
type Int64 int64
type CString string
type String string

And for each newly defined type, I made sure they implemented the BSON interface I created:

type BSON interface {
   ToBSON() []byte
   ToString() string
}

Once I had all the base types complete, I started implementing the core of BSON, the elements. I also wanted all the elements to implement the BSON interface. This was super simple thanks to the fact that all of the children types were already implementing the BSON interface:


type Element struct {
   Identifier byte
   EName CString
   Data BSON
}

func (e Element) ToBSON() []byte {
   out := []byte{}
   out = append(out, e.Identifier)
   out = append(out, e.EName.ToBSON()...)
   out = append(out, e.Data.ToBSON()...)
   return out
}

func (e Element) ToString() string {
    return e.EName.ToString() + ": " + e.Data.ToString()
}

After that, all I had left to do was define the types that used the Elements, and make sure they also implemented the BSON interface, which was simple since all their children already knew how to represent themselves too.


type Document ElementList
type ElementList []Element

And I was done!

So now I can create new Documents, and just call ToBSON(), and the document asks all it’s children to represent themselves in BSON, which ask their children to represent themselves in BSON.

Closing Remarks

What I really like about this implementation is that now I can use this library (probably for a fuzzer), and have it generate BSON for any type in the entire stack. Every type corresponds to BSON making the library much simpler than only operating on documents.

You can check out the implementation here:

https://github.com/c0nrad/bson

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s