Serialization Format (When JSON Not Good Enough)

This is an excerpt from a discussion in #devkini Telegram group, so if you have something more to say on this topic, heads up to the group to share your thought.

JSON is a de-facto format to use when transferring structured data over the network. But what if you need more than JSON can offer, such as faster serialization/de-serialization, efficient bandwidth usage or backward compatible data schema ? Let’s look at some of the alternatives:-


From the website - MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like JSON. But it’s faster and smaller. Small integers are encoded into a single byte, and typical short strings require only one extra byte in addition to the strings themselves.

(google) protobuf

protobuf is binary. msgpack is 1:1 with json. If you have a set predefined structure, you’d save a lot of bandwidth not passing the keys, unlike json or msgpack. TL/DR: much smaller. For performance, capnproto (by one of the creators of protobuf) and flatbuffers (also by google) are the successors. Key diff: no serialization, so you get zero-copy performance.

Note: everyone (with large data in production) uses protobuf for on-the-wire data (v2 though, v3 take up is a lot less). Diablo3’s network protocol , google’s gtfs-realtime, etcd. Compares very similar to facebook’s apache thrift. Twitter shoehorned protobuf into Hadoop.

Note: I do NOT recommend protobuf for new projects. Premature optimization. Stick with json, then switch when you have hit CPU/bandwidth bottlenecks (yeah, those are serialization stuff, we still use them over https anyway. mix and match) Read all about them here so you’ll be able to impress other folks: :)

Another smartsounding useless factoid: zfs, nfs, libvirt, firebird db uses XDR… Financial services (“millions of messages per second”) like(?)/standardized(?)/use(?) SBE
dota2 (and steam).

Other reading -

Written on December 13, 2015