Protocol Buffers, commonly known as protobuf, is a binary data-interchange format that guarantees type-safety while being language-agnostic and cross-platform. The format is size-efficient and developed with a focus on high serialization/deserialization performance.

The format relies on pre-compiled schemas unlike other data-interchange formats such as JSON that can be serialized/deserialized using generic libraries. The official compiler (protoc) supports C++, C#, Dart, Go, Java and Python. Utilizing protobuf with a language not supported by the compiler, while possible, is quite cumbersome.

Schema Files

The schema is defined in .proto files using the language defined in the official language guide. It supports complex structures including nested types, optional fields, repeated fields (arrays), mappings, and much more.

syntax = "proto3";

package Example;

message Entity {
  int32 identifier = 1;
  optional string description = 2;
  repeated Coordinate points = 3;
}

message Coordinate {
  optional int32 x = 1;
  optional int32 y = 2;
}

This schema is then compiled into the desired languages using protoc which produces a number of files with class definitions for each message entity.

Performance - Message Size

I first discovered protobuf while working on Abathur, a framework for modularized StarCraft II agents. The entire game state has to be synchronized every game step (16Hz at normal game speed, 22.4Hz at faster) – which creates a high bandwidth requirement, so I decided to run some tests.

ResponseObservations size formatted as protobuf compared to JSON.

A single game state object (ResponseObservations) varied between 959.55KiB and 1534.51KiB in size on the map Trozinia LE when formatted as JSON, which roughly equals 7.86-12.57Mbps when playing in real-time. The exact same objects varied between 133.61KiB and 169.01KiB when formatted as protobuf – which results in "only" 1.09-1.38Mbps.

Opting for protobuf instead of JSON meant a total message size saving of 618.19% to 807.93% in this specific scenario. This makes a huge difference for the performance of networked applications and potentially substantial savings in pure data-transfers rates for cloud solutions. Abathur however runs locally – but the saved I/O operations are appreciated.

Performance - Serialization/Deserialization

Decreasing data-transfer might in itself be worth the change to protobuf in some application, especially cloud solutions – but is only worth it from a pure performance perspective if the time spent "compressing" the data is made up for in time saved transmitting the data.

The binary representation of protobuf messages is very similar to internal binary representation of C++ objects. The format is therefore highly efficient in C++ as it can almost simply copy the message directly into memory and interpret as an object. Abathur however was a C#/Python hybrid – languages with very different internal data representations. I therefore decided to run some tests...

The test set was generated by running two Elite AIs against each-other on Cinder Fortress and continuously requesting observations for 16860 steps. These observations were then saved to disk – and subsequently loaded into the small testing application for timing of serialization/deserialization. The trimmed mean value is a 25% trimmed mean.

C# Results 25% Trimmed mean Max value Min value
proto serialization 1.00ms 27.899ms 0.326ms
json serialization 9.0916ms 94.5744ms 2.722ms
proto deserialization 0.544ms 16.609ms 0.125ms
json deserialization 18.326ms 124.808ms 6.911ms

These tests were performed on a modest laptop with an I5-5200U CPU @ 2.20GHz, 8 GB RAM running Windows 10. Serialization/deserialization in C# was done using Google.Protobuf.JsonFormatter and Google.Protobuf.JsonParser

Python Results 25% Trimmed mean Max value Min value
proto serialization 23.788ms 126.0893ms 11.007ms
json serialization 38.021ms 171.122ms 19.995ms
proto deserialization 24.220ms 124.0892ms 10.007ms
json deserialization 46.635ms 218.156ms 23.997ms

Python serialization/deserialization used google.protobuf.json_format

Protobuf serialization and deserialization is significantly faster than JSON in C#, which is not surprising as the internal data representation of an object is similar to the ones utilized by C++ which the format is optimized towards in the first place. Python however also gain a substantial performance boost!

Reflection

Is protobuf simply better than JSON? Of course not.
The two formats vastly differs and comparing them purely on performance is unfair. JSON is human readable, self-describing, universally supported and effectively an industry data-interchange format.

Protobuf on the other hand can be cumbersome to work with, as the schema has to be known by the receiver for the data to make any sense. Small schema changes can easily break previous integrations if you don't follow best-practices carefully – not to mention the awkward schema-compilation workflow. It is less known and supported by fewer languages as-well, so it should probably not be your first choice for an public API.

But if you crave high performance data transfer or your cloud provider is ripping you off on data transfer charges – give it a try.