Protocol Buffers, commonly known as protobuf, is a binary data-interchange format that guarantees type-safety while being language-agnostic and cross-platform. The format is size-efficient and developed with a focus on high serialization/deserialization performance.
The format relies on pre-compiled schemas unlike other data-interchange formats such as JSON that can be serialized/deserialized using generic libraries. The official compiler (protoc) supports C++, C#, Dart, Go, Java and Python. Utilizing protobuf with a language not supported by the compiler, while possible, is quite cumbersome.
Schema Files
The schema is defined in .proto files using the language defined in the official language guide. It supports complex structures including nested types, optional fields, repeated fields (arrays), mappings, and much more.
syntax = "proto3";
package Example;
message Entity {
int32 identifier = 1;
optional string description = 2;
repeated Coordinate points = 3;
}
message Coordinate {
optional int32 x = 1;
optional int32 y = 2;
}
This schema is then compiled into the desired languages using protoc which produces a number of files with class definitions for each message entity.
Performance - Message Size
I first discovered protobuf while working on Abathur, a framework for modularized StarCraft II agents. The entire game state has to be synchronized every game step (16Hz at normal game speed, 22.4Hz at faster) – which creates a high bandwidth requirement, so I decided to run some tests.

A single game state object (ResponseObservations) varied between 959.55KiB and 1534.51KiB in size on the map Trozinia LE when formatted as JSON, which roughly equals 7.86-12.57Mbps when playing in real-time. The exact same objects varied between 133.61KiB and 169.01KiB when formatted as protobuf – which results in "only" 1.09-1.38Mbps.
Opting for protobuf instead of JSON meant a total message size saving of 618.19% to 807.93% in this specific scenario. This makes a huge difference for the performance of networked applications and potentially substantial savings in pure data-transfers rates for cloud solutions. Abathur however runs locally – but the saved I/O operations are appreciated.
Performance - Serialization/Deserialization
Decreasing data-transfer might in itself be worth the change to protobuf in some application, especially cloud solutions – but is only worth it from a pure performance perspective if the time spent "compressing" the data is made up for in time saved transmitting the data.
The binary representation of protobuf messages is very similar to internal binary representation of C++ objects. The format is therefore highly efficient in C++ as it can almost simply copy the message directly into memory and interpret as an object. Abathur however was a C#/Python hybrid – languages with very different internal data representations. I therefore decided to run some tests...
The test set was generated by running two Elite AIs against each-other on Cinder Fortress and continuously requesting observations for 16860 steps. These observations were then saved to disk – and subsequently loaded into the small testing application for timing of serialization/deserialization. The trimmed mean value is a 25% trimmed mean.
C# Results | 25% Trimmed mean | Max value | Min value |
---|---|---|---|
proto serialization | 1.00ms | 27.899ms | 0.326ms |
json serialization | 9.0916ms | 94.5744ms | 2.722ms |
proto deserialization | 0.544ms | 16.609ms | 0.125ms |
json deserialization | 18.326ms | 124.808ms | 6.911ms |
These tests were performed on a modest laptop with an I5-5200U CPU @ 2.20GHz, 8 GB RAM running Windows 10. Serialization/deserialization in C# was done using Google.Protobuf.JsonFormatter and Google.Protobuf.JsonParser
Python Results | 25% Trimmed mean | Max value | Min value |
---|---|---|---|
proto serialization | 23.788ms | 126.0893ms | 11.007ms |
json serialization | 38.021ms | 171.122ms | 19.995ms |
proto deserialization | 24.220ms | 124.0892ms | 10.007ms |
json deserialization | 46.635ms | 218.156ms | 23.997ms |
Python serialization/deserialization used google.protobuf.json_format
Protobuf serialization and deserialization is significantly faster than JSON in C#, which is not surprising as the internal data representation of an object is similar to the ones utilized by C++ which the format is optimized towards in the first place. Python however also gain a substantial performance boost!
Reflection
Is protobuf simply better than JSON? Of course not.
The two formats vastly differs and comparing them purely on performance is unfair. JSON is human readable, self-describing, universally supported and effectively an industry data-interchange format.
Protobuf on the other hand can be cumbersome to work with, as the schema has to be known by the receiver for the data to make any sense. Small schema changes can easily break previous integrations if you don't follow best-practices carefully – not to mention the awkward schema-compilation workflow. It is less known and supported by fewer languages as-well, so it should probably not be your first choice for an public API.
But if you crave high performance data transfer or your cloud provider is ripping you off on data transfer charges – give it a try.