My thesis Autonomous and Mobile Ad Hoc Positioning Systems involved simulating large amounts of autonomous agents. Each agent accesses the environment through simulated sensors and acts upon it through their simulated actuators.

Abstract representation of simulation flow for a single agent.

I wrote a simplified custom simulator for this purpose. It simulated monomorphic and perfectly round agents on an infinite plane. Discrete collision detection was implemented as it was deemed an adequate method for the purpose.

Selecting spatial data structure

Each simulated sensor needs to perform a search query in the agent's proximity and the same is required for collision detection. A space-partitioning data structure that allows for efficient searching is therefore required to run simulations with large quantities of entities like these.

Game engines and physics simulations are often based on some variation of the R-Tree (also referred to as Axis-aligned Bounding Box Tree in gaming) - as they allow for storage of diverse geometric objects in three dimensions and offer efficient search.

However, this project assumed monomorphic and round agents, which allowed me to store them as a single point in space. A balanced KD-tree would offer great search efficiency - but the cost of maintaining the structure with thousands of simultaneously moving agents would outweigh the benefit. The simpler regional quad-tree was therefore chosen.

Regional Quad-tree implementation

The regional quad-tree is a fairly simple spatial data structure. Each node contains four children nodes that subdivide the region into four equally sized quadrants.

Upscaling the quad-tree

Dynamic upscaling of quad-tree when an agent moves outside the boundaries.

A regular quad-tree covers a fixed area - but it can be extended (theoretically) infinitely. This implementation of the quad-tree does store the boundaries on node-level and can thus be extended by replacing the root node and constructing new nodes at depth one where needed.

Optimizing memory usage

Limiting the size of individual nodes in the quad-tree will not only save memory but will in most cases also positively affect performance as smaller entities are anticipated to reduce cache misses.

struct QuadNode {
    BoundingBox boundary; // 128 bit
    QuadNode *northWest; // 32 bit
    QuadNode *northEast; // 32 bit
    QuadNode *southWest; // 32 bit
    QuadNode *southEast; // 32 bit
    std::vector<QuadDataPoint> data;
}

struct BoundingBox {
    float x0; float x1;
    float y0; float y1;
}
A common implementation of quad-tree

The boundaries of a node in a regional quad-tree can be calculated based on its parents' boundary and relation – as it always splits into four equal quadrants. We can therefore get away with only storing the boundaries of the root node at a small computational cost when traversing. Saving 128bit per node.

The four references, one to each child node, can be reduced to a single reference if we ensure they are allocated sequentially in memory. Not only does this free up 96bit, but it also (forces us to) improve data locality. It is worth noting this comes at the cost of readability and testability. I would not recommend this in production code unless the code is completely encapsulated.

Improving Performance

This section covers tricks deployed to further improve the performance of this implementation. Consider the consequences thoroughly before using these in your own implementation ⚠️

Avoid floating-point arithmetic

The boundaries are calculated at traversal – resulting in quite a few divisions by 2.
However, calculating boundaries in whole numbers allows for division through bit-shifting which is extremely cost-efficient. It does strictly limit the minimum size of a region to 1 unit x 1 unit. This was acceptable in this case as I also control the size of the agents - but might not be in your use case. As a bare minimum multiply by 0.5f instead of dividing by 2.

Avoid auto-scaling structures (vector)...

The std::vector<T> is essentially an auto-scaling array that will double its capacity when you reach its limit to ensure subsequent inserts can be performed fast. This is often great – but not in this case. The many moving agents will result in these vectors claiming an unnecessarily large amount of memory and invoking memory allocation and deallocation outside of our control.

It is more efficient, counter-intuitively, to store the entities in a custom-made linked list. Traversal of a linked list is slower and will often occur a performance penalty from bad data locality – but allows us to insert and delete from them in constant time. Furthermore, the nodes for these linked lists can be reused as agents that move into a new region must always move out of another. Each entity in the tree will always have one associated linked-list node – we will therefore never need to allocate more memory to perform these operations.

Node allocation using object pooling

The density of the entities in the quad-tree will constantly change as the agents move around – and quad-nodes will frequently be allocated and deallocated as a result. Object-pooling is therefore implemented to prevent frequent memory allocation calls. All nodes are stored sequentially and a reference to the first free node (if any) is kept at all times.

Final minimal node model

The final model looks quite minimalistic. A quad-node will always either be a regular node (with four QuadNode children) or a leaf node with data.  The child reference will either be the head of a linked list containing all the entities or the first of the four children notes (if the count is -1). This hack to limit memory usage and storing secret meanings is not recommended for production code!

You might notice the child reference is not a pointer but an integer – this is an index in an array instead of a pointer, as I ensure all nodes are always stored sequentially in memory. This can in some cases require that they get relocated in memory which would mess around with a regular pointer.

The child-count is cached as linked-list traversal is slow.

struct QuadNode {
	int child;
	int count; // used for calculating splits
	QuadNode();
};

struct Entity {
        int link; // used for linked-list implementation
        Vec2 position; 
        Vec2 velocity; // used for physics simulation
};
Unreadable, but fast, QuadNode implementation

Performance

Benchmarking is performed on an i7-4770K CPU @ 3.50GHz with 8 GB DDR3 ram @ 2400Mhz, a reasonably modest system by today’s standards. Each data point is an average of 100 runs.

Insertion

Here n-entities are inserted into the data structure. The entire operation is measured – and the average cost of inserting is calculated below.

Entities 10^1 10^2 10^3 10^4 10^5 10^6 10^7
Runtime 0.00264ms 0.00976ms 0.07355ms 0.79198ms 13.523ms 189.57ms 2.7425s
Average 0.264μs 0.097μs 0.073μs 0.079μs 0.135μs 0.189μs 0.274μs

The time it takes to perform 100 searches with a radius of 100 units around random agents in the structure.

Entities 10^1 10^2 10^3 10^4 10^5 10^6 10^7
Runtime 0.1444ms 0.12099ms 0.15073ms 0.16988ms 0.15892ms 0.26293ms 0.34042ms

Simulate a tick

The time it takes to simulate a single simulation tick. This includes moving each agent to their new location (based on current velocity) and calculating collisions.

I ran this test in two environments. One sparsely populated (low risk of collision) and one densely populated with each agent having a 20% chance of colliding. Densely populated environments are non-surprisingly slower.

Entities 10^1 10^2 10^3 10^4 10^5 10^6 10^7
Sparse - Total 0.01668ms 0.12747ms 1.50349ms 17.5186ms 165.341ms 3.043s 38.308s
Dense - Total 0.01602ms 0.21486ms 2.47993ms 27.3711ms 299.184ms 4.597s 55.867s
Sparse - Avg 1.668µs 1.2747µs 1.5034µs 1.7518µs 1.6534µs 3.0431µs 3.8308µs
Dense - Avg 1.602µs 2.1486µs 2.4799µs 2.7371µs 2.9918µs 4.5978µs 5.5868µs

Conclusion

I have shown the tricks I deployed to optimize the quad-tree implementation used in my master thesis. It is worth noting some of them seriously hurt the readability and testability of your code. Please think twice before using this in production.

However, for academic code or hobby projects - hack away!

It's also worth noting that the quad-tree is only a viable and efficient solution because of numerous assumptions and conditions that apply to my specific use case. This implementation assumes:

  • All environments are flat planes
  • Agents are monomorphic and round
  • The majority of agents will be moving at all times
  • It project will never have to be extended nor maintained...

Feel free to contact me for the source code or any questions.