Abstract
Problem: How did a team of four programmers optimize a massive open-world RPG (Arcanum) that could have hundreds of thousands of objects and creatures loaded at once?
Approach: Tim Cain describes the specific memory and time optimization techniques used in Arcanum's development, covering object compaction, custom memory management (TIG), profiling with VTune, AI heartbeat scaling, dirty flags, and spatial partitioning.
Findings: Memory was tamed through prototype-based object compaction (storing only deltas from prototypes) and a custom allocator with handle-based locking and defragmentation. Time was optimized by profiling with VTune to target the highest-impact functions, scaling AI heartbeat frequency by distance to the player, using dirty flags to avoid redundant calculations, and using quad trees for collision.
Key insight: Most optimization is about not doing work β don't store what hasn't changed, don't recalculate what hasn't moved, don't update what the player can't see.
Memory Optimization: Object Compaction
Arcanum used a system called object compaction where each object only stored how it differed from its prototype. A prototype (e.g., "sword" or "zombie") contained all default values β art, max hit points, attack data, animations. Individual instances typically only needed to store their location and current hit points.
This was implemented using bit fields: each object had a bit array indicating which fields were present. To look up the 4th field, you'd check the 4th bit β if it's 1, count how many preceding bits are set to find the field's offset in the object's data. If it's 0, read from the prototype instead.
Programmer Mark Harrison built a system called Big Bits that allowed arbitrarily long bit arrays (not limited to 32 bits).
The Prototype Trade-off
For common variants (like a "+1 sword"), you had two options:
- Store the delta from the base prototype (good for rare variants)
- Create a new prototype (good for frequently-used variants, since all instances reference it with zero overhead)
Even with the bit field overhead per object, the savings were enormous because the vast majority of objects β walls, trees, rocks, standard items, common monsters β barely differed from their prototypes.
Memory Optimization: TIG Memory Manager
Arcanum used TIG, a static library that managed all memory allocation. On startup, TIG allocated one giant contiguous block of memory and managed everything within it.
Handle-Based Allocation
Instead of returning raw pointers, TIG returned handles. To use memory, you had to lock the handle (which returned an actual pointer), then unlock it when done. Unlocked blocks could be freely moved within the giant block.
This solved memory fragmentation in real time: if you allocated blocks A, B, C and then freed B, TIG could slide C down to fill the gap β as long as C wasn't locked. When you next locked C's handle, you'd get the new pointer.
Debug Features
mallocvscalloc: TIG'smallocleft memory uninitialized (faster);calloczeroed it out. This avoided a common C bug where debug builds zero-initialize but release builds don't.- Guard bytes: Every allocation got 4 extra guard bytes at each end. On lock/free, TIG checked if the guard pattern had changed β catching memory stomps immediately.
Time Optimization: Profiling with VTune
The team used Intel VTune to profile where time was being spent. VTune produced a hierarchical map: if function A spent 100ms total but called B (50ms) and C (30ms), then A itself only used 20ms.
The critical insight: optimize the expensive leaf functions first. Cutting B's 50ms in half saves 25ms (25% of total), while cutting A's own 20ms in half only saves 10ms (10%).
But context matters β if B is cheap per-call but looped thousands of times inside A, the real optimization target is the loop: unroll it, reduce iterations, or exit early.
Time Optimization: Script Time Limits
Functions and scripts were given time budgets. If a script exceeded its allocation, an error was thrown. This caught infinite loops and overly ambitious scripts, forcing the author to simplify.
Time Optimization: AI Heartbeat Scaling
Creature AI used a heartbeat system where update frequency was proportional to distance from the player:
- Nearby creatures: Fast heartbeats β responsive, reactive behavior
- Distant creatures: Slow heartbeats (every 2β10 seconds) β could even teleport between positions since the player couldn't see them
- Very far creatures: No heartbeat at all
This dramatically reduced AI computation without any visible impact on gameplay.
Time Optimization: Dirty Flags
Instead of recalculating values every frame, Arcanum used dirty flags. For example, the distance between a player and a monster was calculated once and cached. If either moved, the value was marked dirty and only recalculated on next access.
A single bit was enough β often stored by stealing the top bit of a field that didn't need its full range. During a single heartbeat, a creature might check its distance to the player multiple times but never recalculate it, since nothing could move mid-heartbeat.
Tim notes this pattern works equally well in Unity's Update() cycle.
Time Optimization: Spatial Partitioning
For collision detection, checking every object against every other object is prohibitively expensive. Arcanum used quad trees to partition world space, so collision checks only considered nearby objects (leaves sharing the same parent node).
The quad tree must be maintained, but it's updated once and then enables many fast lookups β a classic time-memory trade-off.
The Fundamental Trade-off
Tim emphasizes that time and memory optimizations are often in tension: saving time usually costs memory (caches, spatial data structures, dirty flags) and saving memory usually costs time (compaction lookups, handle locking).
He also notes that in modern engines like Unity, you don't control memory allocation directly. His workaround in several games: skip Unity's per-object Update() and instead use a manager class that receives the update and decides which children to update and when β reclaiming control over the update loop.
Why Custom Engines
Tim closes by noting that third-party engines and libraries limit what you can optimize, since you can't see or modify their internals (Unity) or must maintain custom forks (Unreal). This is one reason he prefers building his own game engines.
References
- Tim Cain. YouTube video. https://www.youtube.com/watch?v=nJdyR0nO17E