Abstract
Problem: How should a game with tens of thousands of data files store and distribute them efficiently?
Approach: Tim Cain explains the DAT file system used in Arcanum — packing all game files into numbered monolithic archives with an internal lookup table, and the engineering reasoning behind this design.
Findings: The DAT file approach saved disk space (eliminating sector wastage), reduced memory usage (one file pointer instead of thousands), improved load times (seeking within an open file vs. opening new files), enabled elegant patching (higher-numbered DATs override earlier ones), and made modding straightforward.
Key insight: Packing thousands of small files into a single monolithic archive with an offset table is one of those rare engineering tradeoffs that saves both time and space simultaneously — and the numbered-override system elegantly solves patching and modding in one design.
How DAT Files Work
All of Arcanum's game data is stored in large monolithic files with the .dat extension. The process for creating them:
- All individual files are written sequentially into one big file
- As each file is written, its filename and byte offset (position within the DAT) are recorded in a table
- When all files are written, the table is appended to the end of the DAT
- The offset to the table is written at the very beginning of the file
When the game loads, it reads the first bytes to find the table offset, jumps to it, reads the full file listing, and then whenever it needs a file, it simply seeks to that offset within the already-open DAT file.
Why Not Just Use Regular Files?
During development, Troika did use regular directories and individual files. But for a shipping product, there were compelling reasons to switch to DAT archives:
Single File Pointer
For thousands — potentially hundreds of thousands — of files, you only need one open file handle. Every file open creates OS-level memory structures. One open file instead of fifty means dramatically less memory overhead.
Faster Loading
Seeking to an offset within an already-opened file is significantly faster than calling the OS file-open routine. This is one of those rare tradeoffs where you save both time and memory.
Disk Space and Sector Wastage
Hard drives use sector sizes — the minimum allocation unit, typically 512 bytes to 4KB. A 100-byte file still occupies an entire 512-byte (or 4KB) sector, wasting the remainder. Tim gives the example: 1,000 files of 100 bytes each should use 100KB, but due to sector sizes they'd consume 512KB to 4MB on disk. Packed into a single DAT, they actually use just 100KB. In the 1990s when hard drives were 128–256MB, this mattered enormously.
Patching via Numbered DATs
Arcanum shipped with numbered DAT files: arcanum1.dat, arcanum2.dat, arcanum3.dat, etc. The game loads them in numerical order. If a later DAT contains a file with the same name as an earlier one, the later version overrides.
This made patching elegant: if Arcanum shipped with DATs 1–4, a patch was simply arcanum5.dat. Any files inside it automatically replaced their counterparts in earlier DATs. No need to modify or rebuild existing archives.
Modding Support
The same override system powered modding. Troika shipped a tool called DB Maker with the game. Modders could create their own DAT files that loaded after the base game DATs. Want to change how an item works? Write a new version of that item's file. Want to replace a gun sound? Make your own gunshoot.wav — yours takes precedence over the game's version. No need to dig into tables or modify base game files.
Compression Options
With the DAT system, there were three approaches to compression:
- No compression — already saving space from eliminated sector wastage (the baseline)
- Compress for distribution only — compress DATs with something like pkzip for download/install, decompress to raw DATs on the hard drive. Faster downloads, same disk footprint as baseline
- Compress both in distribution and on disk — smallest footprint, faster downloads, but slower file loading since each file needs decompression into memory at runtime
The choice depended on DAT file sizes and the state of average consumer hardware at the time.
Modern Relevance
Tim notes that many of the original motivations — small hard drives, limited memory, slow file operations — matter less today. However, the DAT approach still provides value for easy distribution and modding support, making monolithic archive files a technique that remains relevant in modern game development.
References
- Tim Cain. YouTube video. https://www.youtube.com/watch?v=VYw4ln0jxUY