Creating Playtest Plans

Abstract

Problem: How should RPG developers structure playtest plans to ensure comprehensive QA coverage across the enormous combinatorial space of character builds, behaviors, and game systems?

Approach: Tim Cain draws on his experience creating playtest plans for Fallout, Arcanum, Temple of Elemental Evil, Vampire: The Masquerade – Bloodlines, and The Outer Worlds, outlining the key components every RPG playtest plan should include.

Findings: Effective playtest plans should cover five areas: player build, player behavior, player leveling, end slide reverse-engineering, and play options. The specificity of plans varies from broad directives ("kill everyone") to precise instructions ("get this skill to 20, then gamble at Smoky Pete's Casino until you make $2 million"). Automated plan generators, like the one Taylor Swopee built for The Outer Worlds, can help QA cover more ground.

Key insight: You can never test every combination — Arcanum alone had a million starting combinations — but structured playtest plans that systematically vary builds, behaviors, and constraints will catch bugs that freeform testing misses.

Source: https://www.youtube.com/watch?v=AGdJCVYasOk

Context

Tim Cain explains that he creates formal playtest plans primarily when QA is done out-of-house. Fallout had in-house QA where he met with the lead every Friday. Arcanum was split between in-house and publisher QA, so they provided a general plan. Temple of Elemental Evil and Vampire: Bloodlines had very specific plans. The Outer Worlds had in-house QA at Obsidian with both detailed plans and an automated plan generator.

The Five Components of a Playtest Plan

Player Build

The plan should tell testers exactly what kind of character to create. For class-based games (like D&D), this means specifying class, race, and minimum attributes. For skill-based games (like Arcanum or Bloodlines), it means specifying attributes, skills, traits, and backgrounds.

For Arcanum, Tim would specify builds like: "make a pure tech gunslinger," "make a pure magic wizard," or "make a neutral swordsman at zero on the tech-magic meter." The goal is to delineate all the interesting combinations you want tested, knowing you can never cover them all — Arcanum had roughly a million starting combinations before factoring in play style.

Player Behavior

Beyond the build, the plan dictates how to play. Examples include:

Kill every NPC — The "psychopathic murder hobo" playthrough tests whether the main story arc remains completable when quest givers die. This is why games put quest givers behind glass, have them call you, or send couriers with notes (which you get even if you kill the courier).
Take every quest offered — Complete each quest before moving to the next hub or act.
Resolve every conflict through dialogue, bribery, or intimidation — Tests non-combat paths.
Try to steal everything — May or may not pair with a high pickpocket/lockpicking build. Tim sometimes deliberately combines a "steal everything" behavior with a build that doesn't have thief skills to see what happens when players get caught constantly.
Try to get as rich as possible — Tests economy-related systems and edge cases.

Player Leveling

Separate from the initial build, leveling instructions tell testers how to spend points as they progress:

Multiclass early and often — Tests class-switching systems.
Pick three skills and only invest in those — Creates specialized builds that may expose balance issues.
Only spend points in specific skill categories (e.g., combat and thieving) — Tests category coverage.
Always take flaws when offered — Don't think about it, just take it.

Tim notes these leveling schemes have led to "some really interesting bugs" being found. It's not enough to specify the starting build — you must also specify the growth path.

End Slide Reverse-Engineering

End slides (the narrated conclusion slides showing consequences of player choices) need to be reverse-engineered into test plans. The designer knows what slides exist and what conditions trigger them. The process is:

List all end slides
Examine their trigger conditions
Create test plans with specific player behaviors designed to trigger each slide

Examples: Take or don't take specific companions. Finish or abandon their quest lines. Let companions die. Never visit a particular town. Kill everyone in a town. Complete all quest lines in just one town. Each of these behaviors should produce the correct end slide.

Play Options

These multiply every previous category and apply to all games, not just RPGs:

Difficulty modes — Story, Easy, Normal, Hard, Super Hard. Every plan element should ideally be tested across multiple difficulties.
Performance vs. quality mode — Play through both; things that run fine in performance mode may crawl in quality mode.
PC hardware variety — Different video cards, CPUs, memory amounts. Assign someone to a minspec machine (Tim usually gives them both a minspec and a good machine to alternate). Test across a wide variety of configurations.
Controller vs. keyboard — Many things work flawlessly on one input method but are broken or awkward on the other.
Windowed vs. fullscreen vs. borderless fullscreen — Test all display modes.
Resolutions and monitors — Different resolutions, big curved monitors, multi-monitor setups.
Console variants — Xbox S vs. X, with or without hard drive.

The Overnight Test

A lesson learned from hard experience that Tim always included in plans afterward: have someone pause the game at the end of the day and leave it running overnight. Don't close it — just let it sit for 12-15 hours. Even better, don't pause — find a "safe" spot in-game and leave the character standing there with the game running.

This catches two major bug categories:

Crash bugs from timed events triggering while the player is idle
Memory leaks too small to notice during normal play but catastrophic over extended periods

Plan Element Granularity

Individual plan elements range from extremely brief to highly detailed:

Broad: "Make a Dex character and kill everyone"
Specific: "Get this skill to 20, this attribute to minimum 10, then go gamble at Smoky Pete's Casino until you make $2 million"

Both types are valuable. Some QA people prefer precise instructions; others prefer broad directives. Tim sometimes tells QA why a specific test exists (e.g., there's an end slide triggered by reaching $2 million) but sometimes doesn't.

The Outer Worlds Test Plan Generator

For The Outer Worlds, QA lead Taylor Swopee built an automated test plan generator. When a QA tester sat down to play, they'd run the generator and receive a randomized plan: what kind of character to build and how to play. Play instructions included things like "Kill Everyone" or "whenever you pick up a drug, take it immediately." Tim himself got the drug one once and says it "led to a very interesting playthrough."

References

Tim Cain. YouTube video. https://www.youtube.com/watch?v=AGdJCVYasOk