Skip to main content
FinalRun test specs are plain YAML files stored under .finalrun/tests/. Each file defines a single test scenario using natural-language steps that the AI agent executes on a real device or emulator. You describe what a user would do; FinalRun taps, swipes, types, and verifies on your behalf.

Test fields

Every test file follows a fixed schema. The name and steps fields are required; all others are optional.
name
string
required
A stable, unique identifier for the test scenario. Use snake_case. This value appears in run reports and suite manifests, so keep it descriptive and consistent across renames.
description
string
A short, human-readable summary of what the test validates. One or two sentences is enough.
setup
list of strings
Actions the agent runs before the main steps to prepare a clean starting state. Every setup block must be idempotent — see Setup and idempotent cleanup below.
steps
list of strings
required
An ordered list of natural-language steps the agent executes. Each step must use an action from the allowed action vocabulary.
expected_state
list of strings
The expected UI state after all steps are complete. These are boolean conditions the agent checks against the final screen — not actions to perform. If every condition is met, the test passes; if any fail, the test fails.

Three-phase execution model

At runtime, the agent executes every test in three sequential phases:
1

Setup

The agent runs any setup steps to guarantee a clean starting state, regardless of what a previous run may have left behind.
2

Steps

The agent performs each steps entry in order — tapping, typing, swiping, and verifying as instructed.
3

Expected state

The agent checks each expected_state condition against the final screen. The test succeeds only when all conditions pass.

Example: login smoke test

name: login_smoke
description: Verify that a user can log in and reach the home screen.

setup:
  - Clear app data.

steps:
  - Launch the app.
  - Enter ${secrets.email} on the login screen.
  - Enter ${secrets.password} on the password screen.
  - Tap the login button.

expected_state:
  - The home screen is visible.
  - The user's name appears in the header.
The ${secrets.email} and ${secrets.password} placeholders are resolved at run time from environment variables or .env files. See Placeholders for details.

Allowed action vocabulary

Every step in setup or steps must use one of the following verbs. Do not write steps that require actions outside this list.
Verb to use in stepsWhat the agent doesNeeds a UI target?
Tap / ClickTaps the specified elementYes
Long pressLong-presses the specified elementYes
Type / Enter textInputs text into the specified fieldYes
Swipe / ScrollSwipes in a direction over the specified areaYes
Navigate backPresses the device back buttonNo
Go to home screenReturns to the device home screenNo
Rotate deviceRotates the device orientationNo
Hide keyboardDismisses the on-screen keyboardNo
Open URL / deeplinkOpens a URL or deeplinkNo
Set locationSets the device GPS locationYes (coordinates)
WaitPauses executionNo
Verify / CheckVisually inspects the screen for a conditionYes (what to verify)
Verify is the one step type that is not a device action. Use it in setup to confirm cleanup succeeded, and in steps to confirm intermediate states before critical actions.

Writing good steps

Good steps are specific and reference actual UI labels — the text or label visible on screen, not internal component names.
  • Reference the exact label: Tap the Login button, not Tap the button.
  • Name the screen when it matters: Enter the password on the Password screen.
  • Add inline Verify steps before critical actions so failures are caught with a clear message rather than a confusing grounding error:
steps:
  - Verify the hamburger menu icon is visible in the top-left corner of the toolbar.
  - Tap the hamburger menu icon in the top-left corner of the toolbar.
  • Use Verify steps in steps to confirm intermediate states during multi-step flows.
  • Reserve expected_state for the final screen only. Do not put navigation or interaction instructions there.

Avoid verifying ephemeral UI

Do not assert on toasts, snackbars, or transient banners in steps or expected_state. These short-lived messages disappear on their own timer and can race against the agent’s verification step. Verify the persistent consequence instead — the updated list, the changed badge count, the screen that appeared.
# Good — verifies a persistent outcome
expected_state:
  - The item appears in the shopping cart.

# Bad — toast may have already dismissed
expected_state:
  - The "Added to cart" toast is visible.

Positional strictness

When a step specifies the position of a UI element — top-left corner, in the header, first item — the agent treats that position as a strict assertion. If the element is not found at the described location, the test fails; the agent will not search elsewhere. Use positional context when the element’s location is part of what you are testing. Omit it when you only need to confirm the element exists, so the agent can scroll to find it.
# Position matters — include it
expected_state:
  - The navigation drawer is open and visible on the left side of the screen.
  - The profile avatar is visible at the top of the drawer.

# Position doesn't matter — keep it generic
expected_state:
  - The navigation drawer is open.
  - The profile avatar is visible.
The second expected_state block above is too vague — The navigation drawer is open could match an unintended element. The first block is spatially precise and will only pass if the layout matches exactly.

Setup and idempotent cleanup

Every test must be idempotent: assume it has already run and failed. If a previous run added data, enabled a toggle, or navigated to a new screen, your setup must reverse that state before the test begins.
If the test validates…Setup must…
Adding an itemCheck if the item exists and delete it first.
Deleting an itemCheck if the item exists and add it first if missing.
Enabling a toggleDisable the toggle first if it is already on.
Moving or reorderingReset the list to a known default order first.
Always add a Verify step after each cleanup action to confirm the app is in the expected starting state. If cleanup fails, the test will fail early in setup rather than produce a misleading failure in the main steps.
setup:
  - Navigate to the Shopping List screen.
  - If the item 'Milk' is visible, swipe left on it and tap Delete.
  - Verify that 'Milk' is no longer visible on the Shopping List screen.

File organization

Group tests by feature under .finalrun/tests/<feature>/. For example, authentication tests belong in .finalrun/tests/auth/, and onboarding tests in .finalrun/tests/onboarding/. This mirrors the suite structure and makes it easy to run all tests for a given feature at once.