.finalrun/tests/. Each file defines a single test scenario using natural-language steps that the AI agent executes on a real device or emulator. You describe what a user would do; FinalRun taps, swipes, types, and verifies on your behalf.
Test fields
Every test file follows a fixed schema. Thename and steps fields are required; all others are optional.
A stable, unique identifier for the test scenario. Use
snake_case. This value appears in run reports and suite manifests, so keep it descriptive and consistent across renames.A short, human-readable summary of what the test validates. One or two sentences is enough.
Actions the agent runs before the main steps to prepare a clean starting state. Every setup block must be idempotent — see Setup and idempotent cleanup below.
An ordered list of natural-language steps the agent executes. Each step must use an action from the allowed action vocabulary.
The expected UI state after all steps are complete. These are boolean conditions the agent checks against the final screen — not actions to perform. If every condition is met, the test passes; if any fail, the test fails.
Three-phase execution model
At runtime, the agent executes every test in three sequential phases:Setup
The agent runs any
setup steps to guarantee a clean starting state, regardless of what a previous run may have left behind.Steps
The agent performs each
steps entry in order — tapping, typing, swiping, and verifying as instructed.Example: login smoke test
The
${secrets.email} and ${secrets.password} placeholders are resolved at run time from environment variables or .env files. See Placeholders for details.Allowed action vocabulary
Every step insetup or steps must use one of the following verbs. Do not write steps that require actions outside this list.
| Verb to use in steps | What the agent does | Needs a UI target? |
|---|---|---|
| Tap / Click | Taps the specified element | Yes |
| Long press | Long-presses the specified element | Yes |
| Type / Enter text | Inputs text into the specified field | Yes |
| Swipe / Scroll | Swipes in a direction over the specified area | Yes |
| Navigate back | Presses the device back button | No |
| Go to home screen | Returns to the device home screen | No |
| Rotate device | Rotates the device orientation | No |
| Hide keyboard | Dismisses the on-screen keyboard | No |
| Open URL / deeplink | Opens a URL or deeplink | No |
| Set location | Sets the device GPS location | Yes (coordinates) |
| Wait | Pauses execution | No |
| Verify / Check | Visually inspects the screen for a condition | Yes (what to verify) |
Writing good steps
Good steps are specific and reference actual UI labels — the text or label visible on screen, not internal component names.- Reference the exact label:
Tap the Login button, notTap the button. - Name the screen when it matters:
Enter the password on the Password screen. - Add inline
Verifysteps before critical actions so failures are caught with a clear message rather than a confusing grounding error:
- Use
Verifysteps instepsto confirm intermediate states during multi-step flows. - Reserve
expected_statefor the final screen only. Do not put navigation or interaction instructions there.
Avoid verifying ephemeral UI
Do not assert on toasts, snackbars, or transient banners insteps or expected_state. These short-lived messages disappear on their own timer and can race against the agent’s verification step. Verify the persistent consequence instead — the updated list, the changed badge count, the screen that appeared.
Positional strictness
When a step specifies the position of a UI element —top-left corner, in the header, first item — the agent treats that position as a strict assertion. If the element is not found at the described location, the test fails; the agent will not search elsewhere.
Use positional context when the element’s location is part of what you are testing. Omit it when you only need to confirm the element exists, so the agent can scroll to find it.
expected_state block above is too vague — The navigation drawer is open could match an unintended element. The first block is spatially precise and will only pass if the layout matches exactly.
Setup and idempotent cleanup
Every test must be idempotent: assume it has already run and failed. If a previous run added data, enabled a toggle, or navigated to a new screen, yoursetup must reverse that state before the test begins.
| If the test validates… | Setup must… |
|---|---|
| Adding an item | Check if the item exists and delete it first. |
| Deleting an item | Check if the item exists and add it first if missing. |
| Enabling a toggle | Disable the toggle first if it is already on. |
| Moving or reordering | Reset the list to a known default order first. |
Verify step after each cleanup action to confirm the app is in the expected starting state. If cleanup fails, the test will fail early in setup rather than produce a misleading failure in the main steps.
File organization
Group tests by feature under
.finalrun/tests/<feature>/. For example, authentication tests belong in .finalrun/tests/auth/, and onboarding tests in .finalrun/tests/onboarding/. This mirrors the suite structure and makes it easy to run all tests for a given feature at once.