How verification works
You're probably wondering what "the AI tries it out for you" actually means. Here's how a real browser replays your scenario, how pass and fail are decided, and how the results are saved.
A real browser tries it for you
Here's what "the AI tries it out" actually means. Specnote opens a real web browser and replays your scenario from start to finish, like a person — opening the address, typing into fields, pressing buttons, and moving to the next screen. This is called a full replay.
The important part: you don't write a single line of test code. Building automated tests usually means a developer has to write extra code, but Specnote just follows the steps in the scenario you've organized, so there's no need. The scenario itself is the test script.
The browser runs in the same kind of environment a person sees on screen, so you get a result as close as possible to "doing it with my own eyes."
How pass and fail are decided
Not every step in a scenario carries the same weight. The steps that count toward the verdict are the ones a person directly types or presses — entering an email, filling in a password, pressing the login button: the actions a user really takes. If all of those succeed, the scenario passes.
The following steps, on the other hand, are left out of the pass/fail tally.
- Page transitions — flow steps like moving to the next page.
- System processing — work that runs automatically behind the screen.
- Cleanup — steps that tidy things up at the end.
- External dependencies — steps that rely on a service outside Specnote.
These are parts a user can't directly control, so getting stuck here doesn't drag the whole scenario down to a fail. For example, if an outside payment provider is slow to respond at the checkout step, that's a matter outside your screen rather than a problem with it — so the scenario itself isn't counted as a fail.
The verdict is either pass or fail — nothing in between. There's no ambiguous middle result. And if the automatic check missed something you've confirmed with your own eyes, you can mark it as passed by hand.
When it doesn't go smoothly — auto-recovery
Sometimes it doesn't work cleanly on the first try. A screen loads a touch late, or a button isn't found exactly. When that happens, Specnote doesn't call it a failure right away — it tries again on its own. Changing its approach at each step, it makes up to three attempts. We call this three-stage auto-recovery.
Think of it like a person going "huh, that didn't click — let me wait a second and try once more." This sorts out whether it's a real problem or just a momentary hiccup. If it still won't work after that, then it's recorded as a fail.
It checks the steps are right first
Before running a full verification, Specnote checks one thing first: whether the scenario's steps actually match the current screen. This is called step confirmation.
For example, if there's a "type the email" step, it first looks at whether that input field really exists on screen. If it matches, the step is marked "confirmed"; if the spot has moved or can't be found, it's flagged separately as a step that "needs a look."
The good news: this step-confirmation pass doesn't cost any credits. It's a safety check you tidy up before the real verification, so you can run it freely. Lining things up first means fewer wasted runs that trip over the wrong spot once the real verification starts.
It's saved as video and screenshots
Every time verification runs, two things are saved alongside it.
- A screen recording — a video (in
.webmformat) of the browser replaying the scenario from start to finish. You can play it back exactly as it happened. - Step-by-step screenshots — a picture of what the screen looked like at each step.
So it doesn't end at a single "pass/fail" line. If something failed, you can open the video and screenshots and see with your own eyes which moment went wrong.
A failed scenario is summarized into a fix report. It gathers up what went wrong so you can hand it straight to the AI that edits the code. Specnote doesn't touch the code itself; its job is to produce the "here's what to fix" summary.
For how to read and use verification results, see Running and results. If a term is fuzzy, head back to Key terms at a glance.