AI Test Case Generation — How It Works and How to Get the Best Results
What the AI analyzes, what types of test cases it produces, and how to write inputs that unlock precise, comprehensive test suites — automatically.
For QA Engineers and Tech Leads using Evaficy Smart Test to generate test cases from acceptance criteria.
What Is AI Test Case Generation?
AI test case generation is the process of automatically producing structured, executable test cases from a set of inputs — test type, affected page or component, custom fields, and acceptance criteria — without the need to write cases manually from scratch.
In Evaficy Smart Test, the AI analyzes the inputs you provide and generates complete test cases: each with a title, preconditions, step-by-step instructions, and expected results. It covers the scenarios a QA engineer would write manually — plus the ones they would typically miss: edge cases, negative paths, and state-dependent scenarios that are easy to overlook when writing by hand.
The output is not a replacement for human judgment. It is a comprehensive first draft that a QA engineer reviews, refines, and extends — reducing the time spent on mechanical case-writing so that expertise can be applied to design, review, and execution instead.
Where generation happens in Evaficy Smart Test
The Generate button in the Test Case Selector is available on Advanced and Enterprise plans. Once your scenario is configured with a test type, affected page, and context about the feature, clicking Generate produces a set of test cases in seconds. A warning appears when you approach 80% of your monthly AI generation allowance; the button is disabled once the limit is reached.
What the AI Analyzes Before It Generates
The quality and precision of AI-generated test cases is determined almost entirely by the inputs provided. The AI does not have access to your codebase, design documents, or sprint backlog. It works from what you give it — and the more precise that input, the more targeted the output.
There are four inputs the AI uses. Select each to understand what it does and how to use it effectively.
The Affected Page field tells the AI where in the application to focus. A broad value like "Checkout" produces general checkout-flow tests. A specific value like "Checkout → Payment step → Card entry form" produces tests focused on that component, including input-format validation, card-type switching, and CVV field behaviour.
Specificity here directly affects output quality. The more precise the scope, the less interpretive work the AI has to do, and the more immediately executable the cases are. When a feature spans multiple pages, generate separate scenarios per page rather than combining everything into a single broad scenario.
- "Checkout" → broad cases covering the full checkout flow end to end
- "Checkout > Payment step" → cases focused on the payment step: card entry, error handling, order confirmation
- "Checkout > Payment step > Card number field" → targeted cases for card-number formatting, Luhn validation, and declined-card handling
Four optional custom fields let you provide additional context that the AI uses to focus its output.
Feature narrows scope to a specific user-facing capability within the page. "Password reset" within a "Login" page restricts generation to that flow and avoids cases for unrelated login functionality. Browser or Environment causes the AI to include platform-specific edge cases — Safari autofill behaviour, iOS keyboard interactions, mobile viewport constraints. Component targets a specific UI or API layer, useful for isolated component or API endpoint testing. Requirement is the most powerful field: paste in acceptance criteria, a user story, or a functional specification, and the AI treats it as a direct source of truth.
- Feature: "Two-factor authentication" → AI generates cases for 2FA flows only, not the broader authentication system
- Browser: "Safari on iOS" → AI adds cases for touch keyboard interactions, autofill conflicts, and Safari-specific rendering edge cases
- Requirement: "Users must not log in with an unverified email" → AI explicitly generates the unverified-email rejection case
When you paste acceptance criteria into the Requirement field, you give the AI a direct source of truth. Each criterion becomes a testable assertion. "Must" statements become positive cases. "Must not" statements become negative cases. Conditional logic ("if the user is on a mobile device…") becomes branching scenarios.
Format affects quality. Short, structured criteria — one criterion per line, or Given/When/Then format — produce more targeted output than long free-text paragraphs. Include both success and failure conditions: criteria that only describe the happy outcome produce only happy-path cases. Criteria that also specify what must not happen generate the corresponding negative cases.
- "User must see an error if the entered email is already registered" → AI generates: attempt registration with existing email, verify error message and field highlighting
- "The reset link expires in 24 hours" → AI generates: click link within 24h succeeds; same link after 24h shows expiry message
- "System must not allow checkout if the cart total exceeds £10,000" → AI generates: cart at £9,999 proceeds; cart at £10,001 is blocked
Types of Test Cases the AI Produces
Given well-constructed inputs, the AI produces four distinct categories of test case — each addressing a different kind of risk. Understanding these categories helps you recognise what is already covered in a generated set and what may need to be supplemented manually.
Happy Path
Positive flows — what should happen when everything is done correctly
Happy path cases verify the intended user journey: the sequence of correct inputs and actions that produces the expected successful outcome. They confirm that the feature works as designed for the ideal user in ideal conditions.
Key characteristics
- Cover each distinct success outcome the feature can produce
- Follow the complete flow from precondition through action to verified final state
- Use representative valid inputs — not just any passing value
- Include post-conditions: confirm what persists or changes after the action succeeds
Example test cases
- User logs in with a valid registered email and correct password and is redirected to the dashboard
- User completes checkout with a valid address and card; an order confirmation screen appears and a confirmation email is sent
- User creates a new project with a unique name, saves it, and it appears in the project list with the correct name and a draft status
Using this case type in Evaficy Smart Test
Happy path cases are always generated first and establish the baseline. If they fail, deeper testing is premature. They are also the most reusable cases — your regression suite is largely built from the happy paths written during initial functional testing.
How to Write Inputs That Produce Useful Test Cases
The AI is only as useful as what you give it. Weak inputs — a broad page name and no acceptance criteria — produce generic cases that could apply to any feature. Precise inputs — a specific component, a targeted feature name, and structured acceptance criteria — produce cases that are immediately executable, traceable to requirements, and genuinely useful for finding bugs.
1. Be specific about scope
Use hierarchical page paths ("Checkout → Payment step") rather than top-level pages ("Checkout"). The AI generates to the level of specificity you provide.
2. Use Feature to narrow further
When a page hosts multiple capabilities, name the exact one you are testing. This prevents the AI from generating cases for adjacent functionality on the same page.
3. Include failure conditions in criteria
Acceptance criteria that only describe success produce only happy path cases. Add "must not" and error conditions to unlock negative and edge case generation.
4. Structure criteria, not prose
One criterion per line, or Given/When/Then format, produces more targeted cases than a paragraph description. The AI parses structured input more reliably than narrative.
5. Specify environment when it matters
If the feature behaves differently on mobile, on a specific browser, or in a particular timezone, use the Browser/Environment field to generate relevant platform-specific edge cases.
Weak input vs. strong input — a side-by-side comparison
Weak input
Result: broad cases spread across the full checkout flow. Generic expected results. Card-specific validation coverage is entirely absent.
Strong input
Result: 12+ targeted cases covering Luhn validation, inline error messages, expiry rejection, and successful payment completion with order confirmation.
Reviewing and Editing AI-Generated Cases
AI-generated test cases are a high-quality starting point, not a finished product. Before adding cases to a scenario for execution or expert review, spend a few minutes checking the output. The following five checks catch the most common issues.
Remove duplicates
The AI may generate multiple cases testing the same condition from slightly different angles. Keep the most specific version of each; remove weaker or redundant variants.
Make expected results verifiable
Replace vague expected results ("an error is shown") with specific, verifiable ones: "a red validation message reading 'Email is required' appears below the email field, and the form does not submit."
Complete missing preconditions
Every case that depends on prior state — a logged-in user, a populated cart, a pending approval — must state that state explicitly in the precondition. A test that cannot be reproduced from a defined starting point is not a reliable test.
Make steps unambiguous
Read each step as if you are seeing the application for the first time. If a step requires interpretation ("navigate to settings"), make it specific ("click the gear icon in the top-right header to open the Settings panel").
Add domain-specific cases manually
AI generation covers what can be inferred from your inputs. Cases requiring deep domain knowledge — unusual business rules, legacy data quirks, regional compliance constraints — must be added manually after the generated set is reviewed.
Edit individual cases, or regenerate the whole set
In Evaficy Smart Test, you can edit any individual AI-generated test case directly in the selector. If the generated set needs significant changes — because your inputs were too broad, or you want to try a different test type — it is usually faster to regenerate with improved inputs than to manually correct a large number of cases.
When to Supplement AI Output With Manual Cases
AI generation covers what can be inferred from your inputs. There are categories of test case that consistently require human expertise to write well:
- Domain-specific edge cases the AI cannot infer from a description alone — unusual business logic, regulatory constraints, or legacy system quirks your team has accumulated over time
- Usability and UX observations that a structured test case cannot capture — things that are technically correct but feel wrong, confusing, or inaccessible to real users
- Cases derived from production bug history — known failure modes that have affected users before and must never regress
- Performance and load scenarios that require specialised tooling and cannot be executed manually step by step
- Cases that depend on real-world integration knowledge — knowing that a specific third-party API returns malformed data under particular conditions that no specification documents
AI generation complements human expertise — it does not replace it
The best test suites combine AI-generated breadth with manually written depth. Use generation to ensure systematic coverage of cases that can be derived from requirements; use human expertise to add cases that require contextual knowledge the AI cannot access.
Common Mistakes That Lead to Weak Generated Cases
Most teams encounter the same generation pitfalls. Recognising them early saves significant time in both generation and downstream execution.
No acceptance criteria provided
Generating without criteria produces generic cases that test what the feature might do rather than what it must do. The AI guesses requirements from the feature description alone — and often guesses wrong for non-obvious business rules. Always paste the relevant acceptance criteria into the Requirement field before generating.
Wrong test type for the goal
Selecting "Functional" when you need to verify post-change stability produces the wrong cases. Functional cases confirm requirements; regression cases confirm that prior behaviour has not changed. Choose the type that matches the question you are actually trying to answer before you generate.
Scope too broad
"Dashboard" or "User profile" as the affected page produces cases spread across the full page rather than focused on the specific feature under test. Narrow the scope: "Dashboard > Activity feed" or "User profile > Change password form." The more specific the scope, the more targeted and immediately usable the output.
Accepting the first generation without review
AI-generated cases are a high-quality starting point, not a finished product. Review for duplicates, imprecise expected results, missing preconditions, and steps that would be ambiguous to a tester executing them cold. Five minutes of review after generation prevents hours of confusion during execution.
One test type per feature, every time
Generating only functional cases for every feature leaves regression, edge case, and exploratory coverage gaps. A complete scenario library for any significant feature includes at minimum: functional cases for initial development, regression cases for future releases, and edge case cases for boundary conditions — each as a separate scenario.
Related guides
How to Use AI QA Testing
Platform overview — how generation, validation, and execution work together.Software Testing Types Explained
A complete guide to the seven core testing types and when to use each one.How to Set Up a QA Project from Scratch
Step-by-step guide to creating a project, structuring scenarios, and running your first test execution.Generate your first test suite in minutes
Create a project, configure a scenario, and let the AI produce a comprehensive test suite from your acceptance criteria — then refine and execute it with your team.