Evaficy Smart Test
Learning CentreAI & Test Case GenerationAI Test Case Generation

AI Test Case Generation — How It Works and How to Get the Best Results

What the AI analyzes, what types of test cases it produces, and how to write inputs that unlock precise, comprehensive test suites — automatically.

For QA Engineers and Tech Leads using Evaficy Smart Test to generate test cases from acceptance criteria.

AI generation
Test case design
Acceptance criteria
Input quality
Test types

What Is AI Test Case Generation?

AI test case generation is the process of automatically producing structured, executable test cases from a set of inputs — test type, affected page or component, custom fields, and acceptance criteria — without the need to write cases manually from scratch.

In Evaficy Smart Test, the AI analyzes the inputs you provide and generates complete test cases: each with a title, preconditions, step-by-step instructions, and expected results. It covers the scenarios a QA engineer would write manually — plus the ones they would typically miss: edge cases, negative paths, and state-dependent scenarios that are easy to overlook when writing by hand.

The output is not a replacement for human judgment. It is a comprehensive first draft that a QA engineer reviews, refines, and extends — reducing the time spent on mechanical case-writing so that expertise can be applied to design, review, and execution instead.

Where generation happens in Evaficy Smart Test

The Generate button in the Test Case Selector is available on Advanced and Enterprise plans. Once your scenario is configured with a test type, affected page, and context about the feature, clicking Generate produces a set of test cases in seconds. A warning appears when you approach 80% of your monthly AI generation allowance; the button is disabled once the limit is reached.


What the AI Analyzes Before It Generates

The quality and precision of AI-generated test cases is determined almost entirely by the inputs provided. The AI does not have access to your codebase, design documents, or sprint backlog. It works from what you give it — and the more precise that input, the more targeted the output.

There are four inputs the AI uses. Select each to understand what it does and how to use it effectively.

The test type is the most influential input in the generation form. It tells the AI what question it is trying to answer — and the answer changes fundamentally depending on which question that is.

Selecting "Functional" tells the AI to focus on requirement compliance: does the feature do what the specification says? Selecting "Regression" produces cases designed for re-execution after future changes — stable, repeatable, focused on detecting whether prior behaviour has changed. Selecting "Exploratory" shifts generation toward unusual paths, unexpected inputs, and interaction sequences that structured scripts typically miss.

The same feature described with identical acceptance criteria will produce distinctly different test case sets depending on the type selected. This is by design: each type answers a different question, and the AI adjusts its focus accordingly.

Examples
  • "Functional" + a login form → cases for correct credential handling, validation errors, session creation, and remember-me behaviour
  • "Regression" + a login form → a stable, re-runnable suite covering the critical login paths that must continue working after future changes
  • "Smoke" + a login form → a single case confirming basic login completes without error — nothing more

The Affected Page field tells the AI where in the application to focus. A broad value like "Checkout" produces general checkout-flow tests. A specific value like "Checkout → Payment step → Card entry form" produces tests focused on that component, including input-format validation, card-type switching, and CVV field behaviour.

Specificity here directly affects output quality. The more precise the scope, the less interpretive work the AI has to do, and the more immediately executable the cases are. When a feature spans multiple pages, generate separate scenarios per page rather than combining everything into a single broad scenario.

Examples
  • "Checkout" → broad cases covering the full checkout flow end to end
  • "Checkout > Payment step" → cases focused on the payment step: card entry, error handling, order confirmation
  • "Checkout > Payment step > Card number field" → targeted cases for card-number formatting, Luhn validation, and declined-card handling

Four optional custom fields let you provide additional context that the AI uses to focus its output.

Feature narrows scope to a specific user-facing capability within the page. "Password reset" within a "Login" page restricts generation to that flow and avoids cases for unrelated login functionality. Browser or Environment causes the AI to include platform-specific edge cases — Safari autofill behaviour, iOS keyboard interactions, mobile viewport constraints. Component targets a specific UI or API layer, useful for isolated component or API endpoint testing. Requirement is the most powerful field: paste in acceptance criteria, a user story, or a functional specification, and the AI treats it as a direct source of truth.

Examples
  • Feature: "Two-factor authentication" → AI generates cases for 2FA flows only, not the broader authentication system
  • Browser: "Safari on iOS" → AI adds cases for touch keyboard interactions, autofill conflicts, and Safari-specific rendering edge cases
  • Requirement: "Users must not log in with an unverified email" → AI explicitly generates the unverified-email rejection case

When you paste acceptance criteria into the Requirement field, you give the AI a direct source of truth. Each criterion becomes a testable assertion. "Must" statements become positive cases. "Must not" statements become negative cases. Conditional logic ("if the user is on a mobile device…") becomes branching scenarios.

Format affects quality. Short, structured criteria — one criterion per line, or Given/When/Then format — produce more targeted output than long free-text paragraphs. Include both success and failure conditions: criteria that only describe the happy outcome produce only happy-path cases. Criteria that also specify what must not happen generate the corresponding negative cases.

Examples
  • "User must see an error if the entered email is already registered" → AI generates: attempt registration with existing email, verify error message and field highlighting
  • "The reset link expires in 24 hours" → AI generates: click link within 24h succeeds; same link after 24h shows expiry message
  • "System must not allow checkout if the cart total exceeds £10,000" → AI generates: cart at £9,999 proceeds; cart at £10,001 is blocked

Types of Test Cases the AI Produces

Given well-constructed inputs, the AI produces four distinct categories of test case — each addressing a different kind of risk. Understanding these categories helps you recognise what is already covered in a generated set and what may need to be supplemented manually.

Happy Path

Positive flows — what should happen when everything is done correctly

Happy path cases verify the intended user journey: the sequence of correct inputs and actions that produces the expected successful outcome. They confirm that the feature works as designed for the ideal user in ideal conditions.

Key characteristics
  • Cover each distinct success outcome the feature can produce
  • Follow the complete flow from precondition through action to verified final state
  • Use representative valid inputs — not just any passing value
  • Include post-conditions: confirm what persists or changes after the action succeeds
Example test cases
  • User logs in with a valid registered email and correct password and is redirected to the dashboard
  • User completes checkout with a valid address and card; an order confirmation screen appears and a confirmation email is sent
  • User creates a new project with a unique name, saves it, and it appears in the project list with the correct name and a draft status
Using this case type in Evaficy Smart Test

Happy path cases are always generated first and establish the baseline. If they fail, deeper testing is premature. They are also the most reusable cases — your regression suite is largely built from the happy paths written during initial functional testing.


How to Write Inputs That Produce Useful Test Cases

The AI is only as useful as what you give it. Weak inputs — a broad page name and no acceptance criteria — produce generic cases that could apply to any feature. Precise inputs — a specific component, a targeted feature name, and structured acceptance criteria — produce cases that are immediately executable, traceable to requirements, and genuinely useful for finding bugs.

1. Be specific about scope

Use hierarchical page paths ("Checkout → Payment step") rather than top-level pages ("Checkout"). The AI generates to the level of specificity you provide.

2. Use Feature to narrow further

When a page hosts multiple capabilities, name the exact one you are testing. This prevents the AI from generating cases for adjacent functionality on the same page.

3. Include failure conditions in criteria

Acceptance criteria that only describe success produce only happy path cases. Add "must not" and error conditions to unlock negative and edge case generation.

4. Structure criteria, not prose

One criterion per line, or Given/When/Then format, produces more targeted cases than a paragraph description. The AI parses structured input more reliably than narrative.

5. Specify environment when it matters

If the feature behaves differently on mobile, on a specific browser, or in a particular timezone, use the Browser/Environment field to generate relevant platform-specific edge cases.

Weak input vs. strong input — a side-by-side comparison
Weak input
Test TypeFunctional
Affected PageCheckout
Feature(empty)
Requirement(empty)

Result: broad cases spread across the full checkout flow. Generic expected results. Card-specific validation coverage is entirely absent.
Strong input
Test TypeFunctional
Affected PageCheckout → Payment step
FeatureCard payment validation
RequirementCard number must pass Luhn check before submission. Invalid numbers show an inline error. Expired cards are rejected before submission.

Result: 12+ targeted cases covering Luhn validation, inline error messages, expiry rejection, and successful payment completion with order confirmation.

Reviewing and Editing AI-Generated Cases

AI-generated test cases are a high-quality starting point, not a finished product. Before adding cases to a scenario for execution or expert review, spend a few minutes checking the output. The following five checks catch the most common issues.

1
Remove duplicates

The AI may generate multiple cases testing the same condition from slightly different angles. Keep the most specific version of each; remove weaker or redundant variants.

2
Make expected results verifiable

Replace vague expected results ("an error is shown") with specific, verifiable ones: "a red validation message reading 'Email is required' appears below the email field, and the form does not submit."

3
Complete missing preconditions

Every case that depends on prior state — a logged-in user, a populated cart, a pending approval — must state that state explicitly in the precondition. A test that cannot be reproduced from a defined starting point is not a reliable test.

4
Make steps unambiguous

Read each step as if you are seeing the application for the first time. If a step requires interpretation ("navigate to settings"), make it specific ("click the gear icon in the top-right header to open the Settings panel").

5
Add domain-specific cases manually

AI generation covers what can be inferred from your inputs. Cases requiring deep domain knowledge — unusual business rules, legacy data quirks, regional compliance constraints — must be added manually after the generated set is reviewed.

Edit individual cases, or regenerate the whole set

In Evaficy Smart Test, you can edit any individual AI-generated test case directly in the selector. If the generated set needs significant changes — because your inputs were too broad, or you want to try a different test type — it is usually faster to regenerate with improved inputs than to manually correct a large number of cases.


When to Supplement AI Output With Manual Cases

AI generation covers what can be inferred from your inputs. There are categories of test case that consistently require human expertise to write well:

  • Domain-specific edge cases the AI cannot infer from a description alone — unusual business logic, regulatory constraints, or legacy system quirks your team has accumulated over time
  • Usability and UX observations that a structured test case cannot capture — things that are technically correct but feel wrong, confusing, or inaccessible to real users
  • Cases derived from production bug history — known failure modes that have affected users before and must never regress
  • Performance and load scenarios that require specialised tooling and cannot be executed manually step by step
  • Cases that depend on real-world integration knowledge — knowing that a specific third-party API returns malformed data under particular conditions that no specification documents
AI generation complements human expertise — it does not replace it

The best test suites combine AI-generated breadth with manually written depth. Use generation to ensure systematic coverage of cases that can be derived from requirements; use human expertise to add cases that require contextual knowledge the AI cannot access.


Common Mistakes That Lead to Weak Generated Cases

Most teams encounter the same generation pitfalls. Recognising them early saves significant time in both generation and downstream execution.

No acceptance criteria provided

Generating without criteria produces generic cases that test what the feature might do rather than what it must do. The AI guesses requirements from the feature description alone — and often guesses wrong for non-obvious business rules. Always paste the relevant acceptance criteria into the Requirement field before generating.

Wrong test type for the goal

Selecting "Functional" when you need to verify post-change stability produces the wrong cases. Functional cases confirm requirements; regression cases confirm that prior behaviour has not changed. Choose the type that matches the question you are actually trying to answer before you generate.

Scope too broad

"Dashboard" or "User profile" as the affected page produces cases spread across the full page rather than focused on the specific feature under test. Narrow the scope: "Dashboard > Activity feed" or "User profile > Change password form." The more specific the scope, the more targeted and immediately usable the output.

Accepting the first generation without review

AI-generated cases are a high-quality starting point, not a finished product. Review for duplicates, imprecise expected results, missing preconditions, and steps that would be ambiguous to a tester executing them cold. Five minutes of review after generation prevents hours of confusion during execution.

One test type per feature, every time

Generating only functional cases for every feature leaves regression, edge case, and exploratory coverage gaps. A complete scenario library for any significant feature includes at minimum: functional cases for initial development, regression cases for future releases, and edge case cases for boundary conditions — each as a separate scenario.


Related guides
How to Use AI QA Testing
Platform overview — how generation, validation, and execution work together.
Software Testing Types Explained
A complete guide to the seven core testing types and when to use each one.
How to Set Up a QA Project from Scratch
Step-by-step guide to creating a project, structuring scenarios, and running your first test execution.
Generate your first test suite in minutes

Create a project, configure a scenario, and let the AI produce a comprehensive test suite from your acceptance criteria — then refine and execute it with your team.

Start your trial