analyst analysis skill risk: low

A/B Test Statistical Results Evaluator

The prompt instructs the model to analyze A/B test results by validating sample size and setup, calculating conversion rates, lifts, p-values, and confidence intervals, checking gu…

SKILL 1 file

SKILL.md

Download

---
name: ab-test-analysis
description: "Analyze A/B test results with statistical significance, sample size validation, confidence intervals, and ship/extend/stop recommendations. Use when evaluating experiment results, checking if a test reached significance, interpreting split test data, or deciding whether to ship a variant."
---
## A/B Test Analysis

Evaluate A/B test results with statistical rigor and translate findings into clear product decisions.

### Context

You are analyzing A/B test results for **$ARGUMENTS**.

If the user provides data files (CSV, Excel, or analytics exports), read and analyze them directly. Generate Python scripts for statistical calculations when needed.

### Instructions

1. **Understand the experiment**:
   - What was the hypothesis?
   - What was changed (the variant)?
   - What is the primary metric? Any guardrail metrics?
   - How long did the test run?
   - What is the traffic split?

2. **Validate the test setup**:
   - **Sample size**: Is the sample large enough for the expected effect size?
     - Use the formula: n = (Z²α/2 × 2 × p × (1-p)) / MDE²
     - Flag if the test is underpowered (<80% power)
   - **Duration**: Did the test run for at least 1-2 full business cycles?
   - **Randomization**: Any evidence of sample ratio mismatch (SRM)?
   - **Novelty/primacy effects**: Was there enough time to wash out initial behavior changes?

3. **Calculate statistical significance**:
   - **Conversion rate** for control and variant
   - **Relative lift**: (variant - control) / control × 100
   - **p-value**: Using a two-tailed z-test or chi-squared test
   - **Confidence interval**: 95% CI for the difference
   - **Statistical significance**: Is p < 0.05?
   - **Practical significance**: Is the lift meaningful for the business?

   If the user provides raw data, generate and run a Python script to calculate these.

4. **Check guardrail metrics**:
   - Did any guardrail metrics (revenue, engagement, page load time) degrade?
   - A winning primary metric with degraded guardrails may not be a true win

5. **Interpret results**:

   | Outcome | Recommendation |
   |---|---|
   | Significant positive lift, no guardrail issues | **Ship it** — roll out to 100% |
   | Significant positive lift, guardrail concerns | **Investigate** — understand trade-offs before shipping |
   | Not significant, positive trend | **Extend the test** — need more data or larger effect |
   | Not significant, flat | **Stop the test** — no meaningful difference detected |
   | Significant negative lift | **Don't ship** — revert to control, analyze why |

6. **Provide the analysis summary**:
   ```
   ## A/B Test Results: [Test Name]

   **Hypothesis**: [What we expected]
   **Duration**: [X days] | **Sample**: [N control / M variant]

   | Metric | Control | Variant | Lift | p-value | Significant? |
   |---|---|---|---|---|---|
   | [Primary] | X% | Y% | +Z% | 0.0X | Yes/No |
   | [Guardrail] | ... | ... | ... | ... | ... |

   **Recommendation**: [Ship / Extend / Stop / Investigate]
   **Reasoning**: [Why]
   **Next steps**: [What to do]
   ```

Think step by step. Save as markdown. Generate Python scripts for calculations if raw data is provided.

---

### Further Reading

- [A/B Testing 101 + Examples](https://www.productcompass.pm/p/ab-testing-101-for-pms)
- [Testing Product Ideas: The Ultimate Validation Experiments Library](https://www.productcompass.pm/p/the-ultimate-experiments-library)
- [Are You Tracking the Right Metrics?](https://www.productcompass.pm/p/are-you-tracking-the-right-metrics)

INPUTS

$ARGUMENTS REQUIRED: description of the A/B test to analyze

REQUIRED CONTEXT

A/B test results or $ARGUMENTS describing the experiment

OPTIONAL CONTEXT

data files (CSV, Excel, analytics exports)
raw data for statistical calculations

TOOLS REQUIRED

code_execution

ROLES & RULES

Role assignments

You are analyzing A/B test results for **$ARGUMENTS**.

EXPECTED OUTPUT

Format

markdown

Schema

markdown_sections · Hypothesis, Duration, Sample, Metric table, Recommendation, Reasoning, Next steps

Constraints

use provided analysis summary template
include hypothesis, duration, sample, metrics table, recommendation, reasoning and next steps
generate Python scripts for calculations if raw data provided

SUCCESS CRITERIA

Understand the experiment
Validate the test setup
Calculate statistical significance
Check guardrail metrics
Interpret results
Provide the analysis summary

EXAMPLES

Includes one outcome-to-recommendation table and one full markdown analysis summary template.

CAVEATS

Dependencies

Requires data files (CSV, Excel, or analytics exports) or raw data

Missing context

Exact format or schema expected for raw data files

QUALITY

OVERALL: 0.85
CLARITY: 0.90
SPECIFICITY: 0.85
REUSABILITY: 0.80
COMPLETENESS: 0.85

IMPROVEMENT SUGGESTIONS

Replace the placeholder $ARGUMENTS with an explicit input variable such as {{experiment_description}}

USAGE

Copy the prompt above and paste it into your AI of choice — Claude, ChatGPT, Gemini, or anywhere else you're working. Replace any placeholder sections with your own context, then ask for the output.