analyst analysis skill risk: low
A/B Test Statistical Results Evaluator
The prompt instructs the model to analyze A/B test results by validating sample size and setup, calculating conversion rates, lifts, p-values, and confidence intervals, checking gu…
SKILL 1 file
SKILL.md
---
name: ab-test-analysis
description: "Analyze A/B test results with statistical significance, sample size validation, confidence intervals, and ship/extend/stop recommendations. Use when evaluating experiment results, checking if a test reached significance, interpreting split test data, or deciding whether to ship a variant."
---
## A/B Test Analysis
Evaluate A/B test results with statistical rigor and translate findings into clear product decisions.
### Context
You are analyzing A/B test results for **$ARGUMENTS**.
If the user provides data files (CSV, Excel, or analytics exports), read and analyze them directly. Generate Python scripts for statistical calculations when needed.
### Instructions
1. **Understand the experiment**:
- What was the hypothesis?
- What was changed (the variant)?
- What is the primary metric? Any guardrail metrics?
- How long did the test run?
- What is the traffic split?
2. **Validate the test setup**:
- **Sample size**: Is the sample large enough for the expected effect size?
- Use the formula: n = (Z²α/2 × 2 × p × (1-p)) / MDE²
- Flag if the test is underpowered (<80% power)
- **Duration**: Did the test run for at least 1-2 full business cycles?
- **Randomization**: Any evidence of sample ratio mismatch (SRM)?
- **Novelty/primacy effects**: Was there enough time to wash out initial behavior changes?
3. **Calculate statistical significance**:
- **Conversion rate** for control and variant
- **Relative lift**: (variant - control) / control × 100
- **p-value**: Using a two-tailed z-test or chi-squared test
- **Confidence interval**: 95% CI for the difference
- **Statistical significance**: Is p < 0.05?
- **Practical significance**: Is the lift meaningful for the business?
If the user provides raw data, generate and run a Python script to calculate these.
4. **Check guardrail metrics**:
- Did any guardrail metrics (revenue, engagement, page load time) degrade?
- A winning primary metric with degraded guardrails may not be a true win
5. **Interpret results**:
| Outcome | Recommendation |
|---|---|
| Significant positive lift, no guardrail issues | **Ship it** — roll out to 100% |
| Significant positive lift, guardrail concerns | **Investigate** — understand trade-offs before shipping |
| Not significant, positive trend | **Extend the test** — need more data or larger effect |
| Not significant, flat | **Stop the test** — no meaningful difference detected |
| Significant negative lift | **Don't ship** — revert to control, analyze why |
6. **Provide the analysis summary**:
```
## A/B Test Results: [Test Name]
**Hypothesis**: [What we expected]
**Duration**: [X days] | **Sample**: [N control / M variant]
| Metric | Control | Variant | Lift | p-value | Significant? |
|---|---|---|---|---|---|
| [Primary] | X% | Y% | +Z% | 0.0X | Yes/No |
| [Guardrail] | ... | ... | ... | ... | ... |
**Recommendation**: [Ship / Extend / Stop / Investigate]
**Reasoning**: [Why]
**Next steps**: [What to do]
```
Think step by step. Save as markdown. Generate Python scripts for calculations if raw data is provided.
---
### Further Reading
- [A/B Testing 101 + Examples](https://www.productcompass.pm/p/ab-testing-101-for-pms)
- [Testing Product Ideas: The Ultimate Validation Experiments Library](https://www.productcompass.pm/p/the-ultimate-experiments-library)
- [Are You Tracking the Right Metrics?](https://www.productcompass.pm/p/are-you-tracking-the-right-metrics)
INPUTS
- $ARGUMENTS REQUIRED
description of the A/B test to analyze
REQUIRED CONTEXT
- A/B test results or $ARGUMENTS describing the experiment
OPTIONAL CONTEXT
- data files (CSV, Excel, analytics exports)
- raw data for statistical calculations
TOOLS REQUIRED
- code_execution
ROLES & RULES
Role assignments
- You are analyzing A/B test results for **$ARGUMENTS**.
EXPECTED OUTPUT
- Format
- markdown
- Schema
- markdown_sections · Hypothesis, Duration, Sample, Metric table, Recommendation, Reasoning, Next steps
- Constraints
- use provided analysis summary template
- include hypothesis, duration, sample, metrics table, recommendation, reasoning and next steps
- generate Python scripts for calculations if raw data provided
SUCCESS CRITERIA
- Understand the experiment
- Validate the test setup
- Calculate statistical significance
- Check guardrail metrics
- Interpret results
- Provide the analysis summary
EXAMPLES
Includes one outcome-to-recommendation table and one full markdown analysis summary template.
CAVEATS
- Dependencies
- Requires data files (CSV, Excel, or analytics exports) or raw data
- Missing context
- Exact format or schema expected for raw data files
QUALITY
- OVERALL
- 0.85
- CLARITY
- 0.90
- SPECIFICITY
- 0.85
- REUSABILITY
- 0.80
- COMPLETENESS
- 0.85
IMPROVEMENT SUGGESTIONS
- Replace the placeholder $ARGUMENTS with an explicit input variable such as {{experiment_description}}
USAGE
Copy the prompt above and paste it into your AI of choice — Claude, ChatGPT, Gemini, or anywhere else you're working. Replace any placeholder sections with your own context, then ask for the output.
MORE FOR ANALYST
- ML Missing Values Treatment Pipelineanalystanalysis
- Quantitative Sports Betting Edge Evaluatoranalystanalysis
- B2B Manufacturing Homepage Tech-SEO Diagnosticanalystanalysis
- Technical Academic Paper Revieweranalystanalysis
- UX Landing Page Conversion Analyzeranalystanalysis
- TAM SAM SOM Market Size Estimatoranalystanalysis
- PyDESeq2 RNA-seq Differential Expression Guideanalystanalysis
- CSV Data Audit and Cleaning Pipelineanalystanalysis
- Statistical Test Selection and APA Reportinganalystanalysis
- Cohort Retention and Feature Adoption Analyzeranalystanalysis
- Network Fault Report Generatoranalystanalysis
- Technical Swimsuit Photo Analysis JSONanalystanalysis
- Energy DJU Consumption Cost Analyzeranalystanalysis
- French Financial Table Trends Analyzeranalystanalysis
- Online Groups Values Behaviors Comparatoranalystanalysis
- Academic Research Brainstorm and Improvement Analyzeranalystresearch
- OSINT US Surveillance Source Investigatoranalystresearch
- Curated Compendium of Cuckold BNWO Websitesanalystresearch
- US Indices Market News and Sentiment Reporteranalystfinance
- Incident Root Cause Analysis Generatoranalystoperations
- Mobile Gaming UA Creative Network Analyzeranalystmarketing
- Crypto 2026 Outlook Summary Analystanalystfinance
- Academic Research Paper Evaluatoranalystevaluation
- French Banking Regulatory Text Reformulatoranalystwriting
- Local SEO Analysis Report Generatoranalystmarketing