agent planning skill risk: low
ML Ablation Study Planner
Designs ablation studies from a reviewer's perspective to isolate novel components, test hyperparameters, and answer expected questions, then parses results into tables and has CC…
- External action: low
SKILL 1 file
SKILL.md
---
name: ablation-planner
description: "Use when main results pass result-to-claim (claim_supported=yes or partial) and ablation studies are needed for paper submission. Codex designs ablations from a reviewer's perspective, CC reviews feasibility and implements."
---
# Ablation Planner
Systematically design ablation studies that answer the questions reviewers will ask. Codex leads the design (reviewer perspective), CC reviews feasibility and implements.
## Context: $ARGUMENTS
## When to Use
- Main results pass `/result-to-claim` with claim_supported = yes or partial
- User explicitly requests ablation planning
- `/auto-review-loop` reviewer identifies missing ablations
## Workflow
### Step 1: Prepare Context
CC reads available project files to build the full picture:
- Method description and components (from docs/research_contract.md or project CLAUDE.md)
- Current experiment results (from EXPERIMENT_LOG.md, EXPERIMENT_TRACKER.md, or W&B)
- Confirmed and intended claims (from result-to-claim output or project notes)
- Available compute resources (from CLAUDE.md server config, if present)
### Step 2: Codex Designs Ablations
```
mcp__codex__codex:
config: {"model_reasoning_effort": "xhigh"}
prompt: |
You are a rigorous ML reviewer planning ablation studies.
Given this method and results, design ablations that:
1. Isolate the contribution of each novel component
2. Answer questions reviewers will definitely ask
3. Test sensitivity to key hyperparameters
4. Compare against natural alternative design choices
Method: [description from project files]
Components: [list of removable/replaceable components]
Current results: [key metrics from experiments]
Claims: [what we claim and current evidence]
For each ablation, specify:
- name: what to change (e.g., "remove module X", "replace Y with Z")
- what_it_tests: the specific question this answers
- expected_if_component_matters: what we predict if the component is important
- priority: 1 (must-run) to 5 (nice-to-have)
Also provide:
- coverage_assessment: what reviewer questions these ablations answer
- unnecessary_ablations: experiments that seem useful but won't add insight
- suggested_order: run order optimized for maximum early information
- estimated_compute: total GPU-hours estimate
```
### Step 3: Parse Ablation Plan
Normalize Codex response into structured format:
```markdown
## Ablation Plan
### Component Ablations (highest priority)
| # | Name | What It Tests | Expected If Matters | Priority |
|---|------|---------------|---------------------|----------|
| 1 | remove module X | contribution of X | performance drops on metric Y | 1 |
| 2 | replace X with simpler Z | value of learned vs fixed | drops, especially on dataset A | 2 |
### Hyperparameter Sensitivity
| # | Parameter | Values to Test | What It Tests | Priority |
|---|-----------|---------------|---------------|----------|
| 3 | lambda | [0.01, 0.1, 1.0] | sensitivity to regularization | 3 |
### Design Choice Comparisons
| # | Name | What It Tests | Priority |
|---|------|---------------|----------|
| 4 | joint vs separate matching | whether joint adds value | 4 |
### Coverage Assessment
[What reviewer questions these ablations answer]
### Unnecessary Ablations
[Experiments that seem useful but won't add insight — skip these]
### Run Order
[Optimized for maximum early information]
### Estimated Compute
[Total GPU-hours]
```
### Step 4: CC Reviews Feasibility
Before running anything, CC checks:
- Compute budget: can we afford all ablations with available GPUs?
- Code changes: which ablations need code modifications vs config-only changes?
- Dependencies: which ablations can run in parallel?
- Cuts: if budget is tight, propose removing lower-priority ablations and ask Codex to confirm
### Step 5: Implement and Run
1. Create configs/scripts for each ablation (config-only changes first)
2. Smoke test each ablation before full run
3. Run in suggested order, using descriptive names (e.g., `ablation-no-module-X`)
4. Track results in EXPERIMENT_LOG.md
5. After all ablations complete → update findings.md with insights
## Rules
- **Codex leads the design. CC does not pre-filter or bias the ablation list** before Codex sees it. Codex thinks like a reviewer; CC thinks like an engineer.
- Every ablation must have a clear `what_it_tests` and `expected_if_component_matters`. No "just try it" experiments.
- Config-only ablations take priority over those needing code changes (faster, less error-prone).
- If total compute exceeds budget, CC proposes cuts and asks Codex to re-prioritize — don't silently drop ablations.
- Component ablations (remove/replace) take priority over hyperparameter sweeps.
- Do not generate ablations for components identical to the baseline (no-op ablations).
- Record all ablation results in EXPERIMENT_LOG.md, including negative results (component removal had no effect = important finding).
INPUTS
- $ARGUMENTS REQUIRED
Context for the ablation planning task
REQUIRED CONTEXT
- method description and components
- current experiment results
- confirmed and intended claims
OPTIONAL CONTEXT
- available compute resources
TOOLS REQUIRED
- codex
ROLES & RULES
Role assignments
- You are a rigorous ML reviewer planning ablation studies.
- Codex leads the design. CC does not pre-filter or bias the ablation list before Codex sees it.
- Every ablation must have a clear `what_it_tests` and `expected_if_component_matters`.
- Config-only ablations take priority over those needing code changes.
- If total compute exceeds budget, CC proposes cuts and asks Codex to re-prioritize.
- Component ablations take priority over hyperparameter sweeps.
- Do not generate ablations for components identical to the baseline.
- Record all ablation results in EXPERIMENT_LOG.md, including negative results.
EXPECTED OUTPUT
- Format
- markdown
- Schema
- markdown_sections · Ablation Plan, Component Ablations, Hyperparameter Sensitivity, Design Choice Comparisons, Coverage Assessment, Unnecessary Ablations, Run Order, Estimated Compute
- Constraints
- use specified table structure for component ablations, hyperparameter sensitivity, and design comparisons
- include coverage_assessment, unnecessary_ablations, run_order, and estimated_compute sections
SUCCESS CRITERIA
- Isolate the contribution of each novel component
- Answer questions reviewers will definitely ask
- Test sensitivity to key hyperparameters
- Compare against natural alternative design choices
EXAMPLES
Includes one detailed example of the final normalized ablation plan in markdown with multiple tables and sections.
CAVEATS
- Dependencies
- Requires available project files (research_contract.md, CLAUDE.md, EXPERIMENT_LOG.md)
- Requires result-to-claim output or project notes
- Requires current experiment results and compute resources info
- Missing context
- Exact schema or example content for $ARGUMENTS
- How to access or format the Codex tool call in different environments
- Ambiguities
- Context: $ARGUMENTS placeholder has no specified format or content requirements.
- References to specific files (docs/research_contract.md, EXPERIMENT_LOG.md, CLAUDE.md) assume a fixed project structure without defining alternatives.
QUALITY
- OVERALL
- 0.85
- CLARITY
- 0.85
- SPECIFICITY
- 0.90
- REUSABILITY
- 0.80
- COMPLETENESS
- 0.85
IMPROVEMENT SUGGESTIONS
- Replace the inline Codex prompt placeholders with explicit variables (e.g., {{method_description}}, {{components_list}}) so the template can be instantiated without manual editing.
- Add a short 'Input contract' section listing the minimum files or data that must exist before the workflow runs.
USAGE
Copy the prompt above and paste it into your AI of choice — Claude, ChatGPT, Gemini, or anywhere else you're working. Replace any placeholder sections with your own context, then ask for the output.
MORE FOR AGENT
- Consciousness Council Multi-Perspective Deliberationagentplanning
- Multi-Agent Architecture Patterns Guideagentplanning
- TDD Implementation Plan Writeragentplanning
- A/B Test Design and Analysis Guideagentplanning
- Autonomous EDA Design Space Exploreragentplanning
- Autonomous Design Space Exploration Loopagentplanning
- Website Architecture Planning Expertagentplanning
- BDI RDF Mental State Modeleragentplanning
- Collaborative Software Design Brainstorming Processagentplanning
- WWA Product Backlog Item Creatoragentplanning
- Structured Development Plan Outlineragentplanning
- Ansoff Matrix Growth Strategy Analyzeragentplanning
- Team OKR Brainstorming Product Leaderagentplanning
- Context Engineering Fundamentalsagentplanning
- Product Monetization Strategy Developeragentplanning
- LLM Project Pipeline Development Methodologyagentplanning
- What-If Scenario Analysis Oracleagentplanning
- Business Model Canvas Generatoragentplanning
- Implementation Plan Execution Workflowagentplanning
- Concise Coding Task Planneragentplanning
- Domain Model Plan Grilling Intervieweragentplanning
- Latent Briefing KV Cache Compactionagentplanning
- Product Roadmap Outcome Transformeragentplanning
- Puzzle Activity Planner with Generator Linksagentplanning
- Osterwalder Business Model Canvas Architectagentplanning