agent analysis skill risk: low

ML Experiment Results Analyzer

Analyzes ML experiment results from JSON/CSV files by locating outputs, building comparison tables with independent/dependent variables and deltas, performing statistical analysis,…

SKILL 1 file

SKILL.md

Download

---
name: auto-claude-code-research-in-sleep-analyze-results
description: "Analyze ML experiment results, compute statistics, generate comparison tables and insights. Use when user says \"analyze results\", \"compare\", or needs to interpret experimental data."
---
# Analyze Experiment Results

Analyze: $ARGUMENTS

## Workflow

### Step 1: Locate Results
Find all relevant JSON/CSV result files:
- Check `figures/`, `results/`, or project-specific output directories
- Parse JSON results into structured data

### Step 2: Build Comparison Table
Organize results by:
- **Independent variables**: model type, hyperparameters, data config
- **Dependent variables**: primary metric (e.g., perplexity, accuracy, loss), secondary metrics
- **Delta vs baseline**: always compute relative improvement

### Step 3: Statistical Analysis
- If multiple seeds: report mean +/- std, check reproducibility
- If sweeping a parameter: identify trends (monotonic, U-shaped, plateau)
- Flag outliers or suspicious results

### Step 4: Generate Insights
For each finding, structure as:
1. **Observation**: what the data shows (with numbers)
2. **Interpretation**: why this might be happening
3. **Implication**: what this means for the research question
4. **Next step**: what experiment would test the interpretation

### Step 5: Update Documentation
If findings are significant:
- Propose updates to project notes or experiment reports
- Draft a concise finding statement (1-2 sentences)

## Output Format
Always include:
1. Raw data table
2. Key findings (numbered, concise)
3. Suggested next experiments (if any)

INPUTS

$ARGUMENTS REQUIRED: description of the experiment results to analyze

REQUIRED CONTEXT

$ARGUMENTS

TOOLS REQUIRED

file_search
code_execution

ROLES & RULES

Find all relevant JSON/CSV result files
Organize results by independent variables, dependent variables and delta vs baseline
Always compute relative improvement
Report mean +/- std and check reproducibility if multiple seeds
Identify trends and flag outliers if sweeping a parameter
Structure each finding as Observation, Interpretation, Implication, Next step
Propose updates to documentation if findings are significant
Always include raw data table, key findings and suggested next experiments

EXPECTED OUTPUT

Format

structured_report

Schema

numbered_list · Raw data table, Key findings, Suggested next experiments

Constraints

always include raw data table
always include numbered key findings
always include suggested next experiments if any

SUCCESS CRITERIA

Locate and parse result files
Build comparison tables with deltas
Perform statistical analysis
Generate structured insights
Update documentation when appropriate

CAVEATS

Dependencies

$ARGUMENTS
result files in figures/, results/ or project-specific directories

Missing context

Target project root or experiment naming conventions
Preferred statistical libraries or output formats (e.g., markdown vs LaTeX)

Ambiguities

Does not specify exact project-specific output directories or how they are discovered.
"Delta vs baseline" does not define how the baseline is identified.

QUALITY

OVERALL: 0.79
CLARITY: 0.85
SPECIFICITY: 0.75
REUSABILITY: 0.80
COMPLETENESS: 0.78

IMPROVEMENT SUGGESTIONS

Add a configurable list of result directories as a prompt variable instead of hard-coded examples.
Specify how the baseline run is selected (e.g., first entry, lowest loss, or user-provided name).

USAGE

Copy the prompt above and paste it into your AI of choice — Claude, ChatGPT, Gemini, or anywhere else you're working. Replace any placeholder sections with your own context, then ask for the output.