Skip to main content
NEW · APP STORE Now on iOS · macOS · iPad Android & Windows soon GET IT
Prompts ML Experiment Results Analyzer

agent analysis skill risk: low

ML Experiment Results Analyzer

Analyzes ML experiment results from JSON/CSV files by locating outputs, building comparison tables with independent/dependent variables and deltas, performing statistical analysis,…

SKILL 1 file

SKILL.md
---
name: auto-claude-code-research-in-sleep-analyze-results
description: "Analyze ML experiment results, compute statistics, generate comparison tables and insights. Use when user says \"analyze results\", \"compare\", or needs to interpret experimental data."
---
# Analyze Experiment Results

Analyze: $ARGUMENTS

## Workflow

### Step 1: Locate Results
Find all relevant JSON/CSV result files:
- Check `figures/`, `results/`, or project-specific output directories
- Parse JSON results into structured data

### Step 2: Build Comparison Table
Organize results by:
- **Independent variables**: model type, hyperparameters, data config
- **Dependent variables**: primary metric (e.g., perplexity, accuracy, loss), secondary metrics
- **Delta vs baseline**: always compute relative improvement

### Step 3: Statistical Analysis
- If multiple seeds: report mean +/- std, check reproducibility
- If sweeping a parameter: identify trends (monotonic, U-shaped, plateau)
- Flag outliers or suspicious results

### Step 4: Generate Insights
For each finding, structure as:
1. **Observation**: what the data shows (with numbers)
2. **Interpretation**: why this might be happening
3. **Implication**: what this means for the research question
4. **Next step**: what experiment would test the interpretation

### Step 5: Update Documentation
If findings are significant:
- Propose updates to project notes or experiment reports
- Draft a concise finding statement (1-2 sentences)

## Output Format
Always include:
1. Raw data table
2. Key findings (numbered, concise)
3. Suggested next experiments (if any)

INPUTS

$ARGUMENTS REQUIRED

description of the experiment results to analyze

REQUIRED CONTEXT

  • $ARGUMENTS

TOOLS REQUIRED

  • file_search
  • code_execution

ROLES & RULES

  1. Find all relevant JSON/CSV result files
  2. Organize results by independent variables, dependent variables and delta vs baseline
  3. Always compute relative improvement
  4. Report mean +/- std and check reproducibility if multiple seeds
  5. Identify trends and flag outliers if sweeping a parameter
  6. Structure each finding as Observation, Interpretation, Implication, Next step
  7. Propose updates to documentation if findings are significant
  8. Always include raw data table, key findings and suggested next experiments

EXPECTED OUTPUT

Format
structured_report
Schema
numbered_list · Raw data table, Key findings, Suggested next experiments
Constraints
  • always include raw data table
  • always include numbered key findings
  • always include suggested next experiments if any

SUCCESS CRITERIA

  • Locate and parse result files
  • Build comparison tables with deltas
  • Perform statistical analysis
  • Generate structured insights
  • Update documentation when appropriate

CAVEATS

Dependencies
  • $ARGUMENTS
  • result files in figures/, results/ or project-specific directories
Missing context
  • Target project root or experiment naming conventions
  • Preferred statistical libraries or output formats (e.g., markdown vs LaTeX)
Ambiguities
  • Does not specify exact project-specific output directories or how they are discovered.
  • "Delta vs baseline" does not define how the baseline is identified.

QUALITY

OVERALL
0.79
CLARITY
0.85
SPECIFICITY
0.75
REUSABILITY
0.80
COMPLETENESS
0.78

IMPROVEMENT SUGGESTIONS

  • Add a configurable list of result directories as a prompt variable instead of hard-coded examples.
  • Specify how the baseline run is selected (e.g., first entry, lowest loss, or user-provided name).

USAGE

Copy the prompt above and paste it into your AI of choice — Claude, ChatGPT, Gemini, or anywhere else you're working. Replace any placeholder sections with your own context, then ask for the output.

MORE FOR AGENT