Skip to main content
NEW · APP STORE Now on iOS · macOS · iPad Android & Windows soon GET IT
Prompts Codex ML Research Review Workflow

agent research skill risk: low

Codex ML Research Review Workflow

The prompt directs an agent to compile research context then conduct a multi-round critical review of ML work via a secondary Codex agent using spawn_agent and send_input calls wit…

  • External action: medium

SKILL 1 file

SKILL.md
---
name: auto-claude-code-research-in-sleep-research-review-36a6dbd2
description: "Get a deep critical review of research from GPT using a secondary Codex agent. Use when user says /\"review my research/\", /\"help me review/\", /\"get external review/\", or wants critical feedback on research ideas, papers, or experimental results."
---
# Research Review via a secondary Codex agent (xhigh reasoning)

Get a multi-round critical review of research work from an external LLM with maximum reasoning depth.

## Constants

- REVIEWER_MODEL = `gpt-5.5` — Model used via a secondary Codex agent. Must be an OpenAI model (e.g., `gpt-5.5`, `o3`, `gpt-4o`)
- **REVIEWER_BACKEND = `codex`** — Default: Codex xhigh reviewer. Use `--reviewer: oracle-pro` only when explicitly requested; if Oracle is unavailable, warn and fall back to Codex xhigh.

## Context: $ARGUMENTS

## Prerequisites

- Use `spawn_agent` and `send_input` when the user has explicitly allowed delegation or subagents.
- If delegation is not allowed, run the same review loop locally and preserve the same deliverable structure.

## Workflow

### Step 1: Gather Research Context
Before calling the external reviewer, compile a comprehensive briefing:
1. Read project narrative documents (e.g., STORY.md, README.md, paper drafts)
2. Read any memory/notes files for key findings and experiment history
3. Identify: core claims, methodology, key results, known weaknesses

### Step 2: Initial Review (Round 1)
Send a detailed prompt with xhigh reasoning:

```
spawn_agent:
  reasoning_effort: xhigh
  message: |
    [Full research context + specific questions]
    Please act as a senior ML reviewer (NeurIPS/ICML level). Identify:
    1. Logical gaps or unjustified claims
    2. Missing experiments that would strengthen the story
    3. Narrative weaknesses
    4. Whether the contribution is sufficient for a top venue
    Please be brutally honest.
```

### Step 3: Iterative Dialogue (Rounds 2-N)
Use `send_input` with the returned agent id to continue the conversation:

```text
send_input:
  target: [saved reviewer id from Step 2]
  message: |
    Please continue the review using the revised materials below.

    Revised files:
    - /absolute/path/to/file1
    - /absolute/path/to/file2

    Focus on unresolved weaknesses and whether the revision actually fixed them.
```

For each round:
1. **Respond** to criticisms with evidence/counterarguments
2. **Ask targeted follow-ups** on the most actionable points
3. **Request specific deliverables**: experiment designs, paper outlines, claims matrices

Key follow-up patterns:
- "If we reframe X as Y, does that change your assessment?"
- "What's the minimum experiment to satisfy concern Z?"
- "Please design the minimal additional experiment package (highest acceptance lift per GPU week)"
- "Please write a mock NeurIPS/ICML review with scores"
- "Give me a results-to-claims matrix for possible experimental outcomes"

### Step 4: Convergence
Stop iterating when:
- Both sides agree on the core claims and their evidence requirements
- A concrete experiment plan is established
- The narrative structure is settled

### Step 5: Document Everything
Save the full interaction and conclusions to a review document in the project root:
- Round-by-round summary of criticisms and responses
- Final consensus on claims, narrative, and experiments
- Claims matrix (what claims are allowed under each possible outcome)
- Prioritized TODO list with estimated compute costs
- Paper outline if discussed

Update project memory/notes with key review conclusions.

### Step 6: Review Tracing

Save a trace for every `spawn_agent`, `send_input`, or `oracle-pro` review call following `../shared-references/review-tracing.md`. Record the reviewer route, saved agent id, prompt summary, raw response path, decisions, and action items. This preserves the Claude mainline Review Tracing semantics while using Codex-native reviewer calls.

## Key Rules

- ALWAYS use `reasoning_effort: xhigh` for reviews
- Send comprehensive context in Round 1 — the external model cannot read your files
- Be honest about weaknesses — hiding them leads to worse feedback
- Push back on criticisms you disagree with, but accept valid ones
- Focus on ACTIONABLE feedback — "what experiment would fix this?"
- Document the agent id for potential future resumption
- The review document should be self-contained (readable without the conversation)

## Prompt Templates

### For initial review:
"I'm going to present a complete ML research project for your critical review. Please act as a senior ML reviewer (NeurIPS/ICML level)..."

### For experiment design:
"Please design the minimal additional experiment package that gives the highest acceptance lift per GPU week. Our compute: [describe]. Be very specific about configurations."

### For paper structure:
"Please turn this into a concrete paper outline with section-by-section claims and figure plan."

### For claims matrix:
"Please give me a results-to-claims matrix: what claim is allowed under each possible outcome of experiments X and Y?"

### For mock review:
"Please write a mock NeurIPS review with: Summary, Strengths, Weaknesses, Questions for Authors, Score, Confidence, and What Would Move Toward Accept."

INPUTS

$ARGUMENTS REQUIRED

User-provided research context passed to the reviewer

REQUIRED CONTEXT

  • project narrative documents (STORY.md, README.md, paper drafts)
  • memory/notes files for key findings and experiment history

TOOLS REQUIRED

  • spawn_agent
  • send_input

ROLES & RULES

Role assignments

  • Please act as a senior ML reviewer (NeurIPS/ICML level).
  1. ALWAYS use `reasoning_effort: xhigh` for reviews
  2. Send comprehensive context in Round 1 — the external model cannot read your files
  3. Be honest about weaknesses — hiding them leads to worse feedback
  4. Push back on criticisms you disagree with, but accept valid ones
  5. Focus on ACTIONABLE feedback — "what experiment would fix this?"
  6. Document the agent id for potential future resumption
  7. The review document should be self-contained (readable without the conversation)

EXPECTED OUTPUT

Format
structured_report
Schema
markdown_sections · Round-by-round summary of criticisms and responses, Final consensus on claims, narrative, and experiments, Claims matrix, Prioritized TODO list with estimated compute costs, Paper outline if discussed
Constraints
  • include round-by-round summary of criticisms and responses
  • include final consensus on claims/narrative/experiments
  • include claims matrix
  • include prioritized TODO list with compute costs
  • include paper outline if discussed
  • use reasoning_effort: xhigh for all reviewer calls

SUCCESS CRITERIA

  • Both sides agree on the core claims and their evidence requirements
  • A concrete experiment plan is established
  • The narrative structure is settled
  • Save the full interaction and conclusions to a review document

EXAMPLES

Includes five prompt templates for initial review, experiment design, paper structure, claims matrix, and mock review.

CAVEATS

Dependencies
  • Requires project narrative documents (e.g., STORY.md, README.md, paper drafts)
  • Requires memory/notes files
  • Requires spawn_agent and send_input tools
  • Context: $ARGUMENTS
Missing context
  • Definition or example of $ARGUMENTS placeholder
  • Exact format expected for the final review document
Ambiguities
  • Context: $ARGUMENTS is referenced but not defined or explained.
  • Model name `gpt-5.5` is specified but does not correspond to any known OpenAI model.
  • Name contains 'sleep-research' while description and workflow are domain-agnostic.

QUALITY

OVERALL
0.78
CLARITY
0.75
SPECIFICITY
0.90
REUSABILITY
0.65
COMPLETENESS
0.85

IMPROVEMENT SUGGESTIONS

  • Replace fictional model `gpt-5.5` with a parameter or list of supported models.
  • Define or remove the placeholder `$ARGUMENTS` with an explicit description of expected input.
  • Add a short example of the compiled briefing produced in Step 1.

USAGE

Copy the prompt above and paste it into your AI of choice — Claude, ChatGPT, Gemini, or anywhere else you're working. Replace any placeholder sections with your own context, then ask for the output.

MORE FOR AGENT