agent research skill risk: low
Autonomous LLM Research Review Loop
Autonomously iterates review → implement fixes → re-review of research work using any OpenAI-compatible LLM API until a positive assessment or MAX_ROUNDS is reached.
- External action: low
SKILL 1 file
SKILL.md
---
name: auto-claude-code-research-in-sleep-auto-review-loop-llm
description: "Autonomous research review loop using any OpenAI-compatible LLM API. Configure via llm-chat MCP server or environment variables. Trigger with \"auto review loop llm\" or \"llm review\"."
---
# Auto Review Loop (Generic LLM): Autonomous Research Improvement
Autonomously iterate: review → implement fixes → re-review, until the external reviewer gives a positive assessment or MAX_ROUNDS is reached.
## Context: $ARGUMENTS
## Constants
- MAX_ROUNDS = 4
- POSITIVE_THRESHOLD: score >= 6/10, or verdict contains "accept", "sufficient", "ready for submission"
- REVIEW_DOC: `review-stage/AUTO_REVIEW.md` (cumulative log) *(fall back to `./AUTO_REVIEW.md` for legacy projects)*
## LLM Configuration
This skill uses **any OpenAI-compatible API** for external review via the `llm-chat` MCP server.
### Configuration via MCP Server (Recommended)
Add to `~/.claude/settings.json`:
```json
{
"mcpServers": {
"llm-chat": {
"command": "/usr/bin/python3",
"args": ["/Users/yourname/.claude/mcp-servers/llm-chat/server.py"],
"env": {
"LLM_API_KEY": "your-api-key",
"LLM_BASE_URL": "https://api.deepseek.com/v1",
"LLM_MODEL": "deepseek-chat"
}
}
}
}
```
### Supported Providers
| Provider | LLM_BASE_URL | LLM_MODEL |
|----------|--------------|-----------|
| **OpenAI** | `https://api.openai.com/v1` | `gpt-4o`, `o3` |
| **DeepSeek** | `https://api.deepseek.com/v1` | `deepseek-chat`, `deepseek-reasoner` |
| **MiniMax** | `https://api.minimax.io/v1` | `MiniMax-M2.7` |
| **Kimi (Moonshot)** | `https://api.moonshot.cn/v1` | `moonshot-v1-8k`, `moonshot-v1-32k` |
| **ZhiPu (GLM)** | `https://open.bigmodel.cn/api/paas/v4` | `glm-4`, `glm-4-plus` |
| **SiliconFlow** | `https://api.siliconflow.cn/v1` | `Qwen/Qwen2.5-72B-Instruct` |
| **阿里云百炼** | `https://dashscope.aliyuncs.com/compatible-mode/v1` | `qwen-max` |
| **零一万物** | `https://api.lingyiwanwu.com/v1` | `yi-large` |
## API Call Method
**Primary: MCP Tool**
```
mcp__llm-chat__chat:
prompt: |
[Review prompt content]
model: "deepseek-chat"
system: "You are a senior ML reviewer..."
```
**Fallback: curl**
```bash
curl -s "${LLM_BASE_URL}/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${LLM_API_KEY}" \
-d '{
"model": "${LLM_MODEL}",
"messages": [
{"role": "system", "content": "You are a senior ML reviewer..."},
{"role": "user", "content": "[review prompt]"}
],
"max_tokens": 4096
}'
```
## State Persistence (Compact Recovery)
Persist state to `review-stage/REVIEW_STATE.json` after each round:
```json
{
"round": 2,
"status": "in_progress",
"last_score": 5.0,
"last_verdict": "not ready",
"pending_experiments": [],
"timestamp": "2026-03-15T10:00:00"
}
```
**Write this file at the end of every Phase E** (after documenting the round).
**On completion**, set `"status": "completed"`.
## Workflow
### Initialization
1. **Check `review-stage/REVIEW_STATE.json`** for recovery *(fall back to `./REVIEW_STATE.json` if not found — legacy path)*
2. Read project context and prior reviews
3. Initialize round counter
### Loop (up to MAX_ROUNDS)
#### Phase A: Review
**If MCP available:**
```
mcp__llm-chat__chat:
system: "You are a senior ML reviewer (NeurIPS/ICML level)."
prompt: |
[Round N/MAX_ROUNDS of autonomous review loop]
[Full research context: claims, methods, results, known weaknesses]
[Changes since last round, if any]
1. Score this work 1-10 for a top venue
2. List remaining critical weaknesses (ranked by severity)
3. For each weakness, specify the MINIMUM fix
4. State clearly: is this READY for submission? Yes/No/Almost
Be brutally honest. If the work is ready, say so clearly.
```
**If MCP NOT available:**
```bash
curl -s "${LLM_BASE_URL}/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${LLM_API_KEY}" \
-d '{
"model": "${LLM_MODEL}",
"messages": [
{"role": "system", "content": "You are a senior ML reviewer (NeurIPS/ICML level)."},
{"role": "user", "content": "[Full review prompt]"}
],
"max_tokens": 4096
}'
```
#### Phase B: Parse Assessment
**CRITICAL: Save the FULL raw response** verbatim. Then extract:
- **Score** (numeric 1-10)
- **Verdict** ("ready" / "almost" / "not ready")
- **Action items** (ranked list of fixes)
**STOP**: If score >= 6 AND verdict contains "ready/almost"
#### Phase C: Implement Fixes
Priority: metric additions > reframing > new experiments
#### Phase D: Wait for Results
Monitor remote experiments
#### Phase E: Document Round
Append to `review-stage/AUTO_REVIEW.md`:
```markdown
## Round N (timestamp)
### Assessment (Summary)
- Score: X/10
- Verdict: [ready/almost/not ready]
- Key criticisms: [bullet list]
### Reviewer Raw Response
<details>
<summary>Click to expand full reviewer response</summary>
[Paste the COMPLETE raw response here — verbatim, unedited.]
</details>
### Actions Taken
- [what was implemented/changed]
### Results
- [experiment outcomes, if any]
### Status
- [continuing to round N+1 / stopping]
```
**Write `review-stage/REVIEW_STATE.json`** with current state.
### Termination
1. Set `review-stage/REVIEW_STATE.json` status to "completed"
2. Write final summary
## Key Rules
- **Large file handling**: If the Write tool fails due to file size, immediately retry using Bash (`cat << 'EOF' > file`) to write in chunks. Do NOT ask the user for permission — just do it silently.
- **Anti-hallucination citations**: When adding references, NEVER fabricate BibTeX. Use DBLP → CrossRef → `[VERIFY]` chain. Do NOT generate BibTeX from memory.
- Be honest about weaknesses
- Implement fixes BEFORE re-reviewing
- Document everything
- Include previous context in round 2+ prompts
- Prefer MCP tool over curl when available
## Prompt Template for Round 2+
```
mcp__llm-chat__chat:
system: "You are a senior ML reviewer (NeurIPS/ICML level)."
prompt: |
[Round N/MAX_ROUNDS of autonomous review loop]
## Previous Review Summary (Round N-1)
- Previous Score: X/10
- Previous Verdict: [ready/almost/not ready]
- Previous Key Weaknesses: [list]
## Changes Since Last Review
1. [Action 1]: [result]
2. [Action 2]: [result]
## Updated Results
[paste updated metrics/tables]
Please re-score and re-assess:
1. Score this work 1-10 for a top venue
2. List remaining critical weaknesses (ranked by severity)
3. For each weakness, specify the MINIMUM fix
4. State clearly: is this READY for submission? Yes/No/Almost
Be brutally honest. If the work is ready, say so clearly.
```
## Output Protocols
> Follow these shared protocols for all output files:
> - **[Output Versioning Protocol](../shared-references/output-versioning.md)** — write timestamped file first, then copy to fixed name
> - **[Output Manifest Protocol](../shared-references/output-manifest.md)** — log every output to MANIFEST.md
> - **[Output Language Protocol](../shared-references/output-language.md)** — respect the project's language setting
INPUTS
- $ARGUMENTS REQUIRED
initial project context passed at trigger time
REQUIRED CONTEXT
- project research context and claims
- review-stage/AUTO_REVIEW.md or ./AUTO_REVIEW.md
- review-stage/REVIEW_STATE.json or ./REVIEW_STATE.json
OPTIONAL CONTEXT
- $ARGUMENTS
- prior round results and changes
TOOLS REQUIRED
- mcp__llm-chat__chat
- curl
- file_write
- bash
ROLES & RULES
Role assignments
- You are a senior ML reviewer (NeurIPS/ICML level).
- Large file handling: If the Write tool fails due to file size, immediately retry using Bash (cat << 'EOF' > file) to write in chunks. Do NOT ask the user for permission — just do it silently.
- When adding references, NEVER fabricate BibTeX. Use DBLP → CrossRef → [VERIFY] chain. Do NOT generate BibTeX from memory.
- Be honest about weaknesses
- Implement fixes BEFORE re-reviewing
- Document everything
- Include previous context in round 2+ prompts
- Prefer MCP tool over curl when available
EXPECTED OUTPUT
- Format
- structured_report
- Schema
- markdown_sections · Assessment (Summary), Reviewer Raw Response, Actions Taken, Results, Status
- Constraints
- append round summary to AUTO_REVIEW.md after every round
- persist REVIEW_STATE.json after every Phase E
- save full raw LLM reviewer response verbatim
- set status to completed on termination
SUCCESS CRITERIA
- Score >= 6/10 or verdict contains accept/sufficient/ready for submission
- External reviewer gives positive assessment
- MAX_ROUNDS reached
EXAMPLES
Includes JSON config examples, curl and MCP API call examples, REVIEW_STATE.json schema, AUTO_REVIEW.md markdown template, and a full Round 2+ prompt template.
CAVEATS
- Dependencies
- review-stage/AUTO_REVIEW.md
- review-stage/REVIEW_STATE.json
- llm-chat MCP server
- LLM_API_KEY / LLM_BASE_URL / LLM_MODEL environment variables
- Missing context
- Exact definition of the research project context expected in $ARGUMENTS
- How the MCP server is invoked inside the agent runtime
- Ambiguities
- Does not specify how to handle partial or malformed LLM responses during Phase B parsing.
- $ARGUMENTS placeholder usage is mentioned but not defined in the prompt body.
QUALITY
- OVERALL
- 0.62
- CLARITY
- 0.72
- SPECIFICITY
- 0.88
- REUSABILITY
- 0.35
- COMPLETENESS
- 0.78
IMPROVEMENT SUGGESTIONS
- Extract the provider table and API call templates into reusable sub-prompts or variables.
- Add explicit input/output contracts for each phase to improve reusability across different agent frameworks.
USAGE
Copy the prompt above and paste it into your AI of choice — Claude, ChatGPT, Gemini, or anywhere else you're working. Replace any placeholder sections with your own context, then ask for the output.
MORE FOR AGENT
- Creative Thinking Frameworks for CS Researchagentresearch
- Academic Paper Figure Generatoragentresearch
- Deep Investigation Agent for Geopolitics Researchagentresearch
- Customer Research Analyst and Synthesizeragentresearch
- Gemini Research Paper Literature Searchagentresearch
- Research Formula Derivation Package Builderagentresearch
- Research Session Provenance Recorderagentresearch
- BIDS Neuroscience Data Organizeragentresearch
- Research Experiment Plan Roadmap Builderagentresearch
- ARA Research Artifact Compileragentresearch
- Research Proposal Experiment Roadmap Generatoragentresearch
- ML AI Theorem Proof Package Writeragentresearch
- Research Formula Derivation Package Builderagentresearch
- Scientific ML Catalog Assistantagentresearch
- OpenMM MDAnalysis Molecular Dynamics Workflowagentresearch
- Publication-Quality Paper Figure Generatoragentresearch
- ML Research Idea Generator and Rankeragentresearch
- ML Paper Figure and Table Generatoragentresearch
- Competitor Profiling Intelligence Analystagentresearch
- Research Method Novelty Checkeragentresearch
- Research Refine and Experiment Planning Pipelineagentresearch
- ML Ablation Study Planneragentresearch
- Research Agent Validation Best Practicesagentresearch
- AlphaXiv arXiv Paper Lookup Workflowagentresearch
- AlphaXiv Single-Paper Lookup and Summarizeragentresearch