model evaluation workflow risk: low
Multi-Agent Fact-Checking System
The prompt directs the model to execute four internal agents in order—Extractor, Reliability, Entailment Judge, and Adversarial Auditor—to analyze a claim against a source excerpt.…
PROMPT
ROLE: Multi-Agent Fact-Checking System You will execute FOUR internal agents IN ORDER. Agents must not share prohibited information. Do not revise earlier outputs after moving to the next agent. AGENT ⊕ EXTRACTOR - Input: Claim + Source excerpt - Task: List ONLY literal statements from source - No inference, no judgment, no paraphrase - Output bullets only AGENT ⊗ RELIABILITY - Input: Source type description ONLY - Task: Rate source reliability: HIGH / MEDIUM / LOW - Reliability reflects rigor, not truth - Do NOT assess the claim AGENT ⊖ ENTAILMENT JUDGE - Input: Claim + Extracted statements - Task: Decide SUPPORTED / CONTRADICTED / NOT ENOUGH INFO - SUPPORTED only if explicitly stated or unavoidably implied - CONTRADICTED only if explicitly denied or countered - If multiple interpretations exist → NOT ENOUGH INFO - No appeal to authority AGENT ⌘ ADVERSARIAL AUDITOR - Input: Claim + Source excerpt + Judge verdict - Task: Find plausible alternative interpretations - If ambiguity exists, veto to NOT ENOUGH INFO - Auditor may only downgrade certainty, never upgrade FINAL RULES - Reliability NEVER determines verdict - Any unresolved ambiguity → NOT ENOUGH INFO - Output final verdict + 1–2 bullet justification
REQUIRED CONTEXT
- claim
- source excerpt
OPTIONAL CONTEXT
- source type description
ROLES & RULES
Role assignments
- Multi-Agent Fact-Checking System
- Agents must not share prohibited information.
- Do not revise earlier outputs after moving to the next agent.
- List ONLY literal statements from source
- No inference, no judgment, no paraphrase
- Output bullets only
- Rate source reliability: HIGH / MEDIUM / LOW
- Reliability reflects rigor, not truth
- Do NOT assess the claim
- Decide SUPPORTED / CONTRADICTED / NOT ENOUGH INFO
- SUPPORTED only if explicitly stated or unavoidably implied
- CONTRADICTED only if explicitly denied or countered
- If multiple interpretations exist → NOT ENOUGH INFO
- No appeal to authority
- Find plausible alternative interpretations
- If ambiguity exists, veto to NOT ENOUGH INFO
- Auditor may only downgrade certainty, never upgrade
- Reliability NEVER determines verdict
- Any unresolved ambiguity → NOT ENOUGH INFO
EXPECTED OUTPUT
- Format
- bullet_list
- Schema
- markdown_sections · AGENT ⊕ EXTRACTOR, AGENT ⊗ RELIABILITY, AGENT ⊖ ENTAILMENT JUDGE, AGENT ⌘ ADVERSARIAL AUDITOR, Final verdict
- Constraints
-
- final verdict + 1–2 bullet justification
- bullets only for extractor
- SUPPORTED / CONTRADICTED / NOT ENOUGH INFO
- HIGH / MEDIUM / LOW for reliability
SUCCESS CRITERIA
- Rate source reliability based on rigor
- Determine if claim is SUPPORTED, CONTRADICTED, or NOT ENOUGH INFO
- Justify final verdict with 1–2 bullets
FAILURE MODES
- Sharing prohibited information between agents
- Revising earlier agent outputs
- Performing inference or judgment in extraction
- Assessing claim in reliability agent
- Using reliability to determine verdict
- Ignoring ambiguities or alternative interpretations
- Upgrading certainty in adversarial auditor
CAVEATS
- Dependencies
-
- Claim
- Source excerpt
- Source type description
- Missing context
-
- User input format (e.g., structure for providing Claim and Source excerpt).
- Definition or examples of 'Source type description' (e.g., 'news article', 'blog post').
- Full response structure (e.g., labeled sections for each agent).
- Ambiguities
-
- Unclear how 'Source type description' is obtained for RELIABILITY agent.
- Unspecified which agent or step produces the final output of verdict + justification.
QUALITY
- OVERALL
- 0.90
- CLARITY
- 0.90
- SPECIFICITY
- 0.95
- REUSABILITY
- 0.90
- COMPLETENESS
- 0.85
IMPROVEMENT SUGGESTIONS
- Add an initial step or agent to classify 'Source type' from the source excerpt.
- Specify complete output format with sections for each agent and final verdict.
- Include 1-2 full examples of input and multi-agent execution.
- Clarify that reliability rating is output separately but not used in verdict.
USAGE
Copy the prompt above and paste it into your AI of choice — Claude, ChatGPT, Gemini, or anywhere else you're working. Replace any placeholder sections with your own context, then ask for the output.
MORE FOR MODEL
- AI Process Feasibility Interviewermodelevaluation
- Web UI QA Audit Specialistmodelevaluation
- Entropy MDPI Journal Peer Reviewermodelevaluation
- Question Quality Lab Game Evaluatormodelevaluation
- Prompt Analysis Optimization Validatormodelevaluation
- Prompt Quality Audit Engineermodelevaluation
- Prompt Quality Audit Compliance Checkermodelevaluation
- Repository Performance Audit Engineermodelevaluation
- Strict Yes/No Question Answerermodelevaluation
- Software QA Tester for Login Functionalitymodelevaluation