agent research skill risk: low
Semantic Scholar Published Paper Search
The prompt defines a complete workflow for searching published academic papers via the Semantic Scholar API, including argument parsing, filtered searches, single-paper fetches by…
- External action: medium
SKILL 1 file
SKILL.md
---
name: semantic-scholar
description: "Search published venue papers (IEEE, ACM, Springer, etc.) via Semantic Scholar API. Complements /arxiv (preprints) with citation counts, venue metadata, and TLDR. Use when user says \"search semantic scholar\", \"find IEEE papers\", \"find journal papers\", \"venue papers\", \"citation search\", or wants publ"
---
# Semantic Scholar Paper Search
Search topic or paper ID: $ARGUMENTS
## Role & Positioning
This skill is the **published venue** counterpart to `/arxiv`:
| Skill | Source | Best for |
|-------|--------|----------|
| `/arxiv` | arXiv API | Latest preprints, cutting-edge unrefereed work |
| `/semantic-scholar` | Semantic Scholar API | **Published** journal/conference papers (IEEE, ACM, Springer, etc.) with citation counts, venue info, TLDR |
**Do NOT duplicate arXiv's job.** If results contain an `externalIds.ArXiv` field, the paper is also on arXiv — note this but do not re-fetch from arXiv.
## Constants
- **MAX_RESULTS = 10** — Default number of search results.
- **S2_FETCHER** — canonical name `semantic_scholar_fetch.py`, resolved per
[`shared-references/integration-contract.md`](../shared-references/integration-contract.md) §2
(Codex-side chain: `$ARIS_REPO/tools/` → `tools/` → `~/.codex/skills/semantic-scholar/`).
Policy D1 — if unresolved (canonical chain exhausted), fall back to inline Python.
- **DEFAULT_FILTERS** — For general research queries, apply these by default to reduce noise:
- `--fields-of-study "Computer Science,Engineering"`
- `--publication-types JournalArticle,Conference`
> Overrides (append to arguments):
> - `/semantic-scholar "topic" - max: 20` — return up to 20 results
> - `/semantic-scholar "topic" - type: journal` — only journal articles
> - `/semantic-scholar "topic" - type: conference` — only conference papers
> - `/semantic-scholar "topic" - min-citations: 50` — only highly-cited papers
> - `/semantic-scholar "topic" - year: 2022-` — papers from 2022 onward
> - `/semantic-scholar "topic" - fields: all` — remove default field-of-study filter
> - `/semantic-scholar "topic" - sort: citations` — bulk search sorted by citation count
> - `/semantic-scholar "DOI:10.1109/..."` — fetch a single paper by DOI
## Workflow
### Step 1: Parse Arguments
Parse `$ARGUMENTS` for directives:
- **Query or ID**: main search term, or a paper identifier:
- DOI: `10.1109/TWC.2024.1234567`
- Semantic Scholar ID: `f9314fd99be5f2b1b3efcfab87197d578160d553`
- ArXiv: `ARXIV:2006.10685`
- Corpus: `CorpusId:219792180`
- **`- max: N`**: override MAX_RESULTS
- **`- type: journal|conference|review|all`**: map to `--publication-types`
- **`- min-citations: N`**: map to `--min-citations`
- **`- year: RANGE`**: map to `--year` (e.g. `2022-`, `2020-2024`)
- **`- fields: FIELDS`**: override `--fields-of-study` (use `all` to remove filter)
- **`- sort: citations|date`**: use `search-bulk` with `--sort citationCount:desc` or `publicationDate:desc`
If the argument matches a DOI pattern (`10.XXXX/...`), a Semantic Scholar ID (40-char hex), or a prefixed ID (`ARXIV:...`, `CorpusId:...`), skip search and go directly to Step 3.
### Step 2: Search Papers
Resolve `$S2_FETCHER` via the canonical strict-safe Codex chain
(see [`shared-references/integration-contract.md`](../shared-references/integration-contract.md) §2):
```bash
if [ -z "${ARIS_REPO:-}" ] && [ -f .aris/installed-skills-codex.txt ]; then
ARIS_REPO=$(awk -F'\t' '$1=="repo_root"{print $2; exit}' .aris/installed-skills-codex.txt 2>/dev/null) || true
fi
S2_FETCHER=""
[ -n "${ARIS_REPO:-}" ] && [ -f "$ARIS_REPO/tools/semantic_scholar_fetch.py" ] && S2_FETCHER="$ARIS_REPO/tools/semantic_scholar_fetch.py"
[ -z "$S2_FETCHER" ] && [ -f tools/semantic_scholar_fetch.py ] && S2_FETCHER="tools/semantic_scholar_fetch.py"
[ -z "$S2_FETCHER" ] && [ -f ~/.codex/skills/semantic-scholar/semantic_scholar_fetch.py ] && S2_FETCHER="$HOME/.codex/skills/semantic-scholar/semantic_scholar_fetch.py"
```
**Standard search** (default — relevance-ranked):
```bash
[ -n "$S2_FETCHER" ] && python3 "$S2_FETCHER" search "QUERY" --max MAX_RESULTS \
--fields-of-study "Computer Science,Engineering" \
--publication-types JournalArticle,Conference
```
**Bulk search** (when `- sort:` is specified, or MAX_RESULTS > 100):
```bash
[ -n "$S2_FETCHER" ] && python3 "$S2_FETCHER" search-bulk "QUERY" --max MAX_RESULTS \
--sort citationCount:desc \
--fields-of-study "Computer Science" \
--year "2020-"
```
If `semantic_scholar_fetch.py` is not found, fall back to inline Python using `urllib` against `https://api.semanticscholar.org/graph/v1/paper/search`.
**Recommended filter combos** (from testing):
| Goal | Flags |
|------|-------|
| High-quality journal papers | `--publication-types JournalArticle --min-citations 10` |
| CS/EE papers, recent | `--fields-of-study "Computer Science,Engineering" --year "2022-"` |
| Foundational / high-impact | `search-bulk --sort citationCount:desc --fields-of-study "Computer Science"` |
| Conference papers only | `--publication-types Conference` |
> **Note**: `--venue` requires exact venue names (e.g. "IEEE Transactions on Signal Processing"), not partial matches like "IEEE". Avoid using `--venue` in automated flows — prefer `--publication-types` + `--fields-of-study`.
### Step 3: Fetch Details for a Specific Paper
When a single paper ID is requested:
```bash
[ -n "$S2_FETCHER" ] && python3 "$S2_FETCHER" paper "PAPER_ID"
```
Where PAPER_ID can be:
- DOI: `10.1109/TSP.2021.3071210`
- ArXiv: `ARXIV:2006.10685`
- CorpusId: `CorpusId:219792180`
- S2 ID: `f9314fd99be5f2b1b3efcfab87197d578160d553`
### Step 4: De-duplicate Against arXiv
For each result, check `externalIds.ArXiv`:
- If present → paper is also on arXiv. Note this in output but do NOT re-fetch via `/arxiv`.
- If absent → paper is **venue-only** (e.g. IEEE without preprint). This is the unique value of this skill.
### Step 5: Present Results
Present results as a table:
```text
| # | Title | Venue | Year | Citations | Authors | Type |
|---|-------|-------|------|-----------|---------|------|
| 1 | Deep Learning Enabled... | IEEE Trans. Signal Process. | 2021 | 1364 | Xie et al. | Journal |
```
For each paper, also show:
- **DOI link**: `https://doi.org/DOI` (for IEEE/ACM papers, this is the canonical link)
- **Open Access PDF**: if `openAccessPdf.url` is non-empty, show it
- **TLDR**: if available, show the one-line summary
- **Also on arXiv**: if `externalIds.ArXiv` exists, note the arXiv ID
### Step 6: Detailed Summary
For each paper (or top 5 if many results):
```markdown
## [Title]
- **Venue**: [venue name] ([publicationVenue.type]: journal/conference)
- **Year**: [year] | **Citations**: [citationCount]
- **Authors**: [full author list]
- **DOI**: [doi link]
- **Fields**: [fieldsOfStudy]
- **TLDR**: [tldr.text if available]
- **Abstract**: [abstract]
- **Open Access**: [openAccessPdf.url or "Not available"]
- **Also on arXiv**: [ArXiv ID if exists, else "No"]
```
### Step 7: Update Research Wiki (if active)
**Required when `research-wiki/` exists in the project**; skip silently
otherwise. Ingest the papers presented to the user. For results with an
`externalIds.ArXiv` field, use `--arxiv-id`; for venue-only papers (no
arXiv mirror — common for IEEE/ACM), fall back to manual metadata:
```
if [ -d research-wiki/ ]:
WIKI_SCRIPT=""
[ -n "$ARIS_REPO" ] && [ -f "$ARIS_REPO/tools/research_wiki.py" ] && WIKI_SCRIPT="$ARIS_REPO/tools/research_wiki.py"
[ -z "$WIKI_SCRIPT" ] && [ -f tools/research_wiki.py ] && WIKI_SCRIPT="tools/research_wiki.py"
[ -z "$WIKI_SCRIPT" ] && [ -f ~/.codex/skills/research-wiki/research_wiki.py ] && WIKI_SCRIPT="$HOME/.codex/skills/research-wiki/research_wiki.py"
for each paper in results:
if paper.externalIds.ArXiv:
[ -n "$WIKI_SCRIPT" ] && python3 "$WIKI_SCRIPT" ingest_paper research-wiki/ \
--arxiv-id "<ArXiv>"
else:
[ -n "$WIKI_SCRIPT" ] && python3 "$WIKI_SCRIPT" ingest_paper research-wiki/ \
--title "<title>" --authors "<authors joined by , >" \
--year <year> --venue "<venue>" \
[--external-id-doi "<externalIds.DOI>"]
```
The helper handles slug / dedup / page / index / log — **do not
handwrite `papers/<slug>.md`**. See
[`shared-references/integration-contract.md`](../shared-references/integration-contract.md).
Backfill with `/research-wiki sync --arxiv-ids <id1>,<id2>,...` for
arXiv-available papers.
### Step 8: Final Output
Summarize what was done:
- `Found N published papers for "query"`
- `Filters applied: [publication types, fields, year range, etc.]`
- `N papers are venue-only (not on arXiv)`
- `Wiki-ingested N papers` (if `research-wiki/` was present)
Suggest follow-up skills:
```text
/arxiv "topic" - search arXiv preprints (complements this search)
/research-lit "topic" - multi-source review: Zotero + local PDFs + arXiv + S2
/novelty-check "idea" - verify novelty against literature
```
## Key Rules
- **Default to filtered search**: Always apply `--fields-of-study` and `--publication-types` unless user says `- fields: all`. Without filters, S2 returns cross-discipline noise (linguistics, psychology, etc.).
- **Citation count is gold**: S2's citation data is its main advantage over arXiv. Always show `citationCount` prominently and use it to rank/prioritize results.
- **Venue metadata matters**: Show `venue` and `publicationVenue.type` (journal vs conference) — this helps users assess paper quality.
- **DOI is the canonical ID for published papers**: Always show DOI links for IEEE/ACM/Springer papers.
- **Rate limiting**: S2 API without key is heavily rate-limited (~1 req/s, strict cooldown). If HTTP 429 occurs, wait and retry. Recommend users set `SEMANTIC_SCHOLAR_API_KEY` env var for higher limits (free at https://www.semanticscholar.org/product/api#api-key-form).
- **TLDR may be null**: Some publishers (notably IEEE) elide the TLDR field. Fall back to showing the first sentence of the abstract.
- **openAccessPdf may be empty**: Many IEEE papers are closed access. Always provide the DOI link as fallback.
- If the S2 API is unreachable, suggest using `/arxiv` or `/research-lit "topic" - sources: web` as fallback.
INPUTS
- $ARGUMENTS REQUIRED
main search term or paper identifier (DOI, S2 ID, ARXIV:, CorpusId:)
e.g. topic or DOI:10.1109/...
REQUIRED CONTEXT
- $ARGUMENTS (search topic, DOI, or paper ID)
OPTIONAL CONTEXT
- override flags like - max:, - type:, - min-citations:, - year:, - fields:, - sort:
TOOLS REQUIRED
- semantic_scholar_fetch.py
- research_wiki.py
- Semantic Scholar API
ROLES & RULES
- Do NOT duplicate arXiv's job.
- If results contain an externalIds.ArXiv field, note this but do not re-fetch from arXiv.
- Always apply --fields-of-study and --publication-types unless user says - fields: all.
- Always show citationCount prominently and use it to rank/prioritize results.
- Show venue and publicationVenue.type.
- Always show DOI links for IEEE/ACM/Springer papers.
- If HTTP 429 occurs, wait and retry.
- Fall back to showing the first sentence of the abstract if TLDR is null.
- Always provide the DOI link as fallback if openAccessPdf is empty.
- If the S2 API is unreachable, suggest using /arxiv or /research-lit as fallback.
EXPECTED OUTPUT
- Format
- markdown
- Schema
- markdown_sections · #, Title, Venue, Year, Citations, Authors, Type, ## [Title], Venue, Year, Citations, Authors, DOI, Fields, TLDR, Abstract, Open Access, Also on arXiv
- Constraints
- present results as table with #,Title,Venue,Year,Citations,Authors,Type
- include DOI links, openAccessPdf, TLDR, arXiv note per paper
- provide detailed markdown summary for each paper or top 5
- end with summary of actions taken and suggested follow-up skills
SUCCESS CRITERIA
- Parse arguments for query/ID and overrides
- Resolve S2_FETCHER via canonical chain or fall back to inline Python
- Perform standard or bulk search with appropriate filters
- Fetch details for specific paper IDs
- De-duplicate against arXiv using externalIds.ArXiv
- Present results as table plus per-paper markdown details
- Ingest papers into research-wiki/ when present
- Summarize actions taken and suggest follow-up skills
FAILURE MODES
- May return cross-discipline noise without default filters
- TLDR may be null for some publishers
- openAccessPdf may be empty for closed-access papers
- S2 API rate-limited without API key
EXAMPLES
Includes multiple command-line override examples, recommended filter combos table, search code snippets, result table format, and detailed markdown summary template.
CAVEATS
- Dependencies
- S2_FETCHER (semantic_scholar_fetch.py)
- shared-references/integration-contract.md
- research-wiki/ directory
- research_wiki.py
- SEMANTIC_SCHOLAR_API_KEY (optional)
- Missing context
- Implementation details or source code of semantic_scholar_fetch.py
- Exact format and location of research-wiki/ structure
- Ambiguities
- References external files (integration-contract.md) whose content is not provided in the prompt.
QUALITY
- OVERALL
- 0.72
- CLARITY
- 0.82
- SPECIFICITY
- 0.92
- REUSABILITY
- 0.35
- COMPLETENESS
- 0.93
IMPROVEMENT SUGGESTIONS
- Extract the long path-resolution logic into a short reusable sub-prompt or function.
- Add 2-3 concrete example $ARGUMENTS inputs with expected outputs to improve testability.
USAGE
Copy the prompt above and paste it into your AI of choice — Claude, ChatGPT, Gemini, or anywhere else you're working. Replace any placeholder sections with your own context, then ask for the output.
MORE FOR AGENT
- Creative Thinking Frameworks for CS Researchagentresearch
- Academic Paper Figure Generatoragentresearch
- Deep Investigation Agent for Geopolitics Researchagentresearch
- Customer Research Analyst and Synthesizeragentresearch
- Gemini Research Paper Literature Searchagentresearch
- Research Formula Derivation Package Builderagentresearch
- Research Session Provenance Recorderagentresearch
- BIDS Neuroscience Data Organizeragentresearch
- Research Experiment Plan Roadmap Builderagentresearch
- ARA Research Artifact Compileragentresearch
- Research Proposal Experiment Roadmap Generatoragentresearch
- ML AI Theorem Proof Package Writeragentresearch
- Research Formula Derivation Package Builderagentresearch
- Scientific ML Catalog Assistantagentresearch
- OpenMM MDAnalysis Molecular Dynamics Workflowagentresearch
- Publication-Quality Paper Figure Generatoragentresearch
- ML Research Idea Generator and Rankeragentresearch
- ML Paper Figure and Table Generatoragentresearch
- Competitor Profiling Intelligence Analystagentresearch
- Research Method Novelty Checkeragentresearch
- Research Refine and Experiment Planning Pipelineagentresearch
- ML Ablation Study Planneragentresearch
- Research Agent Validation Best Practicesagentresearch
- AlphaXiv arXiv Paper Lookup Workflowagentresearch
- AlphaXiv Single-Paper Lookup and Summarizeragentresearch