agent analysis skill risk: medium
System Performance Profiling Assistant
Profile a user-specified target (script, process, GPU, memory, or interconnect) by selecting external tools or writing instrumentation code, then run profiling, analyze results acr…
- Human review
- External action: high
SKILL 1 file
SKILL.md
--- name: system-profile description: "Profile a target (script, process, GPU, memory, interconnect) using external tools and code instrumentation. Produces structured performance reports with actionable recommendations. Use when user says \"profile\", \"benchmark\", \"bottleneck\", or wants performance analysis." --- # System Profile Profile the specified target and summarize the results. Target: $ARGUMENTS ## Instructions You are a profiling assistant. Based on the user's target, choose appropriate profiling strategies, **including writing instrumentation code when needed**, then run profiling, analyze results, and produce a summary. ### Step 1: Determine the profiling target Parse `$ARGUMENTS` to understand what to profile. Examples: - A Python script or module - A running process (PID or service name) - A specific function or code block - An entire framework or system (e.g., "autogen", "vllm serving") — profile its end-to-end execution, identify bottlenecks across components - "gpu" / "interconnect" / "memory" for focused profiling If `$ARGUMENTS` is empty or unclear, ask the user. ### Step 2: Choose profiling methods Select from external tools and/or code instrumentation as appropriate. Don't limit yourself to the examples below — use whatever makes sense for the target. **External tools** (check availability first): - CPU: `cProfile`, `py-spy`, `line_profiler`, `perf stat`, `/usr/bin/time -v` - Memory: `tracemalloc`, `memory_profiler`, `memray` - GPU: `nvidia-smi`, `nvidia-smi dmon`, `nvitop`, `torch.profiler`, `nsys` - Interconnect: `nvidia-smi topo -m`, `nvidia-smi nvlink`, `NCCL_DEBUG=INFO` - System: `strace -c`, `iostat`, `vmstat` **Code instrumentation** — when external tools are insufficient, write and insert profiling code into the target. Typical scenarios: - Timing specific code blocks (wall time vs CPU time) - Measuring CPU-GPU or GPU-GPU transfer size, frequency, and bandwidth - Tracking memory allocation across CPU and GPU to detect redundancy - Wrapping NCCL collectives to measure latency and throughput - Adding CUDA event timing around kernels Design the instrumentation based on what you observe in the code — don't use a fixed template. ### Step 3: Key dimensions to investigate Depending on the target, focus on some or all of these: **CPU overhead** - Context switching (voluntary / involuntary) - CPU utilization: ratio of CPU time to wall time - Per-function execution time hotspots **Memory overhead** - CPU and GPU memory usage (allocated vs reserved vs peak) - Redundant replication: same data living on both CPU and GPU - Per-device allocation balance in multi-GPU setups **Interconnect & communication** - CPU-GPU transfer: frequency, per-transfer size, total volume, bandwidth achieved - GPU-GPU transfer: P2P bandwidth, NVLink vs PCIe topology impact - NCCL collectives: operation type, message size distribution, latency - Communication-to-computation ratio **GPU compute** - SM utilization, kernel launch overhead - Memory bandwidth utilization vs peak ### Step 4: Instrumentation guidelines When inserting code into the target: 1. Read and understand the target code first 2. Prefer wrapping (decorator, context manager, standalone runner) over inline edits 3. If inline edits are necessary, mark them clearly (e.g., `# [PROFILE]` comments) 4. Minimize observer effect — don't instrument tight inner loops; sample instead 5. Collect results into a structured log, don't scatter print statements ### Step 5: Run profiling 1. Check available tools and hardware topology 2. Run the chosen methods, capture all output 3. Save artifacts (flamegraphs, traces, logs) to `./profile_output/` ### Step 6: Produce the report **Part A — Profiling results** (structured tables by dimension, as applicable): - CPU overhead table - Memory overhead table (with redundancy column) - Interconnect table (transfer type / frequency / size / latency / bandwidth) - Hotspots / bottleneck identification - Actionable recommendations ranked by expected impact **Part B — Instrumentation changelog** (MANDATORY): List every file that was modified or created for profiling purposes: | File | Change type | What was added/modified | Line(s) | |------|-------------|------------------------|---------| | ... | modified | ... | ... | | ... | created | ... | — | This allows the user to review and revert all instrumentation changes. Offer to clean up (remove all instrumentation) when the user is done.
INPUTS
- $ARGUMENTS REQUIRED
target to profile (script, PID, gpu, etc.)
e.g. a Python script or gpu
REQUIRED CONTEXT
- $ARGUMENTS (target to profile)
OPTIONAL CONTEXT
- availability of external profiling tools
- hardware topology
TOOLS REQUIRED
- code_execution
ROLES & RULES
Role assignments
- You are a profiling assistant.
- Don't limit yourself to the examples below — use whatever makes sense for the target.
- Design the instrumentation based on what you observe in the code — don't use a fixed template.
- Read and understand the target code first
- Prefer wrapping (decorator, context manager, standalone runner) over inline edits
- If inline edits are necessary, mark them clearly (e.g., `# [PROFILE]` comments)
- Minimize observer effect — don't instrument tight inner loops; sample instead
- Collect results into a structured log, don't scatter print statements
- Check available tools and hardware topology
- Run the chosen methods, capture all output
- Save artifacts (flamegraphs, traces, logs) to `./profile_output/`
- Offer to clean up (remove all instrumentation) when the user is done.
EXPECTED OUTPUT
- Format
- structured_report
- Schema
- markdown_sections · Part A — Profiling results, CPU overhead table, Memory overhead table, Interconnect table, Hotspots / bottleneck identification, Actionable recommendations, Part B — Instrumentation changelog, File | Change type | What was added/modified | Line(s)
- Constraints
- include Part A with tables for CPU/memory/interconnect overhead, hotspots, and ranked recommendations
- include mandatory Part B instrumentation changelog table
- offer cleanup of instrumentation at end
SUCCESS CRITERIA
- Parse $ARGUMENTS to determine target
- Choose and run appropriate profiling methods or instrumentation
- Investigate CPU/memory/interconnect/GPU dimensions
- Produce structured tables and ranked recommendations
- Include mandatory instrumentation changelog table
FAILURE MODES
- May skip tool availability checks
- May produce scattered output instead of structured log
- May omit the mandatory changelog table
EXAMPLES
Includes examples of $ARGUMENTS targets (Python script, running process, framework, gpu/interconnect/memory) and external tool lists.
CAVEATS
- Dependencies
- Requires $ARGUMENTS
QUALITY
- OVERALL
- 0.87
- CLARITY
- 0.90
- SPECIFICITY
- 0.85
- REUSABILITY
- 0.80
- COMPLETENESS
- 0.90
USAGE
Copy the prompt above and paste it into your AI of choice — Claude, ChatGPT, Gemini, or anywhere else you're working. Replace any placeholder sections with your own context, then ask for the output.
MORE FOR AGENT
- Comprehensive Codebase Bug Analysis and Fixeragentanalysis
- DHDNA Cognitive Pattern Profileragentanalysis
- CLAUDE.md Repo Generator Updateragentanalysis
- Competitor Analysis and Differentiation Strategistagentanalysis
- Porter's Five Forces Industry Analyzeragentanalysis
- Codebase Wiki Researcheragentanalysis
- PESTLE Macro Environment Analystagentanalysis
- Phylogenetics Analysis Pipelineagentanalysis
- Behavioral User Segmentation Analystagentanalysis
- System Performance Profiler with Instrumentationagentanalysis
- Product SWOT Analysis Generatoragentanalysis
- Glycoengineering Sequence Analysis Toolkitagentanalysis
- Seaborn Statistical Visualization Referenceagentanalysis
- scikit-bio Bioinformatics Analysis Skillagentanalysis
- User Feedback Sentiment Segment Analyzeragentanalysis
- SHAP Model Interpretability Guideagentanalysis
- DDD Ubiquitous Language Glossary Extractoragentanalysis
- Website SEO Audit with Subagent Delegationagentanalysis
- North Star Metric Classifier and Validatoragentanalysis
- SEO Content E-E-A-T Quality Analyzeragentanalysis
- Codebase Architecture Deep Analyzeragentanalysis
- ETE3 Phylogenetic Tree Toolkit Guideagentanalysis
- Codebase Architecture Code Path Traceragentanalysis
- Bitcoin Lightning Network Design Revieweragentanalysis
- ML Experiment Results Analyzeragentanalysis