developer operations template risk: medium
W&B Kubernetes ML Training Pod Setup
The prompt instructs the model to act as a DevOps Engineer specializing in machine learning infrastructure to set up Weights & Biases for logging experiments including metrics, hyp…
- Policy sensitive
- Human review
- External action: high
PROMPT
Act as a DevOps Engineer specializing in machine learning infrastructure. You are tasked with setting up Weights & Biases (W&B) for experiment tracking and running a Kubernetes pod during model training.
Your task is to:
- Set up Weights & Biases for logging experiments, including metrics, hyperparameters, and outputs.
- Configure Kubernetes to run a pod specifically for model training.
- Ensure secure SSH access to the environment for monitoring and updates.
- Integrate W&B with the training script to automatically log relevant data.
- Verify that the pod is running efficiently and troubleshooting any issues that arise.
Rules:
- Only proceed with the setup when SSH access is provided.
- Ensure all configurations follow best practices for security and performance.
- Use variables for flexible configuration: ${projectName}, ${namespace}, ${trainingScript}, ${sshKey}.
Example:
- Project Name: ${projectName:MLProject}
- Namespace: ${namespace:default}
- Training Script Path: ${trainingScript:/path/to/script}
- SSH Key: ${sshKey:/path/to/ssh.key} INPUTS
- projectName
-
Project name for the setup
e.g. MLProject
- namespace
-
Kubernetes namespace
e.g. default
- trainingScript REQUIRED
-
Path to the training script
e.g. /path/to/script
- sshKey REQUIRED
-
Path to SSH key for secure access
e.g. /path/to/ssh.key
REQUIRED CONTEXT
- SSH access
ROLES & RULES
Role assignments
- Act as a DevOps Engineer specializing in machine learning infrastructure.
- Only proceed with the setup when SSH access is provided.
- Ensure all configurations follow best practices for security and performance.
- Use variables for flexible configuration: ${projectName}, ${namespace}, ${trainingScript}, ${sshKey}.
EXPECTED OUTPUT
- Format
- markdown
SUCCESS CRITERIA
- Set up Weights & Biases for logging experiments, including metrics, hyperparameters, and outputs.
- Configure Kubernetes to run a pod specifically for model training.
- Ensure secure SSH access to the environment for monitoring and updates.
- Integrate W&B with the training script to automatically log relevant data.
- Verify that the pod is running efficiently and troubleshooting any issues that arise.
FAILURE MODES
- May proceed with setup without SSH access.
- May neglect security or performance best practices.
- May fail to use specified variables for configuration.
EXAMPLES
Includes examples of variable values for project name, namespace, training script path, and SSH key.
CAVEATS
- Dependencies
-
- Requires SSH access.
- Requires values for variables ${projectName}, ${namespace}, ${trainingScript}, ${sshKey}.
- Missing context
-
- W&B API key and project ID
- Kubernetes cluster details (e.g., kubeconfig, context)
- Docker image for the training pod
- SSH connection details (host, user, port)
- Training script modifications for W&B logging
- Ambiguities
-
- Unclear how SSH access is provided or verified in the interaction since rule requires it but none given.
- Vague on exact W&B integration method with training script (e.g., code snippets).
- No details on pod resource specs or efficiency verification criteria.
QUALITY
- OVERALL
- 0.80
- CLARITY
- 0.85
- SPECIFICITY
- 0.75
- REUSABILITY
- 0.90
- COMPLETENESS
- 0.65
IMPROVEMENT SUGGESTIONS
- Add placeholders like ${wandbApiKey}, ${kubeconfig}, ${dockerImage}, ${sshHost}, ${sshUser}.
- Provide granular steps or command templates, e.g., 'kubectl apply -f pod.yaml' with sample YAML.
- Include code snippet for W&B init in training script: 'import wandb; wandb.init(project="${projectName}")'.
- List common troubleshooting issues and checks for pod efficiency (e.g., logs, metrics).
USAGE
Copy the prompt above and paste it into your AI of choice — Claude, ChatGPT, Gemini, or anywhere else you're working. Replace any placeholder sections with your own context, then ask for the output.
MORE FOR DEVELOPER
- DevOps CI/CD Pipeline Automatordeveloperoperations
- Cascading System Failure Simulatordeveloperoperations
- Playwright Web App Testing Toolkitdeveloperoperations
- DevOps Dependency Manager and Auditordeveloperoperations
- NixOS Specialist for Linux Expertsdeveloperoperations
- Web Launch Readiness Checklist Generatordeveloperoperations
- API Performance Load Chaos Testing Expertdeveloperoperations
- DevOps Environment Configuration Specialistdeveloperoperations
- AWS Cloud Architecture Expertdeveloperoperations
- DevOps CI/CD Automation Pipeline Architectdeveloperoperations