Skip to main content
NEW · APP STORE Now on iOS · macOS · iPad Android & Windows soon GET IT
Prompts NetFlow Pandas Traffic Baselining

analyst security skill risk: medium

NetFlow Pandas Traffic Baselining

The prompt provides an overview, prerequisites, and seven steps for ingesting NetFlow/IPFIX data, computing hourly/daily distributions and per-host profiles with pandas, and detect…

SKILL 4 files · 2 folders

SKILL.md
---
name: implementing-network-traffic-baselining
description: "Build network traffic baselines from NetFlow/IPFIX data using Python pandas for statistical analysis, z-score"
---
# Implementing Network Traffic Baselining

## Overview

Network traffic baselining establishes normal communication patterns by analyzing historical NetFlow/IPFIX data to create statistical profiles of expected behavior. This skill uses Python pandas to compute hourly and daily traffic distributions, per-host byte/packet counts, protocol ratios, and top-N talker profiles. Anomalies are detected using z-score thresholds and IQR (interquartile range) outlier methods, enabling SOC analysts to identify deviations such as data exfiltration spikes, beaconing patterns, and unusual port usage.


## When to Use

- When deploying or configuring implementing network traffic baselining capabilities in your environment
- When establishing security controls aligned to compliance requirements
- When building or improving security architecture for this domain
- When conducting security assessments that require this implementation

## Prerequisites

- NetFlow v5/v9 or IPFIX flow data exported as CSV or JSON
- Python 3.8+ with pandas and numpy libraries
- Historical flow data (minimum 7 days recommended for baseline)

## Steps

1. Ingest NetFlow/IPFIX records from CSV or JSON exports
2. Compute hourly and daily traffic volume distributions (bytes, packets, flows)
3. Build per-source-IP baseline profiles with mean, median, standard deviation
4. Calculate protocol and port distribution baselines
5. Apply z-score anomaly detection to identify statistical outliers
6. Flag flows exceeding IQR-based thresholds as potential anomalies
7. Generate baseline report with anomaly alerts

## Expected Output

JSON report containing traffic baselines (hourly/daily profiles), per-host statistics, detected anomalies with z-scores, and top talker rankings with deviation indicators.

REQUIRED CONTEXT

  • NetFlow v5/v9 or IPFIX flow data exported as CSV or JSON
  • Python 3.8+ with pandas and numpy
  • Historical flow data (minimum 7 days recommended)

EXPECTED OUTPUT

Format
json
Schema
json · traffic baselines (hourly/daily profiles), per-host statistics, detected anomalies with z-scores, top talker rankings with deviation indicators
Constraints
  • include traffic baselines (hourly/daily profiles)
  • include per-host statistics
  • include detected anomalies with z-scores
  • include top talker rankings with deviation indicators

SUCCESS CRITERIA

  • Ingest NetFlow/IPFIX records from CSV or JSON exports
  • Compute hourly and daily traffic volume distributions
  • Build per-source-IP baseline profiles with mean, median, standard deviation
  • Calculate protocol and port distribution baselines
  • Apply z-score anomaly detection
  • Flag flows exceeding IQR-based thresholds
  • Generate baseline report with anomaly alerts

CAVEATS

Dependencies
  • NetFlow v5/v9 or IPFIX flow data exported as CSV or JSON
  • Python 3.8+ with pandas and numpy libraries
  • Historical flow data (minimum 7 days recommended for baseline)
Missing context
  • Exact input CSV/JSON column schema
  • Concrete z-score or IQR threshold values
  • Sample input data or code examples

QUALITY

OVERALL
0.75
CLARITY
0.90
SPECIFICITY
0.80
REUSABILITY
0.65
COMPLETENESS
0.70

IMPROVEMENT SUGGESTIONS

  • Add a minimal example CSV schema and sample pandas code for step 2-5
  • Specify desired output JSON structure with field names

USAGE

Copy the prompt above and paste it into your AI of choice — Claude, ChatGPT, Gemini, or anywhere else you're working. Replace any placeholder sections with your own context, then ask for the output.

MORE FOR ANALYST