analyst security skill risk: medium

NetFlow Pandas Traffic Baselining

The prompt provides an overview, prerequisites, and seven steps for ingesting NetFlow/IPFIX data, computing hourly/daily distributions and per-host profiles with pandas, and detect…

SKILL 4 files · 2 folders

SKILL.md

Download

---
name: implementing-network-traffic-baselining
description: "Build network traffic baselines from NetFlow/IPFIX data using Python pandas for statistical analysis, z-score"
---
# Implementing Network Traffic Baselining

## Overview

Network traffic baselining establishes normal communication patterns by analyzing historical NetFlow/IPFIX data to create statistical profiles of expected behavior. This skill uses Python pandas to compute hourly and daily traffic distributions, per-host byte/packet counts, protocol ratios, and top-N talker profiles. Anomalies are detected using z-score thresholds and IQR (interquartile range) outlier methods, enabling SOC analysts to identify deviations such as data exfiltration spikes, beaconing patterns, and unusual port usage.

## When to Use

- When deploying or configuring implementing network traffic baselining capabilities in your environment
- When establishing security controls aligned to compliance requirements
- When building or improving security architecture for this domain
- When conducting security assessments that require this implementation

## Prerequisites

- NetFlow v5/v9 or IPFIX flow data exported as CSV or JSON
- Python 3.8+ with pandas and numpy libraries
- Historical flow data (minimum 7 days recommended for baseline)

## Steps

1. Ingest NetFlow/IPFIX records from CSV or JSON exports
2. Compute hourly and daily traffic volume distributions (bytes, packets, flows)
3. Build per-source-IP baseline profiles with mean, median, standard deviation
4. Calculate protocol and port distribution baselines
5. Apply z-score anomaly detection to identify statistical outliers
6. Flag flows exceeding IQR-based thresholds as potential anomalies
7. Generate baseline report with anomaly alerts

## Expected Output

JSON report containing traffic baselines (hourly/daily profiles), per-host statistics, detected anomalies with z-scores, and top talker rankings with deviation indicators.

REQUIRED CONTEXT

NetFlow v5/v9 or IPFIX flow data exported as CSV or JSON
Python 3.8+ with pandas and numpy
Historical flow data (minimum 7 days recommended)

EXPECTED OUTPUT

Format

json

Schema

json · traffic baselines (hourly/daily profiles), per-host statistics, detected anomalies with z-scores, top talker rankings with deviation indicators

Constraints

include traffic baselines (hourly/daily profiles)
include per-host statistics
include detected anomalies with z-scores
include top talker rankings with deviation indicators

SUCCESS CRITERIA

Ingest NetFlow/IPFIX records from CSV or JSON exports
Compute hourly and daily traffic volume distributions
Build per-source-IP baseline profiles with mean, median, standard deviation
Calculate protocol and port distribution baselines
Apply z-score anomaly detection
Flag flows exceeding IQR-based thresholds
Generate baseline report with anomaly alerts

CAVEATS

Dependencies

NetFlow v5/v9 or IPFIX flow data exported as CSV or JSON
Python 3.8+ with pandas and numpy libraries
Historical flow data (minimum 7 days recommended for baseline)

Missing context

Exact input CSV/JSON column schema
Concrete z-score or IQR threshold values
Sample input data or code examples

QUALITY

OVERALL: 0.75
CLARITY: 0.90
SPECIFICITY: 0.80
REUSABILITY: 0.65
COMPLETENESS: 0.70

IMPROVEMENT SUGGESTIONS

Add a minimal example CSV schema and sample pandas code for step 2-5
Specify desired output JSON structure with field names

USAGE

Copy the prompt above and paste it into your AI of choice — Claude, ChatGPT, Gemini, or anywhere else you're working. Replace any placeholder sections with your own context, then ask for the output.