security analyst security skill risk: medium

Insider Data Exfiltration DLP Analyzer

Analyze endpoint activity logs, cloud storage access, and email DLP events to detect data exfiltration patterns using behavioral baselines and statistical anomaly detection, includ…

Policy sensitive
Human review

SKILL 4 files · 2 folders

SKILL.md

Download

---
name: detecting-insider-data-exfiltration-via-dlp
description: "Detects insider data exfiltration by analyzing DLP policy violations, file access patterns, upload volume anomalies,"
---
# Detecting Insider Data Exfiltration via DLP


## When to Use

- When investigating security incidents that require detecting insider data exfiltration via dlp
- When building detection rules or threat hunting queries for this domain
- When SOC analysts need structured procedures for this analysis type
- When validating security monitoring coverage for related attack techniques

## Prerequisites

- Familiarity with security operations concepts and tools
- Access to a test or lab environment for safe execution
- Python 3.8+ with required dependencies installed
- Appropriate authorization for any testing activities

## Instructions

Analyze endpoint activity logs, cloud storage access, and email DLP events to detect
data exfiltration patterns using behavioral baselines and statistical anomaly detection.

```python
import pandas as pd

df = pd.read_csv("file_activity.csv", parse_dates=["timestamp"])
# Baseline: average daily upload volume per user
baseline = df.groupby(["user", df["timestamp"].dt.date])["bytes_transferred"].sum()
user_avg = baseline.groupby("user").mean()

# Alert on users exceeding 3x their baseline
today = df[df["timestamp"].dt.date == pd.Timestamp.today().date()]
today_totals = today.groupby("user")["bytes_transferred"].sum()
anomalies = today_totals[today_totals > user_avg * 3]
```

Key indicators:
1. Upload volume exceeding 3x daily baseline
2. Access to files outside normal scope
3. Bulk downloads before resignation
4. Off-hours file access patterns
5. USB/external device usage spikes

## Examples

```python
# Detect off-hours activity
df["hour"] = df["timestamp"].dt.hour
off_hours = df[(df["hour"] < 6) | (df["hour"] > 22)]
suspicious = off_hours.groupby("user").size().sort_values(ascending=False)
```

REQUIRED CONTEXT

endpoint activity logs
file_activity.csv

EXPECTED OUTPUT

Format

markdown

Constraints

include Python code examples
list key indicators

EXAMPLES

Includes two Python code snippets for baseline volume calculation and off-hours activity detection.

CAVEATS

Dependencies

Familiarity with security operations concepts and tools
Access to a test or lab environment for safe execution
Python 3.8+ with required dependencies installed
Appropriate authorization for any testing activities

Missing context

Desired output format or report structure for the analysis results.
Exact schema or column names expected in input CSV files.

Ambiguities

Description field is truncated mid-sentence.
Final code block is incomplete (ends abruptly after suspicious assignment).

QUALITY

OVERALL: 0.65
CLARITY: 0.70
SPECIFICITY: 0.65
REUSABILITY: 0.70
COMPLETENESS: 0.55

IMPROVEMENT SUGGESTIONS

Complete the truncated description and code blocks so the prompt is self-contained.
Add an explicit 'Output Format' section specifying how results should be returned (e.g., list of users, JSON schema).

USAGE

Copy the prompt above and paste it into your AI of choice — Claude, ChatGPT, Gemini, or anywhere else you're working. Replace any placeholder sections with your own context, then ask for the output.