Threat Hunting with Claude Code and MCP

How AI agents compress days of manual correlation into hours of guided analysis.

Jan 20, 2026

Welcome to Detection at Scale, a weekly newsletter on AI-first security operations, detection engineering, and the infrastructure that makes modern SOCs work. I’m Jack, founder & CTO at Panther. If you find this valuable, please share it with your team!

This post continues our series on using AI agents and Model Context Protocol (MCP) servers to build and operationalize threat models for security operations:

Building Threat Models with MCP and AI Agents

Jack Naglieri

Jan 5

Read full story

Threat hunting has historically been a tedious, manual process: you get an indicator or set of suspicious behaviors from threat intelligence, then spend hours (or days) searching through logs to determine if there’s evidence of activity in your environment. Security analysts manually parse reports for IP addresses, domains, or attack patterns, then craft queries against their security data lake, iterating through different log sources until they either find something or exhaust their patience. It’s analytically intensive work that requires deep expertise in both attacker tradecraft and your organization’s specific infrastructure.

This is exactly the kind of work AI agents should excel at. The SOC analyst’s job is fundamentally analytical—synthesizing context from multiple sources, forming hypotheses about adversary behavior, and validating those hypotheses against evidence in your data. The limiting factors for AI-driven security operations aren’t model capabilities anymore; they’re access to the right data and tools (can the agent query your identity provider, logs, and HR systems?), human comprehension speed (can your team review findings fast enough?), and organizational alignment (is everyone clear on what threats matter most?). AI agents like Claude Code provide an intelligent harness that combines data access through MCP servers, tool execution, and iterative workflows to compress what used to take days of manual work into hours of guided analysis.

In our previous post, we demonstrated how to build threat models using AI agents and MCP, resulting in a comprehensive, prioritized set of threats. But a threat model sitting in a document doesn’t improve your security posture until you operationalize it. The right sequence is to prioritize threats with stakeholders, hunt for historical evidence of those threats in your environment, and then formalize successful hunts into detection-as-code rules. Every alert is a claim on your team’s time and attention, so you need to validate that a threat is relevant before committing to long-term monitoring.

This post will dive into using Claude Code and MCP to hunt for the threats you’ve already prioritized, gathering evidence that will directly inform which detections to build and how to scope them. We’ll walk through the stakeholder alignment process that turns threat models into hunt priorities, show concrete examples of AI-assisted hunting workflows, and explain how hunt findings translate into detection logic.

Stakeholder Alignment on the Threat Models

Before diving into threat-hunting workflows, you need organizational alignment on which threats matter most to your business right now. Many security teams either skip stakeholder buy-in entirely, leading to hunts that don’t align with leadership priorities, or they schedule a three-hour meeting where everyone argues about hypotheticals without clear outcomes. The threat model you’ve built makes this conversation substantially easier because you’re not debating what could theoretically happen; you’re reviewing documented threats with context.

The stakeholder alignment meeting should include security leadership, detection engineers, threat intelligence analysts, and incident response leads. Essentially, anyone who will either execute the hunts or respond to these specific threats. The agenda is to review your identified crown jewels (what systems are existential if compromised versus merely operational), discuss which threat actors are actually relevant to your industry and geography, and align on the top five priority threats from your model that warrant immediate hunting. Prior to this meeting, you should carefully validate the AI’s findings for accuracy/correctness.

The expected outcome is a ranked shortlist of three to five threat scenarios to hunt for with explicit business justification documented for each. The common pitfall is trying to hunt for everything at once, resulting in 20 different threat scenarios across 10 log sources with no clear hypothesis about what success looks like. Focused hunts with clear hypotheses win. For example, if you’ve aligned on contractor privilege abuse, compromised third-party integrations, and SSH lateral movement from developer workstations, you now have a concrete plan for the technical work ahead. The rest of your threat model doesn’t disappear; it becomes your backlog for future hunting sprints as you systematically work through priorities.

Hunting for Signals with MCP and Claude Code

You’ve aligned on three to five priority threats with your stakeholders. Now comes the technical work: hunting through your security data lake to determine if there’s historical evidence of these threats in your environment. This is where AI agents with MCP access compress weeks of manual query iteration into hours of guided analysis.

The workflow is straightforward. Start with a specific threat ID from your model, translate it into testable hypotheses, query your data lake through MCP servers, and iterate based on what you find. Throughout this process, the agent will query real log data, review past alerts, examine existing detection rules, and produce structured findings on where and how to create actionable detections.

Teaching the Agent to Hunt

Claude Code supports a feature called Skills, which are reusable instruction sets that teach the agent how to approach specific types of work.

A Skill is just a folder containing a SKILL.md file followed by instructions. Claude automatically discovers and loads Skills when their description matches the current task, applying specialized knowledge without you having to repeat the same prompting patterns in every conversation.

For threat hunting, this builds a repeatable thought process for the agent to orient on the task and understand effective hunting techniques.

Here’s how to create one:

1. Create the skill directory structure:

mkdir -p ~/.claude/skills/threat-hunter # Or in your local directory

2. Create the SKILL.md file at ~/.claude/skills/threat-hunter/SKILL.md:

---
name: threat-hunter
description: Hunt for cyber threats and investigate security incidents. Use when investigating alerts, hunting for IOCs, analyzing suspicious activity, or performing incident response.
---

# Cyber Threat Hunter

You are assisting a security analyst with threat hunting and incident investigation.

## Threat Hunting Methodology

### 1. Hypothesis-Driven Hunting
Start with a hypothesis about attacker behavior:
- "An attacker with initial access would enumerate cloud resources"
- "Compromised credentials would show unusual login patterns"
- "Data exfiltration would involve large outbound transfers"

### 2. The Pivot Loop
Effective hunting follows a continuous pivot pattern:
1. **Start** with a known indicator (IP, user, hash, domain)
2. **Query** for all activity involving that indicator
3. **Identify** new related indicators from the results
4. **Pivot** to those new indicators
5. **Repeat** until you've mapped the full scope

### 3. Correlation Priorities
When investigating, correlate across these dimensions:
- **Time**: What happened before/after the suspicious event?
- **Identity**: What else did this user/service account do?
- **Network**: What other systems communicated with this IP?
- **Host**: What other processes ran on this machine?

## Investigation Patterns

### Alert Triage
1. Review the alert and triggering events
2. Assess: Is this expected behavior for this user/system?
3. Pivot: What else happened in the same timeframe?
4. Scope: Are there similar alerts across other entities?
5. Conclude: True positive, false positive, or needs escalation?

### User Compromise Assessment
1. Establish baseline: What's normal for this user?
2. Check authentication: Unusual locations, times, or devices?
3. Review access: Did they touch sensitive resources?
4. Look for persistence: New MFA devices, API keys, OAuth grants?
5. Check lateral movement: Access to other accounts or systems?

### IOC Hunting
When given an indicator (IP, domain, hash):
1. Search for any historical activity involving the IOC
2. Identify all affected users and systems
3. Determine first and last seen timestamps
4. Map the attack timeline
5. Look for related IOCs to expand the hunt

## Key Questions to Answer

- **Who**: Which identities are involved?
- **What**: What actions were taken?
- **When**: What's the timeline of events?
- **Where**: Which systems/regions/networks?
- **How**: What techniques were used (map to MITRE ATT&CK)?
- **Why**: What was the likely objective?

Once saved, Claude Code will discover this Skill and apply it whenever you’re working on threat hunting or incident investigation tasks. We recommend refining this guidance based on your own preferred best practices, organizational context, and internal hunting methodologies.

After you create this file, you’ll need to quit and reopen Claude Code. Then, you can invoke the skill explicitly using /threat-hunter <prompt>.

Executing the Hunt for Past Activity

With your threat-hunter Skill in place and MCP servers connected to your SIEM, like mcp-panther, you’re ready to start hunting. The prompt structure is straightforward: simply point the agent at your threat model, specify which threat(s) to hunt for, and set guardrails on scope and output:

Using the threat-model.md file, hunt for evidence of threat T-INSIDER-002 (contractor privilege abuse) across the last 90 days. 

Check for:
1. Related existing detection rules and subsequent alerts
2. Authentication patterns from contractor accounts
3. Access to sensitive resources or data stores
4. Privilege escalation attempts or unusual permission changes

For each finding, assess: Is this expected behavior or anomalous? 
Document evidence with timestamps, actors, and affected resources. 
When identifying gaps, be specific and avoid ambiguity.

Store your findings in markdown format in a local hunts/ directory, 
recommendations in a recommendations/ directory, and track progress 
in a tracker.yml file.

A few notes on effective hunt prompts:

Start with the CRITICAL or HIGH-severity threats from your model (the ones your stakeholders prioritized).
You can optionally hunt entire threat categories (such as supply chain or insider threat) or individual threats, depending on how deep you want to go.
Specify the time range explicitly (30, 60, or 90 days) to balance pulling recent activities with the costs of looking back multiple months at once.
You should also request external research to ground the hunt in reality: “Include research on common detection methods and past attacks for contractor privilege abuse.”

Interpreting Hunt Results

Threat hunting rarely produces simple yes/no answers. The output from a successful hunt is a structured assessment that synthesizes evidence across multiple dimensions, including what activity occurred, the confidence level assigned to the conclusions, and the context gaps that prevented stronger findings.

The hunt report structure should communicate any uncertainty appropriately while still providing actionable intelligence. A complete hunt produces several outputs:

Executive summary with findings and confidence: States what you found (or didn’t find) and assigns a confidence level based on data completeness. When you see “No evidence of compromise” paired with MEDIUM confidence, it signals that the hunt found no anomalies, but there are detection gaps that could hide sophisticated activity. The confidence assessment tells you how much weight to give the conclusion.
Evidence analysis by activity type: Breaks findings into logical categories, like authentication patterns, resource access, privilege changes, or workflow modifications. Each piece of evidence gets assessed as EXPECTED (clearly legitimate business activity), NEEDS REVIEW (requires stakeholder verification to determine if it’s authorized), or ANOMALOUS (suspicious enough to escalate immediately).
Follow-up items prioritized by severity: Separates findings that need immediate investigation from those requiring routine verification. When the hunt identifies admin role assignments, secret deletions, or workflow modifications, it’s flagging them for validation.
Detection coverage gaps mapped to attack techniques: Documents where you lack visibility. If the hunt revealed you’re collecting CloudTrail but missing GitHub audit logs, or you have authentication events but no privileged access management telemetry, those gaps represent blind spots in your detection posture.

The hunt findings directly inform what happens next. Genuine anomalies or confirmed policy violations become high-priority detection rules, using the hunt queries you’ve already developed as the starting point for the detection logic. Coverage gaps become your instrumentation backlog, prioritized by the severity of threats you can’t currently see.

Not everything discovered during hunting needs to become an alert. Some detections perform better with scheduled hunting procedures (quarterly supply chain threat hunts, monthly privileged access reviews). The question then becomes “what’s the right operational cadence for monitoring this threat given its likelihood and triage automation?”

The Compounding Value of Agent-Assisted Hunting

At the beginning of this post, we outlined the traditional threat hunting workflow: analysts manually parsing intelligence reports, crafting queries, iterating through log sources, and hoping to find evidence before exhausting their patience. The limiting factors weren’t about whether your team was smart enough to find the threats—it was whether they had enough time to synthesize context from dozens of sources, enough stamina to iterate through query variations, and enough organizational alignment to prioritize the right hunts in the first place. AI agents fundamentally change this calculus by compressing days of manual correlation work into hours of guided analysis, freeing your team to focus on the analytical work that actually requires human judgment: validating hypotheses, assessing business impact, and deciding what deserves sustained monitoring.

What we’ve demonstrated in this post is the full cycle of operationalizing threat intelligence with AI agents and MCP. You start with structured threat models that document business context and detection gaps, align stakeholders on priority threats through focused conversations rather than endless debates, and then deploy agents with direct access to your security data lake to hunt for evidence across multiple log sources simultaneously. The agent synthesizes findings and produces structured assessments with appropriate confidence levels and gaps. This augmentation lets your team operate at a fundamentally different pace, turning quarterly threat-hunting exercises into weekly sprints where you continuously validate your threat model against real evidence in your environment.

The productivity gains compound over time because agents help you maintain detection quality, not just create initial rules. Hunt findings that reveal detection gaps become top priorities for instrumentation. Queries that successfully identify suspicious activity serve as the foundation for new detection-as-code rules. As your threat landscape evolves and new attack techniques emerge, you can re-run hunts against updated intelligence to validate whether your existing detections still provide adequate coverage or whether new blind spots have developed. This is the compounding advantage of AI-first security operations. Your team’s analytical capacity grows with every hunt rather than resetting to zero each time you need to investigate a new threat. In our next post, we’ll show how to formalize these hunt findings into production detection-as-code rules—translating the queries and correlation logic you’ve validated during hunting into sustainable, version-controlled detections that run continuously across your data lake.

Operationalizing AI-Assisted Hunting at Scale

If you’re building this workflow yourself with Claude Code and MCP servers, you now have a blueprint for agent-assisted threat hunting that integrates directly with your existing security data infrastructure. But most security teams don’t have the engineering capacity to maintain custom MCP integrations, develop hunting Skills, and orchestrate these workflows across dozens of analysts. Panther AI brings embedded AI agents directly into your security operations platform, making these capabilities available to your entire team without requiring everyone to become prompt engineers or MCP developers.

Panther’s AI agents have native access to your normalized security data lake and existing detection-as-code rules, enabling them to run ad hoc hunts like the ones we’ve demonstrated here, suggest alert tuning based on historical false-positive patterns, and recommend detection priorities based on threat model alignment. Instead of every analyst learning how to write the perfect hunting prompt, your team gets a shared intelligence layer that operationalizes these workflows consistently across your SOC.

If you’re ready to move from manual threat hunting to AI-assisted security operations, visit panther.com to see how Panther AI accelerates detection engineering and threat hunting workflows for modern security teams.