Evolving from Atomic Alerts to Behavioral Signals
Reduce SIEM costs and improve detection accuracy by organizing security data into efficient, searchable signals
Welcome to Detection at Scale—a weekly newsletter covering security monitoring, cloud infrastructure, the latest breaches, and more. Enjoy!
SIEM is entering a new era that transcends traditional data storage. While security teams have historically segmented their data between hot and cold storage tiers, they can now leverage advanced storage formats like Apache Iceberg and modern query engines to build security data lakes capable of handling petabytes of data across multi-year retention periods. This data lake approach, combined with data labeling techniques similar to fact and dimension tables in traditional data warehousing, enables effective risk-based correlation, dramatically improving search performance across the entire security dataset and driving down compute costs. In this post, we'll explore how this emerging signaling pattern reshapes security monitoring and represents a practical evolution for security log data management.
What is a Signal?
A “signal” refers to the meaningful information to detect, while "noise" represents the unwanted distractions that interfere with that information1. Audit logging captures all activity in a given system, from read events to system changes to device check-ins. Distinguishing signal versus noise in security monitoring can be a fine line that’s distinguishable only by the surrounding behavioral context or business logic for those teams operate:
To refine the definition, a signal is any security-relevant log that helps gauge risk, security, or compliance information, for example:
Authz/Authn: SSH logins to production, IdP, SSO
Privileged Commands: Infrastructure creation/teardown, sudo commands
Data Movement: Uploads, downloads, copying, and deleting in corp/prod
In the universe of available audit logs, <1% are typically security-relevant. However, it’s hard always to know which 1% is in advance because we can’t anticipate any possible action an attacker would take during an incident. As a result, the signal pattern suggests amassing a large corpus of data and labeling the “known knowns,” which is why the need for highly scalable data layers has become essential.
The signal is the important data and the noise is everything that clutters it.
The building blocks to effective correlations
SIEM's most overstated problem statement is that "security teams are overwhelmed with alerts, and we need to reduce the noise." However, the basic approach has stayed the same, and we send an alert when any atomic malicious behavior is suspected. We must evolve our approach to collect evidence to prove that a given behavior is bad versus making generic assumptions. This applies to any technique for analysis, whether it's behavioral profiling, signature-based, or thresholding.
Atomic alerts can still be beneficial if the confidence and impact of a rule are high enough. Otherwise, one should consider using "security signals" that combine data points and correlate a particular entity's risk. Vertical security solutions are best served to deliver value on atomics, while SIEM better delivers a risk-oriented, evidence-based system. The reason is that vertical solutions are designed to be stateful. For example, a system like Wiz tracks:
A cloud resource's attribute history
Where this resource sits in the infrastructure graph
If resource changes cause a lapse in confidentiality, integrity, or availability
The signals approach suggests a new operational framework passing through various phases:
Label logs based on their security-relevance
Group multiple signals of varying impact and confidence by entity
Determine collective confidence and impact and take action
To read more on correlation, check out the posts below:
Decreasing Costs and Time-to-Resolve
The signal pattern makes storing a history of behaviors nearly indefinitely cost-efficient. This pattern parallels the "fact" and "dimension" tables used in business intelligence, where raw business data is transformed into actionable insight. In the security context, "facts" represent security-relevant behaviors, while "dimensions" encompass our entire log corpus.
The key advantages of this approach are:
Query Performance: By tagging and segmenting data into enriched tables, signals create a dramatically smaller dataset compared to raw logs. This reduction enables rapid information retrieval across extended periods.
Cost Efficiency: The reduced dataset size means fewer compute resources are needed for queries and analysis, significantly lowering operational costs.
Investigation Workflow: Signals should be the first stop during an investigation, with source data queried only when additional context is needed. This creates a natural investigation hierarchy that speeds up incident response.
In addition, the recovery phase of incident response can inform new signal creation, improving detection capabilities for similar future incidents.
Powering the Transition to Data Lakes
The evolution from traditional SIEM to security data lakes represents more than a storage transformation – it's a fundamental shift in how we approach security monitoring and analysis. The signal pattern serves as the bridge between these two worlds, enabling teams to maintain the rapid detection capabilities of SIEM while leveraging the scale and cost benefits of data lakes.
Modern security teams face dual challenges: they must detect threats quickly while maintaining years of searchable history for investigations and compliance. The signal pattern addresses both needs by creating a focused layer of security-relevant data that's fast to query and economical to maintain. Teams can optimize their storage and compute resources based on security relevance instead of treating all data equally or relying on rigid hot/warm/cold tiers.
As organizations generate more security telemetry, efficiently storing, processing, and analyzing this data becomes crucial. Technologies like Apache Iceberg and the signal pattern provide the foundation for scaling security operations without scaling costs proportionally. This pragmatic approach helps security teams maintain the historical context for thorough investigations and compliance requirements.
https://conceptually.org/concepts/signal-and-noise