One important concept I will cover in this post is a focus on how organizations take action on the data they received. Many new security professionals misunderstand this concept and believe the more data you have, the better off you are. I’ve seen organizations register their tools to random threat intelligence feeds with the thought of “more data makes them better”. (See my threat intelligence post to learn more about why this is a bad idea). I’ve had large SOCs show me many dashboards full of data from various flashy tools, but the truth is, none of these tools or data sources matter. The true value of the data you collect is based on how it is used.
With that being said, I believe there is more value for a student looking to get into security to understand what type of data they could obtain from a security tool rather than understanding what a security tool does. The same concept applies to a security operation center (SOC). If an analyst can’t benefit from a data source, that source is not providing value. A prime example is the famous Target breach, which the SOC had the tools and the alerts but the SOC wasn’t using the data effectively leading to a failure in security.
Beginning of Security Alerts
Let’s kick off this topic by going back in time to the beginning of cybersecurity when tools and alerts were very limited. I remember hearing Jeff Moss (founder of Blackhat and DEF CON) speak about working for an IPS vendor who would sell you appliances that would notify Jeff’s beeper when an event was seen (this occured sometime in the 1980s). This allowed Jeff to remotely log in and see the alert. Fast forward a few years from that point and vendors are now unable to manually monitor events based on the volume and velocity opening up data logging to their customers. As organizations developed security operation centers to accommodate this requirement of managing their own event data, they realized they needed a centralized tool to all of the events. This need for a single tool led way to the Security Information and Event Management (SIEM) market, which many organizations have some form of SIEM today.
A SIEM generally performs a few steps to the data it receives. The following is a generalized summary.
- First: Data is parsed into fields and link associated values. An example is identifying the time variable and time values (time=09:43)
- Second: Data is normalized and categorized to reduce the collected data into a more useful and organized format.
- Third: Data is enriched by applying additional data and adjustments to give it more value.
- Forth: Data is indexed so it can be quickly identified during quires
- Fifth: Data is stored and referenced
The following diagram represents a summary of what a SIEM will do with the data it receives. The goal is to reduce hundreds of thousands of logs into manageable alerts that a SOC analyst can investigate. Duplicate alerts can be consolidated into one alert with a counter and indexing can allow searching without having to scan all data to find the content of interest. All of this can be converted into reports, dashboards, and other tools allowing for useful action to be taken by the SOC.
A SIEM is extremely useful but does not solve all of the challenges faced by analysts on today’s networks. For starters, SIEMs do not take any action on events. As a result, organizations have to develop their own processes regarding how they manually respond to events. These processes can be converted into repeatable responses also known as Playbooks, which have the potential to be automated with the right technology. As the volume and velocity of events increases, analysts are finding themselves once again experiencing the data overload problem even with a SIEM. This new version of the data overload problem is not based on data overload but instead seeing too many consolidated alerts that need to be manually investigated. So essentially, organizations with SIEMs are now experiencing case management overload leading to spending 100% of their time responding to a never-ending river of events. Plus the up keep of the SIEM tool continues to increase in complexity causing useful dashboards to turn into useless displays of data.
Another challenge with a SIEM is its inability to deal with all of the potential and available data sources. Consider big data as an example which represents data sizes that are too big for a SIEM to collect. Another example is unstructured data that can’t be properly parsed by a SIEM such as images or data scraped from Social media sources. This opens up the need for additional tools to accommodate data the SIEM can’t handle such as Hadoop for bigdata meaning a requirement for more infrastructure and complexity.
Even data that can be collected by a SIEM may not be processed properly without custom parsing or tuning requiring lots of manual labor. One situation I dealt with for a customer was importing Syslog data from Cisco Stealthwatch (a NetFlow security tool) into Splunk (a SIEM). Syslog allows for an open data format, which is bad for SIEM vendors that expect data in a specific format so the SIEM knows how to parse the data. Using the default Syslog template within Stealthwatch, I found Splunk was not able to properly parse the Stealthwatch data. The results looked like the following, which has the data I want but I can’t take any useful actions with it. The arrows represent key data points but Splunk didn’t index them properly.
I had to research the issue and found a better template (on the IBM QRadar website of all places) before I was able to get the Stealthwatch data to be properly parsed within Splunk. By doing so, I was able to quickly develop useful widgets and reports as shown within minutes.
The following two images show the default template compared against my custom template for sending data to Splunk. The point of this example is tuning and troubleshooting SIEM data parsing can be extremely tedious and complex. Imagine the skillset required and time needed to manage 50 or more tools sending data to a SIEM!
The SOAR Market
There are key features SIEM technology lacks, which SOCs are throwing money at to obtain. First, organizations want automation of actions based on playbooks so their analysts can focus on more complex tasks. SOCs also need a way to manage the lifecycle of an event more commonly known as case management ensuring SOC and non-SOC teams that are involved are tracked and held accountable for their actions. SOCs need to accommodate all data sources beyond what a SIEM is limited to process to ensure they are aware of the current threat landscape. All of these requirements were heard by the industry and led to the creation of the Security Orchestration, Automation, and Response (SOAR) market. The next figure shows a comparison between SIEM and SOAR.
There are a few general concepts I have found regarding most SOAR offerings in today’s market.
- Most SOAR offerings require a SIEM for its data
- SOAR vendors will instruct customers to only automate simple tasks.
- Default playbooks are not as good as real-world actions needed. It is best to modify what exists and/or build your own which takes time
- Responses can consider many things including alerting people, sending notifications and instructing systems to do something
Many popular SIEMs vendors have either acquired a 3rd party SOAR or built their own SOAR offering to fill in the void of capabilities left by their SIEM platform. Splunk acquired Phantom. IBM QRadar (its SIEM) has Resilience as its SOAR. Which vendor is best for an organization? That depends on which tool is capable of delivering the best results. In general, these are factoring an organization must consider regarding their SIEM / SOAR investment
- Many SIEM / SOAR vendors bill on data usage, which can be measured as events per second. The more data you send at the system, the higher your periodic bill will be or the larger license you will need to acquire. Vendors sometimes give away their hardware/software knowing an organization will become dependent on them as they send data, which the vendor can slow increase billing as more data is being seen.
- Reducing the data means a smaller bill for many SIEM/SOAR vendors. This has led to middleware and other tools that clean up data before it is processed by a SIEM reducing their overall SIEM/SOAR expenses.
- Default SOAR automation and playbooks are not great. You must be ready to develop or modify templates to meet your custom needs to see the true value
- The value of a SIEM / SOAR solution is based on the quality of the data received. If data is not received regarding the security posture of a certain part of the network, you will be blind to events within that part of the organization. If the data is not properly processed and managed by the SIEM / SOAR, you will also be blind to the source of that data based on the poor results you will receive.
- Somebody has to be assigned to the solution and business outcomes must be developed in order to measure the value received. Not having measurable value puts the SIEM/SOAR project at risk of being viewed as a failure.
XDR Comes to Market
One final data management concept that has recently hit the security market is XDR. This concept stems from endpoint detection and response (EDR), which is an endpoint threat detection solution that combines real-time monitoring and data collected from endpoints to automate response and analysis. If you read this closely, you should find this sounds pretty similar to what the SIEM/SOAR market is trying to do however, EDR is just for endpoints. What has occurred is the EDR vendors also saw this connection and decided to open up their offerings to devices outside of endpoints. As a result, they changed the E representing endpoint to X, which now represents extended or extended detection and response (XDR). XDR means everything else making XDR similar to what the SIEM/SOAR market is offering to solve causing confusion between the value from either approach. The following is an example of LogRhythm, a market leader for SIEM now claiming they are a next-generation SIEM or as they put it … XDR!
I’ll close this post with the same statement as I opened with, which is the value of data is based on how it is used. What students need to understand is organizations have specific business outcomes they want to obtain from tools such as SIEMs, SOARs, and XDR. Organizations are throwing tons of money at achieving these outcomes opening the door for students to learn skills such as programming, leveraging APIs, and understanding data workflow, which all will make them very valuable for solving this data process requirement.
At Cisco where I work, we have the new DevNet certification program targeting these skillsets. The focus is on how to achieve business outcomes using existing tools, APIs, and expected data rather than traditional security education regarding what a security tool does and how to configure it. I highly recommend people looking to get into security or those in security but want to modernize their skills check out https://developer.cisco.com/ (Links to an external site.). Also include learning more about leveraging SIEM, SOAR and XDR for business outcomes as part of your security education based on API and other programming learning tracks. These skills are critical and highly desired by SOCs around the world.