Incident Management Within the SOC: Playbooks and DevOps Explained

I posted about current trends in security including heavy investments in Secure Access Service Edge (SASE) and Security Orchestration, Automation, and Response (SOAR). Both of these topics are increasing the need for DevOps skills so I thought I would write a post about how DevOps fits into a security operation center. This post will explain the difference between DevOps and programming, how workflow management can be automated, and providing a better understanding of where security orchestration, automation, and response technologies are used within an organization.


First, let’s look at the concept of a playbook.

Playbooks

Playbooks represent the steps taken when a specific type of incident occurs. For example, a SOC that is responsible for protecting endpoints will need to make plans for how to respond to when a virus outbreak occurs. Without a plan, people within the SOC won’t know what to do and the result will be a poor response. A playbook works through each step of the incident management process starting with building a team to detecting, responding, and eventually returning the system back to a normal state post-incident. The incident response consortium found at www.incidentresponse.com (Links to an external site.) offers free playbooks broken into different stages of responding to common security events. Those stages are the following

  • Prepare – How to set up the program that will be responsible to perform the steps within the playbook.
  • Detect – How to determine when something has occurred triggering the response to the event.
  • Analyze – How to confirm or deny the incident is real and what it is
  • Contain – How to prevent the threat from spreading
  • Eradicate – How to prevent further damage and resolve the problem
  • Recover – How to return the impacted back to an operational state
  • Post-Incident handling – Lessons learned and a review of how to improve future response

The following image comes from the playbook for the eradicate stage of responding to a virus outbreak found at www.incidentresponse.com (Links to an external site.).

The first thing to look at is the format being used. Playbooks have a start and endpoint and work in a forward-moving style working from the starting point to one or more ending points. An example of having more than one ending point would be a playbook for troubleshooting a problem, where if an answer can be found then the playbook would end following that path while if an answer can’t be found, a “call customer support” option can be another possible ending. Steps are highlighted but the details are not displayed. It is common to align detailed documentation explaining what needs to occur for each step within a playbook’s notes.

This example playbook does not have a decision point, which is commonly represented as a diamond and asks a question such as “does this do something”, which Yes would have one workstream while “no” would have another. Either Yes or No workstreams could loop back to an earlier state creating a loop. An example of a loop is during troubleshooting steps within a playbook having the next step ask to wait five minutes and see if something comes up. If Yes, the workstream can continue while No could loop back to wait five minutes essentially asking the analyst to wait until a response comes up. The next image shows common images used within playbooks.

Automation

Developing playbooks is the first step to organizing how a SOC responds to incidents. Another useful step is applying automation when applicable. Automation is ideal for simple tasks such as sending an email, triggering a tool to apply some change, or copying something for future evaluation. Complex tasks that involve human behavior are not ideal for automation such as detecting social engineering attacks or predicting if a user will like something. Technologies such as artificial intelligence based on leveraging big data are improving automation capabilities, but humans will and are always needed. Automation is ideal for reducing the SOC’s tedious and mundane tasks so the analysts can use their time for more complex work rather than a way to remove the need for having SOC analysts.

I covered security orchestration, automation, and response (SOAR) in a previous post HERE. The next image shows Cisco SecureX allowing for configuring automation within a playbook. This example playbook explains how to send a URL to a cloud-based threat intelligence tool so it can be evaluated for risk. Steps include going to a website, logging into the website to validate you have a valid license to use the service, entering a URL, and pulling down the results. This isn’t a complex task but it would take a SOC analyst 5-10 minutes to do this entire process manually leading to lots of wasted time if 50 or more websites need to be evaluated each week. The next image shows the playbook has captured each step and an analyst can click each step and apply automated steps to be taken. First, the URL being evaluated is checked to ensure it is indeed a valid URL. The login process is broken down into steps and automated with the final step of pulling the results and pasting them in a Cisco Teams collaboration space. By applying automation to this playbook, the SOC can simply drop URLs into this playbook and the results will automatically end up within the SOC’s collaboration space reducing associated manual steps to get this data.

Playbooks can be much more complex such as the example shown next taken from Splunk’s SOAR known as Phantom. As playbooks increase in complexity, case management concepts can be applied alerting people to review outcomes and apply input as the playbook is executed.

The goal for many SOARs is saving time leading to saving the SOC money. The next image shows how the main dashboard for Phantom is focused on the time it takes to resolve events leading to dollars saved. The more automation and orchestration that can be applied to a playbook, the more time can be saved leading to a more effective SOC.

The next image shows an example of clicking into a step within a Phantom playbook and configuring automation. For those that have programming backgrounds, this environment will look very familiar. This leads us to the last topic, which is DevOps.

DevOps

Programing means creating or modify a program using a programming language. There are reasons to write your own programs however, there are also reasons to speed up the time to value by either using an open-source tool requiring less customizing than building a tool from scratch or just purchasing a fully operational tool from a vendor. Programming allows you to build the exact tool you want however, you also are responsible for all updates, support, etc. Open source is free and allows you to start with a working tool however, open-source does not have vendor support and tends to require a lot of customization work before it can provide value. Buying a tool has support and typically much easier to use than an open-source tool however those benefits come with a cost.

Looking back at the SOC, they will acquire tools to accomplish different goals. Regarding automation, the goal is not to build new tools meaning a programmer is not needed to write new software. The focus of automation from an IT operations viewpoint is allowing tools to work together leaving APIs and other capabilities. An API is essentially a way for two or more systems to speak to each other just like humans interact with computers. This concept of programming focused on making things work together has its own focus known as DevOps.

DevOps Certifications

Cisco is one of the many vendors that have changed the type of training provided for the future IT professionals. The biggest change to the Cisco certification program is a huge focus on DevOps. In the Cisco world, this is known as DevNet meaning a network professional that can program tools to work with each other. Programming languages such as Python and C++ are great but in the DevOps world, concepts such as YAML, YANG, learning automation with tools such as Ansible are unique to DevOps and targeting IT operations vs what is common with programming courses. The next image shows a comparison of the traditional network engineer learning path verse a software-focused engineer looking to learn DevOps.

For those interesting in gaining some hands-on experience with DevOps labs, Cisco offers free online labs you can access right now. Go to https://developer.cisco.com/site/sandbox (Links to an external site.) and create a free account. The next image shows examples of different DevOps labs you can access right now.

An example of one lab is sending commands to a Cisco router using RESTCON, which is a much better option to send commands than older protocols such as SNMP. One lab has you use POSTMAN, which is a tool allowing you to have a GUI to send RESTCON commands as well as save those commands within collections to simplify DevOps steps. POSTMAN is a free tool found at https://www.postman.com/downloads/ (Links to an external site.).

Wrap Up

I’ll conclude this lesson by bringing all of the concepts together. Every organization has some team responsible for security, which makes up the security operation center (SOC). SOCs continue to have challenges responding to threats and using automation as a way to reduce the number of tedious and mundane tasks allowing analysts to spend time on more complex assignments. Technologies such as security orchestration, automation, and response (SOAR) require a unique style of programming skills commonly referred to as DevOps. DevOps skills are in demand and will continue to increase in importance allowing for a rich career path for the future IT professionals. Cisco offers free labs for those interested in learning more about DevOps. Hope you enjoyed this lesson and now have a better understanding of why DevOps is such an important topic to include in your cybersecurity courses. Be safe and stay secure!

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.