Establishing a full-fledged Security Operation Center (SOC): "How to build a SOC"

Note: This document is still in progress. But you may find finished parts useful (hopefully)

Motivation

Unlike big enterprises, a lot of SMBs (even IT) generally doesn't have so much budget for security. When you search for reason for this, generally they think 'they won't be targeted by an APT actor' but indeed, that's not the case in many cases. For example, if a SMB providing any 3rd party services to an enterprise, they may be a low-hanging fruit for an APT actors to pivot towards real enterprise target. At the end of the day, they may even go bankrupt because of this failure of them to defend their network.

Definitions

SOC actually doesn't only consist of technologies but also people and processes. So, if you'll do all installations for your tools, you won't have a SOC in its real meaning but a SOC-environment. So, to fully operate a SOC, you need to written down all your processes, hire people, train people, practice this processes with these people. We will also mention about this steps.

Visibility and knowing your normal

The most critical part for a SOC is knowing your environment (normal) and the second part is getting visibility for all ongoing activites in this environment.

Sample questions are as follows;

1-) Should Server_Y (IP: 10.5.33.34) communicate to an UK IP via port 8443 with HTTPS protocol? Is there any business justification & approval for this?

2-) Should WorkstationA and WorkstationB communicate to each other & transfer a ~1MB file via port 445 at 9:00 in every morning?

3-) Will you know SHA2 hash of a downloaded file by UserX on WorkstationA? If this file opens a powershell.exe and run Get-Process cmdlet after this download, will you be seeing the executable name (Powershell.exe) and its command parameters (Get-Process)?

4-) If there's a beaconing PING traffic to a Chinese IP, will you see & know whether it's normal or not? May be it's a malware? May be it is a CDN traffic for a critical business app? Do you aware about your policies for cloud CDN usages?

Two sides of visibility: "Network vs. Endpoint" (and filling the gaps between these)

There're some machines that we can install agent and there are some, we can't. (Printers, cameras, some OT devices, other IOTs etc.) When you install a kernel-mode agent in a system, in most cases, you can see almost all ongoing activities on this machine. Like DLL loads, registry changes, process creations & terminations, network connections etc. Sysmon can be used for these purposes.

For the ones, we cannot install any kernel-level agent, there is always a chance for network visibility. Like if there is a brute-force attempt towards a network-aware smart printer, or an attacker trying to penetrate to an ICS device or they laterally move from a workstation (patient 0) to a crown-jewel-data-holding server using SMB etc. Zeek or Suricata can be used for these purposes. Beside this, some attacks are easier to detect using network data since they consist of group of network anomalies rather than individual system events.

Our challenge is generally filling the gaps between these two data type; network and endpoint data. When a file is copied from one-endpoint to another one, sometimes you need to track this file inside the network data especially if one-side of the data-transfer doesn't have any agent installed etc. Community ID project can be used to correlate fingerprint of same activity from both endpoint and network perspective. It's like saying; "(...) the file we've just saw in the wire data, belongs to a data-transfer, triggered by WorkstationA, to a destination named WorkstationB, which doesn't have any endpoint data recorder installed etc.

Threat Intelligence and Detection Engineering

Do you know what are the possible threat actors will target your company/region/sector and their possible behaviours (TTP) when they've penetrated your perimeter? Are you ready for detecting these? MITRE actually created a guideline for this attacker knowledge. You can select any threat-group and know their tools and tactics to detect related activities. Detection Engineering is another side of the story. When you know what-to-detect, you should also know how-to-detect and test your detections periodically. You should know your capabilities of tools like logs they may generate and logs they're currently generating with their active configs, so that you'll know if there's any change needed to get correct logs to your central log manager / SIEM. You can refer to Cyber Analytics Repository for this analytics guidance. One other critical part of detection engineering is tests. You should construct test mechanisms to know your analytics are working properly each and everytime. You may create an automatic validation & test pipeline for this purpose which will be very effective. This may also help you to track changes in your detection logic. (Git etc.)

They're also an open source analytics project called Sigma which provides free and relatively-good analytics for threat detection. You can also use free Yara signatures for file/mail scanning purposes and free Zeek and Snort/Suricata signatures for your network detections.

To track latest malware and attacker activities, you can use MISP

Metrics, measurements and maturity

Like in threat-detection and response processes, situational awareness/visibility is also import for your SOC and its metrics. How well do you handle? What's your average triage and IR time? Are they mature? Is there any analyst burnout? Can your anaylsts do proper investigations before and after an investigation? Measuring this kind of metrics and knowing your maturity is also very important for your future-budget plannings and possible improvement plans.

Tooling and architecture

When we come to tooling & architecting the SOC, there're a lot of different approaches and strategies. But actually it's all about quality of your people and your budget. Does your people know scripting? Are they comfortable with open-source tools and integrations? As an example we can consider SIEM/logging solution; if you want to go with Elastic, you should carefully plan your hardware and maintenance costs -even for a simple authentication to your Kibana, there will be an extra effort- but as you know, there won't be an per EPS/data volume licensing etc.

Here some other fundemantals tools for your SOC and free and commercial alternatives;

  • Antivirus: Free: ClamAV | Commercial: Kaspersky
  • EDR: Free: Sysmon, Osqyery Commercial: Carbon Black, Fireeye HX
  • SIEM: Free: Elastic | Commercial: Splunk, QRadar
  • NDR/NTA: Free: Zeek, Suricata, Security Onion | Commercial: Corelight, Fidelis, Vecta
  • Case management/SOAR: Free: theHive | Commercial: Resilient, ATAR
  • TIP: Free: MISP | Commercial: ThreatQ, Threat Connect
  • Attack simulation: Free: Caldera | Commercial: Picus, Verodin
  • Vulnerability scanner: Free: Open VAS | Commercial: Security Center/Nessus
  • Live response/forensics: Free: FastIR | Commercial: Kape, IREC
Organization and governance

Organization of your SOC is also very important. How many people are they? Any dedicated IR person/people? Is your alert monitoring team will also create your detections? How will escalations be done? Is there any specific triage team or every-alert will go to IR team (which will create an overload to your IR people) I

I recommend at least these five function bodies, even for a SMB. (There may be only 1 person doing security but management should be aware of this need / specialization need so that they may plan future hirings, trainings, performance evaluation this 1-hero or out-source some parts of SOC to a 3rd party -which is a brilliant idea sometimes especially if they can't hire good people for their team & retain them)

  • Engineering function: Installations, automations, integrations, scripting etc.
  • Intel/Detection function: Creating & testing analytics, tracking intelligence & TTPs, notifying to relevant teams (monitoring, vulnerability management etc.)
  • Monitoring/Response: Monitoring alerts generated by detections and responding incidents
  • Vulnerability / Threat Management: Following vulnerabilities, scanning systems, management of threats relevant to threat intelligence
  • Hunting team: Highlighting & searching anomalies within IT/OT systems before an actual incident happens. Consuming intelligence and notifying monitoring team when something suspicious is found.
Processes, escalations, playbooks and runbooks

TBD

Feedback-loop and knowledge sharing

TBD

Self-healing, optimization and automation

TBD

Final words

TBD

Furkan ÇALIŞKAN

Read more posts by this author.