Another fundamental aspect of Cybersecurity is collecting logs. Whether you're using Windows, Linux, or Mac. Whether you're using Sophos XG Firewalls or Fortinet. Whether you're using Cisco Switches and routers or Brocade. Nearly every system has a way of logging and often exporting those logs to an external system (which we'll discuss later). So with this paragraph, the following image is clearly the way forward right?
Wrong! A big part of logging is knowing what to log. When you turn on every possible logging option and log every single event on every single device, you're almost certainly going to overwhelm your security operations center. Siemplify has a great article on SOC burnout. If you have time to read it (it's rather long), I'd highly recommend checking it out. Burnout in this industry is extremely common, with 71% of SOC analysts saying they felt burned out on the job (see this short article on Dark Reading). Knowing this as you go into the industry and how to handle this burnout is essential as you grow in this field. However, this post is not about burnout, so I'm going to stop here.
Back to the topic at hand: Logging. This post is going to take a very generic view on logging, and then I'll post a few other times to focus in on logging in specific areas, like Windows logging, Linux logging, and firewall logging. But lets move on to what logging even is, in the context of Cybersecurity.
A log is a collection of events within a system. Each event details a particular action that was taken or a particular task that was conducted on a system. For example, in a properly configured corporate system, Windows should be logging failed logins. These are reported in the Windows Event Viewer within the Windows Logs > Security section as an "Audit Failure" with an Event ID of 4625.
These events are recorded locally in Windows, although you should be offloading them to either a centralized logging facility, or preferably to a Security Information and Event Management (SIEM) solution, but we'll talk about that a little later.
With the previous example, the "log" is the Security log for windows, and the "event" is one single Audit Failure in that image. If we haven't been failing to login to our system(s), but we are seeing hundreds of failed login attempts, we need to begin investigating whether it is a potential attempt at breaking into that system. We don't immediately assume it is, because it could just be a vulnerability scanner like Nessus failing to authenticate to the system repeatedly, which can generate similar log failures.
While this is one example of the kinds of events we should be logging, there are many. Not only should we log failed logins, we should log successful logins. We should be logging all traffic to our SSH servers (and failed/successful logins). In the future articles I'll post about Windows/Linux/Firewall/etc... logging, I'll talk more about each individual log we should be looking at as a bare minimum.
In case the above callout didn't make it clear, let me say it again. Logging is not optional. Ignoring any compliance requirements that exist for almost any industry, and almost always include logging, what happens if you're breached and you aren't logging? How do you know where the breach originated? How do you know if the attacker created persistence techniques for your infrastructure?
In a breach, you need as much logs as possible on as many systems as possible that are relevant to what you're looking at. Again, this doesn't mean log all the things, just log what is relevant to a breach (which I'll discuss in a future post). This way, your incident response team (or, alternatively, the company you hire to conduct an incident response, like Talos Intelligence) can determine how the breach occurred, and what the attacker did. Without logs (or, without enough logs), they won't know what happened. Alternatively, too much logging can slow the forensics because they can get overwhelmed with the mass of data.
So now you know what logging is, and you know why it is important. Now we need to move on to the next portion of this post: What are the vulnerabilities/weaknesses with logging?
If your logs are kept locally, you can be in for a world of hurt. When each individual device stores it's logs locally and has no centralization, there are a few problems that can come in to play. First, corruption or other loss of logs. Second and perhaps more damaging, an attacker removing logs during their attack.
Corruption or other loss
When you store your logs in a single location, you are risking the availability of that data (remember my last post about the CIA triad? If not, read it here). If that device's hard drive crashes, or the sector holding the logs gets corrupted, that data is gone, and almost certainly in an irrecoverable fashion. "Big deal" you might say. "I wasn't breached" you might say. I would argue it is a very big deal, primarily because I am an advocate for believing you are breached every day. Treat every device like it is already breached, treat every network as if it isn't trusted. This is the foundation of Zero Trust Networking, but we'll talk about that another day.
Back to logging, if you are currently breached, that device may have had some critical data in the logs regarding how you were breached and activities that attacker took. But... now it's gone.
Attacker erasing logs
Now what if you are currently breached and the attacker is currently using a device for activities? One of their top priorities is for you to not figure out that they are in your network. The easiest way to do that is to delete the logs. Any logs that might be tracking their attack. It's trivial to erase logs in Windows and just as trivial to delete logs in Linux (just delete the log file).
If these logs are deleted, you may never know the attacker is there. A savvy administrator might realize their logs aren't going back as far as they should and see that as a possible indicator of attack. But if your logs aren't centralized, how often are you looking through the logs, really? Be honest.
The solution to the previous problem is centralized logging. This can be done in a few ways. You can use a technology like syslog or syslog-ng to centralize logs in a single log server, but this only partly resolves the issue. Yes, it offloads logs (preferably in real time!) to a separate system, helping maintain the availability of the logs and keep them out of the attackers reach. However, it doesn't make them readable, which means you still aren't reviewing them, probably.
This is where Security Information and Event Management (SIEM) solutions come in to play. There are many vendors who are in this space. To name a few, you have Splunk, Elastic, ArcSight, and Graylog. They all have a unique way of handling the problem, but the underlying solution is the same: It takes a mass of logs into their solution, and somehow makes it readable and presentable. This is often done with Machine Learning algorithms or other advanced algorithms meant to filter out the noise and show the most high-risk activities. They all can take in logs from Windows, Linux, Firewalls, Switches, Routers, Cloud services like Azure, Applications, Web Servers, and so on.
SIEMs solve the problem of centralizing your logging (and thus retaining availability), and making them readable and actionable. A SIEM is a fundamental part of any SOC (Security Operations Center, if you don't remember), with many incorporating technologies like Security Orchestration, Automation, and Response (SOAR) and eXtended Detection and Response (XDR), both topics I'll need to cover another time.
So SIEMs are perfect and there are no problems with them, yes? No. I'm only going to give a single example, one that was rathe recent and threw the Cyber world into chaos for quite some time. Log4j is a very common and popular library for Java based applications that handles logging. An attack vector that was seen was putting the exploit code into packets that would then be sent to a centralized logging system that used Log4j, resulting in it executing the exploit and thus gaining access to the SIEM system. Yes, it's a rare problem to have something that crazy, but it's still something that happened.
Does that mean you shouldn't use a SIEM? Absolutely not. You should have a SIEM because the benefits of the technology far outweighs the chance that something like this will happen again. It will, because the insane exploits always happen in the real world. However, you never can remove 100% of risk in your organization, so it is essential to know when it is okay to have risk. The risk of this is minimal, so while you should think about it, it absolutely is not a big enough risk to remove SIEMs from your planning.