Silly human – logs are for machines (too)

Here is an anecdote from a recent interaction with an enterprise application in the electric power industry:

1. Dave the developer logs all kinds of events. Since he is the primary consumer of the log, the format is optimized for human-readability. For example:

02-APR-2012 01:34:03 USER49 CMD MOD0053: ERROR RETURN FROM MOD0052 RETCODE 59

Apparently this makes perfect sense to Dave:   each line includes a timestamp and some text.

2. Sam from the Security team needs to determine the number of daily unique users. Dave quickly writes a parser script for the log and schedules it. He also builds a little Web interface so that Sam can query the parsed data on his own. Peace reigns.

3. A few weeks later, Sam complains that the web interface is broken. Dave takes a look at the logs, only to realize that someone else has added an extra field in each line, breaking his custom parser. He pushes the change and tells Sam that everything is okay again. Instead of writing a new feature, Dave has to go back and fill in the missing data.

4. Every 3 weeks or so, repeat Step 3 as others add logs.

The EPS Myth

Often when I engage with a prospect their first question is “How many events per second (EPS) can EventTracker handle?” People tend to confuse EPS with scalability so by simply giving back an enormous-enough number (usually larger than the previous vendor they spoke with) it convinces them your product is, indeed, scalable. The fact is scalability and Events per Second (EPS) are not the same and many vendors get away from the real scalability issue by intentionally using the two interchangeably. A high EPS rating does not guarantee a scalable solution.If the only measure of scalability available is an EPS rating, you as a prospect should be asking yourself a simple question. What is the vendor definition of EPS? You will generally find that the answer is different with each vendor.

  • Is it number of events scanned/second?
  • Is it number of events received/second?
  • Is it number of events processed/second?
  • Is it number of events inserted in the event store/second?
  • Is it a real time count or a batch transfer count?
  • What is the size of these events? Is it some small non-representative size, for instance, 100 bytes per event or is it a real event like a windows event which may vary from 1000 to 6,000 bytes?
  • Are you receiving these events in UDP mode or TCP mode?
  • Are they measuring running correlation rules against the event stream? How many rules are being run?
  • And let’s not even talk about how fast the reporting function runs, EPS does not measure that at all.

At the end of the day, an EPS measure is generally a measure of a small, non-typical normalized event received. Nothing measured about actually doing something useful with the event, and indeed, pretty much useless.

With the lack of definition of what an event actually is, EPS is also a terrible comparative measure. You cannot assume that one vendor claiming 12,000EPS is faster than another claiming 10,000EPS as they are often measuring very different things. A good analogy would be if you asked someone how far away an object was, and they replied 100. For all the usefulness of the EPS measure the unit could be inches or miles.

EPS is even worse for ascertaining true solution capability. Some vendors market appliances that promise 2,000 EPS and 150 GB disk space for log storage. They also promise to archive security events for multiple years to meet compliance. For the sake of argument let’s assume the system is receiving, processing and storing 1000 windows events/sec with an average 1K event size (a common size for a Windows event). In 24 hours you will receive 86 million events. Compressed at 90% this consumes 8.6GB or almost 7% of your storage in a single day. Even with heavy compression it can handle only a few weeks of data with this kind of load. Think of buying a car with an engine that can race to 200MPH and a set of tires and suspension that cannot go faster that 75MPH. The car can’t go 200, the engine can, but the car can’t. A SIEM solution is the car in this example, not the engine. Having the engine does not do you any good at all.

So when asked about EPS, I sigh, and say it depends, and try to explain all this. Sometimes it sinks in, sometimes not. All in all don’t pay a lot of attention to EPS – it is largely an empty measure until the unit of measure is standardized, and even then it will only be a small part of overall system capability.

Steve Lafferty

Difference between IT search and Log Management

Came across an interesting blog entry  by Raffy at Splunk. As a marketing guy I am jealous as they are generating a lot of buzz about “IT Search”. Splunk has led a lot of people that are knowledgeable to wonder how this is something different than what all the log management vendors have been providing.

Still, while Raffy touched on what is one of the real differences between IT Search and Log Management, he left a few of the salient points out in the discussion of a “connector” and how a connector puts you at the mercy of the vendor to produce the connector, and what happens when the log data format changes?

Let’s step back — at the most basic level in log management (or IT Search for that matter) you have to do 2 fundamental things, you have to help people  1) collect logs from a mess of different sources, and 2) help them do interesting things with them. The “do interesting things” includes the usual stuff like correlation, reporting, analytics, secure storage etc.

You can debate fiercely the relative robustness of collection architectures – and there are a number of differences if you are evaluating vendors you should look at. For the sake of this discussion however most any log management system worthy of its salt will have a collection mechanism for all the basic methods – if you handle (in no particular order) ODBC, Syslog, read the Windows event format, maybe SNMP, throw in a file reader for custom applications, well you have the collection pretty much covered..

The reality is, as Raffy points out, there are a few totally proprietary access methods to get logs like Checkpoint. It is far easier for a system or application vendor to write one of the standard methods. So getting access to the raw logs in some way, shape or form is straightforward.

So here is where the real difference between IT search and Log Management begins.

Raffy mentions a small change in the syslog format causing the connector to break. Well syslog is a standard so if it would not break any standard syslog receiver, what it actually meant is that the syslog message has not changed but the content had.

Log Management vendors provide “knowledge” about the logs beyond simple collection.

Let’s make an analogy – IT Search is like the NSA collecting all of the radio transmissions in all of the languages in the entire world. Pretty useful. However, if you want to make sense of the Russian ones you hire your Russian expert, Swahili, your Swahili expert and so on. You get the picture.

Logs are like languages — the fact of the matter is the only thing that is the same about logs is that the content is all different. If you happen to be an uber-log weenie and you understand the format of  20 different logs, simple IT Search is really powerful. If you are only concerned about a single log format like Windows (although Windows by itself is pretty darn arcane), IT Search can be a powerful tool.  If you are like the rest of us whose entire lives are not spent understanding multiple log formats, or get really rusty because many of us often don’t get exposed to certain formats all the time, well, it gets a little harder. What Log Management vendors do is to help you ( as the user) out with the knowledge – rules that categorize important event logs from unimportant ones, alerts, reports that are configured to look for key words in the different log streams. How this is done is different from vendor to vendor – some normalize, i.e. translate logs into a standard canonical format, others don’t. And this knowledge is what can conceivably get out of date.

In IT Search, there is no possibility for anything to get out of date mainly because there is no knowledge, only the ability to search the log in its native format. Finally, if a Log Management vendor is storing the original log and you can search on it, your Log Management application gives you all the capability of IT Search.

Seems to me IT Search is much ado about nothing…

– Steve Lafferty

Defining SIM/SEM Requirements

The rational approach to pretty much any IT project is the same…define the requirements, solutions, do a pilot project, implement/refine and operationalize.

Often you win or lose early at requirements gathering time.

So what should you keep in mind while defining requirements for a Security Information and Event Management (SIEM) project?

Look at it in two ways:

  1. What are the trends that you (and your peers) have seen and experienced?
  2. What are the experts saying?

Well, for ourselves, we see a clear increase in attacks from the outside.  These are increasingly sophisticated (which is expected I guess since it’s an arms race) and disturbingly indiscriminate. Attacks seem to be launched merely because we exist on the Internet and have connectivity and disconnecting from the Internet is not an option.

We see attacks that we recognize immediately (100 login failures between 2-3 AM). We see attacks that are not so obvious (http traffic from a server that should not have any). And we see the almost unrecognizable zero-day attacks. These appear to work their way through our defenses and manifest as subtle configuration changes.

Of the expert prognosticators, we (like many others) find that the PCI-DSS standard is a good middle ground between loosely defined guidelines (HIPAA anyone?) and vendor “Best Practices”.

The interesting thing is that PCI-DSS requirements seem to match what we see. Section 10 speaks to weaponry that can detect (and ideally remediate) the attacks and Section 11.5 speaks to the ability to detect configuration changes.

Its all SIEM, in the end.

So what are the requirements for SIEM?

  1. Gather logs from a variety of sources in real-time
  2. The ability to detect (and ideally remediate) well recognized attacks in real-time
  3. The ability (and more importantly habit) to extract value from raw logs for the non-obvious attacks
  4. The ability to detect configuration changes to the file and registry level for those zero-day attacks

As the saying goes — well begun is half done. Get your requirements correct and improve your odds of success.

Ananth