Log Review for Incident Response: Part 2
From all the uses for log data across security, compliance and operations (see, for example, LogTalk: 100 Uses for Log Management #67: Secure Auditing – Solaris), using logs for incident response presents a truly universal scenario: you can be forced to use logs for incident response at any moment, whether you are prepared to or not. As we discussed in a previous newsletter edition, having as much log data as possible in an incident response (“IR” or “IH” for incident handling) situation is critical. You might not use it all, and you might have to work hard to find the proverbial needle in the haystack, but having reliable log data from both affected and unaffected systems is indispensable in a hectic post-incident environment.
In the previous edition, we focused on how to “incident-response-proof” your logging which is how to prepare your logging infrastructure for incident response. In this article, we address how to start reviewing logs for discovering incidents, and how to review logs during an incident response.
Logs play a role at all stages of incident response. They are reviewed under two very different circumstances during incident response process:
- Routine periodic log review – this is how an incident may be discovered;
- Post-incident review – this may happen when initial suspicious activity signs are available, or during a full-blown incident investigation.
If you are looking for a quick answer to log review, the “Simple Incident Log Review Checklist” is available in various formats.
Periodic Log Review: Discover Incidents
The basic principle of periodic log review (referred to as “daily log review” even if it might not be performed daily) is to accomplish the following:
- Detect ongoing intrusions and incidents (monitoring)
- Look for suspicious signs that indicate an impending incident (proactive)
- Find traces of past, unresolved incidents (reactive)
The daily log review is built around the concept of establishing a “baseline”, or learning and documenting the normal set of messages appearing in logs. Baselines are then followed by the process of finding “exceptions” from the normal routine and investigating them to assure that no breach of data has occurred or is imminent.
Build a baseline for each log source type (and sometimes even individual log sources) because it is critical to become familiar with normal activities logged on each of the applications. Initial baselines can be quickly built using the process described below.
In addition to this “event type”, it makes sense to perform a quick assessment of the overlap log entry volume for the past day (past 24 hr period). Significant differences in log volume should also be investigated using the procedures defined below. In particular, loss of logging (often recognized from a dramatic decrease in log entry volume) needs to be investigated and escalated as a security incident.
Building an Initial Baseline
To build a baseline using a log management tool do the following:
- Make sure that relevant logs are aggregated by the log management tool or a SIEM tool – also make sure that the tool can “understand” the logs
- Select a time period for an initial baseline ranging from one week at the low end to, ideally, 90 days
- Run a report that shows counts for each message type. This report indicates all the log types that are encountered over the baseline period of system operation
- Assuming that no breaches of data have been discovered, we can accept the above report as a baseline for “routine operation”
An additional step should be performed while creating a simple baseline: even though we assume that no compromise has taken place, there is a chance that some of the log messages recorded triggered some kind of action or remediation. Such messages are referred to as “known bad” and should be marked as such and not counted as normal baseline. System crashes, intrusion events, and unplanned maintenance are examples of such events.
How does one actually compare today’s batch of logs to a baseline? Two methods are widely used for log review, and the selection can be made based on the available resources and tools used.
The first method only considers log types not observed before and can be done manually as well as with tools. Despite its simplicity, it is extremely effective with many types of logs: simply noticing that a new log message type is produced is typically very insightful for security, compliance and operations.
For example, if log messages with IDs 1,2,3,4,5,6 and 7 are produced every day in large numbers, but a log message with ID 8 is never seen, each occurrence of such a log message is reason for an investigation. If it is confirmed that the message is benign and no action is triggered, it can be later added to the baseline.
So, the summary of comparison methods for daily log review is:
- Basic method:
- Log type not seen before (NEW log message type)
- Advanced methods:
- Log type not seen before (NEW log message type)
- Log type seen more frequently than in baseline
- Log type seen less frequently than in baseline
- Log type not seen before (for particular user)
- Log type not seen before (for particular application module)
- Log type not seen before (on the weekend)
- Log type not seen before (during work day)
- New user activity noted (any log from a user not seen before on the system)
While following the advanced method, other comparison algorithms can be used by the log management tools as well. After the message is flagged as an exception, we move to a different stage in our daily workflow – from daily review to investigation and analysis.
Exception Investigation and Analysis: Incident Response
A message that does not fit the profile of a normal log is flagged “an exception.” It is important to note that an exception is not the same as a security incident, but it might be an early indication that one is taking place. Incident response might start at this stage, but it may also go back to normal.
At this stage, we have an individual log message that is outside of routine/normal operation. The following high-level investigative process is used on each “exception” entry:
- Look at log entries that occurred at the same time: this technique involves looking at an increasing range of time periods around the log message that is being investigated. Most log management products can allow you to review logs or to search all logs within a specific time frame. For example:
- Look at other log messages triggered 1 minute before and 1 minute after the “suspicious” log message
- Now look at other log messages triggered 10 minutes before and 10 minutes after the “suspicious” log message
- Finally look at other log messages triggered 1 hour before and 1 hour after the “suspicious” log message (if needed – the volume of log messages can be significant)
- Look at other entries from the same user: this technique includes looking for other log entries produced by the activities of the same user. It often happens that a particular logged event of a user activity can only be interpreted in the context of other activities of the same user. Most log management products can allow you to “drill down into” or search for a specific user within a specific time frame.
- Look at the same type of entry on other systems: this method covers looking for other log messages of the same type, but on different systems in order to determine its impact. Learning when the same message was produced on other system may hold clues to understanding the impact of this log message.
- Look at entries from the same source (if applicable): this method involves reviewing all other log messages from the network source address (where relevant).
- Look at entries from the same app module (if applicable): this method involves reviewing all other log messages from the same application module or components. While other messages in the same time frame (see item 1. above) may be significant, reviewing all recent logs from the same components typically helps to reveal what is going on.
After following this process, the impact of the logged event on the organization should become more clear and further incident response process steps can be taken. Detailed discussion of incident response practices goes outside the scope of this newsletter.
Even though compliance might compel organizations to enable logging, deploy log management, and even start reviewing logs, an incident response scenario allows the value of logs to truly manifest itself. However, in order to use logs for incident response the organization needs to be prepared – follow the above guidelines and and “IR-proof” your logging infrastructure.
Log review, however, needs to happen on an ongoing basis. Build the baseline and then compare the events to the baseline in order to detect exceptions. Investigate those exception in order to qualify them as incidents, as well as assess their impact on the organization.
Dr. Anton Chuvakin (http://www.chuvakin.org) is a recognized security expert in the field of log management and PCI DSS compliance. He is an author of books “Security Warrior” and “PCI Compliance” and a contributor to “Know Your Enemy II”, “Information Security Management Handbook”; he is now working on a book about computer logs. Anton has published dozens of papers on log management, correlation, data analysis, PCI DSS, security management (see list www.info-secure.org) . His blog http://www.securitywarrior.org is one of the most popular in the industry.
In addition, Anton teaches classes (including his own SANS class on log management) and presents at many security conferences across the world; he recently addressed audiences in United States, UK, Singapore, Spain, Russia and other countries. He works on emerging security standards and serves on the advisory boards of several security start-ups.
Currently, Anton is building his security consulting practice www.securitywarriorconsulting.com, focusing on logging and PCI DSS compliance for security vendors and Fortune 500 organizations. Dr. Anton Chuvakin was formerly a Director of PCI Compliance Solutions at Qualys. Previously, Anton worked at LogLogic as a Chief Logging Evangelist, tasked with educating the world about the importance of logging for security, compliance and operations. Before LogLogic, Anton was employed by a security vendor in a strategic product management role. Anton earned his Ph.D. degree from Stony Brook University.
EVENTTRACKER EXCELS IN UNIX CHALLENGE
In this fifth installment of the Honeynet Challenge of 2010, EventTracker 7.0 was put to the test, and was used by 4 of the top 6 contestants to perform the forensic analysis for this competition — The challenge required participants to discern what had transpired on a virtual server, utilizing all of the logs from this potentially compromised UNIX server.
This challenge was created by the Honeynet Project, an international non-profit organization dedicated to raising the awareness of vulnerabilities and threats that exist on the vast expanse of the worldwide web. This section, called “Log Mysteries” was opened for competition on September 1, 2010, with finalists announced October 26, 2010 by. Contestants were provided the complete logs from a virtual server, and asked to determine the following:
- Was the system compromised and when? How do you know that for sure?
- If the (server) was compromised, what was the method used?
- Can you locate how many attackers failed? If some succeeded, how many were they? How many stopped attacking after the first success?
- What happened after the brute force attack?
- Locate the authentication logs. Was a brute force attack performed? If yes, how many?
- What is the timeline of significant events? How certain are you of the timing?
- Anything else that looks suspicious in the logs? Any misconfigurations? Other issues?
- Was an automatic tool used to perform the attack? If yes which one?
- What can you say about the attacker’s goals and methods?
- Bonus. What would you have done to avoid this attack? (Source www.honeynet.org)
Working independently, 4 of the top 6 came to the same correct answers by importing the logs into EventTracker, and utilizing its standard functionality to quickly and accurately unravel this mystery. EventTracker accurately discovered invalid log- in attempts from existing users, and most importantly, exposed a brute-force attack on the server.
“Security challenges abound regardless of the network architecture. While the vulnerabilities and attack techniques are platform dependent EventTracker excels as a platform for log analysis. The ability to have user defined output and indexed search are especially useful features in such situations” said A.N. Ananth, CEO, Prism Microsystems.
In an EventTracker protected environment, , administrators would have been notified of this attack, or would have been programmed by the organization to take remedial action and protect the IT infrastructure from harm.
In the answer to the final question, one contestant utilizing EventTracker 7.0 for their submission provided the following nine steps for companies to avoid brute force attacks:
- Hide systems running services such as SSH behind a firewall
- Use strong passwords or public-key authentication
- Configure SSH servers to use a non-standard port
- Restrict access to SSH servers
- Utilize Intrusion Detection/Intrusion Prevention (in conjunction with EventTracker 7.0)
- Disable Root Access
- Use ‘iptables’ to block attacks
- Use tcp_wrappers to block attacks
- Use EventTracker 7.0 Reports