Logging for Incident Response: Part 1 – Preparing the Infrastructure
From all the uses for log data across the spectrum of security, compliance, and operations, using logs for incident response presents a truly universal scenario –you can be forced to use logs for incident response at any moment, whether you’re prepared or not. An incident response (IR) situation is one where having as much log data as possible is critical. You might not use it all, and you might have to work hard to find the proverbial needle in the haystack of logs – still, having reliable log data from all – affected and unaffected – systems is indispensable in a hectic post-incident environment.
The security mantra “prevention-detection-response” still defines most of the activities of today’s security professionals. Each of these three components is known to be of crucial importance to the organization’s security posture. However, unlike detection and prevention, the response is impossible to avoid. While it is not uncommon for an organization to have weak prevention and nearly non-existent detection capabilities, they will often be forced into response mode by attackers or their evil creations – malware. Even in cases where ignoring the incident that happened might be the chosen option, the organization will implicitly follow a response plan, even if it is as ineffective as to do nothing.
In this paper, we will focus on how to “incident-response-proof” your logging – how to prepare your logging infrastructure for incident response. The previous six articles focused on specific regulatory issues, and it is not surprising that many organizations are doing log management just to satisfy compliance mandates. Still, technology and processes implemented for PCI DSS or other external mandates are incredibly useful for other uses such as incident response. On top of this, many of the same regulations prescribe solid incident response practices (for additional discussion see “Incident management in the age of compliance”)
Even though a majority of incidents are still discovered by third parties (seeVerizon Breach Report 2010 and other recent research), it is clear that organizations should still strive to detect incidents in order to limit the damage stemming from extensive, long-term compromises. On the other hand, even for incidents detected by the third parties, the burden of investigation – and thus using logs for figuring out what happened –falls on the organization itself.
We have therefore identified two focal points for use of logs in incident response:
- Detecting incidents
- Investigating incidents
Sometimes the latter use-case is called “forensics” but we will stay away from such definitions since we would rather reserve the term “forensics” for legal processes.
Incident Response Model and Logs
While incidents and incident response will happen whether you want it to or not, a structured incident response process is an effective way to reduce the damage suffered by the organization. The industry-standard SANS incident response model organizes incident response in six distinct stages (see (http://www.sans.org/rr/whitepapers/incident/Incident Management 101 Preparation & Initial Response (aka Identification) By: Robin Dickerson (posted on January 17, 2005)
Preparation includes tasks that need to be done before the incident: from assembling the team, training people, collecting, and building tools, to deploying additional monitoring and creating processes and incident procedures
- Identification starts when the signs of an incident are seen and then confirmed, so that incident is declared
- Containment is important for documenting what is going on, quarantining the affected systems, as well as possibly taking systems offline
- Eradication is preparing to return to normal by evaluating the available backups, and preparing for either restoration or rebuilding of the systems
- Recovery is where everything returns to normal operation
- Follow-Up includes documenting and discussing lessons learned, and reporting the incident to management
Logs are extremely useful, not just for identification and containment as we mention above, but for all phases of incident response process. Specifically, here is how logs are used at each stage of the IR process:
- Preparation: incident response logs help us verify controls (for example, review login success and failure histories), collect normal usage data (learn what log messages show up during routine system activity), and perform a baseline (create log-based metrics that describe such normal activity), etc.
- Identification: logs containing attack traces, other evidence of a successful attack, or insider abuse are pin-pointed, or alerts might be sent to notify about an emerging incident; also, a quick search and review of logs helps to confirm an incident, etc.
- Containment: logs help us scope the damage (for example, firewall logs show which other machines display the same scanning behavior in case of a worm or spyware infestation), and learn what else is lost by looking at logs from other systems that might contain traces similar to the one that is known to be compromised, etc.
- Eradication: while restoring from backups, we need to also make a backup of logs and other evidence: preserving logs for the future is required, especially if there is risk of a lawsuit (even if you don’t plan to sue, the other side might)
- Recovery: logs are used for confirming the restoration and then measures are put in place to increase logging so that we have more data in case it happens again; incident response will be much easier next time
- Follow-Up: apart from summarizing logs for a final report, we might use the incident logs for peaceful purposes: training the new team members, etc.
As a result, the IT infrastructure has to be prepared for incident response logging way before the first signs of an incident are spotted.
Preparing the Infrastructure
In light of predominantly 3rd party incident discovery, the incident response process might need to be activated at any moment when notification of a possible incident arrives. From this point onward, the security team will try to contain the damage and investigate the reason for the attack or abuse based on initial clues. Having logs will allow an organization to respond better and faster!
What logs needs to be collected for effective IR? This is very simple: any and all logs from networks, hosts, applications, and other information systems can be useful during response to an incident. The same applies to context data – information about users, assets, and vulnerabilities will come in handy during the panic of incident response. As we say above, having as much log data as possible will allow your organization to effectively investigate what happened, and have a chance of preventing its recurrence in the future.
Specifically, make sure that the following log sources have logs enabled and centrally collected:
- Network Devices – routers and switches
- Firewalls – including firewall modules in other Network Devices
- IDS, IPS and UTM devices – while firewalls are ubiquitous and can create useful logs, IDS/IPS alerts add to a useful dimension to IR process
- Servers running Windows, Unix and other common operating systems; logging should include all server components such as web servers, email servers, DNS servers, and other server components
- VPN logs are often key to IR since they can reveal who was accessing your systems from remote locations
- Web proxies – these logs are extremely useful for tracking “drive-by downloads” and other web malware attacks
- Database – logs from RDBMS systems contain records indicating access to data as well as changes to database systems
- Applications ranging from large enterprise applications such as SAP to custom and vertical applications specific to a company
Detailed discussion of logging settings on all those systems goes beyond the scope of this paper and might justify not just reading a document, but engaging specialty consultants focused on logging and log management.
Tuning Log Settings for Incident Response
What logs should be enabled on the systems covered above? While “log everything” makes for a good slogan, it also makes log analysis a nightmare by mixing together more relevant log messages with debugging logs which are used much less often, if at all. Still, many logging defaults should be changed as described below.
A typical Unix (Solaris, AIX, etc.) or Linux system will log the following into syslog: various system status and error messages, local and remote login/logout, some program failures, and system start/stop/restart messages. Logs that will not be found will be all logs tracking access to files, running processes, and configuration changes. For example, to log file access on Linux, one needs to use a kernel audit facility, and not simply default syslog.
Similarly, on Windows systems the Event Log will contain a plethora of system status and error messages, login/logout records, account changes, as well as system and component failures. To have more useful data for incident response , one needs to modify the audit policy to start logging access to files and other objects.
Most web servers (such as Apache and Microsoft IIS) will record access to web resources located on a server, as well as access errors. Unlike the OS platforms, there is not a pressing need for more logging, but one can modify the /etc/http/httpd.conf to add logging of additional details, such as referrer and browser type.
Databases such as Oracle and MS SQL Server log painfully little by default, even though the situation is improving in recent database versions such as Oracle 11g. With older databases, you have to assume to have no database logs if you have not enabled them during the incident preparation stage. A typical database will log only major errors, restarts, and some administrator access, but will not log access, or changes to data or database structures.
Firewalls typically log denied or blocked connections, but not the allowed connections by default: as our case study showed, connection allowed logs are one of the most indispensable for incident response. Follow the directions for your firewall to enable such logging.
VPN servers will log connections, user login/logouts, errors; default logging will be generally sufficient. Making sure that successful logins – not just failures- are logged is one of the important preparation tasks for VPN concentrators.
Network IDS and IPS will usually log their alerts, various failures, user access to the sensor itself; the only additional type of “logging” is recording full packet payload.
Implementing Log Management
Log management tools that can collect massive volumes of diverse log data without issues are hugely valuable for incident response. Having a single repository for all activity records, audit logs, alerts, and other log types allows incident responders to quickly assess what was going on during an incident, and what led to a compromise or insider abuse.
After logging is enabled and configured for additional details and additional logged events, the logs have to be collected and managed to be useful for incident response. Even if a periodic log review process is not occurring, the logs have to be available for investigations. Following the maturity curve (see http://chuvakin.blogspot.com/2010/02/logging-log-management-and-log-review.html), even simply having logs is a huge step forward for many organizations.
When organizations start collecting and retaining logs, the question of retention policy comes to the forefront. Some regulations give specific answers: PCI DSS for example, mandates storing logs for one year. However, determining proper log storage for incident response can be more difficult. One year might still be a good rule of thumb for many organizations, since it is likely that investigating incidents more than one year after they happened will be relatively uncommon,but certainly possible – so longer retention periods such as three years may be useful).
In the next paper, we will address how to start reviewing logs for discovering incidents, and also how to review logs during incident response. At this point, we have made a huge step forward by making sure that logs will be around when we really need them!
Even though compliance might compel organizations to enable logging, deploy log management, and even start reviewing logs, incident response scenarios allow the value of logs to truly manifest itself.
However, in order to use logs for incident response, the IT environment has to be prepared – follow the guidance and tips from this paper in order to “IR-proof” your logging infrastructure. A useful resource to jumpstart your incident response log review is “Critical Log Review Checklist for Security Incidents” which can be obtained at http://chuvakin.blogspot.com/2010/03/simple-log-review-checklist-released.html in various formats.
Dr. Anton Chuvakin (http://www.chuvakin.org) is a recognized security expert in the field of log management and PCI DSS compliance. He is an author of books “Security Warrior” and “PCI Compliance” and a contributor to “Know Your Enemy II”, “Information Security Management Handbook”; he is now working on a book about computer logs. Anton has published dozens of papers on log management, correlation, data analysis, PCI DSS, security management (see list www.info-secure.org) . His blog http://www.securitywarrior.org is one of the most popular in the industry.
In addition, Anton teaches classes (including his own SANS class on log management) and presents at many security conferences across the world; he recently addressed audiences in United States, UK, Singapore, Spain, Russia and other countries. He works on emerging security standards and serves on the advisory boards of several security start-ups.
Currently, Anton is building his security consulting practice www.securitywarriorconsulting.com, focusing on logging and PCI DSS compliance for security vendors and Fortune 500 organizations. Dr. Anton Chuvakin was formerly a Director of PCI Compliance Solutions at Qualys. Previously, Anton worked at LogLogic as a Chief Logging Evangelist, tasked with educating the world about the importance of logging for security, compliance and operations. Before LogLogic, Anton was employed by a security vendor in a strategic product management role. Anton earned his Ph.D. degree from Stony Brook University.
Previously on EventSource: Logging for FISMA Part 1 and Part 2