Data, data everywhere but not a drop of value

The sailor in The Rime of the Ancient Mariner relates his experiences after long sea voyage when his ship is blown off course:

“Water, water, every where,

And all the boards did shrink;

Water, water, every where,

Nor any drop to drink.”

An albatross appears and leads them out, but is shot by the Mariner and the ship winds up in unknown waters.  His shipmates blame the Mariner and force him to wear the dead albatross around his neck.

Replace water with data, boards with disk space, and drink with value and the lament would apply to the modern IT infrastructure. We are all drowning in data, but not so much in value. “Big data” are datasets that grow so large that managing them with on-hand tools is awkward. They are seen as the next frontier in innovation, competition, and productivity.

Log management is not immune to this trend. As the basic log collection problem (different sources, different protocols and different formats) has been resolved, we’re now collecting even larger datasets of logs. Many years ago we refuted the argument that log data belonged in a RDBMS, precisely because we saw the side problem of efficient data archival begin to overwhelm the true problem of extracting value from the data. As log data volumes continue to explode, that decision continues to be validated.

However, while storing raw logs in a database was not sensible, their power in extracting patterns and value from data is well established. Recognizing this, EventVault Explorer was released in 2011. Users can extract selected datasets to their choice of external RDBMS (a datamart) for fuzzy searching, pivot tables etc.   As was noted here , the key to managing big data is to personalize the results for maximum impact.

As you look under the covers of SIEM technology, pay attention to that albatross called log archives. It can lead you out of trouble, but you don’t want it around your neck.

Top 5 Compliance Mistakes

5.   Overdoing compensating controls

When a legitimate technological or documented business constraint prevents you from satisfying a requirement, a compensating control can be the answer after a risk analysis is performed. Compensating controls are not specifically defined inside PCI, but are instead defined by you (as a self-certifying merchant) or your QSA. It is specifically not an excuse to push PCI Compliance initiatives through completion at a minimal cost to your company. In reality, most compensating controls are actually harder to do and cost more money in the long run than actually fixing or addressing the original issue or vulnerability. See this article for a clear picture on the topic.

4. Separation of duty

Separation of duties is a key concept of internal controls. Increased protection from fraud and errors must be balanced with the increased cost/effort required.   Both PCI DSS Requirements 3.4.1 and 3.5 mention separation of duties as an obligation for organizations, and yet many still do not do it right, usually because they lack staff.

3. Principle of Least privilege

PCI 2.2.3 says they should “configure system security parameters to prevent misuse.” This requires organizations to drill down into user roles to ensure they’re following the rule of least privilege wherever PCI regulations apply.   This is easier said than done; more often it’s “easier” to grant all possible privileges rather than determine and assign just the correct set. Convenience is the enemy of security.

2. Fixating on excluding systems from scope

When you make the process of getting things out of scope a higher priority than addressing real risk, you get in trouble. Risk mitigation must come first and foremost. In far too many cases, out-of-scope becomes out-of-mind. This may make your CFO happy, but a hacker will get past weak security and not care if the system is in scope or not.

And drum roll …

1. Ignoring virtualization

Many organizations have embraced virtualization wholeheartedly given its efficiency gains. In some cases, virtualized machines are now off-premises and co-located at a service provider like Rackspace. This is a trend at federal government facilities.   However, “off-premises” does not mean “off-your-list”. Regardless of the location of the cardholder data, such systems are within scope as are the hypervisor. In fact, PCI DSS 2.0 says, if the cardholder data is present on even one VM, then the entire VM infrastructure is “in scope.”

IT Operations and SIEM Management Drive Business Success

While there are still some who question the ‘relevance’ of IT to the enterprise, and others who question the ‘future’ of IT, those involved in day-to-day business activities recognize and acknowledge that IT operations is integral to business success and this is unlikely to change in the immediate future.  Today’s IT staffer with security incident and event management (SIEM) responsibility must be able not only to detect, identify and respond to anomalies in infrastructure performance and operations, but also build processes, make decisions and take action based on the business impact of the incidents and events recorded in ubiquitous logs.

Since the earliest incarnations of IT infrastructure management, a lot of ingenuity and effort has been applied to detecting, identifying and notifying a responsible party to take action when something occurs that signals a potential problem. Competition, combined with creativity, led to a proliferation of tools able to monitor and alert to problematic events.

Consider how far we’ve come in the process from the old days of manual tracking and analysis. Isn’t the whole process, from detection through analysis to notification and even resolution, now fully automated for nearly all installations?  Aren’t we long past the days when we had to worry about (and avoided) automated management solutions since they likely introduced more problems than they solved? Now, even compliance-related monitoring has been automated. And with the advent of Cloud computing, along with SaaS, PaaS and IaaS, has the user been isolated from the infrastructure underlying service delivery? Doesn’t IT have bigger, more pressing problems to concentrate on than SIEM?

Simply, “no.” While it is true that SIEM has evolved considerably over time, the fact remains that even with more sophisticated, intelligent and automated solutions – there remains the need for IT staff to mine data logs for more information and insight into infrastructure operations, and to understand the impact of that interaction on the delivery of business services and the experiences of the user. IT must be able to identify and inform business staff on risks toSLA and performance commitments. IT must also be able to contribute defining and taking the actions needed to eliminate or reduce those risks.

The increasing complexity of IT operations, resulting from the expanding diversity and distribution of infrastructure combines with an evolving, dynamic integration and interaction in the delivery of business services to escalate service disruptions. This means that avoidance of such disruptions is of increasing importance. The link between the service user and the performance of the infrastructure has become more critical. The need for SIEM and for the IT staff to obtain actionable information from these management solutions is the combination that drives convergence of the management discipline sassociated with Applications Performance Management (APM), Business Process Management(BPM) and Business Service Management (BSM). Once considered and treated as separate areas of expertise, their overlapping interests and interdependencies become apparent. They cannot succeed if treated as organizational silos. Integrated SIEM with their detailed end-to-end data collection and analysis helps to end siloed operations.

For example, BPM assures processes execute precisely and consistently to complete a specific task with all intervening steps. BSM tracks the proper functioning of involved infrastructure at all stages of the service delivery and business process to assure a satisfactory end-user experience. APM optimizes infrastructure utilization and performance.  IT needs to understand the involvement and impact of infrastructure on service delivery – hence they need data from all three functions. This involves monitoring, analyzing and reporting on a staggering number of incidents and events to identify what is significant to initiate appropriate action. This is the environment in which today’s SIEM best solutions demonstrate their value.

Free, entry-level SIEM solutions (such as EventTracker Pulse) provide basic functionality to begin data gathering, analysis and reporting from multiple different sources. Such solutions eliminate error-prone and tedious manual efforts. They can also provide basic application and service management. More feature rich products such as EventTracker Enterprise offer sophisticated functionality like complex analysis and custom reporting of potentially problematic behavior.

High functionality SIEM solutions provide significant opportunity for IT to exercise and demonstrate its ability to contribute to business success. The ability to document that ability is even more necessary as virtualized infrastructures and Cloud proliferate as de facto operating models.

The idea that IT operates to support the overall success of the business, not simply manage infrastructure is no longer a matter of contention. Today, IT is also under increasing pressure to document and demonstrate its contributions to business success.  The difficultly, in an environment filled with competing solutions, comes in deciding just how to do this most effectively without breaking the budget. The answer is found in leveraging cost effective SIEM solutions.

The 5 Most Annoying Terms of 2011

Since every cause needs “Awareness,” here are my picks for management speak to camouflage the bloody obvious:

  5. Events per second

Log Management vendors are still trying to “differentiate” with this tired and meaningless metric as we pointed out in The EPS Myth.

  4. Thought leadership

Mitch McCrimmon describes it best.

  3. Cloud

Now here is a term that means all things to all people.

  2. Does that make sense?

The new “to be honest.” Jerry Weismann discusses it in the Harvard Business Review.

  1. Nerd

During the recent SOPA debate, so many self-described “country boys” wanted to get the “nerds” to explain the issue to them; as Jon Stewart pointed out, the word they were looking for was “expert.”

SIEM and the Appalachian Trail

The Appalachian Trail is a marked hiking trail in the eastern United States extending between Georgia and Maine. It is approximately 2,181 miles long and takes about six months to complete. It is not a particularly difficult journey from start to finish; yet even so, completing the trail requires more from the hiker than just enthusiasm, endurance and will.

Likewise, SIEM implementation can take from one to six months to complete (depending on the level of customization) and like the Trail, appears deceptively simple.   It too, can be filled with challenges that reduce even the most experienced IT manager to despair, and there is no shortage of implementations that have been abandoned or uncompleted.   As with the Trail, SIEM implementation requires thoughtful consideration.

1) The Reasons Why

It doesn’t take too many nights scurrying to find shelter in a lightning storm, or days walking in adverse conditions before a hiker wonders: Why am I doing this again? Similarly, when implementing any IT project, SIEM included, it doesn’t take too many inter-departmental meetings, technical gotchas, or budget discussions before this same question presents itself: Why are we doing this again?

  All too often, we don’t have a compelling answer, or we have forgotten it. If you are considering a half year long backpacking trip through the woods, there is a really good reason for it.   In the same way, one embarks on a SIEM project with specific goals, such as regulatory compliance, IT security improvement or to control operating costs.   Define the answer to this question before you begin the project and refer to it when the implementation appears to be derailing. This is the compass that should guide your way.   Make adjustments as necessary.

2) The Virginia Blues

Daily trials can include anything from broken bones to homesickness, a circumstance that occurs on the Appalachian Trail about four to eight weeks into the journey, within the state lines of Virginia. Getting through requires not just perseverance but also an ability to adapt.

For a SIEM project, staff turnover, false positives, misconfigurations or unplanned explosions of data can potentially derail the project. But pushing harder in the face of distress is a recipe for failure. Step back, remind yourself of the reasons why this project is underway, and look at the problems from a fresh perspective. Can you be flexible? Can you make find new avenues to go around the problems?

  3) A Fresh Perspective

In the beginning, every day is chock full of excitement, every summit view or wild animal encounter is exciting.   But life in the woods will become the routine and exhilaration eventually fades into frustration.

In  much the same way, after the initial thrill of installation and its challenges, the SIEM project devolves into a routine of discipline and daily observation across the infrastructure for signs of something amiss.

This is where boredom can set in, but the best defense against the lull that comes along with the end of the implementation is the expectation of it. The journey’s going to end.   Completing it does not occur when the project is implemented.   Rather, when the installation is done, the real journey and the hard work begins.

Humans in the loop – failsafe or liability?

Among InfoSec and IT staff, there is a lot of behind-the-scenes hand wringing that users are the weakest link.   But are InfoSec staff that much stronger?

While automation is and does have a place, Dan Geer, of CIA-backed venture fund In-Q-Tel, properly notes that while ” …humans can build structures more complex” than they can operate, ” …Are humans in the loop a failsafe or a liability? Is fully automated security to be desired or to be feared?”

We’ve considered this question before at Prism, when “automated remediation” was being heavily touted as a solution for mid-market enterprises, where IT staff is not abundant. We’ve found that human intervention is not just a fail-safe, but a necessity.   The interdependencies, even in medium sized networks are far too complex to automate.   We introduced the feature a couple of years back and in reviewing the usage, concluded that such “automated remediation” does have a role to play in the modern enterprise. Use cases include changes to group membership in Active Directory, unrecognized processes, account creation where the naming convention is not followed or honeypot access. In other words, when the condition can be well defined and narrowly focused, humans in the loop will slow things down. However for every such “rule” there are hundreds more that will be obvious to a human but missed by the narrow rule.

So are humans in the loop a failsafe or a liability? It depends on the scenario.

What’s your thought?

Will the cloud take my job?

Nearly every analyst has made aggressive predictions that outsourcing to the cloud will continue to grow rapidly. It’s clear that servers and applications are migrating to the cloud as fast as possible, but according to an article in The Economist, the tradeoff is efficiency vs. sovereignty.   The White House announced that the federal government will shut down 178 duplicative data centers in 2012, adding to the 195 that will be closed by the end of this year.

Businesses need motivation and capability to recognize business problems, solutions that can improve the enterprise, and ways to implement those solutions.   There is clearly a role for outsourced solutions and it is one that enterprises are embracing.

For an engineer, however, the response to outsourcing can be one of frustration, and concerns about short-sighted decisions by management that focus on short term gains at the risk of long term security. But there is also an argument why in-sourcing isn’t necessarily the better business decision:   a recent Gartner report noted that IT departments often center too much of their attention on technology and not enough on business needs, resulting in a “veritable Tower of Babel, where the language between the IT organization and the business has been confounded, and they no longer understand each other.”

Despite increased migration to cloud services, it does not appear that there is an immediate impact on InfoSec-related jobs.   Among the 12 computer-related job classifications tracked by the Department of Labor’s Bureau of Labor Statistics (BLS), information security analysts, along with computer and information research scientists, were among those whose jobs did not report unemployment during the first two quarters of 2011.

John Reed, executive director at IT staffing firm Robert Half Technology, attributes the high growth to the increasing organizational awareness of the need for security and hands-on IT security teams to ensure appropriate security controls are in place to safeguard digital files and vital electronic infrastructure, as well as respond to computer security breaches and viruses.

Simply put: the facility of using cloud services does not replace the skills needed to analyze and interpret the data to protect the enterprise.   Outsourcing to a cloud may provide immediate efficiencies, but it’s   the IT security staff who deliver business value that ensure long term security.

Threatscape 2012 – Prevent, Detect, Correct

The past year has been a hair-raising series of IT security breakdowns and headlining events reaching as high as RSA itself falling victim to a phishing attack.   But as the year set on 2011, the hacker group Anonymous remained busy, providing a sobering reminder that IT Security can never rest.

It turned out that attackers sent two different targeted phishing e-mails to four workers at its parent company, EMC.   The e-mails contained a malicious attachment that was identified in the subject line as “2011 Recruitment plan.xls” which was the point of attack.

Back to Basics:

Prevent:

Using administrative controls such as security awareness training, technical controls such as firewalls, and anti-virus and IPS, to stop attacks from penetrating the network.   Most industry and government experts agree that security configuration management is probably the best way to ensure the best security configuration allowable, along with automated patch management and updating anti-virus software.

Detect:

Employing a blend of technical controls such as anti-virus, IPS, intrusion detection systems (IDS), system monitoring, file integrity monitoring, change control, log management and incident alerting   can help to track how and when system intrusions are being attempted.

Correct:

Applying operating system upgrades, backup data restore and vulnerability mitigation and other controls to make sure systems are configured correctly and can prevent the irretrievable loss of data.

IT Trends and Comments for 2012

The beginning of a new year marks a time of reflection on the past and anticipation of the future. The result for analysts, pundits and authors is a near irresistible urge to identify important trends in their areas of expertise (real or imagined). I am no exception, so here are my thoughts on what we’ll see in the next year in the areas of application and evolution of Information Technology.

The past few years have been marked by a significant maturity in understanding of the capabilities, demands and expectations of educated consumers in the application of IT in their business and personal lives. The evolution in capability, ease of and ubiquity of availability and access, accelerated dramatically. This resulted from the combination of past trends, industry economics and general IT maturation driving its application into new areas while speeding and facilitating benefit realization.

These effects will continue into 2012 as a result of the following trends:

  1.  Customers buy solutions, not technologies. IT solution providers, regardless of size or product form (hardware, software, services) have become more sensitive and responsive to the needs of their target markets. Business buyers want immediate solutions to their problem with minimal complexity in its application. They do not want ‘tool kits’ or 75 per cent-complete products.  The best and most successful recognize and respond to this demand for comprehensive solutions to their customers’ expectations and demands. The emergence of affordable, fully integrated, modular and comprehensive solutions that address identifiable business and operational problems out-of-the-box will continue and become more competitive as more intelligence and power are embedded in IT solutions. The Prism Microsystems EventTracker product family provides a good example of how vendors are creating solutions in this model. It is true that some solutions will stand and operate on their own. However, an increasingly complex and evolving environment requires that solutions be able to co-exist and interoperate with data, products and services from many sources.
  2. Private, public and hybrid Clouds continue to grow in number and application spreading across all market segments. Service providers and vendors are in race to make Clouds more accessible, secure and functional. Consumers of Cloud services will continue to be even more selective and careful as they choose their providers/supplier/partners. High on their list will be concerns for stability, security and interoperability. The issue of stability tips the preference toward private and hybrid solutions. (We have already seen very public and dramatic failures from big vendor Cloud suppliers; there will be more.) However, a combination of improved architectures and customer interest in achieving very real Cloud/IaaS/PaaS/SaaS financial, operational and competitive benefits will maintain adoption rates. These also drive the following trend.
  3.  Standards and reference architectures will become more important as Clouds (public, private, and hybrid) proliferate. As business and IT consumers pursue the potential benefits of Cloud/IaaS/ PaaS/ SaaS, etc. it is becoming increasingly obvious that the link between applications/services and the underlying infrastructure must be broken. The big advantage, as well as the fundamental challenge is how to assure easy portability and access to any and all Cloud services.  But, this must be done in a way that allows Cloud solution systems to interoperate and co-exist with traditional structures.  You must provide a structure that allows for the creation, publication, access, use and release of assets in all environments. Vendors must cooperate to create multi-vendor standards and architectures to meet these expectations. This is a natural evolution of the pursuit of standards and techniques that disconnect the implementation of a service from its operational underpinnings. The effort goes back to the earliest days of creating machine independent languages (Cobol, Fortran, etc.) and all Open Systems and architectures (e.g. Unix). This new degree of structural dependence is just implemented at a higher level of abstraction.  The Cloud Standards Customer Council acts as an advocacy group for end-users interested in accelerating successful Clouds. They are addressing the standards, security and interoperability issues surrounding the transition to a Cloud operating environment. One example of a service implementation architecture we see as being particularly worthy of note is the OASIS-sponsored Topology and Orchestration Specification for Cloud Applications(TOSCA).
  4. Use of sophisticated analytics as a business and competitive tool spreads far and wide. The application of analytics to data to solve tough business and operational problems will accelerate as vendors compete to make sophisticated analytics engines easier to access and use, more flexible in application and the results easier to understand and implement. IT has provided the rest of the enterprise with mountains of data. The challenge has been in getting useful information and insight. Operations Research, simulation and analytics have been around and in use for decades (even centuries). Yet, their use has been limited to very large companies. Today’s more powerful computers, the ability to collect and process big streams of live data, combined with concentrated efforts by vendors to wrap accessible user interfaces around the analytics will provide tools to a wider audience. The power of IT servers allows the user to avoid underlying complexities and will do more so over time.
  5. Increasingly integrated, intelligent, real-time end-to-end management solutions enable high-end, high-value services. Think of Cisco Prime™ Collaboration Manager which provides proactive monitoring and corrective action based on the potential impact on the end-user. Predictive analysis is applied down to the event level (data logs provide significant insight) – and analytics to identify problem correlation and/or causation. The primary goal is prediction to avoid problems. Identifying correlated events can be as effective as or even more effective than recognizing cause in providing an early warning.  The fact is that while knowledge of causation is necessary for repair, both correlation and causation work for predictive problem avoidance.
  6. APM (Application Performance Management) converges on BPM (Business Process Management). The definition of APM is expanding to include a focus on the end-user to infrastructure performance optimization as a prime motivator for corrective action. Business managers care about infrastructure performance only to the extent it negatively impacts the service experience. They want high quality services guaranteed. BPM focuses on getting processes right so things are done correctly and efficiently.  IT cares about infrastructure, so traditionally, this is where APM has focused. The emphasis will continue shifting toward the consumer, blurring the lines between APM and BPM. BMC provides one example of the impact by adding analytic and functional capabilities to Application Performance Management to speed root cause as well as impact analysis.  Enhanced real-time predictive analytics are specifically used to improve the user’s interactive experience by more quickly alerting IT staff to infrastructure behaviors that can disrupt service delivery.
  7. The impact of the consumerization of IT will continue to become more significant. Consumers of services are increasingly intolerant of making any concessions to the idiosyncrasies of their access devices (iPad, iPod, Smartphone, etc.). They expect a consistent experience regardless of what is used to access data. Such expectations increase the pressure for service, software and platform standards, as well as drive the evolution of device capabilities and design. Previous efforts generally focused on the ‘mechanics’ and ‘ergonomics’ of the interface. Today the focus is increasingly on consistency of access, ‘look-and-feel’ and performance.  One example is the growing interest in and ability to deliver what AppSense is calling ‘user-centric IT’, where the user has consistent access to all of their desktop resources wherever they are and on whatever device or platform they use. Technology will increasingly and automatically detect, adapt to and serve the user. This goes beyond the existing concept of ‘application aware’ devices to one that associates and binds the user with a consistent, cross-platform experience.
  8.  Virtualization – acts as a ‘gateway’ step to the Cloud and fully ‘service’ infrastructure.  Virtualization will continue to be subsumed by Cloud. Virtualization is now recognized as an enabling technology and a necessary building block to Cloud implementations. It is the first step toward achieving a truly adaptive infrastructure that operates with the flexibility, reliability and robustness to respond to the evolving and changing needs of the business and consumer of IT services. Storage, servers and networks have been virtualized. The focus is shifting to providing applications and services as fully virtualized resources. The increasingly complex nature of ever more sophisticated services acts to accelerate and reinforce this trend.

There you have it: eight trends and influences IT will have to deal with in 2012. I expect to be commenting more on these efforts this year. Your comments, questions and discussion around any of these are welcome. I can be reached at rlptak@ptaknoel.com.

Echo Chamber

In the InfoSec industry, there is an abundance of familiar flaws and copycat theories and approaches. We repeat ourselves and recommend the same approaches. But what has really changed in the last year?

The emergence of hacking groups like Anonymous, LulzSec, and TeaMp0isoN.

In 2011, these groups brought the fight to corporate America, crippling firms both small (HBGary Federal) and large (Stratfor, Sony). As the year drew to a close these groups shifted from prank-oriented hacks for laughs (or “lulz”), to aligning themselves with political movements like Occupy Wall Street, and hacking firms like Stratfor, a Austin, Tex.-based security “think tank” that releases a daily newsletter concerning security and intelligence matters all over the world. After HBGary Federal CEO Aaron Barr publicly bragged that he was going to identify some members of the group during a talk in San Francisco at the RSA Conference week, Anonymous members responded by dumping a huge cache of his personal emails and those of other HBGary Federal executives online, eventually leading to Barr’s resignation. Anonymous and LulzSec then spent several months targeting various retailers, public figures and members of the security community. Their Operation AntiSec aimed to expose alleged hypocrisies and sins by members of the security community. They targeted a number of federal contractors, including IRC Federal and Booz Allen Hamilton, exposing personal data in the process. Congress got involved in July when Sen. John McCain urged Senate leaders to form a select committee to address the threat posed by Anonymous/LulzSec/Wikileaks.

The attack on RSA SecurId was another watershed event. The first public news of the compromise came from RSA itself, when it published a blog post explaining that an attacker had been able to gain access to the company’s network through a “sophisticated” attack. Officials said the attacker had compromised some resources related to the RSA SecurID product, which set off major alarm bells throughout the industry. SecurID is used for two-factor authentication by a huge number of large enterprises, including banks, financial services companies, government agencies and defense contractors. Within months of the RSA attack, there were attacks on SecurID customers, including Lockheed Martin, and the current working theory espoused by experts is that the still-unidentified attackers were interested in LM and other RSA customers all along and, having run into trouble compromising them directly, went after the SecurID technology to loop back to the customers.

The specifics of the attack were depressingly mundane (targeted phishing email with a malicious Excel file attached).

Then too, several certificate authorities were compromised throughout the year. Comodo was the first to fall when it was revealed in March that an attacker (apparently an Iranian national) had been able to compromise the CA infrastructure and issue himself a pile of valid certificates for domains belonging to Google, Yahoo, Skype and others. The attacker bragged about his accomplishments in Pastebin posts and later posted evidence of his forged certificate for Mozilla. Later in the year, the same person targeted the Dutch CA DigiNotar. The details of the attack were slightly different, but the end result was the same: he was able to issue himself several hundred valid certificates and this time went after domains owned by, among others, the Central Intelligence Agency. In the end, all of the major browser manufacturers had to revoke trust in the DigiNotar root CA.   The damage to the company was so bad that the Dutch government eventually took it over and later declared it bankrupt. Staggering, isn’t it? A lone attacker not only forced Microsoft, Apple and Mozilla to yank a root CA from their list of trusted roots, but he was also responsible for forcing a certificate authority out of business.

What has changed in our industry? Nothing really. It’s not a question “if” but “when” the attack will arrive on your assets.

Plus ça change, plus c'est la même, I suppose.

Events, Analytics, and End-Users: Changing Performance Management

Changes in end-user behavior and the resulting “consumerization” of IT have contributed to the changing and expanding definition of Application Performance Management (“APM”).   APM can no longer focus just on the application or the optimization of infrastructure against abstract limits; APM must now view performance from the end-user’s access point back across all infrastructure involved in the delivery of the service. Increasingly fickle end-users, are forcing application business-owners to an intimate engagement and collaboration with IT to monitor, assess and manage to assure an optimal end-user experience. This involves aggressive event monitoring at multiple levels, application of sophisticated analytics that will either predict and avoid service disruptions, or quickly determine root causes and appropriate corrective actions.

It also requires that IT operations staff be aware of the quality of the end-users’ experience in real-time.

The End-User

Consumerization has changed end-user expectations and demands in terms of the services they want, the functionality they require, and the interfaces they demand to access applications and services.

The spread of mobile computing has fundamentally raised the dynamics of service delivery, monitoring, and management to a whole new level of complexity. Services are no longer tied to single platforms. The end-to-end pathway to service implementation and delivery generally move across multiple boundaries.   Instead of encompassing just the enterprise data center, it can now include a mix of private, hybrid and public Cloud operations. This impacts every level of monitoring, management and control.

With mobile computing, end-users expect ubiquity of access, ease of use and rapid response through consistent, secure interfaces. They are intolerant of poor performance and have no problems making their dissatisfaction known by changing suppliers.

Dissatisfied customers, switching suppliers do damage to both the enterprise revenue streams and the careers of Application/Service owners. When the underlying performance problems are linked to and identified with IT infrastructure operations, the resulting conflict benefits neither the business managers nor IT.

End-users have not been completely ignored. There exist multiple products and services that endeavor to monitor and inform business managers as well as IT operations about the end-user experience. The experience to-date remains less than overwhelming. We hear repeatedly how IT only becomes aware of a serious service disruption when distraught end-users overwhelm the Service Desk with complaints.

This is because APM has traditionally focused on monitoring, analyzing and managing the IT infrastructure. The most time and effort has been invested on keeping the infrastructure operating at ‘optimal’ levels of performance. Unfortunately, the performance metrics frequently focused on the technology and physiology of the infrastructure not the business.

Efforts to change this have been made; Business Service Management (BSM) being one wide ranging area that has focused on efforts to link infrastructure operations to business results. But, the results have been less than satisfactory. There were a number of reasons for this failure.

One reason was the complexity in keeping track of the dependencies between infrastructure, applications and operations in service delivery. Another was that the infrastructure had to achieve a level of automated self-management that allowed it to automatically adapt to resolve problems in a timely manner.

The Rise of Analytics

Although the use of analytics to get business advantage dates back to the 1800’s, their full potential had not been widely exploited. The mathematics required to understand how to apply them slowed adoption. Also contributing was the amount of computational power and time needed to get results. The complexity involved in structuring the analysis and computation limited the areas where analytics were being effectively and economically applied. Data mining has been successfully used but effective application of analytics lagged except in specialized areas like financial analysis.

Things are changing rapidly both due to the increase in available compute power and as vendors begin to provide multiple, easily accessible ways to apply analytics. Analytics can be accessed as ‘Analytics as a Service’ (like Google Analytics). Vendors are beginning to offer embedded analytics to evaluate data as it is collected. Vendors are embedding and automating the use of predictive analytics in their monitoring solutions. The challenges of working across multiple, different types of data sets from different sources are being addressed. This will make the work of analyzing data easier and improve the quality of the results.

The Importance of ‘Events’

Monitoring and tracking begin and end with significant ‘events’. One of the most important results of more processing power and sophisticated means that more collected data can be exhaustively and productively processed to yield important information in less time.

Effective APM requires identifying the changes in data that either correlate with or will cause a significant disruption in service delivery, then making sure that appropriate action whether it is ‘break, fix, repair’ remedy or a work-around avoidance of the problem device or element is identified and implemented.   It all starts with capturing and working with the data.

Cloud: Observations and Recommendations

The commercialization of Cloud-based IT services, along with market and economic challenges are changing the way business services are conceived, created, delivered and consumed. This change is reflected in the growing interest in alternative delivery models and solutions.

Both providers and consumers of IT products and services demand more flexibility and choice in how they access, use and pay for technology. Cloud implementations are growing whether private (in-house), public (outsourced services ala Amazon or Google) or hybrid combinations of these two models. Initially, private models are dominating but the expectation is that flexible, option-rich hybrid models for services, capacity and features will become the new norm. Whichever model dominates Clouds implementations directly impact the design and delivery of new products, solutions and services. The growing popularity of demand-based services and usage-based pricing models result from efforts by customers and businesses to improve operational efficiency through service consolidation and cost optimization. Customer Relations Management (CRM) and Payroll/ Financial services are two long-running examples of beneficial Cloud-based services. Microsoft’s Office 365, IBM’s SmartCloud Foundation family, HP’s Cloud service plans, along with many others – all targeting mid-range, as well as high-end enterprises are testaments to the potential revenue that vendors anticipate from Cloud-building and service delivery. The benefits already realized by all sizes of businesses using cloud-based services, as well as the enterprises providing Cloud-based services are driving IT and business staff to apply the model in their own environments.

What does this mean for the mid-market?

Just as in large enterprises, mid-market IT and business staff want integrated solutions with clear paths to additional capabilities and increased capacities to meet their changing needs. For many, on-premises solutions and in-house services are preferred. Well-publicized Cloud service problems underscore existing challenges in security, stability and availability. But increasingly, on-demand and Cloud-based solutions are becoming attractive alternatives at appealing price points for the mid-market.

Prudence dictates that on-demand solutions (SaaS[1], IaaS[2], PaaS[3], etc.) be evaluated as business cases — with due attention paid to their ability to expand to meet long-term needs. Existing trends among IT service consumption favor consolidation and integration of IT and business services. This provides a significant opportunity for both creating and consuming on-demand solutions. Increasingly powerful tools allow accurate cost-benefit analysis to help buyers select the proper delivery and pricing models to meet their needs.

What’s the issue?

Whether evaluating the purchase of public Cloud services or implementing a private Cloud, IT and business staff should leverage the capabilities of experienced partners. Today, vendors including the Big 4, as well as many other firms offer prescriptive consults as part of the decision-making process. There are many partners to choose from, but even the largest service providers can suffer from delivery problems.

What do I want in a partner?

There are important criteria to consider in selecting a Cloud partner. More and more vendors are offering products and services tailored to the mid-market. These include pre-packaged combinations of hardware, software and services to implement a Cloud environment; service and platform offerings can sound remarkably similar and you need to have some metrics to evaluate the providers and their services. These should include:

  1. Does their experience fit what you are trying to do? — there are plenty of stories about success in implementing and delivering Cloud environments —make sure that the potential partner has experience that prepares them for what you want to achieve.
  2. Cloud is an evolutionary path, not “rip and replace” — most firms can’t afford the risk and expense of a total infrastructure rebuild. Transforming IT operations into a service oriented Cloud requires learning and change. For most, the far more reasonable approach is to treat transformation as an evolutionary project that takes into account the existing IT environment.
  3. Cloud-related skills and expertise in IT are still a scarce commodity — today’s typical IT staff are lacking in Cloud expertise — the partner should be able to tailor the pace, scale and scope of implementation to your needs and ability to absorb change. This requires them to have the insight, as well as flexibility to recognize the maturity level of your business and IT organizations.
  4. Cloud-usage patterns — as for any technology; there exist common ways and reasons why Cloud implementations are successful, e.g. optimizing utilization and cost of the IT infrastructure supporting business. A usage pattern will pose consistent challenges in implementation and execution that can be minimized or avoided with experience. A partner that understands and has experience with your usage pattern will help minimize implementation risks.

The other traditional financial, legal and experiential rules and guidelines for choosing partners still apply. There is no absolute guarantee that all risk can be eliminated, but these will help to minimize it

[1] Software as a Service — on-demand access to specific types of services i.e. Salesforce.com

[2] Infrastructure as a Service — on-demand scale-up and —down in response to capacity demands

[3] Platform as a Service — on-demand access to complete environment consisting of a defined hardware platform and software stack combination e.g. a complete environment development through deployment for developing and delivering web applications

Taxonomy of a Cyber Attack

Cyber Attack

New Bill Promises to Penalize Companies for Security Breaches

On September 22, the Senate Judiciary Committee approved and passed Sen. Richard Blumenthal’s (D, Conn.) bill, the “Personal Data Protection and Breach Accountability Act of 2011,” sending it to the Senate floor. The bill will penalize companies for online data breaches and was introduced on the heels of several high profile security breaches and hacks that affected millions of consumers. These included the Sony breach which compromised the data of 77 million customers, and the DigiNotar breach which resulted in 300,000 Google GMail account holders having their mail hacked and read. The measure addresses companies that hold the personal information of more than 10,000 customers and requires them to put privacy and security programs in place to protect the information, and to respond quickly in the event of a security failure.

The bill proposes that companies be fined $5,000 per day per violation, with a maximum of $20 million per infringement. Additionally, companies who fail to comply with the data protection law (if it is passed) may be required to pay for credit monitoring services and subject to civil litigation by the affected consumers. The bill also aims to increase criminal penalties for identity theft, as well as crimes including the installing of a data collection program on someone’s computer and concealing any security breached in which personal data is compromised.

Key provisions in the bill include a process to help companies establish appropriate minimum security standards, notifications requirements, information sharing after a breach and company accountability.

While the intent of the bill is admirable, the problem is not a lack of laws to deter breaches, but the insufficient enforcement of these laws. Many of the requirements espoused in this new legislation already exist in many different forms.

SANS is the largest source for information security training and security certification, and their position is that we don’t need an extension to the Federal Information Security Management Act of 2002 (FISMA) or other compliance regulations, which have essentially encouraged a checkbox mentality: “I checked it off, so we are good.” This is the wrong approach to security but companies get rewarded for checking off criteria lists. Compliance regulations do not drive improvement. Organizations need to focus on the actual costs that can occur by not being compliant:

  • Loss of consumer confidence: Consumers will think twice before they hand over their personal data to an organization perceived to be careless with that information which can lead to a direct hit in sales.
  • Increased costs of doing business as with PCI-DSS: PCI-DSS is one example where enforcement is prevalent, and the penalties can be stringent. Merchants who do not maintain compliance are subject to higher rates charged by VISA, MasterCard, etc.
  • Negative press: One need only look at the recent data breaches to consider the continuing negative impact on the compromised company’s brand and reputation. In one case (DigiNotar), the company folded.

The gap does not exist in the laws, but rather, in the enforcement of those laws. Until there is enforcement any legislation or requirements are hollow threats.

New in EventTracker 7.2

Getting from ‘Log Data’ to ‘Actionable Information’

Those in IT operations responsible for service delivery or infrastructure operations know what it’s like:   collect and store a growing amount of the data that is necessary to do our jobs, but at a rate that drives up cost.

However, the problem with infinite detail is not much different than trying to organize and analyze noise; there’s plenty of it, but finding the signal underneath is the difficult, but critical point.   Take event and audit logs for example:   such data is important and valuable only to the extent it helps to simplify problem resolution, avoid service mishaps and keep operations running smoothly. All too often IT staff gets reports consisting of siloed information that requires significant further interpretation to glean any actionable information.   Effectively, we are drowning in data and searching for intelligence.

What’s the issue?

Most IT departments have logs filled with data about events that are intended to help identify, alert to, avoid or help remediate a problem. Typically, log data is unstructured, developer shorthand that follows no specific or consistent standard in style, format, delineation or content. The generated reports typically sort or group according to specific data fields or event classification with some summarization. The net result is that the task of teasing out information for decision making can be a painful, tedious and drawn-out process; the results of which may not justify the effort.

All of this is to support three goals — inform, describe and predict. First, it’s to inform IT about what is happening — in a timely manner, with the intent to detect and avoid service disruptions and performance problems. An example is an alert to the fact that inquiry response times are trending toward a length that, if continued, will cause a violation of a Service Level Agreement (SLA).   Second, it’s to describe what has happened — to analyze data to uncover what went wrong that caused the event and to gain insight into what can be done to avoid this problem in the future. An example is identifying that a fan failure led a component to over-heat, function erratically and disrupt a service.   Finally, it’s to identify anomalies that predict a failure — an alert to take action to avoid the problem, such as identifying a device that has not had a recent patch update applied. Without the patch the device will likely fail and cause an operational problem.

The goal is get actionable information in enough time to remedy the anomaly and assure reliable, efficient and effective operations. Too often the reports from a packaged log analysis are barely comprehensible, only partially analyzed and with no actionable information included.

What to do?

What can a mid-market IT person do to resolve this? It is possible to get more integrated and comprehensive analysis and reporting of log data without purchasing an expensive, high-end, data analysis tool targeted at large enterprise. Make sure you understand the data available in your logs and the analysis that is done, and then identify what is missing from reports, what additional analysis you want and how you want it presented. Finally, prioritize your requirements list.

Look for an integrated log management solution with analysis, as well as pre-packaged and custom reporting capabilities built-in; this allows report modification and creation to get actionable insights. You want a solution that integrates multiple functions, like one that takes a comprehensive approach to management, so it will include automated relationship discovery, log review and analysis, behavior analysis, event correlation, anomaly and change detection supported by a strong analytics engine.

Understanding the results is critical, so the solution should include customizable, pre-packaged reports that are understandable and suggest ‘next steps’, using built-in best practices. All this must be easy to use and manage by IT staff, who are not always analytics experts, but who understand what information is useful to them.

There are lots of solutions that perform some of these functions. There are also expensive, sophisticated solutions that provide all of them, but require large skilled operations staff. The tough-part is finding a solution meeting these requirements, but designed and priced for the mid-market.

The SIEM and log management solutions available on the market range in the breadth of their capabilities, and extend from sophisticated solutions requiring a large skilled staff to be able to write scripts to get the desired results, all the way to basic solutions that simply collects logs and allows the user to search through the data to glean the information they want, to everything in between.   By carefully defining the exact requirements and results that are needed for your organization, you can be sure that you have a solution that will satisfy your requirements and is in the appropriate price range.   The common outcome of a SIEM and log management solution that is too complicated to use or doesn’t do what the organization needs is an expensive, idle device.

Publication Date: October 18, 2011

This document is subject to copyright.   No part of this publication may be reproduced by any method whatsoever without the prior written consent of Ptak Noel & Associates LLC.

To obtain reprint rights contact associates@ptaknoel.com

All trademarks are the property of their respective owners.

While every care has been taken during the preparation of this document to ensure accurate information, the publishers cannot accept responsibility for any errors or omissions.   Hyperlinks included in this paper were available at publication time.

About Ptak, Noel & Associates LLC

We help IT organizations become “solution initiators” in using IT management technology to business problems.   We do that by translating vendor strategy & deliverables into a business context that is communicable and actionable by the IT manager, and by helping our clients understand how other IT organizations are effectively implementing solutions with their business counterparts.   Our customers recognize the meaningful breadth and objectively of our research in IT management technology and process

Top 10 Pitfalls of Implementing IT Projects

It’s a dirty secret, many IT projects fail; maybe even as many as 30% of all IT projects.

Amazing, given the time, money and mojo spent on them, and the seriously smart people working in IT.

As a vendor, it is painful to see this. We see it from time to time (often helplessly from the sidelines), we think about it a lot, we’d like to see eliminated along with malaria, cancer and other “nasties.”

They fail for a lot of reasons, many of them unrelated to software.

At EventTracker we’ve helped save a number of nearly-failed implementations, and we have noticed some consistency of why they fail.

From the home office in Columbia MD, here are the top 10 reasons IT projects fail:

10. “It has to be perfect”

This is the “if you don’t do it right, don’t do it at all” belief system. With this viewpoint, the project lead person believes that the solution must perfectly fit existing or new business processes. The result is a massive, overly complicated implementation that is extremely expensive. By the time it’s all done, the business environment has changed and an enormous investment is wasted.

Lesson: Value does not mean perfection. Make sure the solution delivers value early and often, and let perfection happen as it may.

9. Doesn’t integrate with other systems

In almost every IT shop, “seamless integration with everything” is the mantra. Vendors tout it, management believes it, and users demand it. In other words to be all things to all people, IT project cannot exist in isolation. Integration has become a key component of many IT projects and it can’t exist alone anymore.

Lesson: Examine your needs for integration before you start the project. Find out if there are pre-built tools to accomplish this. Plan accordingly if they aren’t.

8. No one is in charge, everyone is in charge

This is the classic “committee” problem. The CIO or IT Manager decides the company needs an IT solution, so they assign the task of getting it done to a group. No one is accountable, no one is in charge. So they deliberate and discuss forever. Nothing gets done, and when it does, no one makes sure it gets driven into the organization. Failure is imminent.

Lesson: Make sure someone is accountable in the organization for success. If you are using a contractor, give that contractor enough power to make it happen.

7. The person who championed the IT solution quits, goes on vacation, or loses interest

This is a tough problem to foresee because employees don’t usually broadcast their departure or disinterest before bailing. The bottom line is that if the project lead leaves, the project will suffer. It might kill the project if no one else is up to speed. It’s a risk that should be taken seriously.

Lesson: Make sure that more than just one person is involved, and keep a new interim project manager shadowing and up-to-date.

6. Drive-by management

IT projects are often as much about people and processes as it is about technology. If the project doesn’t have consistent management support, the project will fail. After all, if no one knows how or why to use the solution, no one will

Lesson: Make sure you and your team have allocated time to define, test, and use your new solution as it is rolled out.

5. No one thought it through

One day someone realized, “hey we need a good solution to address the compliance regulations and these security gaps.” The next day someone started looking at packages, and a month later you buy one. Then you realized that there were a lot of things this solution affects, including core systems, router, applications and operations processes. But you’re way too far down the road on a package and have spent too much money to switch to something else. So you keep investing until you realize you are dumping money down a hole. It’s a bad place to be.

Lesson: Make sure you think it all through before you buy. Get support. Get input. Then take the plunge. You’ll be glad you did.

4. Requirements are not defined

In this all-too-common example, half way through a complex project, someone says “we actually want to rework our processes to fit X.” The project guys look at what they have done, realize it won’t work, and completely redesign the system. It takes 3 months. The project goes over budget. The key stakeholder says “hey this project is expensive, and we’ve seen nothing of value.” The budget vanishes. The project ends.

Lesson: Make sure you know what you want before you start building it. If you don’t know, build the pieces you do, then build the rest later. Don’t build what you don’t understand.

3. Processes are not defined

This relates to #4 above. Sometimes requirements are defined, but they don’t match good processes, because these processes don’t exist. Or no one follows them. Or they are outdated. Or not well understood. The point is that the solution is computer software: it does exactly what you tell it the same way every time, and it’s expensive to change it. Sloppy processes are impossible to create in software making the solution more of a hindrance than a help.

Lesson: Only implement and automate processes that are well understood and followed. If they are not well understood, implement them in a minimal way and do not automate until they are well understood and followed.

2. People don’t buy in

Any solution with no users is a very lonely piece of software. It’s also a very expensive use of 500Mb on your server. Most IT projects fail because they just aren’t used by anyone. They are a giant database of old information and spotty data.  That’s a failure.

Lesson: Focus on end user adoption. Buy training. Talk about the value that it brings your customers, your employees, and your shareholders. Make usage a part of your employee review process. Incentivize usage. Make it make sense to use it.

1. Key value is not defined

This is by far the most prevalent problem in implementing IT solutions: Businesses don’t take time to define what they want out of their implementation, so it doesn’t do what they want. This goes further than just defining requirements. It’s about defining what value the new software will deliver for the business. By focusing on the nuts and bolts, the business doesn’t figure out what they want from the system as a whole.

Lesson: Instead of starting with “hey I need something to accomplish X,” the organization should be asking “how can this software help us bring value to our security posture, to our internal costs, to our compliance requirements.”

This list is not exhaustive – there are many more ways to kill your implementation. However if your organization is aware of the pitfalls listed above, you have a very high chance of success.

A.N. Ananth

Always Enable Auditing – Even for Logs and Systems You Don’t Actively Review

I have two rules of thumb when it comes to audit logging:   first, if it has a log, enable it. Second, if you can collect the log and archive it with your log management/SIEM solution, do it – even if you don’t set up any alert rules or reports.

There is value in these rules; you really never know when audit log data will be valuable in a forensics situation and there is no way to reconstruct log data after the fact. The most obscure or seemingly unimportant systems and applications may end up being crucial to determining the extent or root vector of an intrusion, and can be critical to building a case during insider threat investigations. Security events that aren’t necessarily monitored may also be valuable in providing documentation certain actions were performed as part of a compliance related matter.   Standard audit recommendations for Active Directory should include auditing and archiving events related to the disabling and deletion of accounts and group membership removals.   Generally these events do not need to be reviewed; but when an audit is underway, these can be used to document that access was adjusted in response to changes in employee status and role.

Finally, even if audit logs are not reviewed, they can serve as a powerful deterrent against malicious actions and policy violations by end users and administrators.   If they know that their actions are being recorded, there is less likelihood that they will commit an illegal action.   Audit logs need to be collected and archived to a system separate from the people being monitored to prevent deletion or tampering by administrators with access to the logs.

Common logs should be enabled and added to the log management process for collection and archiving.   Obvious candidates include domain controller security logs, but don’t ignore appropriate auditing on member servers and securely archiving them with DC logs as well.   Many of the most critical security events on the network are only caught by the member servers.   This would include attempts to break into member servers with local accounts, security configuration changes, programs executed and files accessed.   There is a wealth of security information in the workstation security logs; it may be a challenge to collect and archive so many logs and the amount of data, but setting maximum log size to 300MB helps to ensure that the audit data will be there when it is needed even if they aren’t centrally managing them.

Other than the Windows security logs, enable DHCP server logging because many other security logs list the IP address but no other information to identity to identify the client.   If DHCP logs are enabled, they can look up the IP address and date and time against the lease events to figure out the MAC address and the computer name of the system with that particular IP address at the time of the event.   If RRAS (Routing and Remote Access Server) is used, be aware that it has client authentication and connecting logging capabilities which record important events not sent to the Windows security log.   This is also true for IIS[JOHN:   what does IIS stand for?].   IIS can log every incoming request to any web based application hosted including URL, verb, result code, client IP address and more.   This is critical for tracking down attacks against the website and other web based applications.   More and more companies are beginning to store and process security and compliance critical information in SharePoint and it has an audit log capability.   Exchange 2010 has a new non-owner mailbox activity log and a new administrator audit log.   VMWare has an audit capability that is critical to ensuring accountability over the virtualization infrastructure.   SQL Server 2008 has a new audit log too.   Even Microsoft Dynamics CRM 2010 has an audit capability.

It seems like every software vendor is seeing the need to provide audit trail capability which is good news for information security but it also means that these new audit features need to be turned on. There’s also a good reason to enable auditing even if the logs aren’t being regularly reviewed or monitored.   If at all possible get these logs out of the systems and applications where they are generated and into the separate and protected log management/SIEM solution so that the integrity of these audit trails in ensured.

For Immediate Release

Prism Microsystems Unveils EventTracker DriveShield for Preventing “WikiLeaks” and Enhancing Monitoring of USB and Writable Media

EventTracker DriveShield monitors USB, CD/DVD-W

Columbia, MD, August 30, 2011 — Prism Microsystems, a leading provider of comprehensive security and compliance software for the US Department of Defense (DoD) and US Federal Government agencies, today announced the release of EventTracker DriveShield, an easy-to-deploy solution designed to provide visibility to files copied to USB devices or burned to CD/DVD-W drives.

Designed in response to “WikiLeaks” and similar data breach incidents and the processes that were implemented afterwards, EventTracker DriveShield protects important information by monitoring users with access to sensitive data — while alerting preventing and reporting, improper transfers to writable media.

EventTracker DriveShield monitors USB devices and writable media that are inserted or removed, as well as change activities for these devices, including any file adds, modifications, deletions, or copies that are made, recording the time and date, location, and the user name.   This information is then available in real time alerts, and in dashboards and summary or detailed reporting from the console.   EventTracker DriveShield can also disable USB devices based on white listing the serial numbers.

This solution enables the DoD and Federal civilian agencies to utilize the convenience, portability and storage capacity of removable media, while complying with reporting, security and compliance requirements.

“What good is technology if you can’t use it safely and efficiently?   Many government organizations have banned the use of these types of portable storage technologies and are going back to paper to print and distribute manuals and training materials. This is unfortunate, costly and inefficient. EventTracker DriveShield allows the safe use of writable media USB and CD/DVD and while actually improving security,” said Prism Microsystems CEO A.N. Ananth.

EventTracker DriveShield available through GSA resellers holds the following certifications:

Common Criteria EAL-2

US Army Certificate of Networthiness

FDCC SCAP

FIPS 140-2

About Prism Microsystems

Prism Microsystems delivers business critical solutions that transform high-volume cryptic log data into actionable, prioritized intelligence that will fundamentally change your perception of the utility, value and organizational potential inherent in log files. Prism’s leading solutions offer Security Information and Event Management (SIEM), real-time Log Management, and powerful Change and Configuration Management to optimize IT operations, detect and deter costly security breaches, and comply with multiple regulatory mandates. Visit www.eventtracker.com for more information.

Press Inquiries:
Joanne Hogue
(410) 658-8246
joannePR@prismmicrosys.com

Why are Workstation Security Logs so Important?

No one needs to be convinced that monitoring Domain Controller security logs is important; member servers are equally as important: most people understand that member servers are where “our data” is located. But I often face an uphill battle helping people understand why workstation security logs are so critical.

Frequently I hear IT administrators tell me they have policies that forbid the of storing confidential information locally. But the truth is, workstations and laptops always have sensitive information on them – there’s no way to prevent it. Besides applications like Outlook, Offline Files and SharePoint workspace that cache server information locally, there’s also the page file, which can contain content from any document or other information at any time.

But even if there were no confidential information on workstations, their security logs are still very important to an enterprise audit trail, forensics and for detecting advanced persistent threats. There’s a wealth of audit trail information logged by workstations that can’t be found anywhere else and consequently a host of questions that can only be answered by workstation logs.

First of all, if you care about when a user logged off, this information can only be found in the workstation’s security log. Domain controllers audit the initial authentication during logon to a workstation but essentially forget about the user thereafter. Logoff times cannot be determined based on shared folder connections because Windows does not keep network logons open to file servers between file accesses. So, during logout from Windows, the only computer on the network that logs this is the workstation.

And what about logon failures? Yes, domain controllers do log authentication failures but the events logged are tied to Kerberos and the error codes are based on RFC 1510’s failure codes. Kerberos failure codes are not as granular and do not map directly to all the reasons a logon can fail in Windows. Therefore some authentication failure codes provided in event IDs 4768 and 4771 can mean any one of several possible reasons. For instance, failure code 0×12 which Kerberos defines as “Clients credentials have been revoked” can mean that the logon failed due to the account being disabled, locked out or outside of authorized logon hours.

With today’s focus by bad guys on the endpoint it’s also important to know if someone is trying to break into a workstation. If the attacker is breaking into the workstation with a domain account, the evidence of this can be found in the domain controller security logs by looking for the same to events mentioned above. However workstations also have local accounts and these are big targets for attackers since local accounts are often poorly secured and tend to fly under the radar in terms of security monitoring. When you attempt to logon to a workstation with a local account, this activity is not logged on the domain controller. Since the authentication is being handled locally the event is also logged locally in the form of event ID 4776. The event description can be confusing; it reads: “The domain controller failed to validate the credentials for an account.” What it should actually say is “the system” instead of “domain controller” because this event is logged on all types of computers.

Files that a user accesses on file servers with file system auditing on those servers can be tracked, but an audit trail of what programs were being executed by the user is logged on workstation security logs. During forensic investigations I’ve found that knowing what programs a user ran and for how long can be crucial to documenting what actually occurred. Furthermore, many of the advanced persistent threat (APT) attacks being waged depend on malicious executables run on end user workstations. You can’t afford not to have a record of what’s running on your endpoints. Unfortunately, the Process Tracking (aka Detailed Tracking) category only logs programs run on the local computer therefore no information about end user desktop program usage is available in domain controller or member server security logs. The only process events logged on servers are the actual server programs that execute there.

There’s a lot of critical audit trail information only available if auditing is enabled on the workstations. Of course enabling auditing on workstations is one thing while collecting logs from thousands of additional computers is another. There are some very important workstation security events which Windows auditing does not record. For instance, Windows does not audit when devices or removable storage like flash drives are connected or disconnected, and it does not record what files are transferred to or from the removable storage, nor does Windows audit the installation of software.

Workstations are really just as important as any other component of a secure network. If an attacker can compromise the workstation of a user with access to critical information the attacker can impersonate that user and access any information or applications that user has access to on the network. Even workstations of users without access to sensitive resources are important because attackers, especially in APT, scenarios are happy to start with any endpoint as beach head and attack other systems from there. Moreover workstations are arguably the most vulnerable components of your network since they process so much content from the Internet connected with web browsing and email, because they come into contact with potentially infected files on removable storage and because they connect to other insecure networks like Wi-Fi hotspots.

In a webinar I will present later this year in cooperation with Prism Microsystems I’ll delve more deeply into these issues and how to address them. It’s important that we educate decision makers about why endpoint security and audit logs from endpoints are so important. We have to get beyond the mainframe inspired mindset that security only matters on the centralized systems where critical data resides. Be sure to register for this event and invite your manager.

How do retailers follow PCI DSS Compliance?

Security and Compliance At Talbot’s

Talbots is a leading multi-channel retailer and direct marketer of women’s apparel, shoes and accessories, based in Tampa, Florida. Talbots is well known for it’s stellar reputation in classic fashion. Everyone knows to look to Talbots when it is time to buy the perfect jacket or a timeless skirt.

Talbots customers are women in the 35+ population that shop at their 568 stores in 47 states, catalogs and online at www.talbots.com. Approximate sales for Talbots in 2010 were $991 million.

With its multi-pronged approach to reaching its customers, Talbots must be constantly vigilant in maintaining its records, having access to reports at any time and making sure it is up-to-date with its Payment Card Industry Data Security Standard (PCI DSS compliance). PCI DSS is a set of requirements designed to ensure that companies that process, store or transmit credit card information maintain a secure environment for the information throughout the transaction process. It was the pursuit of the highest-level of PCI DSS compliance that brought EventTracker to Talbots’ attention.

For the full Talbot’s case study click here.

The Key Difference between “Account Logon” and “Logon/Logoff” Events in the Windows Security Log

An area of audit logging that is often confusing is the difference between two categories in the Windows security log: Account Logon events and Logon/Logoff events.  These two categories are related but distinct, and the similarity in the naming convention contributes to the confusion. That being said, what is the difference between authentication and logon?  In Windows, when you access the computer in front of you or any other Windows computer on the network, you must first authenticate and obtain a logon session for that computer. A logon session has a beginning and end. An Account Logon event  is simply an authentication event, and is a point in time event.  Are authentication events a duplicate of logon events?  No: the reason is because authentication may take place on a different computer than the one into which you are logging.

Workstation Logons

Let’s start with the simplest case.  You are logging onto at the console (aka “interactive logon”) of a standalone workstation (meaning it is not a member of any domain).  The only type of account you can logon with in this case is a local user account defined in Computer Management \ Local Users and Groups.  You don’t hear the term much anymore but local accounts and SAM accounts are the same thing.  In this case both the authentication and logon occur on the very same computer because you logged on to the local computer using a local account.  Therefore you will see both an Account Logon event (680/4776 [1]) and a Logon/Logoff (528/4624) event in its security log.

If the workstation is a member of a domain, at this point it’s possible to authenticate to this computer using a local account or a domain account – or  a domain account from any domain that this domain trusts. When the user logs on with a domain account, since the user specifies a domain account, the local workstation can’t perform the authentication because the account and its password hash aren’t stored locally.  So the workstation must request authentication from a domain controller via Kerberos.  An authentication event (672/4768) is logged on which ever domain controller handles the authentication request from the workstation.  Once the domain controller tells the workstation that the user is authenticated, the workstation proceeds with creating the logon session and a records a logon event (528/4624) in its security log.

What if we logon to the workstation with an account from a trusted domain?  In that case one of the domain controllers in the trusted domain will handle the authentication and log 672/4768 there, with the workstation logging 528/4624 the same as above.

In all such “interactive logons”, during logoff, the workstation will record a “logoff initiated” event (551/4647) followed by the actual logoff event (538/4634).  You can correlate logon and logoff events by Logon ID which is a hexadecimal code that identifies that particular logon session.

Accessing Member Servers

After logging on to a workstation you can typically re-connect to shared folders on a file server.  What gets logged in this case?  Remember, whenever you access a Windows computer you must obtain a logon session – in this case a “network logon” session.  You might assume that the logon session begins when you connect to the share and then ends when you disconnect from it – usually when logging off your local workstation.  Unfortunately this is not the case: Windows servers only keep network logon sessions alive for as long as you have a file open on the server.  This accounts repeated logon/logoff events on Windows file servers by the same user throughout the course of the day.  With network logons, Windows 2003 logs 540 instead of 528 while Windows 2008 logs 4624 for all types of logons.

When you logon at the console of the server the events logged are the same as those with interactive logons at the workstation as described above.  More often though, you logon to a member server via Remote Desktop.  In this case the same 528/4624 event is logged but the logon type indicates a “remote interactive” (aka Remote Desktop) logon.  I’ll explain logon types next.

When looking at logon events we need to consider what type of logon are we dealing with: is this an interactive logon at the console of the sever indicating the user was physically present, or is it a remote desktop logon?  For that matter the logon could be associated with a service starting or a scheduled task kicking off.  In all such cases you will need to look at the Logon Type specified in the logon event 528/540/4624.  A full list of Logon Types is provided at the provided links for those events but in short:

Logon Type

Description

2

Interactive (logon at keyboard and screen of system)

3

Network (i.e. connection to shared folder on this computer from elsewhere on network)

4

Batch (i.e. scheduled task)

5

Service (Service startup)

10

RemoteInteractive (Terminal Services, Remote Desktop or Remote Assistance)

Events at the Domain Controller

When you logon to your workstation or access a shared folder on a file server, you are not “logging onto the domain”. Each Windows computer is responsible for maintaining its own set of active logon sessions and there is no central entity aware of everyone who is logged on somewhere in the domain.  After servicing an authentication request, the domain controller doesn’t maintain information about how you were logging (console, remote desktop, network, etc) or when you logged off.

On domain controllers you often see one or more logon/logoff pairs immediately following authentication events for the same user.  But these logon/logoff events are generated by the group policy client on the local computer retrieving the applicable group policy objects from the domain controller so that policy can be applied for that user.  Then approximately every 90 minutes, Windows refreshes group policy and you see a network logon and logoff on the domain controller again.  These network logon/logoff events are little more than noise.  In forensic situations, they provide an estimate of how long the user was logged on (as long as the user remains logged on group policy will refresh about every 90 minutes), and can help to infer that the preceding authentication events for the same user were in conjunction with an interactive or remote desktop logon as opposed to a service or scheduled task logon.

What about the other service ticket related events seen on the domain controller? Basically, after your initial authentication to the domain controller which logs log 672/4768 you also obtain a service ticket (673, 4769) for every computer you logon to including your workstation, the domain controller itself for the purpose of group policy and any member servers such as in connection with shared folder access.  Then as computers remain up and running and users remain logged on, tickets expire and have to be renewed which all generate further Account Logon events on the domain controller.

The Facts: Good, Bad and Ugly

Both the Account Logon and Logon/Logoff categories provide needed information and are not fungible:  both are distinct and necessary.  Here are some important facts to understand, and accept about authentication and logon/logoff events.

  1. To determine definitely how a user logged on you have find the logon event on the computer where the account logged on.  You can only make some tenuous inferences about logon type by looking at the domain controller and that requires analyzing multiple events.
  2. To determine when a user logged off you have to go to the workstation and find the “user initiated logoff” event (551/4647).
  3. To correlate authentication events on a domain controller with the corresponding logon events on a workstation or member server there is no “hard’ correlation code shared between the events.  Folks at Microsoft have suggested the Logon GUID field in these events would provide that but my research and experiments indicate that unfortunately the GUIDs are either not supplied or do not match.  So to make that correlation you basically have to dead reckon based on time, computer names and user account names.
  4. Account Logon events on domain controllers are great because they allow you to see all authentication activity (successful or failed) for all domain accounts.  Remember that you need to analyze the security logs of all your domain controllers – security logs are not replicated between DCs.
  5. Account Logon events on workstations and member servers are great because they allow you to easily pick out use of or attacks against local accounts on those computers.  You should be interested in that because using local accounts is bad practice and bad guys know they tend to be more vulnerable than domain accounts.  But, you don’t have to use Account Logon to detect logon attempts on local accounts; you can use Logon/Logoff events if you know what you are doing.  When viewing a Logon/Logoff event compare the domain name in the event details to the computer name that generated the event; if they match you are looking at a local account logon attempt – otherwise the domain name field with reflect some domain.  So can you survive with only enabling Logon/Logoff events on member servers and workstations?  I suppose so.
  6. Logon/Logoff events are a huge source of noise on domain controllers because every computer and every user must frequently refresh group policy.  If you disable this category on domain controllers what will you lose?  You will lose some visibility into logons at the domain controller itself such as when an admin logs on at the console, via remote desktop or a service or scheduled task starts up.  In all cases Account Logon events will still be logged but see points 1 and 2 above.
  7. Successful network logon and logoff events are little more than “noise “on domain controllers and member servers because of the amount of information logged and tracked.  Unfortunately you can’t just disable successful network logon/logoff events without also losing other logon/logoff events for interactive, remote desktop, etc.  Noise can’t be configured out of the Windows security log; that’s the job of your log management / SIEM solution.

Account Logon (i.e. authentication) and Logon/Logoff events.  All things considered, I’d like to see both categories enabled on all computers ideally.  I haven’t seen these events create a noticeable impact on the server but the amount of log data might exceed your log management / SIEM solution’s current capacity.  If you can’t afford to collect workstation logs, I still suggest enabling these 2 categories on workstations and letting the log automatically wrap after reaching 100MB or so.  Chances are the data will be there if you need it for forensic purposes.

[1] In this article, the three digit Windows 2003 event is followed by the four digit Windows 2008 event.

The View from the Trenches

Noticed the raft of headlines about break-ins at companies? If you did, that is the proverbial tip of the iceberg.

Why?

Think about the hammering that Sony took on the Playstation hack or how RSA will never live down the loss of golden keys and the subsequent attack at Lockheed.

Victims overwhelmingly prefer to keep quiet. If there is disclosure, its because there is loss of consumer information which is subject to laws. If corporate information is stolen, it is often not required to be disclosed.

A survey by the Ponemon Institute sponsored by Juniper of 581 security professionals at large companies in the United States, Britain, France and Germany,   found that 90 percent of them had at least one breach in the last year and 59 percent had two or more. And the costs are mounting; 41 percent of break-ins cost more than half a million dollars.

What is interesting though, is the variation in perception between those in the trenches who think the organization is under equipped to cope with the onslaught, versus senior executives who think that resources are in place.

This study describes the situation at federal agencies such as DHS, DOD and HHS. Whereas 64% of the rank-and-file recognized the importance of log management, only 45% of senior executives shared this view.

These are important findings because they show differences between the people who are determining the priorities and direction for their organization and those who are in the trenches and seeing the risks first-hand.

The magnitude of the security threat is much greater than many realize.

Virtualization Security What are the Real World Risks

There’s been a lot of recent hype about security risks with the rise of virtualization, but much of it is vague and short on specifics.  There is also an assumption that all the security available on a physical server simply disappears when it migrates to being a virtual machine.  This is not true.  A virtual server is the same server it was before it was P2V’d from a physical server.  IS authentication, access control, audit, and network controls remain as active as before.  The virtual server sits on hosts and SANs in the same datacenter as did the physical server .  So what has changed?  What are the new risks?

The risks are in the virtualization infrastructure layer…

Some questions to consider:  how secure are the host, the storage and the host control server? How secure are the ESX/ESXi hosts, the SANS and the vCenter servers?  Those are the real concerns.  Now that a new layer has been inserted between the guest operating system and the hardware, that layer’s immediate components and other components upon which it depends (the Active Directory forest to which the vCenter servers belong) need the same security controls. Virtualization security is even more critical in some respects because a low level access is equivalent to physical access to every guest server and its data, and may compromise the system.

Recent audits show that some areas of security and control of virtualization components can be immature and do reflect concern about how critical the virtualization layer is to security.

Much is made about network security risks associated with virtualization but this concern may be unfounded.  Most servers are not behind internal firewalls or on heavily restricted network segments in the first place, so moving them to an ESX/i host on a virtual switch doesn’t expose the server to new network risks. Physical servers with such controls can be set up exactly the same with virtual switches and firewalls. There are more advanced attack scenarios involving a compromised VM where the attacker can break out of that VM, into the host and possibly back up into other VMs, but at this point, most security teams are vigilant enough to deter such an attack.

The other area of network security is the protection of the visualization infrastructure itself

Unless IT shops totally ignore virtualization best practices, they will implement multiple network cards on ESX/i hosts that allow for completely separate guest, live migration (aka vMotion), storage and management (connections from VMWare vCenter and clients to the host) traffic.  In audits I’ve performed, SANs and the management interfaces on hosts are isolated from rest of the organization’s internal LAN.

There are areas of virtualization where serious risks do exist: privileged access, Active Directory and auditing.  Allowing administrators (especially multiple administrators!) to use the built-in root or admin account on operating system is bad practice and risky. Everyone needs his or her own account.  The principle is emphatically true for virtualization hosts.  However prior audits reveal that many hosts are managed by the built-in root account which is shared among multiple administrators.  With a virtualization host, the risk of shared and insecure root accounts is multiplied by number and criticality of all the guest VMs on that host.  The best practice is to lock down ESX/i hosts so that even admins don’t directly access them and are required to go through the central management server (called vCenter in the case of the VMware environment).  vCenter doesn’t share the same prevalence of insecure root access because vCenter integrates with Active Directory and allows an organization to leverage the AD accounts admins already possess.

Another prevalent risk associated with the dependence of virtualization infrastructures is situated with the Active Directory

Directory integration and unified authentication is definitely the way to go, but there are risk factors to consider as well.  First, virtualization management servers like vCenter tend to be members of the main AD.  For example: a Windows server belonging to a domain is exposed to any and all risks in Active Directory and all domain controllers within that forest (remember the security boundary in AD is the forest not the domain).  A vCenter server is the “boss” of all the ESX/i hosts connected to it.  Thus, everyone with domain admin authority anywhere in the AD forest and anyone who compromises a domain controller in the AD forest can take the vCenter server and compromise the virtualization infrastructure (and ultimately any guest VM and its data.  Any outstanding risks from previous AD audits must be carried forward to the virtualization infrastructure audit too.

For example, at one financial institution, an excessive number of IT folks had domain admin authority to the main AD forest.  That was problem enough as a risk to AD and the Windows systems within that forest.  But with the virtualization management server as a member of that forest, now every virtual machine – even those running Linux and Windows servers in other forests are now accessible to that same excessively large group of AD admins.  Worse, in this organization remote access was widely available with no strong authentication, so the entire virtualization infrastructure and the countless servers were vulnerable to compromise by a successful password based attack against any one of many AD admins.

The solution?  First, think about the AD forest(s) that hold either your virtualization management servers (e.g. vCenter) or those user accounts with privileged access to the virtualization infrastructure (i.e. users with the Administrator role in vCenter).  Those forests, including each domain and domain controller within them, must be locked down and secure to a level appropriate for the virtualization infrastructure itself. Organizations with outstanding AD security issues with no resolution in site should really look at implementing a small, separate AD forest for providing directory and authentication services to their infrastructure including virtualization, storage and network components.  This small AD forest would be much more locked down and protected and careful thought should be given before implementing synchronization or trust relationships between it and other forests.  This may be at the price of maintaining additional user accounts for infrastructure admins but that is the price of security in this instance.  If trust is implemented it should such that the infrastructure forest is trusted by the other forests not vice versa.   If synchronization is implemented, password changes or other authentication data should not flow from other forests or directories into the infrastructure forest.

The final risk area in newly virtualized organizations is a lack of auditing and log management for virtualization infrastructure components

Virtualization management servers (e.g. vCenter) and hosts (e.g. ESX/i) can generate audit logs.  It is crucial to enable this feature and subsequently collect, archive alert and report on this log data the same as is necessary with any other security critical components on your network.  Virtualization hosts like ESX/i are simple to accommodate since they can send events via syslog, but  management servers like vCenter are more problematic.  vCenter creates a number of text log files named vpxd-1 through 9 but my research has proven them to omit very important data and fail to resolve other key ID codes.  This is not to say the audit trails aren’t there.  They are but they are trapped inside database tables.  In the case of vCenter the audit trail is stored in VPX_EVENT and VPX_EVENT_ARG tables within the vCenter SQL database.   Incomprehensibly command line interfaces like Get-VIEvent that pull data from these leave out critical event arguments as well as the name or ID of the event itself!  So the final option seems to be direct query of the SQL tables themselves with the necessary resolution of foreign keys to rows in related tables.  This presents an opportunity to log management and SIEM vendors to distinguish themselves with enhanced support for collecting enriched audit trails from virtualization infrastructures.

Are there risks in the typical virtualized data center?

Absolutely.  But it’s important to identify the real risks.  In most environments, the risk is less among the virtual machines and more with the basic security controls of the infrastructure itself as well as risks resulting from poorly understood security dependencies between the virtualization infrastructure and the directory used for identity and authentication.  Make sure virtualization infrastructure components are properly isolated.  Follow the same best practices for securing root access on hosts as we’ve had to apply to normal servers for decades.  And include audits trails from virtualization infrastructure in all log management efforts.

Automating Review and Response to Security Events

The next significant horizon in audit log management will be the automation of the review and response tasks associated with security events.  Currently, log management SIEM solutions are expected to scour logs, identify high-impact changes or other suspicious activity, and simply send out an alert.  It requires the intercession of a person to assess the information, make inquiries, research and review data, and ultimately resolve the matter.

In this article I’ll present several automation opportunities.  In addition to some simple integration with Active Directory or other LDAP services, these scenarios will require a degree of scripting and workflow capabilities such as those provided by SharePoint and similar technologies.

Verifying New User Accounts

The appearance of a new user account is an important event to the security of a system because it signifies a means of gaining entry into that system’s resources.  New accounts need to be reviewed to prevent the unnoticed creation of backdoor accounts by intruders, and to reduce the creation of insecure accounts outside of the change management and security controls of the organization.   However, wading through new user account notifications can be a laborious process for the information security team; it may be difficult for him or her to differentiate valid accounts from inappropriate accounts – especially at large organizations where employee turnover is high.

In this instance, automation can be an invaluable solution. When confronted with a new user account (e.g. event ID ??? in Windows Server 2008 Active Directory) the log management solution can compare the new account’s name with the naming convention standards for the organization.  At most organizations, naming conventions distinguish between human user accounts and system accounts created for services, applications and batch processes.  Often there are additional naming standards to distinguish between employees, contractors, privileged admin accounts.

For example, a company may prefix employees accounts with an “e”, contractors with a “c”, admins with “p” and system accounts with “s”, and so on.  When the automated response rule encounters a new user account that does not match this convention, it automatically recognizes the non-compliant user account and opens a trouble ticket for enforcing security standards. Conversely, an account with the proper naming convention is sent to the employee’s manager for confirmation. There is great value in corresponding a user account with an employment record in the Human Resources system so the automated response system can look up and verify an employee.  Lacking that verification, an exception ticket is generated and investigated.  By verifying all accounts, it becomes much more difficult to purposefully sidestep security controls.

Verifying Group Membership Additions

When a member is added to a group, new access is granted. Such occurrences can be excellent opportunities for automated response.  The crucial element in automating new group member response is for the system to be capable of identifying the data “owner” responsible for approving entitlements to the data to which that group is granted access.  Assuming the repository of groups is Active Directory, the obvious attribute is the Managed By property which allows a user account to be specified.  By populating each group’s Managed By attribute with the appropriate data owner the automated response system to determine who to contact when new members are added is created.  Data owners confirm and approve the new entitlement.  A positive confirmation closes the event and documents that re-verification has been performed.  A negative response opens a ticket with the information security group to investigate all relevant information (including the user listed in the audit log who executed the group member addition).  These are the only circumstances that require manual intervention by the information security team, but every group member addition can be confirmed.

Following Up On Suspicious Logon Activity

Resolving failed logons because of a bad password can be one of the most difficult tasks to execute since they are often caused by user error.  These events also present an opportunity for automated response.  When an information security analyst has reason to suspect an attack on a given account, the typical response is to contact the user to determine if they inadvertently created audit events.  An automated response system upon observing failed logons may perform the same actions similar to the process described above for group member additions.  Thresholds can be set up to prevent users from being targeted by incessant inquires each time they mis-enter their password. Since email access may be controlled by the very account under attack, it may be better to use an “out of band” communication method such as sending a text message to the user’s phone (again illustrating the important of being able to leverage information from the organizations directory when responding to events).

IP Geolocation can also be leveraged to identify suspicious logon attempts.  When a system observes a logon attempt (successful or failed) from a country or region outside the defined normal area for the organization or for an individual user (if physical location data is stored with the user’s directory information) an automated message can be sent to the user requesting confirmation of their activity.

There are multiple advantages to response automation for security events.  The ability to script such automated responses, access to the directory service, HR data, IT ticketing and some type of workflow system are important requirements but careful analysis organizations can identify those operations which are important.  Through such integration and automation, vigilance can be increased while eliminating the manual effort required to follow up on the numerous security events generated daily.

Five reasons for log apathy — and the antidote

Five Reasons for Log Apathy – and the Antidote

How many times have you heard people just don’t care about logs? That IT guys are selfish, stupid or lazy? That they would rather play with new toys than do serious work?
I argue that IT guys are amazing, smart and do care about the systems they curate, but native tools are such that log management is often like running into a brick wall — they encourage disengagement.

Here are five reasons for this perception and what can be done about them.

#1 Obscure descriptions: Ever see a raw log? A Cisco intrusion or a Windows failed object access attempt or a Solaris BSM record to mount a volume? Blech… it’s a description even the author would find hard to love. Not written to be easy to understand, rather its purpose is either debugging by the developer or meant to satisfy certification requirements. This is not apathy, it’s intentional exclusion.

To make this relevant, you need a relevant description which highlights the elements of value, enrichs the information (e.g., lookup an IP address or event id) and not just spew them in time sequence but present information in priority order of risk.

#2 Lack of access: What easier way to spur disengagement than by hiding the logs away in an obscure part of the file system, out of sight to any but the most determined; if they cannot see it, they won’t care about it.

The antidote is to centralize logging and throw up an easy to under display which presents relevant information – preferably risk ordered

#3 Unsexiness:  All the security stories are about wikileaks and credit card theft. Log review is considered dull/boring, it’s a rare occurrence to make it to the plot line of Hawaii Five-O .

Compare it to working out at the gym, it can be boring and there are 10 reasons why other things are more “fun” but it’s good for you and pays handsomely in the long run.

#4 Unsung Heroes: Who is the Big Man on your Campus? Odds are, it’s the guys who make money for the enterprise (think sales guys or CEOs).

Rarely is it the folks who keep the railroad running or god forbid, reduce cost or prevent incidents.

However, they are the wind beneath the wings of the enterprise. The organization that recognizes and values the guys who show up for work everyday and do their job without fuss/drama is much more likely to succeed. Heroes are the ones who make a voluntary effort over a long period of time to accomplish serious goals, not chosen ones with marks on their forehead, destined from birth to save the day.

#5 Forced Compliance: As long as management looks at regulatory compliance as unwarranted interference, it will be resented and IT is forced into checkbox mentality that benefits nobody.

It’s the old question “What comes first? Compliance (chicken) or security (egg)?” We see compliance as a result of secure practices. By making it easy to crunch the data and present meaningful scores and alerts, there is less need to force this.

I’ll say it again, I know many IT guys and gals who are amazing, smart and care deeply about the systems they manage. To combat log apathy, make it easier to deal with them.

Tip of the hat to Dave Meslin whose recent talk at Tedx in Toronto spurred this blog entry

A.N. Ananth

Security Logging as a Detective and Deterrent Control Against Rogue Admins

Intrusion detection and compliance are the focus of log management, SIEM and security logging.  But security logs, when managed correctly are also the only control over rogue admins.  Once root or admin authority has been given to, or acquired by, a user, there is little they cannot do:  with admin authority, they can circumvent access or authorization controls by changing settings or using tools to leverage their root access to tamper with the internals of the operating system.

Audit logs, when properly managed, can serve as a control and deterrent against the privileged super user’s authority. Simply enabling auditing and deploying a log management solution may not suffice; to really be a deterrent, the audit log must be protected from deletion or tampering by rogue admins.

First and foremost, log data must be moved as frequently as possible from the system where it is generated to separate secure log repository.  Today’s enterprise log management solutions do a great job of frequent log collection and long term archiving.  However, who has privileged access to the log management solution and the systems on which it runs?

A log management process is not an effective control if administrators have privileged access to the log management components.  Though administrators should not be denied access to run reports, configure alerts or research logs, privileged access to the log management solution that allows someone to disable, erase or otherwise compromise the integrity of the log collection and archival process should be carefully managed.

A log management solution cannot serve as a deterrent over administrators who have privileged access at the application level or any of the infrastructure components on which it runs.  This includes:

  • the operating system of the log management solution
  • any database servers it uses
  • the Active Directory forest in which the log manage server resides if a Windows server
  • the NAS or SAN where it stores log data

And if the log management application or any of the above components run inside a virtual machine this also includes:

  • the virtualization host, such as VMWare ESX(i), if it runs inside a virtual machine
  • the virtualization manager, such as VMWare vCenter
  • any of the components listed earlier which are used by or host the virtualization manager

Physical access to any of these components could potentially allow administrators to compromise the integrity of the audit trail.  To the extent possible, the log management solution should run on a completely separate infrastructure.

Remember such separation is a protection against not just internal rogue admins but outsiders who succeed in obtaining privileged access.  Typically the larger the organization, the more important and practical it is to achieve maximum separation between the log management solution and the environment it monitors.

Beyond hardware and software separation, the log management application, database servers, storage, OS and other components also need careful management.  Larger organizations generally have dedicated information security teams, and usually within that group is someone responsible for the audit log management process.  For full accountability and separation of duty, that team should have no privileged access to production business systems monitored by the log management process.  Ideally that group would provide the oversight necessary for all components in the log management solution and supervise any action that touches the audit log to insure its integrity and prevent the introduction of backdoors into the system.

There are a host of reasons why even “supervised access” can be compromised:  staff in smaller IT shops aren’t always able to specialize so the possibility for separation of skills and duties may not exist. When an in-house log management system can’t be physically and logically separated, log management as a service may be an alternative to consider.  With cloud-based log management, the entire system is controlled by a professional service team at a separate site.  Serices can be set up with role-based access control so the ability to erase audit logs is controlled.  If organizations can overcome the frequent pushback to sending audit logs to the cloud, full isolation and integrity of log data can be achieved without building a separate log management system, and without the requirement of expertise for audit log management.

Whether an organization goes with an in-house audit log management or turns to the cloud-based service, it should carefully assess its choices in architecture and administrative responsibility. When the worst happens, audit logs may be the only deterrent and detective control over rogue admins.  Are they secure?

Personalization wins the day

Despite tough times for the corporate world in the past year, spending on IT security was a bright spot in an otherwise gloomy picture.

However if you’ve tried to convince a CFO to sign off on tools and software, you know just how difficult this can be. In fact, the most common way to get approval is to tie this request to an unrelenting compliance mandate. Sadly, a security incident can also help focus and trigger the approval of budget.

Vendors have tried hard to showcase their value by appealing to the preventive nature of their products. ROI calculations are usually provided to demonstrate quick payback but these are often dismissed by the CFO as self serving. Recognizing the difficulty of measuring ROI, an alternate model called ROSI has been proposed but has met with limited success.

So what is an effective way to educate and persuade the gnomes? Try an approach from a parallel field, presentation of medical data. Your medical chart: it’s hard to access, impossible to read — and full of information that could make you healthier if you just knew how to use it, pretty much like security information inside the enterprise. But if you have seen lab results, even motivated persons find it hard to decipher and take action, much less the disinclined.

In a recent talk at TED, Thomas Goetz, the executive editor of Wired magazine addressed this issue and proposed some simple ideas to make this data meaningful and actionable. The use of color, graphics and most important personalization of the information to drive action. We know from experience that posting the speed limit is less effective at getting motorists to comply as compared to a radar gun which posts the speed limit and framed by “Your speed is __”. Its all about personalization.

To make security information meaningful to the CFO, a similar approach can be much more effective than bland “best practice” prescriptions or questionable ROI numbers. Gather data from your enterprise and present it with color and graphs tailored to the “patient”.

Personalize your presentation; get a more patient ear and much less resistance to your budget request.

A. N. Ananth

Best Practice v/s FUD

Have you observed how “best practice” recommendations are widely known but not followed as much? While it seems more the case in IT Security, it is observed true in every other sphere as well. For example, dentists repeatedly recommend brush and floss after each meal as best practice, but how many follow this advice? And then there is the clearly posted speed limit on the road, more often than not, motorists are speeding.

Now the downside to non-compliance is well known to all and for the most part well accepted – no real argument. In the dentist example these include social hardships ranging from bad teeth and breath to health issues and the resulting expense. In the speeding example, there is potential physical harm and of course monetary fines. However it would appear that neither the fear of “bad outcomes” nor “monetary fine” spur widespread compliance. Indeed one observes that the persons who do indeed comply, appear to do so because they wish to; the fear or fine factors don’t play a major role for them.

In a recent experiment, people visiting the dentist were divided in two groups. Before the start, each patient was asked to indicate if they classified themselves as “generally listen to the doctors advice”. After the checkup, people from one group were given the advice to brush and floss regularly but then given a “fear” message on the consequences of non-compliance — bad teeth, social ostracism, high cost of dental procedures etc. People from the other group got the same checkup and advice but were given a “positive” message on the benefits of compliance– nice smile, social popularity, less cost etc. A follow up was conducted to determine which of the two approaches was more effective in getting patients to comply.

Those of us in IT Security battling for budget from unresponsive upper management have been conditioned to think that the “fear” message would be more effective … but … surprise, neither approach was more effective than the other in getting patients to comply with “best practice.”  Instead, those who classified themselves as “generally listen to doctors advice” were the one who did comply. The rest were equally impervious to either the negative or positive consequences, while not disputing them.

You could also point to the great reduction in smoking incidence but this best practice has required more than 3 decades of education to achieve the trend and still can’t be stamped out.

Lesson for IT Security — education takes time and behavior modification, even more so.

Come on Feel the Noise

It’s the line from a song in the 70’s, but quite apt when it comes to describing the Windows security log.  There’s no getting around the fact that there are a lot of useless and inexplicable events in the Security log, and the sooner you get comfortable with that the sooner you’ll save your sanity and get on with work.  In this article we’ll look at some common examples and noise events in the security and discuss strategies for dealing with them.

It is important to recognize noise.  First, the earlier in the log management process that noise can be discarded or ignored, the better for performance, bandwidth and storage.  Second, a lot of time can be wasted investigating what appears to be suspicious events, but which are, in fact, meaningless.

One example of high volume noise events are Kerberos service ticket renewal attempts generated on domain controllers.  Kerberos has two kinds of tickets – authentication tickets (also known as ticket granting tickets), and service tickets.  Authentication tickets (see event IDs 47684771 and 4772 on Windows 2008 and 627675 and 676 on Windows 2003) are connected with the actual authentication of the user (or computer) to the domain controller.  Service tickets vouch for a user’s identity to the member computers that the user subsequently accesses on the network.

When a user remains logged on (or a computer remains up) long enough, the service tickets expire and Windows needs to renew them with the domain controller.  A successful service ticket event (event ID 673 on Windows 2003 and 4769 on Windows 2008) can be useful by providing a record of the workstation and servers accessed by a user.  But a successful ticket renewal (event ID 674 on Windows 2003 and 4770 on Windows 2008) denotes nothing of value other than the fact that the user or computer remained logged on, or was powered up for a long time.  If a user remains logged on (or a computer remains up without being rebooted) for too long, the service ticket reaches it renewal lifetime limit and the domain controller finally rejects the renewal request, which generates a renewal failure.  There are other scenarios but at the end of day any kind of service ticket renewal (successful or failure) and any kind of service ticket request failure event is essentially noise.  Theoretically, malicious situations involving service ticket events could conceivably be generated, but in practice this is very unlikely, and there are no criteria that can be used to distinguish those events from all the background noise of other service ticket events.

Another example of noise that appears to be a event are the sometimes frequent occurrences of event ID 537, “Logon failure – The logon attempt failed for other reasons”, where user name is blank. Concerned admins may worry an attack is underway, but looking at the the sub status code in the event description  should confirm or allay these fears: if the code is 0xC0000133, “The time at the primary domain controller is different from the time at the backup domain controller or member server by too large an amount” (and it usually is), there is no security issue this is simply a time sync problem on the initiating computer or the domain controller.

Dealing with the quantity and variety of events can be overwhelming but several strategies can help prevent losing too much time on noise events.

First, known noise events should be identified, such as those described above.  Configure the log management / SIEM solution to suppress or filter these events from any alerts that are received or reports that are reviewed on a regular basis.  Document the justification for filtering those events so they are properly identified as noise or unimportant.

Next, set up alerts and reports for events that are known to exist, and are considered important enough to generate an alert or appear in the daily report.  There may be other events that don’t deserve a response but which should be reviewed for compliance purposes.  If entire logs aren’t already being archived, (including noise events), make sure that that the log management process records and archives these events.

Finally, the remaining events are those that are unknown, or unclassified.  Whatever the log management processes and technology, there should be a way to view any such “unclassified” events.  Periodically reviewing these events should prompt revisions to the criteria for the classification types described above with the eventual goal of no unclassified events.  This ensures unknown but important events aren’t missed, and provides a systematic way for managing noise.

While the Windows security log records all events from a finite set of IDs documented at www.ultimatewindowssecurity.com/securitylog/encyclopedia.aspx, many other logs, like those from Linux and Unix, have no well-defined and bounded event schema.  Even with the new subcategory audit policy structure released in Windows Server 2008, Windows audit policy is not granular or flexible enough.  The is true for for other log sources.

Security logs aren’t like a financial audit trail where every transaction and penny can and should be justifiable.  Get comfortable with noise events in your logs; through audit policy refinements, useless events can be reduced and real threats can be more readily identified.

About the Author

Randy Franklin Smith is an internationally recognized expert on the security and control of Windows and Active Directory security who specializes in Windows and Active Directory security. He performs security reviews for clients ranging from small, privately held firms to Fortune 500 companies, national, and international organizations.

Randy Franklin Smith began his career in information technology in the 1980s developing software for a variety of companies. During the early 1990s, he led a business process re-engineering effort for a multi-national organization and designed several mission critical, object-oriented, client/server systems. As the Internet and Windows NT took off, Randy focused on security and led his employer’s information security planning team. In 1997, he formed Monterey Technology Group, Inc where he serves as President.

You can contact Randy at rsmith@ultimatewindowssecurity.com