Resistance is futile


The Borg are a fictional alien race that are a terrifying antagonist in the Star Trek franchise. The phrase “Resistance is futile” is best delivered by Patrick Stewart in the episode The Best of Both Worlds.

When IBM demonstrated the power of Watson in 2011 by defeating two of the best humans to ever play Jeopardy, Ken Jennings who won 74 games in a row admitted in defeat, “I, for one, welcome our new computer overlords.”

As the Edward Snowden revelations about the collection of metadata for phone calls became known, the first thinking was that it would be technically impossible to store data for every single phone call – the cost would be prohibitive. Then Brewster Kahle, one of the engineers behind the Internet Archive made this spreadsheet to calculate the storage cost to record and store one year’s-worth of all U.S. calls. He works the cost to about $30M which is non-trivial but not out of reach by any means for a large US Gov’t agency.

The next thought was – ok so maybe it’s technically feasible to record every phone call, but how could anyone possibly listen to every call? Well obviously this is not possible, but can search terms be applied to locate “interesting” calls? Again, we didn’t think so, until another N.S.A. document, cited by The Guardian, showed a “global heat map” that appeared to represent how much data the N.S.A. sweeps up around the world. If it were possible to efficiently mine metadata, data about who is calling or e-mailing, then the pressure for wiretapping and eavesdropping on communications becomes secondary.

This study in Nature shows that just four data points about the location and time of a mobile phone call, make it possible to identify the caller 95 percent of the time.

IBM estimates that thanks to smartphones, tablets, social media sites, e-mail and other forms of digital communications, the world creates 2.5 quintillion bytes of new data daily. Searching through this archive of information is humanly impossible, but precisely what a Watson-like artificial intelligence is designed to do. Isn’t that exactly what was demonstrated in 2011 to win Jeopardy?

Interpreting logs, the Tesla story


Did you see the NY Times review by John Broder, which was critical about the Tesla Model S? Tesla CEO Elon Musk was not pleased. They are not arguing over interpretations or anecdotal recollections of experiences, instead they are arguing over basic facts — things that are supposed to be indisputable in an environment with cameras, sensors and instantly searchable logs.

The conflicting accounts — both described in detail — carry a lesson for those of us involved in log interpretation. Data is supposed to be the authoritative alternative to memory, which is selective in its recollection. As Bianca Bosker said, “In Tesla-gate, Big Data hasn’t made good on its promise to deliver a Big Truth. It’s only fueled a Big Fight.”

This is a familiar scenario if you have picked through logs as a forensic exercise. We can (within limitations) try and answer four of the five W questions – Who, What, When and Where, but the fifth one -Why- is elusive and brings the analyst of the realm of guesswork.

The Tesla story is interesting because interested observers are trying to deduce why the reporter was driving around the parking lot – to find the charger receptacle or to deliberately drain the battery and make for a bad review. Alas the data alone cannot answer this question.

In other words, relying on data alone, big data included, to plumb human intention is fraught with difficulty. An analyst needs context.

Big Data and Information Inequality


Mike Wu writing in Tech Crunch observed that in all realistic data sets (especially big data), the amount of information one can extract from the data is always much less than the data volume (see figure below): information data.

Big Data

In his view, given the above, the value of big data is hugely exaggerated. He then goes on to infer that this is actually a strong argument for why we need even bigger data. Because the amount of valuable insights we can derive from big data is so very tiny, we need to collect even more data and use more powerful analytics to increase our chance of finding them.

Now machine data (aka log data) is certainly big data, and it is certainly true that obtaining insights from such dataset’s is a painstaking (and often thankless) job, but I wonder if this means we need even more data. Methinks we need to be able to better interpret the big data set and its relevance to “events”.

Over the past two years, we have been deeply involved in “eating our own dog food” as it were. At multiple EventTracker installations that are nationwide in scope, and span thousands of log sources, we have been working to extract insights for presentation to the network owners. In some cases, this is done with a lot of cooperation from the network owner and we have a good understanding of IT assets and the actors who use/abuse them. We find that with such involvement we are better able to risk prioritize what we observe in the data set and map to business concerns. In other cases where there is less interaction with the network owner and we know less about the actors or the relative criticality of assets, then we fall back on past experience and/or vendor-provided info as to what is an incident.  It is the same dataset in both cases but there is more value in one case than the other.

To say it another way, to get more information from the same data we need other types of context to extract signal from noise. Enabling logging at a more granular level from the same devices thereby generating an ever bigger dataset won’t increase the signal level. EventTracker can merge change audit data netflow information as well as vulnerability scan data to enable a greater signal-to-noise ratio. That is a big deal.

Big Data, Old News. Got Humans?


Did you know that big data is old news in the area of financial derivatives?   O’Connor & Associates  was founded in 1977 by mathematician Michael Greenbaum, who had run risk management for Ed & Bill O’Connor’s options trading firm. What made O’Connor and Associates successful was the understanding that expertise is far more important than any tool or algorithm. After all, absent expertise, any tool can only generate gibberish; perfectly processed and completely logical, of course, but still gibberish.

Which brings us back to the critical role played by the driver of today’s enterprise tools. These tools are all full featured and automate the work of crushing an entire hillside of dirt to locate tiny grams of gold — but “got human”? It comes back to the skilled operator who knows how and when to push all those fancy buttons. Of course deciding which hillside to crush is another problem altogether.

This is a particularly difficult challenge for midsize enterprises which struggle with SIEM data; billions of logs, change and configuration data all now available thanks to that shiny SIEM you just installed. What does it mean? What are you supposed to do next? Large enterprises can afford a small army of experts to extract value, whereas the small business can ignore the problem completely but for the midsize enterprises, it’s the worst of all worlds – Compliance regulations, tight budgets, lean staff and the demand for results?

This is why our  SIEM Simplified  offering was created: to allow customers to outsource the heavy lifting part of the problem while maintaining control over the critical and sensitive decision making parts. At the EventTracker Control Center (ECC), our expert staff watches your incidents and reviews log reports daily, and alert you to those few truly critical conditions that warrant your attention. This frees up your staff to take care of things that cannot be outsourced. In addition, since the ECC enjoys economies of scale, this can be done at lesser cost than do-it-yourself. This has the advantage of inserting the critical human component back into the equation but at a price point that is affordable.

As Grady Booch observed “A fool with a tool is still a fool.”

tool

Does Big Data = Better Results? It depends…


If you could offer your IT Security team 100 times more data than they currently collect – every last log, every configuration, every single change made to every device in the entire enterprise at zero cost – would they be better off? Would your enterprise be more secure? Completely compliant? You already know the answer – not really, no. In fact, some compliance-focused customers tell us they would be worse off because of liability concerns (you had the data all along but neglected to use it to safeguard my privacy), and some security focused customers say it will actually make things worse because we have no processes to effectively manage such archives.

As Micheal Schrage noted, big data doesn’t inherently lead to better results. Organizations must grasp that being “big data-driven requires more qualified human judgment than cloud-based machine learning.” For big data to be meaningful, it has to be linked to a desirable business outcome, or else executives are just being impressed or intimidated by the bigness of the data set. For example, IBMs DeepQA project stores petabytes of data and was demonstrated by Watson, the successful Jeopardy playing machine – that is big data linked clearly to a desirable outcome.

In our corner of the woods, the desirable business outcomes are well understood.   We want to keep bad guys out (malware, hackers), learn about the guys inside that have gone bad (insider threats), demonstrate continuous compliance, and of course do all this on a leaner, meaner budget.

Big data can be an embarrassment of riches if linked to such outcome.   But note the emphasis on “qualified human judgment.”   Absent this, big data may be just an embarrassment. This point underlines the core problem with SIEM – we can collect everything, but who has the time or rule-set to make the valuable stuff jump out? If you agree, consider a managed service. It’s a cost effective way to put big data to work in your enterprise today – clearly linked to a set of desirable outcomes.

Are you a Data Scientist?


The advent of the big data era means that analyzing large, messy, unstructured data will increasingly form part of everyone’s work. Managers and business analysts will often be called upon to conduct data-driven experiments, to interpret data, and to create innovative data-based products and services. To thrive in this world, many will require additional skills. In a new Avanade survey, more than 60 percent of respondents said their employees need to develop new skills to translate big data into insights and business value.

Are you:

Ready and willing to experiment with your log and SIEM data? Managers and security analysts must be able to apply the principles of scientific experimentation to their log and SIEM data. They must know how to construct intelligent hypotheses. They also need to understand the principles of experimental testing and design, including population selection and sampling, in order to evaluate the validity of data analyses. As randomized testing and experimentation become more commonplace, a background in scientific experimental design will be particularly valued.

Adept at mathematical reasoning? How many of your IT staff today are really “numerate” — competent in the interpretation and use of numeric data? It’s a skill that’s going to become increasingly critical. IT Staff members don’t need to be statisticians, but they need to understand the proper usage of statistical methods. They should understand how to interpret data, metrics and the results of statistical models.

Able to see the big (data) picture? You might call this “data literacy,” or competence in finding, manipulating, managing, and interpreting data, including not just numbers but also text and images. Data literacy skills should be widespread within the IT function, and become an integral aspect of every function and activity.

Jeanne Harris blogging in the Harvard Business Review writes, “Tomorrow’s leaders need to ensure that their people have these skills, along with the culture, support and accountability to go with it. In addition, they must be comfortable leading organizations in which many employees, not just a handful of IT professionals and PhDs in statistics, are up to their necks in the complexities of analyzing large, unstructured and messy data.

“Ensuring that big data creates big value calls for a reskilling effort that is at least as much about fostering a data-driven mindset and analytical culture as it is about adopting new technology. Companies leading the revolution already have an experiment-focused, numerate, data-literate workforce.”

If this presents a challenge, then co-sourcing the function may be an option. The EventTracker Control Center here at Prism offers SIEM Simplified, a service where trained and expert IT staff perform the heavy lifting associated with big data analysis, as it relates to SIEM data. By removing the outliers and bringing patterns to your attention at greater efficiencies because of scale, focus and expertise, you can focus on the interpretation and associated actions.

Big Data Gotcha’s


Jill Dyche writing in the Harvard Business Review suggests that “the question on many business leaders’ minds is this: Does the potential for accelerating existing business processes warrant the enormous cost associated with technology adoption, project ramp up, and staff hiring and training that accompany Big Data efforts?

A typical log management implementation, even in a medium enterprise is usually a big data endeavor. Surprised? You should not be. A relatively small network of a dozen log sources easily generates a million log messages per day with volumes in the 50-100 million per day being commonplace. With compliance and security guidelines requiring that logs be retained for 12 months or more, pretty soon you have big data.

So let’s answer the question raised in the article:

Q1: What can’t we do today that Big Data could help us do?   If you can’t define the goal of a Big Data effort, don’t pursue it.

A1: Comply with regulations like PCI-DSS, SOX 404, and HIPAA etc.; be alerted to security problems in the enterprise; control data leakage via insecure endpoints; improve operational efficiency

Q2: What skills, technologies, and existing data development practices do we have in place that could help kick-start a Big Data effort? If your company doesn’t have an effective data management organization in place, adoption of Big Data technology will be a huge challenge.

A2: Absent a trained and motivated user of the power tool that is the modern SIEM, an organization that acquires such technology is consigning it to shelf ware.   Recognizing this as a significant adoption challenge in our industry, we offer Monitored SIEM as a service; the best way to describe this is SIEM simplified! We do the heavy lifting so you can focus on leveraging the value.

Q3: What would a proof-of-concept look like, and what are some reasonable boundaries to ensure its quick deployment? As with many other proofs-of-concept the “don’t boil the ocean” rule applies to Big Data.

A3:   The advantage of a software-only solution like EventTracker is that an on premises trial is easy to set up. A virtual appliance with everything you need is provided; set up as a VMware or Hyper-Virtual machine within minutes.   Want something even faster? See it live online.

Q4: What determines whether we green light Big Data investment? Know what success looks like, and put the measures in place.

A4: Excellent point; success may mean continuous compliance;   a 75% reduction in cost of compliance; one security incident averted per quarter; delegation of log review to a junior admin.

Q5: Can we manage the changes brought by Big Data? With the regular communication of tangible results, the payoff of Big Data can be very big indeed.

A5: EventTracker includes more than 2,000 pre-built reports designed to deliver value to every interested stakeholder in the enterprise ranging from dashboards for management, to alerts for Help Desk staff, to risk prioritized incident reports for the security team, to system uptime and performance results for the operations folk and detailed cost savings reports for the CFO.

The old adage “If you fail to prepare, then prepare to fail” applies. Armed with these questions and answers, you are closer to gaining real value with Big Data.

Learning from JPMorgan


The single most revealing moment in the coverage of JPMorgan’s multibillion dollar debacle can be found in this take-your-breath-away passage from The Wall Street Journal: On April 30, associates who were gathered in a conference room handed Mr. Dimon summaries and analyses of the losses. But there were no details about the trades themselves. “I want to see the positions!” he barked, throwing down the papers, according to attendees. “Now! I want to see everything!”

When Mr. Dimon saw the numbers, these people say, he couldn’t breathe.

Only when he saw the actual trades — the raw data — did Mr. Dimon realize the full magnitude of his company’s situation. The horrible irony: The very detail-oriented systems (and people) Dimon had put in place had obscured rather than surfaced his bank’s horrible hedge.

This underscores the new trust versus due diligence dilemma outlined by Michael Schrage. Raw data can have enormous impact on executive perceptions that pre-chewed analytics lack.   This is not to minimize or marginalize the importance of analysis and interpretation; but nothing creates situational awareness faster than seeing with your own eyes what your experts are trying to synthesize and summarize.

There’s a reason why great chefs visit the farms and markets that source their restaurants:   the raw ingredients are critical to success — or failure.

We have spent a lot of energy in building dashboards for critical log data and recognize the value of these summaries; but while we should trust our data, we also need to do the due diligence.