Big Data – Does insight equal decision?

In information technology, big data consists of data sets that grow so large that they become awkward to work with using whatever database management tools are on-hand. For that matter, how big is big? It depends on when you need to reconsider data management options – in some cases it may be 100Gb, in others, it may be 100Tb. So, following up on our earlier post about big data and insight, there is one more important consideration:

Does insight equal decision?

The foregone conclusion from big data proponents is that each nugget of “insight” uncovered by data mining will somehow be implicitly actionable and the end user (or management) will gush with excitement and praise.

The first problem is how can you assume that “insight” is actionable? It very well may not be, so what do you do then? The next problem is how can you convince the decision maker that the evidence constitutes an imperative to act? Absent action, the “insight” remains simply a nugget of information.

Note that management typically responds to “insight” with skepticism, seeing the message bearer as yet another purveyor of information (“insight”) and insisting that this new method is the silver bullet, thereby adding to workload.

Being in management myself, my team often comes to me with their little nuggets … some are gold, but some are chicken.   Rather than purvey insight, think about a recommendation backed up by evidence.

Big Data, does more data mean more insight?

In information technology, big data consists of data sets that grow so large they become unwieldy to work with using available database management tools. How big is big? It depends on when you need to reconsider data management options – in some cases it may be 100 Gigabytes, in others, as great as 100 Terabytes.

Does more data necessarily mean more insight?

The pro-argument is that larger data sets allow for greater incidences of patterns, facts, and insights. Moreover, with enough data, you can discover trends using simple counting that are otherwise undiscoverable in small data using sophisticated statistical methods.

On the other hand, while this is perfectly valid in theory, for many businesses the key barrier is not the ability to draw insights from large volumes of data; it is asking the right questions for which insight is needed.

The ability to provide answers does depend on the question being asked and the relevance of the big-data set to that question. How can one generalize to an assumption that more data will always mean more insight?   It isn’t always the answer that’s important, but the questions that are key.

Silly human – logs are for machines (too)

Here is an anecdote from a recent interaction with an enterprise application in the electric power industry:

1. Dave the developer logs all kinds of events. Since he is the primary consumer of the log, the format is optimized for human-readability. For example:

02-APR-2012 01:34:03 USER49 CMD MOD0053: ERROR RETURN FROM MOD0052 RETCODE 59

Apparently this makes perfect sense to Dave:   each line includes a timestamp and some text.

2. Sam from the Security team needs to determine the number of daily unique users. Dave quickly writes a parser script for the log and schedules it. He also builds a little Web interface so that Sam can query the parsed data on his own. Peace reigns.

3. A few weeks later, Sam complains that the web interface is broken. Dave takes a look at the logs, only to realize that someone else has added an extra field in each line, breaking his custom parser. He pushes the change and tells Sam that everything is okay again. Instead of writing a new feature, Dave has to go back and fill in the missing data.

4. Every 3 weeks or so, repeat Step 3 as others add logs.