Two Stereotypes that Limit Analytic Insights
By Christian ChabotRecent advances in data visualization remain trapped in the arms of a few users, often professional analysts. Why? A lack of well-designed products, certainly. Music compression technology was held back from its world-changing potential until the iPod design breakthrough fully unleashed it. Product design matters. Unfortunately, most products employing data visualization advances are really hard to use.
Yet visualization hasn’t been held back by product design alone. Tableau solves that one. The other problem is that it suffers from persistent stereotypes. Tableau has perspective on this because our products are used by thousands of business of all sizes, from Google, Microsoft and NIH, to arts groups, hospitals and churches. One thing we’ve learned in the field is that two stereotypes hold people back from adopting visualization more broadly. Leaders in the field of information visualization are trying to dispel these.
Stereotype 1: Visualization is for Large Data
The common stereotype of data visualization is that it’s designed for analyzing enormous data volumes. In fact it’s hard to find a discussion of visualization technology that doesn’t immediately emphasize its special applicability to massive data. This includes conferences (“keeping pace with the torrents of data”), research papers, marketing brochures, even basic definitions of the field (“potentially huge quantities of laboratory or simulation data“).
Visualization is indeed a fantastic tool for analyzing massive data. It isn’t a myth. It’s just an unfortunate stereotype. It’s unfortunate for several reasons:
- Visualization is equally useful with small and large data.
- People spend substantially more time working with small datasets than massive ones.
- Applying visualization to large data is easier if you start small.
A result of the stereotype is that fewer people are benefiting from visualization technology than should. It’s not uncommon for people say, “We’re going to adopt visualization once our master data warehousing project is complete, because it’s going to be huge and it’s going to need visualization.” They then go back to their desks and continue working with horrible interfaces to small data (e.g., Pivot Tables, charting wizards, Crystal Reports and statistics packages). They also realize after month 18 of their data warehouse initiative that much less data is going to be rolled into it than they originally hoped. That’s because such projects often break under the sheer complexity of baking too many changing requirements into one project.
One reason that visualization is equally useful for small data is that it’s often the number of columns of data that determine how difficult it is to understand, not the number of rows. Said differently, the complexity of answering questions typically rises faster with the dimensionality of data than the number of observations. And the number of dimensions that can cause data to be really difficult to reason about is very small.
Let’s look at an example.
Would you consider the following data set to be large?
Most people wouldn’t. After all it’s only 30 rows and 9 columns. If I claimed to any Tableau customer that this is a large dataset, it would evoke laughter. But if small means easy to understand without visualization, then I ask: What is interesting about this data? Name two important insights, for example. Likely you’ll find this task difficult. You can feel your cerebral CPU churning as you try to notice something. It’s hard work. One reason is that even 9 columns, regardless of how many rows, make relationships and outliers and associations and patterns difficult to detect.
In contrast, in less than 1 minute using Tableau, I discovered the following:
Hard to believe that I found both of these insights in 60 seconds? Try it using Tableau. Here’s the spreadsheet. And here’s the Tableau workbook.
After taking a look at these files, there’s a good chance you’ll agree that there is a better way of characterizing the applicability of visualization: Visualization is useful for all but the most trivial data sets.
That’s because it’s the question you have-- not the size of the data -- that drives the need. Large data, or small data, or a few rows, or a few columns – they can all benefit from better visualization.
Stereotype 2: Visualization is for Complex Questions
The second stereotype is that visualization technology is best suited to analyzing extremely complex problems. A somewhat humorous side effect of this is that marketers of data visualization technology aggressively showcase the extremely complicated visualizations their systems produce. They may be under the impression that showing something complicated favors a high price. In some cases, complex displays containing fractals and heat maps and rotatable 3-D cubes are the only visualizations that can be produced by the systems. Here are some real examples:
How often do you need to answer a question using such a complex display? The unfortunate but common theme in the marketplace for visualization is “Use complex visualizations to solve complex problems.” As a result, many people leave their first encounters with visualization software thinking “This is for scientists and PhDs.” This is ironic because even scientists work with simple information displays most of the time.
Visualization is indeed a fantastic tool for analyzing complex questions. And indeed sometimes a complex question is best answered with an elaborate display. Once again, it isn’t a myth. It’s just a stereotype. But the stereotype is holding back information visualization from its full potential for several reasons:
- Many questions are simple.
- Simple questions are usually answered better and faster using visualization.
- Even complex questions are often best answered using simple visualizations.
Let’s look at an example.
Would you consider the following dataset to be complex?
Assume it has a few thousand rows, but just these two columns. Here’s the spreadsheet. Most companies wouldn’t consider this one of their complex databases. After all, it contains two columns of readily understandable values: Dates and dollars.
So it should be easy to answer a simple question without needing visualization, right? How about this question: A consultant claims that Saturday was the most vital weekday for sales during the fourth quarter of 2005 and that this effect was unique to that quarter and year. Is he correct?
Try answering the question by staring at the spreadsheet. Or try answering it using a Pivot Table or a chart wizard. You’ll spend a lot of time trying to coax out the answer. My guess is it will take you 15 minutes at least.
In contrast, you can answer the question in about 60 seconds using Tableau:
Was this view really unearthed in 1 minute? Try it using Tableau. Again, here’s the spreadsheet. And here’s the Tableau workbook.
A similar visualization won the B-Eye Network’s Best Visualization Contest held by visualization expert Stephen Few. The winning entry, submitted by Tableau, was based on a simple question: How does sales performance vary over time? A simple visualization produced answers to a simple question that would have been extremely time consuming to discover without effective visualization.
The essence of useful visualization is producing simple displays for simple problems, and when necessary, simple displays for complex problems. Analytical excellence consists of answering questions with clarity, precision, and efficiency.