Manky Sankeys and Data Dilemmas: Data Doctor Download Returns

Hello vizzers and chart makers! You got data problems? You’ve come to the right place. Let’s get vizzy.

Doc,

Hello from the UK! You know how it goes…my boss found the latest “hot chart” on Instagram and is pestering me to make it. It’s a Sankey and he thinks its the answer to every problem. I tried explaining what this chart was for and NOPE, it’s Sankey or bust.

Oh, and he wants this on a massively dense data set, because why not?

Help!

– Sankeys are manky

Manky Sankey,

I feel your pain. The flows and the curves of a Sankey are ever so delightful. If anything reminded me of a car from the 1960’s, it might just be a Camaro…er, Sankey. Curves aside, Insta-chart shopping ends up a dangerous endeavor when the chart’s purpose isn’t understood. Sure, that data looks great in that case, because it met the goal of the task.

Goals are hard. It’s why January sucks so much. We make these lofty plans and by February, most of us have flopped back on the couch. Chart goals require breaking down the task, understanding the data we have, and what we’re hoping to represent. It’s a lot more fun to find something pretty on Instagram than deal with ambiguity. Beyond appeal, what does your boss think he’ll find with this chart? Ask what he’s hoping to understand. If possible, get a sense of how he’s getting this information and build out some (non-Sankey) options. And yes, you might still have to make the manky Sankey, but you can have him give you answers based on charts. My money says a different chart will prove effective, especially if all the densification leaves you dead on the track.

– Doc “all curves” data

Data Doctor,

I work at a place where we’ve amassed a mountain of data. Honestly, it’s probably its own galaxy at this point. I often get sent into this mess to make sense of it. You’d think this would be blissful – literally, I think we could answer the question of life, the universe, and everything, but frankly, I’d be hard-pressed to find anything more than overwhelm.

We don’t delete anything. We can go back decades to longer than I’ve been alive. Granted, it doesn’t mean the data is great or complete, but the point is we have it. And people seriously want to look back 40 years. I’d love a way to make this manageable and – dare I say it – useful. Any ideas?

– No towel or guide for this data adventure

Arthur Dent,

The Hitchhikers Guide to Data Management recommends the following:

  1. Don’t panic. This is so important that it’s on the cover.
  2. Bring your towel. In this case, I’ll spot you, but hold on to it.
  3. The answer is 42 – it’s the question that’s the problem.
  4. Ignore Vogon poetry at all cost.

You’re stuck deep on the Vogon ship. We need to get you out of there quick. For starters, you want data governance. Not just so they say no or force people to put definitions on the data, but so that your organization starts talking about what data to keep, what data to aggregate, and what data to archive or delete. Data governance teams are great for that.

From there, you want to build a space for analytical work where this data is ideally organized or curated. Call it a data lake, warehouse, or whatever paradigm you want – the goal is packaging, organizing, and grouping. Oh, and yes, culling. You want to clear some space without wrecking inhabited planets. Yes, this is expensive. Trying to do what you’re doing without organization is even more expensive long term, which you feel. Hopefully, this helps you to get others to feel it too.

Doc “Ford Prefect” Data

Dear Data Doctor,

I’m a newer analyst, so maybe this is just a “time and experience” problem, but I really struggle with putting together dashboard. I have several books, but often it seems like several charts could work and I don’t know which one to choose and why. I put them together and I don’t know how to make it make sense to others, let alone myself. I’m constantly struggling, reworking my dashboards, while it seems effortless to others. Thoughts? Maybe I’m not cut out for this.

– New and Blue

Blue,

I promise it gets better. Some of this is definitely goes back to time and experience. It’s why I’ve agued for ages that this is a practice profession, because it takes doing the work for some time to feel confident. The reality with dashboards and data viz is several charts do work, just as words often end up being interchangeable in writing. The art is in the aggregation of the chosen terms to make a memorable sentence, or in the charts working together to make a piece. Some ways to practice this are by starting with very basic charts (bars, lines, and scatterplots) to how how they work together. That removes some complexity with the chart selection and focuses more on the whole piece.

There’s a plethora of community projects that can be a powerful way to dissect meaningful compositions. Look at the work of others. What do you like about what’s been done? What happens if you alter a chart or the order of how they’ve presented information? These types of experiments help explicitly define what makes a viz “good” by intentionally finding out what breaks the effect.

Another exercise is playing with all the ways to can answer the question. It takes perfecting one viz out of the equation and focuses on building a vocabulary of expressions that all accomplish the same goal. From there, you may find certain things resonate.

Lastly, one of the fastest – but initially frustrating – ways to get better viz is to drop color completely out of the equation. Color complicates a ton in the beginning. Without color, you’re really forced to look at what shape is telling you about your data. You’d be surprised at what this teaches you.

Doc “Old and Grey(scale)” Data

Dear Doc,

I work at a place where we have “messy numbers” for everything. Everything is an index, score, ratio, or composite of several different numbers. There’s a small share of people who demand these numbers be used in reporting and they have a lot of influence over what gets produced. Unfortunately, no one else understands these messy numbers, and we field a lot of emails, tickets and other requests clarifying the metrics. We already have a data governance site, we have tooltips explaining the metrics, but frankly, these numbers are that convoluted. Is there any way we can move the needle on this benchmark?

– Over-indexed and Scoring in the Red

Over-indexed,

This is a surprisingly common problem. Somewhere, a metric was made, someone declared it was meaningful, and then it had kittens. Lots of them. I blame Excel, mostly because that’s what I do.

Escaping these numbers is hard. They become encultured and ingrained, particularly to those of us who have lived with them too long. It’s jargon, but with numbers, and jargon draws the line between “us” (insiders) and “them” (outsiders to the business). Flipping this paradigm requires the right people to feel the pain of it: those who want the number should defend it. That won’t be enough, but it will be a start.

Bad behavior also goes a long way. Try a “plain language” version of the analysis. Put it with the complicated scorecard and let users toggle between the versions. It acts as a translation tool, but also helps plant the seed that maybe, just maybe, composite numbers aren’t as effective. Users will help fight this battle because they’ll show the visual version in meetings and find patterns with it. They will fight the battle better than you can and then you can move towards the green on this metric.

Doc “Clean Viz” Data

That’s all for this round!