Relational-ish: JSON, Tableau, and the Power of Sets
JSON is incredible if you want a lightweight way to share the data. It allows you to track loads of data, often without worrying about relationships. Wanna download it? Sure!
Wanna analyse it? Ouch.
If you’ve ever pulled native JSON into Tableau, you learn very quickly about the perils of bringing a relational paradigm to a highly non-relational platform. No joins or unions here, love – just a whole lot of hierarchy and schemas.
Yes, Tableau can read JSON natively. It looks like this.
In the cute indents and brackets formation, it doesn’t look so bad. But in Tableau, we quickly realize there’s 226 schemas to level with. Not fun. Ideally, we give Tableau about 10 schemas to play with. So, you can spend your time optimizing JSON or….
Find a Friend
Tools like Snowflake and Alteryx can parse JSON into a semi-relational format. Unless you hit a true wonder, semi-relational will likely be the best you can achieve.
You see, the paradigm that makes JSON (freeflow!) makes it hard to relate and tidy it back up without either massive data loss, or a whole lot of nulls. All collection and no relations makes JSON a null boy.
If you roll this out in Snowflake, you can then connect Tableau directly to that database. Alteryx has a few fun tricks where you can schedule it to pop in the extract or roll it to a database platform of your choice. Since I did this for a customer, I’m going to continue to pretend you just connected to Snowflake. If you want to see the Snowflake piece demonstrated by the delightful Patrick McCormick, here’s the video.
I’m going to Julia Child past this part of baking. The details are in the video. For now, we’ll assume we’ve made a semi-relational source that gives me identifiers and organizes my data with some of the schemas called out in a column (ontology in this example, with category being the individual interesting item) all in one nice tidy data source.
Spicing it up in Tableau
If you’ve followed the Tableau-sphere, you know set and parameter actions are magic. These new feature let you push the boundaries of making Tableau act as the foundation for world-building. (If not, go visit Lindsey Poulter, IronViz contestant and set/parameter action wizard. Oh, just go there anyway for bliss.)
You’re going to need at least version 2018.3 for this to work. Here’s what you do in Tableau (note, this also works in non-JSON derived data as well).
All too often, we get handed the following riddle: did the problem get resolved? Oh, we analysts LOVE this question! Of course, we’ll find out.
Except when the data is in the exact same column…oh yes, we’ve all faced this.
In this case, my data is created from categories. In short, I get a bunch of flags or timestamps showing events in a line. This is fantastic to make a timeline, but challenging if I want to track results. We can see due to shape (duplicate end actions) that to make relationships risks either data loss (I pick one) or Cartesian joins (I choose everything and make data gremlins).
So how do we fix this problem? We’re going to dance and get hacky! Grab your paper and pens, here’s what you’ll need:
- A single, semi-tidy data source. Sets work within a single data source while parameters are technically their own data source (dig into an TWB which is really XML and you’ll see this).
- Some type of unique identifier for your group of items. In my case, it’ll be calls (mediaID). If you’re using Trello data, it might be a card or for IMDB, a movie.
- Willingness to accept this is fuzzy at best. You’re answering “for this identifier, is this true?”
- A dashboard where you use charts to explore. No, you’re not going to throw this complicated mess into 1 chart and get answers, but you can in a nice exploratory dashboard. Single complex charts will still require Cartesians. Sorry!
Stairstep
This is a 4-part dance, and the first thing we’re going to do is step it out.
Put each desired part into its own column. In my data above, I have Ontology and Category, which helps it parse out nicely. If you have to, use grouping in Tableau to get there. Ideally, the same fields are not in multiple columns. So, agent quality gets its own column, and call drivers is in a separate column without overlap between the 2.
Set
Here’s where the fun starts to begin. Make sets for each of your columns. They’ll be a mess and you’ll see nulls. Choose smart, descriptive names to avoid enemies later.
Pick one, any one, of your members, set it, and go. It doesn’t matter what you choose. They’re going to change often.
Sets by themselves are not enough. If you only want single-select and always want 1 item filtered, yes, you can also use parameters. Sets allow you to select multiple or to easily have nothing selected.
If you’re going to have several of these across dashboards, you can also put your sets in folders. If I were smarter, I’d GIF this and put it on Twitters. I’m not, so your win, kids.
Level of Detail (LOD)
LOD calculations start to put the magic to work. For each unique identifier, you’re going to see if that condition is true. Here’s how:
{FIXED [UniqueIdentifier]: MAX(IF [Set] THEN 1 ELSE 0 END)}
This calculation gives you a nice Boolean answer at the grain of your identifier. Name this nicely as well. If you want to be real friendly, comment it, put this series of calculations in its own folder, and even change the default hover description to something useful.
We’ll come back to these.
Filter
Here’s where it gets tricky and coffee becomes quite handy. You have 2 parts: (1) updating the set and (2) making sure the filter affects all the other parts. If you haven’t named and tidied your sheets yet, this is a perfect opportunity to do so.
In the dashboard, make your set action. Choose select, the one sheet with the fields that made the set, and ‘Add all values to set’. Remember this worksheet name – you’ll want to avoid it later.
This makes sure that your chart updates the set. The other worksheets won’t filter…yet.
Pick a sheet that’s a target of the filter (not the one driving the set). Choose your calculated Boolean field and use it to filter this sheet. Apply it to all sheets you want this to filter (but not the source sheet).
Rinse and repeat for all items you want to filter. Be sure to avoid recursive filtering.
This tactic allows your dashboard to emulate a relational source without all the relations. It helps users see data that is related, albeit in a fuzzy manner. It works with JSON and it can work with other data problems.
And yes, you can make a dance to it.
Now, if you’ll excuse me, I’m going to step it out.
[…] Schema built on JSON. […]