The Design of Everyday Data: Data Shaping in Tableau Prep for Amateurs
If I’m known for anything, it’s probably for how often I complain about badly shaped data. Donald Norman talks about how the design of things cue us to how they work. The design of data is very similar to these ideas.
Take for example how we, as humans, like to design spreadsheets:
Even without tidy formatting, I can read this and know who is working when. I worked a lot of places with schedules made like this in some fashion. Each row is dedicated to a real-live human being and the columns are days. As a human, I can find the row that matters to me, read across and go, and make sure I’m working with people that will give me a lunch break on time.
(Side note: normally, there’d be merged cells in at least 3 spots and some color coding. For once, I’m pretending to be nice. Come back tomorrow if you want me to be meaner.)
When we work with systems, we find they make data a bit more like this:
Humans rarely make data like this for direct use. Trying to figure out who punched in or out when takes a bit more effort by eyeball, even when it’s sorted by day, employee, and then time. This format is (usually) ideal for Tableau. We often get data like this in our jobs. Once piece is scattered somewhere (usually the depths of some shared drive) in Excel and 4 other pieces live in 5 other databases. 🙂
So, how do we make this tidy? Let’s try Tableau Prep.
(Those are all my little sponges to clean the data. It’s like Fantasia, but less scary.)
If this looks familiar to Tableau Desktop, it should. You connect to data in the same way. You have the side panel to help you out or take you on an extended detour (SHINY!!!). You also have any existing work in the middle. You can see I’ve been busy…or distracted, take your pick.
When we get the point of pulling data out to the canvas, we can see Tableau knows us me well and provides clues. Donald Norman, author of Design of Everyday Things, would approve. For my schedule, I’m going to make it a bit more machine readable. My eyes like it, but Tableau doesn’t. First things first, pivot. We’re familiar with this idea from Tableau and we can rename the fields right then and there, just like we can in Tableau Desktop.
At first glance, there’s a lot going on here. So, let’s break down the interface with extremely technical jargon that’s Tableau Toddler approved.
The canvas idea is follows Tableau to a tee, as does the data piece. The canvas is where I toss my big steps as I flow through the piece.
What’s new to us is the Pick-It and Fix-It pieces.
- Pick-It is our new card interface for our steps. This is where you pick various things about the step, such as pivoting, joining, and tracking calculations.
- Fix-It (pronounced ‘fick-it’ in my house) is where – hold your hat for this one – you can interact directly with the data and change it. Some of the fancy kids call this the profile pane. We’ll dig more into this shortly after I find my socks…
Meanwhile, back at the canvas, I have a few main options I care about.
- Steps are usually where I clean the data using calculations, grouping, splitting, and deleting. I’ll do most of this work in the fix it area directly.
- Aggregates are where I may roll the data up to a higher level or grain. If I’m too lazy to dismiss a bunch of fields, I’ll also use aggregate to pick a few columns, rather than dismiss a lot of them.
- Pivot, in the Tableau sense, flattens data that came pre-shaped in a wide way I don’t like. Most of the time, these are Excel sources that are aggregations of a nice tidy database in some abyss I’ll never be able to access. Data warehouse people will argue about ‘pivot’ in a Tableau sense, but here, you’re kicking columns to rows. Rows make us happy…usually.
- Join adds more columns based on a shared field. We’ll look at this more.
- Union adds more rows. Consider it like append. Just make sure you have columns that match or you’ll be very good friends with ‘null’. Null gets dull after awhile.
- Outputs are how we kick data out to Tableau (TDE or Hyper) or CSV.
As we start pulling out data, Tableau Prep makes nice tracks. It’s inspired from subway maps, and brings this experience to life in the form of tidy, colored lines and pills. It keeps a nice log of all the changes and lets me interact directly with the data.
I didn’t even realize how many variations I had in my scheduled hours until I pulled it into Prep. As a human, I read hyphens and to as the exact same. Machines don’t. I can edit these manually or make a calculation. Since this is a one-time job, I’ve edited it manually. I can also see what rows are affected and who. Apparently, I only make this mistake when scheduling Rey.
As I tidy, my changes are logged at the top of the step, as well as below. I can hover and get details at a glance or click. The big actions I take with my manual schedule are splitting to get a start and end, creating time stamps to get a duration in minutes, and consolidating down my fields. The end result of this stage looks like this:
You can see where I’ve renamed things, filtered, grouped, and created calculations. I can also remove calculations that I’ve made and used once I no longer need them (Start and End fields that were datetime, that I only needed to get minutes). Prune away, my friends, as each change is made in real-time and you don’t have to make extra steps to prevent issues. You can also name steps, which is a good way to make great friends later.
I can now see I have 3 main types of shifts, they run either 480 minutes or 540, and that Leia and Luke are my sole full-timers. Now to bring in the rest of the data…
In this second piece, I’ve pulled out my data, making a column for each time stamp. I also aggregate up to every worked day, date, and employee, thus doing the very thing I usually complain about. Go figure…we all have our days.
This format (1 row per employee per day) lets me do the very specific analysis I want to do – shift analysis! So, I can join on day and employee and get pretty close to ready to analysis in Tableau.
One thing I really like is Tableau Prep’s join experience. I’d love to see some of this come into Desktop.
I can see I have a total of 20 rows on one side and 19 on the other, of which 16 match. The remain 4 and 3 are called out in red and I can see where I’ll have gaps. For my purposes – making sure people who were schedule worked – this is exactly what I want to call out and find.
A little bit more shaping, such as creating employee and day fields with no nulls, and this thing is ready to plug into Tableau Desktop. Want to see more? Steal it!
So, how the EXCEL do you start with Prep?
- In the floating vs tiled debated, Prep is totally tiled. Keep this in mind if you float.
- Think about what you want to do:
- Do you need more rows? Pivot.
- Do you need less rows? Clean, calc out the columns, then aggregate.
- Do you need more columns? Add a step and calculate to your heart’s content. Also, edit the data directly if you want to group.
- Do you want data from two different places?
- Join if you want more columns.
- Union if you want more rows and the field names align. If not, you’ll find null gets dull.
- Unsure? Start a new branch or step.
- Experiment as needed. Delete freely.
- Keep in mind, all the normal rules apply. Data is cute and furry when it’s controlled. Pour water on it, feed it at night, cartesian your join, and you end up with data gremlins. Data gremlins are rarely cute. Data that multiplies by a factor of X for no clear reason is usually bad news.
- Be kind and note what 1 row is. Also, you can name every part of your steps. Clean 2 is a lot like ‘Sheet 2’.
- When in doubt, export out (to Tableau) at any time. You can even pop out a sample to Tableau at every step if you really want to.
And finally, things I will likely tell you it does, when it doesn’t (yet, as of 2018.1):
- Run automagically on server – it’s coming at some point, I hear.
- Allow you to copy/move (fields, steps, branches, etc) – If I were smart, I’d make this a suggestion on the forum. Maybe you will.
- Allow you to edit the colors – maybe it’s because I’m moderately biased against red and orange…
- Connect to / export to (pick something)
You can add to this list later when I tell you wrong. I will tell you wrong a lot.
Final piece – if you, like me, hate shaping your data, this will feel a bit like home. I’m still waiting for the day when autopilot comes to data shaping and tidies itself up. This is definitely a step in the right direction.
Suggestions on the Tableau Ideas forum that relate to your points:
Tableau Prep copy and paste input (comments say that it would be better to be able to copy paste for all steps): https://community.tableau.com/ideas/8720
Import or copy calculated fields from Tableau Desktop (yeah, that one is mine, but I’m sure I’m not the only one in this case):
https://community.tableau.com/ideas/8746