How the EXCEL do I shape that?! Data Grains for (Wholesome) Analysis
When I started working with data, at some point, people started asking about grain. Now, I eat a variety of whole grains, I know wood has a grain you’re either supposed to cut along or not cut along, but I had no idea about this data grain people mentioned. It sure didn’t line up with any grain I knew.
You see, while Excel may input data with a grain, it doesn’t care that much about it. You can stitch together whatever Frankenstein monster you want with data, because you progressively shape the data with the analysis. Need data aggregated a certain way? Make a pivot table, GETPIVOTDATA, and go. Tools like SPSS favor columns for analysis with your row being the single comparison point. If you’re analyzing employees, you can have a million columns, all at the employee level. BOOM, done.
At some point, though, you start asking harder questions. You go from a hardcore data crossfitter to an Olympic-level athlete. Sure, you feel stacked and unrivaled, but data heroics can only scale so far. Enter Tableau, a tool designed for visualization and exploration. We get in, we analyze, and it’s brilliant. Other times, we pull in a data set and scratch our heads: why does this seem so hard?
The secret often lies with those pesky grains we can’t get enough of.
Whole Grain Goodness
We’ll look at 2 common scenarios. For once, we’ll start with good news. Here’s timeclock data where we have 3 columns: employee, type of punch, and timestamp. It’s the type of tall skinny data source Tableau absolutely loves. I even get real timestamps, which is near mouthwatering.
Here comes the grain part!
One badge swipe (or punch if you’re old-school like me) is a row in this data set. Ideally, in a day, a person has 4 rows: in, lunch out, lunch in, and out with an exact time of these stamps. My lowest analytical grain is the swipe. I can aggregate up to the person, day, time, or type.
Let’s go to the bad news: human curated data that’s not data, but a presentation table.
This classic schedule type is terrible as-is for source data. You’ll want to reshape this in Prep, but here’s why…
All the good stuff – half of it being things I’d need to join or use for calculations – are buried below the grain. If I want insight from this, I’m going to spend a ton of time digging, fighting, and doing all kinds of gymnastics.
We’ll want to reshape this data (and can in Prep as shown here). But first, we have to decide how.
Picking Grains
In this analysis, what I’m comparing is schedule adherence. I don’t schedule lunches (gasp!), only the day’s work. I expect people will take a lunch, but frankly, I’m less concerned when, as long as my customers are being served. This is represented quietly in the data I have. I’m also generating that schedule manually because my shop is super small. I only have 5 employees, all conveniently recruited from Star Wars.
Now, I can also make my life easier, since I’m reshaping the data anyway. My time stamp data allows me to tally in minutes. I’ll want those minutes to get a sense of how well people are adhering to the schedule. (Spoiler alert: they aren’t.)
Once it’s reshaped, we’ll have the schedule data with employee, day of the week, in and out times, and can even add a scheduled minutes column. Our data grain for schedule moves from person to a scheduled shift per person.
Now that my schedule is done, I can move to time punch data. But wait, wasn’t it clean?!
Some whole grains pair well together. Others, depending on the meal, don’t: one ends up soggy while the other is hard as rocks (I’m looking at you buckwheat). While my schedule is at the shift level, my timeclock data grain is still lower. Using relationships, I could relate these, do a few calculations, and do my analysis. It’s a viable option using the data model that I’ll explore in-depth at another time.
For this analysis, I’m going to bring the time clock grain up a level to simplify my data source and to do a join in Prep. That way, in Tableau Desktop, the data comes in as one single unified source. It’s more limiting, but it’s a bit more straightforward for the analysis I plan to do.
Now we’re cooking!
Doing this reshaping on both sides allows the tables share the same grain, essential for physically joining the data. They enrich each other without creating data gremlins, or what the data shaping kids call Cartesian effects. Here’s the beauty: this is one option of several I could do.
Now, to finish this oatmeal I started…