How the EXCEL do I shape that?! Data Grains for (Wholesome) Analysis

When I started working with data, at some point, people started asking about grain. Now, I eat a variety of whole grains, I know wood has a grain you’re either supposed to cut along or not cut along, but I had no idea about this data grain people mentioned. It sure didn’t line up with any grain I knew.

Grains like rice, oatmeal, and buckwheat shown. For some reason, there's also black eyed peas.
Grains (and legumes) I know. Photo by FOODISM360 on Unsplash

You see, while Excel may input data with a grain, it doesn’t care that much about it. You can stitch together whatever Frankenstein monster you want with data, because you progressively shape the data with the analysis. Need data aggregated a certain way? Make a pivot table, GETPIVOTDATA, and go. Tools like SPSS favor columns for analysis with your row being the single comparison point. If you’re analyzing employees, you can have a million columns, all at the employee level. BOOM, done.

At some point, though, you start asking harder questions. You go from a hardcore data crossfitter to an Olympic-level athlete. Sure, you feel stacked and unrivaled, but data heroics can only scale so far. Enter Tableau, a tool designed for visualization and exploration. We get in, we analyze, and it’s brilliant. Other times, we pull in a data set and scratch our heads: why does this seem so hard?

The secret often lies with those pesky grains we can’t get enough of.

Whole Grain Goodness

We’ll look at 2 common scenarios. For once, we’ll start with good news. Here’s timeclock data where we have 3 columns: employee, type of punch, and timestamp. It’s the type of tall skinny data source Tableau absolutely loves. I even get real timestamps, which is near mouthwatering.

Leia, Fin, Yoda, Luke, and Rey all work for me. Punches show In, Lunch out, Lunch in, and Out with timestamps that include both date and time.

Here comes the grain part!

"Timeclock" written out with a blue line coming from it. On the blue line is "1 row is a badge swipe." Above this line are: person, day, time, and type.

One badge swipe (or punch if you’re old-school like me) is a row in this data set. Ideally, in a day, a person has 4 rows: in, lunch out, lunch in, and out with an exact time of these stamps. My lowest analytical grain is the swipe. I can aggregate up to the person, day, time, or type.

This data works because the lowest level I want to analyze is the grain.

Let’s go to the bad news: human curated data that’s not data, but a presentation table.

Schedule for my 5 employees. I have Monday through Sunday across the top and schedules inconsistently added like so: "9:00 AM to 5 PM" and other variations.

This classic schedule type is terrible as-is for source data. You’ll want to reshape this in Prep, but here’s why

"Schedule" written out with a blue line coming from it. On the blue line is "1 row is a person." Above this line is person and below it are day, start, and end times.

All the good stuff – half of it being things I’d need to join or use for calculations – are buried below the grain. If I want insight from this, I’m going to spend a ton of time digging, fighting, and doing all kinds of gymnastics.

This data fails because all my insight is buried and hard to unearth.

We’ll want to reshape this data (and can in Prep as shown here). But first, we have to decide how.

Picking Grains

In this analysis, what I’m comparing is schedule adherence. I don’t schedule lunches (gasp!), only the day’s work. I expect people will take a lunch, but frankly, I’m less concerned when, as long as my customers are being served. This is represented quietly in the data I have. I’m also generating that schedule manually because my shop is super small. I only have 5 employees, all conveniently recruited from Star Wars.

The lowest level of analysis I can get from my schedule is the shift.

Now, I can also make my life easier, since I’m reshaping the data anyway. My time stamp data allows me to tally in minutes. I’ll want those minutes to get a sense of how well people are adhering to the schedule. (Spoiler alert: they aren’t.)

Once it’s reshaped, we’ll have the schedule data with employee, day of the week, in and out times, and can even add a scheduled minutes column. Our data grain for schedule moves from person to a scheduled shift per person.

"Schedule" written out with a blue line coming from it. On the blue line is "1 row is a person." Above this line are person, day, start, end, and total minutes.

Now that my schedule is done, I can move to time punch data. But wait, wasn’t it clean?!

Some whole grains pair well together. Others, depending on the meal, don’t: one ends up soggy while the other is hard as rocks (I’m looking at you buckwheat). While my schedule is at the shift level, my timeclock data grain is still lower. Using relationships, I could relate these, do a few calculations, and do my analysis. It’s a viable option using the data model that I’ll explore in-depth at another time.

Comparison of modified schedule and timeclock data. The grains are shown at slightly different levels. Person and day are linked on both sides, but a large "no" symbol shows start time, end time, and minutes on the schedule side don't line up with type and time on the punch side.

For this analysis, I’m going to bring the time clock grain up a level to simplify my data source and to do a join in Prep. That way, in Tableau Desktop, the data comes in as one single unified source. It’s more limiting, but it’s a bit more straightforward for the analysis I plan to do.

Comparison of modified schedule and timeclock data. The grains are shown in alignment at the shift level with all key fields matching. Additional data like minutes, lunches are unmatched, but we have parity. Hooray!

Now we’re cooking!

Doing this reshaping on both sides allows the tables share the same grain, essential for physically joining the data. They enrich each other without creating data gremlins, or what the data shaping kids call Cartesian effects. Here’s the beauty: this is one option of several I could do.

Now, to finish this oatmeal I started…

Beautiful bowl of oatmeal with random bits strewn around it. It's pretty in the photo but I wouldn't be happy if someone did this in my house.
Photo by Margarita Zueva on Unsplash