This quarter I set myself the goal to learn more about Tableau Prep, and a key part of that has been participating in the weekly #PreppinData challenges. Something I’ve noticed, and have been super intrigued about, is that some participants have been posting one-step solutions. That wasn’t surprising during beginner month, but now I’m seeing one-step flows covering reasonably complex multi-step data transformations. Cool!
This week I took a deeper dive into one of those one-step solutions to learn, and share, how they’re being done.
Two quick things first: They say a magician never reveals their secrets, so my apologies in advance to those Tableau Prep magicians who’d rather not see the “magic” shared. Also a hat tip to Hiroaki Morita, who’s week 7 solution I picked as the example to dive into. If I don’t do the techniques justice that’s on me, not Hiroaki!
Right, what do we mean by a “single step” solution? Basically, a single step between the input and output, like so:
For comparison here is my solution to week 7, which has two inputs and then four steps before the output:
The first thing you may notice is that there is only one source: a UNION to pull the two required data sets together. That might seem strange to you as the two tables of data are quite different. I’m going to add a quick “clean” step off the union to take a look at what it’s producing:
What we’re seeing here, boxed in red, is that the two tables are indeed unioned together, and so we get two chunks of data. The first chunk has the columns and rows from the first table of data (the couples and when their relationship started), and the second chunk has the columns and rows from the second table (the gift relevant to each year of a relationship – this challenge was about finding the right gift for each couple based on the length of their relationship).
This union approach seems to be a hallmark of single step solutions; get all of the data into one place/table so that we can operate over it in a single step. It does leave us with a challenge though as it’s not structured or related in the sorts of ways we’re used to, so let’s see how that is dealt with in the single clean step.
There are four sub-steps (or changes) in that clean step:
The first change is to calculate a consistent field across the two sets of data – in this case a “number of valentines days as a couple”. Where Year is NULL (no data) we’ll use the relationship start date [A] Where Year is not NULL we’ll remove the st, nd, rd, th letters from that Year field to just leave a number [B]. In a multi-step solution this would be the consistent field to join the two data sets together.
Because we have all the data in one data set we don’t need to join, instead we need to “look up” the gift from chunk B and plug it into chunk A (where it is missing). And we want to do that based on the consistent field calculated above. So in the screenshot above we can see that we have “number of valentines days as a couple” = 4 on the top row (for “The Loves”) but no gift (it is null). But we also have a row in chunk B with “number of days…” = 4 where we do have the gift (“Fruit/Flowers”).
The next change handles the look up. It uses a FIXED level of detail expression to say “get me the maximum gift from across the whole data set, for this number of valentines days”. Aggregations like MAX will ignore NULLs so we in essence look up the gift from chunk B:
This is potentially another hallmark of one-step solutions then: lookup the value you need from further down the combined data set, rather than using joins.
The fourth change is to filter down to just chunk A, chunk B was only there for the look up after all:
After that the solution simply removes the unecesary columns to be ready to produce the output. Clever, eh!
For me the key points of this one-step solution were:
- Get all of the data into one place/table so that we can operate over it in a single step.
- Lookup the value you need from further down the combined data set, rather than using joins.
- Filter out the data we only pulled in for the lookup.
I hope that you found this as intriguing as I did. And if you’re interested to see more one-step magic keep an eye on the #prep1stepclub X/Twitter hash tag!
UPDATE: Following a really good discussion with another member of the data community I thought I’d add a few notes about why and when you might use a one-step solution. Our conclusion was that one of the strengths of Tableau Prep is it’s clear, easy to understand and maintain, visual layout. Maintainability is a really important consideration, so you may never* use a one-step solution in production, favouring clarity and maintainability instead. However for your own professional development one-step solutions present a useful challenge. They introduce a constraint that forces you to think about problems differently, and in all likelihood use product features that you wouldn’t normally use. That gives you good practice. And afterall, why do we climb hills and mountains that we could otherwise go around?
* Although I say “never” I should point out that I haven’t performance tested common one-step solution patterns against their more natural counterparts. Consequently there may be some benefits (or indeed further drawbacks) that I’m not yet aware of.