Tableau Prep and #PreppinData 2024 week 6

The #PreppinData 2024 week 6 challenge was to find the latest salary per staff member and summarise their tax position given UK income tax bands.

We’re now into intermediary level challenges and so there are less prescriptive steps, and more options to solve the problem your way. For me the problem had two key parts: (1) get the latest row per staff member; and (2) the various calculations for salary and tax paid based on tax bands.

For part one I introduced an aggregage step to get the maximum row number per staff member, and then joined that back onto the input to return only the detals for that last row. I wondered if an alternative might have been to use a “level of detail” expression, with a filter, in a single step.

For part two I included all of the calculations in one “clean” step. I did hardcode the tax bands, and probably could have used a mapping table to allow for reuse in future years. I also hardcoded the sum of month [1] to [12], but perhaps could have found a way to allow for less months to be in the data set, in case the flow needed to be run mid-year.

Tableau Prep and #PreppinData 2024 week 5

The final week of beginner month for #PreppinData involved a bit more complexity around joins, calculations and outputs. On top of Tableau’s getting started tutorial #PreppinData has been a great way to get into Tableau Prep. I’ve invested about 12-15 hours of time and feel like I’ve got a good initial grasp of the product.

The challenge walk through provided less info on the “how”, which in some ways was quite nice as I felt more license to solve the problem my way. On the other hand I wonder if I should have made my flow less complex instead of aiming for one data set that could then be filtered down to the different outputs.

Tableau Prep and #PreppinData 2024 week 4

#PreppinData 2024 week covered using join types, in this case to understand which seats aren’t chosen given a seating plan and booking data.

I had a preconceived idea of the solution here, as I’m used to using LEFT OUTER JOIN in SQL and then having a WHERE clause that returns rows where the result from the right hand table IS NULL. So I was expecting to have a join and then a filter in my flow. However, Tableau Prep has some additional join types that let you return entries where there are only values in the left table, only values in the right table, or even a “not inner” join for entries only in the left or right but not in both. I gave the left only option a go and it did the job nicely! Great how you can click on the segment of the venn diagram representation of the join to select the type too.

Tableau Prep and #PreppinData 2024 week 3

Two lots of Tableau Prep practice this week. A forum question (see end of post) and #PreppinData 2024 week 3. The challenge for #PreppinData was to join targets from a spreadsheet, with a sheet per quarter, to our previous sales figures. And then to calculate difference from target. Similar union and clean up steps to previous challenges to get to the point where there are two data sets to join, and where we have consistent fields in both (first letter of class, and a month number). Then the join is pretty straightforward:

The forum question involved duplicating a set of rows – once for each value in a comma separated list in one of the columns. And then filtering out any cases where a value in another column appeared in the list. What I found interesting about Tableau Prep in this case was that I can specify a wildcard search for the pivot (B below), but for the initial split whilst I can select “all” it does still hardcode the number of columns split out (A below). So one of the tasks would robustly handle the introduction of more values in the comma separated list, but the other task would not … I think. The workarounds I came across seemed to be to work out how many values you could have in the string and specify enough splits to handle that number. I wonder if that could be improved…

Tableau Prep and #PreppinData 2024 week 2

Week two of getting to grips with Tableau Prep and I decided to countinue with #PreppingData. The team of Carl Allchin, Jenny Martin and Tom Prowse do a great job of picking challenges that gradually introduce you to functionality. This week covered unions, aggregation and reshaping data using pivots. I was particularly interested in pivots, as that’s a frequent challenge people have on the Tableau forums where we talk about data prep being a good option.

A more complex flow this time, and it probably took around 45-60 minutes. Below is my finished flow. And following that I share some more thoughts on this weeks use of Tableau Prep in the form of a quick “I like, I wish, I wonder” retro. Spoiler alert, there was a wow moment for me this week!

Week 2 solution:

I like:

I liked that I could copy and paste the first aggregation step and then change the type of aggregation and the description on the copy.
And there was a moment of genuine delight when I realised I could copy the pivot step and paste against a different input (with the same pattern). I loved that!

I wish:

The fields didn’t reorder between steps. This is probably my OCD talking though!
I could define a sort order for the output. It would make it easier to sense check the output (granted it is easy to open in Excel and sort though).

I wonder:

If I needed to aggregate before pivoting, as the pivot step allowed me to pick an aggregation?
if it would be valuable to have a quick create option for pivot “rows to columns”? It took me a while to spot the option to change the pivot direction (screenshot below), when my brain was looking for an “unpivot” option instead.

Getting started with Tableau Prep

I’ve been meaning to explore Tableau Prep for a while and finally took it for a test drive.

Many data professionals have experienced the need to prepare and clean data prior to analysis in tools like Tableau Desktop. Classic examples are: splitting data out of a single combined field; un-pivoting when each year of a measure is in a separate column; or maybe combining sales data from multiple differently formatted sources.

For me data preparation has usually been in our SQL Server / Data Warehouse environment, using SQL or enterprise integration and ETL tools like SSIS, WhereScape RED, or ADF & Databricks. For others it’s likely to have involved tidy up in Excel, a tool like EasyMorph, or even custom development in Python. But for analysts a number of these tools aren’t always accessible.

Tableau Prep is a tool that makes it easier and faster for analysts to combine, shape, and clean data for analysis within tools like Tableau. It provides a visual low-code environment for repeatable “flows” to get your data ship-shape. I’m often answering Tableau community forum questions with the comment that “yes you can do this in Desktop, but it would be easier in a data preparation stage”. Hence I’ve been keen to give it a go myself.

How did I get started? Three easy steps!

1. Downloaded Tableau Prep. You can download the free trial, then if you have a Creator license you can enter your license key. Chances are if you’re a Tableau Desktop user you have a Creator license.

2. Followed the Get Started with Tableau Prep Builder tutorial on the Tableau help site. I found that this took 1-2 hours, and was a nicely paced introduction to the UI and basic features. I like how it encouraged you to “give something a go” – there is always an undo button!

3. Completed the #PreppinData 2024 week 1 challenge. This took about 30 minutes as most of what I needed had been covered above!

PreppinData is one of the amazing projects that is run by, and for, Tableau community members. If you’ve participated in Makeover Monday, Workout Wednesday, Back to Viz Basics, or one of the other community projects, then the approach will be familiar to you. Each week the organisers post a challenge to social media and their site. The challenge comes with a currated input data set, some requirements for you to try in Tableau Prep (or other data prep tool of your choice), and the expected output. You can give the challenge a go, share you solution and see what other members of the community come up with. A great way to learn and practice!

Week 1 challenge – tidy and split some fake airline loyalty card data into two files ready for analysis.

My solution:

- One clean step to: split out the combined field using two custom splits (Flight Details on //, From-To on -); remove now unecesary fields; rename and fix up data types; and replace 1/0 with Yes/No.

- Two clean steps to produce the two filtered lists (card holders and non-card holders).

- Two corresponding output steps to output the required CSV files.

Summary screenshot below. Impressions so far … loving the tool! So much neater and repeatable than a series of Excel sheets, and seemed more accessible for analysts than other tools I’ve used.

LOD equivilant of LOOKUP (part 2)

In a previous post I walked through a LOD (level of detail) based alternative to the LOOKUP table calculation. In that example I was looking at sales last month, and in a recent Tableau Forums question someone was asking if it could be extended to determine average sales from the previous three months.

The answer was “yes” and you can check the whole thread on the Forums, including other options. The main difference from my previous post is that we needed to extend the group numbers from two to four, so that any given month falls into 4 “higher level groups” that we can target with a LOD. We then chose which LOD to use based on the index (_i).

Top M within each top N of categories

I recently helped with a Tableau community forums question where the user needed to:

filter to the top N categories based on a measure,
using a dense rank (so that categories with the same value had the same rank),
but list no more than M categories within each rank (based on another criteria)

This screenshot illustrates the requirement where we want to see the top 5 categories based on sales, where categories with the same sales have the same rank. But we only want to see the top 2 within each rank, based on lowest cost:

For example Vegatable, Fruit and Cereal are all ranked joint top on sales, however only Vegetable and Fruit should be shown as they are the lowest 2 based on cost.

My first attempt to help was a complex soluton involving PREVIOUS_VALUE (I’m a sucker for this function!) but on reflection I ended up providing an option based on more usual table calculations (RANK_…) and a level of detail expresssion. I’ll go through this option, and why it was necesary, over the rest of this blog post. And I might even share the PREVIOUS_VALUE one in a separate post for those interested!

First up the data set I used to test the solution looked like this:

You’ll note that there is some work to do to aggregate up to the category level.

Filtering to the top-N categories on sales is relatively straightforward. I use RANK_DENSE because I want categories with the same sales to be ranked the same, and to have no gaps in the ranking.

Rank – Sales =

RANK_DENSE(SUM([Sales]),'desc')

I can then drag that onto filters and filter to up to 5:

Things get a bit more complicated next when I want the lowest two on cost within each ranking. Your first thought might be to create another ranking based on cost that is partitioned by (restarts for every) ranking on sales. However when you get into those settings within the table calculation you’ll notice that you can’t use the existing table calculation. “Rank – Sales” is not in the list!

In fact no measures will show up in that partitioning list, yet we need to partition by the sum of sales or the ranking based on sum of sales. This is where the level of detail expression comes in. We can create a sum of sales that can be turned into a dimension.

Category Sales =

{FIXED [Category]: SUM([Sales])}

NB: Once you’ve created this, right click it and convert to dimension. Also if you need other filters on your view you’ll need to remember that the FIXED level of detail expression is busting out of that level of detail (filters) on the view. To get around that you can either use context filters (which apply prior to the level of detail expression being evaluated), or you can add the other dimensions you’re filtering on into the list of dimensions you’re fixing by (before the colon in the calc above).

Now that we have a dimension of category sales we can create another ranking table calculation that is partitioned by that, so that we get a ranking on cost that restarts for each group of categories based on sales.

Rank – Cost =

// Results are computed along Cateogry Sales, Category.
// The calculation restarts at zero for every Category Sales
RANK_UNIQUE(SUM([Cost]),'asc')

And from here we can construct a view that lets us filter to the top N categories on sales, and within each ranking the top M based on cost:

Job done! Do you have a better or simpler approach? If so I’d love to hear about it!

Tableau tile and hex maps

If you’ve ever built a filled map in Tableau where some of the areas are too small to stand out or label effectively, then you’ve probably come across hex maps as a solution. A hex map means that each area is the same size, regardless of the actual relative sizes.

For example, note how the Nelson (NSN) region in this filled map of New Zealand is relatively small compared to other regions, and hence hard to see and label:

Whereas in this hex map version Nelson (NSN) is the same size as the other regions:

In extreme cases the relative size and colour of the border compared to the area can even skew people’s perception of the colour. Also if you want to go a step further and show how data has changed over time (or some other dimension) per area, then you may have heard of tile maps as an option:

But how do you build these in Tableau?

Tile Map Template

I tend to start with a tile map design as it’s easier to visualise in a grid! I find the map I want to reproduce and then plot the areas on a grid in Excel:

Note: Because you’re scaling up small areas and reducing to tiles it won’t be possible to faithfully reproduce the geography. What you’re aiming for is a rough approximation that still looks like the overall geographical area you’re representing, whilst allowing you to show each area at the same size. For example, in New Zealand the West Coast (13 In the grid above) should stretch up further, and I compromise on whether Tasman (10 in the grid above) is south or west of Nelson (11).

Once I’m happy with my grid I create a sheet that has the list of areas (regions in this example) along with the coordinates from the grid:

Using this data I can plot the tiles in Tableau. Dragging X to Columns and Y to rows, I get a mark/area per region. And it is laid out as per the design in Excel:

Tile Map Example

Linking the tile map data set to another data set allows me to produce a set of small multiple charts arranged as per the tile map design. Let’s take a look!

First up define the data source:

Here I’m relating a data set that has NZ population by region and year, to the list of regions and their coordinates. I’ve ended up using a relationship calculation as part of the join on region name to replace “Wanganui” with “Whanganui”, as the two data sets use different spellings.

From here building the view is like the template, dragging X onto Columns and Y onto rows. But this time I also drag Year onto Columns and the measure I want to plot onto Rows (in this case the % change in population since an index year I’ve chosen). This gives me a separate chart of % change by year for each region, in its grid location (Southland in 1, 8 for example):

Note: in this example I’ve also created a fake measure typed straight onto Rows – AVG(0.5) – on a dual synchronized axis. I’m using this with a blank custom shape as a mark that I can attach a region code label to for the first year on each tile. There is also a reference band from -0.5 to the window max of my fake measure (-50% to +50%) on this axis to provide a light grey background where we have data.

When I hide each header, apply some formatting (blue background, grid lines turned off), and include on a dashboard the result is:

Hex Map Template & Example

A hex map is on one hand simpler, and on the other more complex. It’s simpler – in my opinion – because it doesn’t suit including a chart within each hexagon (unlike the tile map). But it’s more complex in that the hexagons aren’t on a simple grid. Let’s take Nelson (NSN), Marlborough (MBH) and Tasman (TAS) in the example below:

MBH is to the east of NSN and so is +1 across on the X axis.

And in my grid system it’s also +1 down on the Y axis, but it actually starts halfway down NSN and a little indented, so that the hexagons tessellate.

TAS is completely below NSN and so is +2 down in my grid system.

I work this out by trial and error by adapting the tile map coordinates to suit a hex map. Ending up with a list of regions with X and Y coordinates again:

Creating the view is also very similar – dragging X to Columns and Y to rows, hiding headers for the X and Y pills, formatting the colour and removing things like grid lines. But this time I use a custom hexagon shape. For the example below I’ve also used a data source similar to the tile map example above where I relate the hexmap coordinates Excel sheet to the regional data I want to show.

Note: in the screenshot above I’ve sized my window and adjusted the shape size in Tableau so that the hexagons are nicely aligned (same space on each side). This is quite fiddly, and you’ll have to play around to get it right when including your map on a dashboard. Including on a dashboard is a good way to retain control over the sizing. An alternative is to draw a polygon where you have much more control over spacing but that is outside of the scope of this blog!

The finished result on a dashboard:

Feel free to use the templateS

You’re very welcome to use these templates in your own work if they’re helpful.

The workbook is available on Tableau Public (templates and examples):

https://public.tableau.com/app/profile/steve7374/viz/NZTileandHexMaps/CONTENTS

And the Excel grid data on the Tableau Forums:

https://community.tableau.com/s/news/a0A8b00002HfQecEAF/new-zealand-tile-and-hex-map-templates

I’ll be following this post up with more APAC tile and hex maps. If you’ve got any requests let me know. I’d also love to explore a way to generate the tile and hex grids from actual the original region coordinates or shapes … but that is a much more complex problem!

LOD equivalent of LOOKUP

The LOOKUP table calculation in Tableau is really handy when you want to show or use a value from a previous row in the view. For example if you are showing sales per month and need to use the sales figure from the previous month to calculate month-on-month growth.

Sales and sales previous month

In the example above our calculation for “Sales last period” is:

LOOKUP(SUM([Sales],-1))

LOOKUP is great when the level of detail you want to look back through is in the view (e.g. order month is in the view if you want to lookup sales for the previous order month). But things get tricky when that isn’t the case. And people often hit constraints when they want to use the result of a table calculation like LOOKUP in another calculation / aggregation. It’s not unusual at this point for people to ask if they can use a level of detail / LOD calculation for looking up the value from a previous month, year, or category. And unfortunately the answer is almost always “you can’t use a LOD for that”, but can you? Spoiler alert: yes!

The trouble with LODs

LOD calculations allow you to break out of the level of detail that your calculation is working on. So if your calculation is working at the month level you can break out to the quarter level, to ask questions like how does this month compare to the quarter. In the example below we could use a LOD to calculate how much the sales for 2022-06 differ from the average for the quarter that that month is in; breaking out to a higher level grouping.

But if you’re after previous month there is no single dimension or higher level grouping that will tell you that; each month has a different previous month (unlike the case above where three months share the same common quarter). The previous month for 2022-06 is 2022-05, but 2022-06 is itself the previous month of 2022-07. Any one month is in two groups when it comes to determining previous months.

Possibly as you read that last sentence an idea will be forming! Let’s pursue that idea that there are two groups in play…

A LOD for LOOKUP(…,-1)

If we put each month into two groups which are offset from each other by a month then we can calculate two previous values, one of which will be right for odd numbered months and the other for even numbered months in a sequence of months. Let’s take a look at this in action:

Firstly we need an continguous sequence number for each row (in our case each month). With months that is easy as we can use a DATEDIFF to calculate the number of months since the first month in the data set.

_i =

DATEDIFF(
 'month',
 {MIN(DATETRUNC('month',[Order Date]))},
 DATETRUNC('month',[Order Date])
 )

Using that sequence number we can create the two offset groupings using integer division.

_iGroupA =

DIV([_i]-1,2)

_iGroupB =

DIV([_i],2)

This gives us a grouping for each combination of two months:

February will be in Group B when we want it’s previous month, and in Group A when it is the previous month.

As we now have a higher level grouping that we can target in a LOD, we create two LOD calculations to get the other value in the group…

_iPrevA =

SUM({FIXED [_iGroupA]: SUM([Sales])})
- 
SUM([Sales])

_iPrevB =

SUM({FIXED [_iGroupB]: SUM([Sales])})
- 
SUM([Sales])

These calculations get the sum of sales for each grouping of two months, and then subtract the sales for the month we’re in. Hence we end up with the sales for the other month in the grouping. Now we just need to work out which of these LODs to use for each month. To do that we switch between them…

iSwitch =

([_i]-1) % 2

The modulus operator here gives us the remainder of dividing by 2 and hence a number that flicks back and forth between 0 and 1. So…

Sales last month (LOD) =

IF MIN([_iSwitch])=1
THEN [_iPrevA]
ELSE [_iPrevB]
END

You can check out the workbook for this on my Tableau Public profile: LOD equivalent of LOOKUP.

I’ve used this approach for a number of gnarly Tableau Forum questions and I think I’ve seen similar crop up once or twice before but it’s not a particularly well known approach. Often people will fall back on solutions like self joins – and to be fair shaping your data to avoid having to jump through hoops like the above is not a bad idea. But if you do need it, I hope that you find it useful!

Bitfern

Archives: Tableau