In my last blog post I looked at a LOD equivalent of RANK / INDEX table calculations in Tableau. That approach was limited to ranking a very small range of whole numbers, and left me considering other options. This post outlines another LOD approach using spatial functions!
WARNING: As before I will stress that you should rarely need a LOD-based equivalent of rank or index, and can often use table calculations when you don’t think that you can. That said, there are scenarios where a LOD equivalent can be useful: onward use of the calc or use in spatial functions being the cases I’ve seen on the Tableau community forums. And it’s also a fun challenge!
To revisit why this is a tough (fun!) challenge we have to remember that LODs really only break out of the viz level of detail to a higher or lower level of detail. So if you’re working at a month level of detail, a LOD expression can get you the figure for that month’s quarter or year, or it can work with figures at the day level. However we would usually say that there is no LOD expression (or indeed level of detail) that equates to the rows before or after any given row. And that makes finding an index or rank difficult with LODs.
The previous solution I wrote about takes any value and switches on a 1 at that value’s position in a string of 1s and 0s. So 5 = 10000 and 3 = 00100. These strings of 1s and 0s can be considered as a binary number. And that can be summed up using a LOD. For 5 and 3 the sum gives us 10100 (in binary). And from this we can count all of the 1s including and to the right of the n-th digit. So the number of 1s including and to the right of the 3rd digit = 1, and the number including and to the right of the 5th digit = 2. Hence 3 = rank/index of 1, and 5 = rank/index of 2.
As noted about this approach is constrained to quite small ranges of whole numbers. However it does give us a generalised approach to solving our problem with LODs. What we need is an alternative way to add up all the values in a range, such that we can still count how many there are “including and before” any particular value.
This is where spatial functions come in. We can wrap spatial functions into LODs, and we can do things like UNION and COLLECT to combine spatial values, and we can do things like INTERSECT to find overlaps.
Approach using LODs and spatial functions
Here’s how we can stitch this together for a LOD equivalent of RANK_DENSE (which is the same as INDEX when there are only unique values):
- Convert the values (that we want to rank) to a latitude between 0 and 90
- Convert these latitudes to buffered points (that have an area) along longitude 0
- Use a LOD to COLLECT all these buffered points (circles) together
- Also convert each value’s latitude to a similarly buffered line from that latitude back to 0
- INTERSECT the buffered line with the LOD COLLECT of buffered points
- This leaves us with just the circle equivalents of all the values up to and including this one!
- We can’t count these (as far as I know!) but we can get the combined area
- And we can divide that area by the known area of a single buffered point
- Which gives us the equivalent of a count, or RANK_DENSE
- And for ranges with only unique values, this will be the same as INDEX
Illustration
To illustrate here is a table of sales by sub-category using sample superstore data, along with INDEX and RANK_DENSE using table calcs, and a LOD version.
Art is ranked fourth here. If we convert the sales values to circles between latitude 0 and 90. And also draw a line from Art back to latidtude 0 (buffered to the same radius as the circles) then we can see that this line intersects with four of the circles, equating to the rank/index of 4 for art:
Calculations to replicate RANK_DENSE
Let’s take a look at the calculations to replicate RANK_DENSE. For this example I’m using data with non-unique values:
I’m going to use INCLUDE and EXCLUDE LODS, but you can try with FIXED too. Just remember that if you use FIXED you’ll need to account for any other filters/fields on your view (either adding filters to context or including the dimensions that you need to fix by), otherwise the FIXED LOD will bust out beyond the data you want to consider – e.g. include years that you wanted to filter out.
First up some base calcs that will be needed regardless of which version of RANK that we’re trying to replicate:
We need to know the minimum and maximum values in the range as these will end up equating to latitude 0 and 90…
In these calculations Sub-Category is the thing I want to rank and Sales is the measure I want to rank by. You will replace these with the dimension and measure you want to use from your data.
BASE (a) min
{EXCLUDE [Sub-Category]: MIN( {INCLUDE [Sub-Category]: SUM([Total Sales])} )}
BASE (b) max
{EXCLUDE [Sub-Category]: MAX( {INCLUDE [Sub-Category]: SUM([Total Sales])} )}
BASE (c) lat
( SUM([Total Sales]) - MIN([BASE (a) min]) ) / ( MIN([BASE (b) max]) - MIN([BASE (a) min]) ) * 90
Next up we convert the latitudes to a point, and we define a buffer radius to get a circle. Note that you may need to tweak this buffer radius if you’re finding that your values, when converted to latitudes and buffered are overlapping too much and giving you incorrect results.
BASE (d) point
MAKEPOINT([BASE (c) lat],0)
BASE (e) buffer m
10
Now we’re in to the RANK_DENSE specifics. We want a buffered circle, and a buffered line back to lat 0, per value. One thing to note here is that some of these spatial functions count as aggregations. And that means they won’t accept aggregated values, so we have to wrap any aggregations in another LOD first – hence the extra INCLUDE in the next calc.
RANKD (g) circle
COLLECT( {INCLUDE [Sub-Category]: BUFFER([BASE (d) point],[BASE (e) buffer m],'m') })
RANKD (h) line
BUFFER( MAKELINE( MAKEPOINT(0,0), [BASE (d) point] ), [BASE (e) buffer m], 'm')
The penultimate step is to return the overall collection of circles that intersect with a specific value’s line. To do this we use the INTERSECTION spatial function. And we use an EXCLUDE LOD on the circles and an INCLUDE LOD on the line to ensure one is the overall set and the other is specific to the value.
RANKD (i) intersect
INTERSECTION( {INCLUDE [Sub-Category]:[RANKD (h) line]}, {EXCLUDE [Sub-Category]:[RANKD (g) circle]} )
That leaves the final step being to convert the circles up to and including this specific value that we’re left with back into a count. I don’t think we can count the circles with the spatial functions that Tableau has, but we can return the AREA of the collection of circles, and we can then divide that area by the known area of a single circle (“pi r squared” – where r is the buffer)
RANKD!
1 + FLOOR(ZN( AREA([RANKD (i) intersect],'m') / ( PI() * SQUARE([BASE (e) buffer m]) ) ))
The use of FLOOR here is to cater for a little bit of overlap in values that are close enough together when converted to latitudes that the buffered circles slightly overlap. And this is the RANK_DENSE!
What about RANK_MODIFIED
In the screenshot above I also included a LOD version of RANK_MODIFED. The approach for this is a little more complex.
It involves a FIXED LOD to determine how many occurences there are of any given value (in my case SUM of Sales per Sub-Category):
BASE (f) num same
{FIXED {FIXED [Sub-Category]:SUM([Total Sales])}: COUNTD([Sub-Category]) }
Working out the radius of a circle whose area would be that number of times larger than the circle for a single occurence of the value:
RANKM (g) radius
// where r = normal radius // what R (radius) do we need for a circle with N times the area // pi*r*r*N = pi*R*R // r*r*N = R*R // R = SQRT(r*r*N) SQRT( SQUARE([BASE (e) buffer m]) * [BASE (f) num same])
We then make our buffered points (circles) use this radius, and our buffered line use the max of these radiuses (radii?)
RANKM (h) circle
COLLECT( {INCLUDE [Sub-Category]: BUFFER([BASE (d) point],AVG([RANKM (g) radius]),'m') })
RANKM (i) line – note the FIXED MAX for radius here
BUFFER( MAKELINE( MAKEPOINT(0,0), [BASE (d) point] ), MAX({FIXED: MAX([RANKM (g) radius])}), // line buffer max of circle buffers 'm')
The intersection and final calc are then pretty much the same as the rank dense approach. And because the values with multiple occurences have bigger circles, the final divsion by the area for a single occurence gives us the RANK_MODIFIED.
RANKM (j) intersect
INTERSECTION( {INCLUDE [Sub-Category]:[RANKM (i) line]}, {EXCLUDE [Sub-Category]:[RANKM (h) circle]} )
RANKM!
1 + FLOOR(ZN( AREA([RANKM (j) intersect],'m') / ( PI() * SQUARE([BASE (e) buffer m]) ) ))
RANK_MODIFIED … LOD-ified!
What about INDEX?
If your values are all unique then either of the above approaches will give you the equivalent of INDEX too – pick the first / simpler one!
However if your values are more like my example, i.e more than one occurence of some values, and you were hoping for a LOD equivalent of INDEX, then…
Sorry! That’s eluding me at the moment.
Conceptually you “just” need to subtract something off the RANK_MODIFIED calc to spread the values back out into a unique index. But, the value you need to substract is itself an index, and in the example above (ranking sub categories) we now need an index across a text field, which this solution doesn’t cater for! In some ways it doesn’t need to be an index, just a unique value for each row shown. And I’d love to hear your ideas on this final step!
Caveats and closing remarks
A couple of caveats with the above approach if you do think that you need to use it.
Firstly you should assess performance for your data. The use of spatial functions and LODs is going to introduce additional work for Tableau to do.
Secondly really do sense check if you need a LOD-based approach in the first place. Often you can still use table calcs in cases you think that you might not be able to.
And finally if you got this far I hope that you found it as fun a challenge as I did. If nothing else it’s enjoyable to test the limits of Tableau. Here’s hoping that in the future we get inbuilt functionality like LEAD and LAG in SQL!