Rollup and Cube Part 1
I have recently come across these 2 keywords: ROLLUP and CUBE. These keywords allow you to get running totals when you group by some data. In this post we’ll have a look at how ROLLUP works, and in the next post we’ll see how CUBE works, and then in the final post we’ll look at how you might use this in the real world.
I have created a table with Population data for US Counties, and now I’m going to select all of the population data for Hawaii and Rhode Island since 2016:
SELECT County.CountyName
,County.State
,Population.Year
,SUM(Population.Population)
FROM Population
INNER JOIN County
ON County.Id = Population.CountyId
WHERE County.State IN ( 'HI', 'RI' )
AND Population.Year > 2016
GROUP BY Population.Year, County.State, County.CountyName
Simple, as expected it lists out all of the population counts. What if I wanted a subtotal of how many people were in each state, and how many people there are in total across all of the states for each year? This is where ROLLUP comes in:
SELECT County.CountyName
,County.State
,Population.Year
,SUM(Population.Population)
FROM Population
INNER JOIN County
ON County.Id = Population.CountyId
WHERE County.State IN ( 'HI', 'RI' )
AND Population.Year > 2016
GROUP BY ROLLUP(Population.Year, County.State, County.CountyName)
As you can see, the magic is in the GROUP BY clause, where I say ROLLUP and list out my grouping. The order here is important. What this tells SQL is that reading from left to right, I want a total row for each year, a total row for each state for each year, and a total for each county in each state for each year.
As you can see, we now have these null’s popping up, but with totals. Row 5 for example, tells us that in 2017 there were 1,427,461 people living in Hawaii. Row 11 tells us that there are 2,438,188 people living in Rhode Island and Hawaii in 2017. Row 22 tells us that there were 2,429,070 people living in Rhode Island and Hawaii in 2018, and finally row 23 tells us that in total there have been 4,867,268 people in 2017 and 2018. This last row is a bit useless for this data as the overlap of those people would be huge, but for something like sales data, this number could be useful.
As I mentioned before, the ordering of the columns in the group by is important. If I swap State and Year then I would get a total for each state in 2017 and 2018, and a total for each state across years.
What does it cost to add these totals? Not much. There is one extra Stream Aggregate in the Execution Plan as you can see here.
When would you use ROLLUP? It makes sense for reports that are being generated straight out of SQL, however most reporting tools would do that for you. Where I think it could make some more sense is in a program that allows users to apply their own criteria, essentially building their own queries dynamically, slicing data by different criteria. However, I think the CUBE keyword would make a much more appealing case for that.