Rollup and Cube Part 3
In the previous 2 posts we have looked at how Rollup and Cube work. This time we’re going to look at how you might implement these in the real world.
I don’t think ROLLUP has a great use in the real world (feel free to comment below if you have a real use for it). The best I can see is that you have SQL Server emailing the results of a query to someone and they want subtotals. Adding totals can be done by any reporting app, probably more efficiently than SQL will do it, and it’s probably easier to debug there as well.
I do, however, think CUBE has an amazing purpose.
As we saw in part 2, CUBE gives a result slicing the data by all of the different parts of a GROUP BY clause. At every company I’ve worked at, senior management wants access to the data to generate their own reports, slicing the data how they wish to get the results they want. At some places, this would mean that management would have to wait weeks to have a new report generated for them, just to find out that the data wasn’t really what they wanted anyway. At other places, tools like SSAS are deployed that do this exceptionally well, but at the expense of having to process the cube regularly and working on out of date data.
What if SQL Server could do that for you on live data?
CREATE VIEW dbo.PopulationCube
AS
SELECT County.CountyName
,County.State
,Population.Year
,SUM(Population.Population) AS Population
FROM Population
INNER JOIN County
ON County.Id = Population.CountyId
WHERE County.State IN ( 'HI', 'RI' )
AND Population.Year > 2016
GROUP BY CUBE(County.State, Population.Year, County.CountyName)
I’ve taken our query from the last post and turned it into a view. Depending on performance, we might want to persist this data by putting it into a table. Once we have this, we can now run queries like this to find the total of people in 2017:
SELECT *
FROM PopulationCube
WHERE PopulationCube.State IS NULL
AND CountyName IS NULL
AND PopulationCube.Year = 2017
Or the same query to find out the total population in Hawaii over the 2 years:
SELECT *
FROM PopulationCube
WHERE PopulationCube.State = 'HI'
AND CountyName IS NULL
AND PopulationCube.Year IS NULL
For each query, instead of having to generate a different group by and so on, we can simply choose the row(s) we want returned by adapting the WHERE clause, specifically by setting the parts we don’t want to slice by to null or not null.
We need to remember that if we’re using a view then we’re going to have a big overhead every time we call it, so if performance sucks, we might want to look at Column Store indexes or just dump it out into a table regularly.