My Favourite, yet most Pointless, SQL Datatype
This month, for T-SQL Tuesday, Brent Ozar has asked us to write about our favourite data types.
There is a list of data types over in the docs, and most of them have a use. Numbers and dates as I’ve already mentioned, strings obviously need to go in (either as extended ASCII or Unicode), and sometimes you even need to be saving Binary types in the database. I’ve never understood the point of Money over Decimal, but there are also plenty of people with an opinion.
If you read through the docs on data types, you hit the “Other data types” section, and you start scratching your head: cursor, hierarchyid, rowversion. I’m sure XML will have a special place for some (love or hate).
The datatype that I think is the coolest has to be Spatial Geography. Under the covers, it’s probably the same as Geometry, but no where near as much fun. What is Geography? It’s what it sounds like – a way to store Latitude and Longitude data that will let you query it back again and plot it on a map, or measure distances from it.
If you get yourself a list of city co-ordinates and load them up into a table, then SSMS will even plot them out for you so you can see them:
This is all great fun. You can also measure distances between two locations using the function STDistance:
SELECT TOP 100
origin.City AS Origin,
dest.City AS Dest,
origin.Coordinates.STDistance(dest.Coordinates)/1000 AS Distance
FROM CitiesWorld AS origin
INNER JOIN CitiesWorld AS dest
ON 1=1
WHERE origin.City = 'San Francisco'
This is all a lot of fun, but like every good toy, none of this comes for free.
The Geography type is stored as a weird binary string. And this presents problems once you want to scale. You want to know all of the locations between 2 lines of longitude? You’ll need to scan the whole table. You want to find all of the locations north of the equator? You’ll have to pull apart the geography object of every record to do that.
Under the hood, this is just a CLR object that Microsoft has built for you, which means it comes with all of the usual limitations of CLR Objects – you can’t index individual parts and the query can’t go parallel. So not only do you have to scan the entire table, you have to do it single threaded.
CLR Datatypes just make your SQL Server a really expensive Application Server. When you have to pay an expensive license for each core of SQL Server, you want to use as few as possible, and leave it to do the work that it does best.
I don’t think I’ve found a use case where CLR data types make sense, in fact I usually see the CLR data type causing the problems.
So as much fun as it is to play with the geography data type, and as cool as the spatial data can look in SSMS, I cannot think of a time when this datatype would be a good idea. I’d rather store Latitude and Longitude as decimal fields and have an application layer do the hard lifting, where I can send things parallel, where I can do some maths to work out the locations I want and index them if I need to.
If you’ve got a use for Geography, or you just want to tell me I’m wrong, I’d love to hear from you below.
Rob Farley
Mar 3, 2021, 11:22 amI find that the indexes over spatial types make them really handy for finding the ‘nearest’ point. If you need to be able to look for things just between within a particular latitude or longitude, then you can have a calculated column holding the Lat or Long property. But spatial is also great for areas, with useful methods for seeing whether something is inside an area or not. When you’re dealing with data about places, it’s definitely a lot more useful than storing the coordinates as two separate values.
Greg
Mar 3, 2021, 11:37 amThanks for the thoughts. I’d not considered indexing on a calculated column for this before.
The Geography Data Type – Curated SQL
Mar 3, 2021, 12:05 am[…] Greg Dodd talks about the GEOGRAPHY data type: […]
Cav
Mar 3, 2021, 10:12 amNow I want a weirdbinarystring data type.