I have worked in organisations where it was decided that
BIGINT should always be used for primary keys, because then we’d never run out of numbers. While the argument was true, they wouldn’t run out of numbers, the same would also have been true if they used a
TINYINT on some tables. So how do you pick the right number type?
What are my Options?
There are a handful of data types to store Integers:
|BIGINT||-9.2 Quintillion to +9.2 Quintillion||8 Bytes|
|INT||-2.1 Billion to +2.1 Billion||4 Bytes|
|SMALLINT||-32,768 to +32,768||2 Bytes|
|TINYINT||0 to 255||1 Bytes|
|BIGINT||0 to 1||1 Bit|
More details can be found on Microsoft Docs.
Why is it Important?
It’s worth asking, why should we care which type we pick. The obvious answer is we need to pick something big enough to hold our largest number. As my previous employer has argued, a
BIGINT will hold the biggest number, so just use that. The problem is that
BIGINT also requires more storage space.
If we use a
BIGINT instead of a
TINYINT then we use 8 times as much storage. For one number that doesn’t make a big difference, but you’re probably storing more than 1 number in your database. But the problem isn’t actually with storage, storage is cheap, it’s with speed. SQL Server stores the table data in 8KB pages. Every table, every index is boiled down to 8KB. When you request data from SQL Server, it reads these 8KB pages from disk into memory and then works out the answer. When you have to read from disk in an OLTP system, it’s slow.
We also need to remember that for a lot of SQL Servers Memory is a scarce resource. Every time you read another 8KB page into memory there is a chance that it will push something else out of memory.
Let’s say that you have a table that has 1 column in it called ID that has records for the numbers 1 to 10,000.
If ID is a
BIGINT then some basic maths says you can store 1000
BIGINTs on a page (simple maths: 8KB = 8000 bytes). You need 10 pages of data to be read from disk, stored in cache and then processed.
If ID is an
SMALLINT then some basic maths says you can store the data on 3 pages. A significantly smaller number of pages to be read from disk, stored in cached and processed.
That difference multiplies out as well in a relational database:
- Every index you create that references that column will be smaller (more data on a page means fewer hits to disk)
- If the column is the primary key then it will be included in all of the indexes anyway
- If you use it as a foreign key, then all of the other tables will also benefit
It doesn’t seem like a big difference when you isolate it. Our servers today don’t really care if something is 2 bytes or 8 bytes – it’s a small amount of data. But when you multiply those numbers out across a database that is 10’s or 100’s of GB’s and especially once you start talking about Terabyte databases, it adds up pretty quickly.