Integer Types

There are many different ways to store integers in SQL, how do you know which is the best?

I have worked in organisations where it was decided that BIGINT should always be used for primary keys, because then we’d never run out of numbers. While the argument was true, they wouldn’t run out of numbers, the same would also have been true if they used a TINYINT on some tables. So how do you pick the right number type?

What are my Options?

There are a handful of data types to store Integers:

BIGINT -9.2 Quintillion to +9.2 Quintillion 8 Bytes
INT -2.1 Billion to +2.1 Billion 4 Bytes
SMALLINT -32,768 to +32,768 2 Bytes
TINYINT 0 to 255 1 Bytes
BIGINT 0 to 1 1 Bit

More details can be found on Microsoft Docs.

Why is it Important?

It’s worth asking, why should we care which type we pick. The obvious answer is we need to pick something big enough to hold our largest number. As my previous employer has argued, a BIGINT will hold the biggest number, so just use that. The problem is that BIGINT also requires more storage space.

If we use a BIGINT instead of a TINYINT then we use 8 times as much storage. For one number that doesn’t make a big difference, but you’re probably storing more than 1 number in your database. But the problem isn’t actually with storage, storage is cheap, it’s with speed. SQL Server stores the table data in 8KB pages. Every table, every index is boiled down to 8KB. When you request data from SQL Server, it reads these 8KB pages from disk into memory and then works out the answer. When you have to read from disk in an OLTP system, it’s slow.

We also need to remember that for a lot of SQL Servers Memory is a scarce resource. Every time you read another 8KB page into memory there is a chance that it will push something else out of memory.

Contrived Examples

Let’s say that you have a table that has 1 column in it called ID that has records for the numbers 1 to 10,000.

If ID is a BIGINT then some basic maths says you can store 1000 BIGINTs on a page (simple maths: 8KB = 8000 bytes). You need 10 pages of data to be read from disk, stored in cache and then processed.

If ID is an SMALLINT then some basic maths says you can store the data on 3 pages. A significantly smaller number of pages to be read from disk, stored in cached and processed.

That difference multiplies out as well in a relational database:

  • Every index you create that references that column will be smaller (more data on a page means fewer hits to disk)
  • If the column is the primary key then it will be included in all of the indexes anyway
  • If you use it as a foreign key, then all of the other tables will also benefit

 

It doesn’t seem like a big difference when you isolate it. Our servers today don’t really care if something is 2 bytes or 8 bytes – it’s a small amount of data. But when you multiply those numbers out across a database that is 10’s or 100’s of GB’s and especially once you start talking about Terabyte databases, it adds up pretty quickly.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s