What is Cross Apply and Outer Apply

By Greg No comments

A few years ago I needed to join a table to the result of a table-value function and came across Cross Apply. Cross Apply and Outer Apply were introduced back in SQL 2005.

[code]
SELECT *
FROM Table1
CROSS APPLY dbo.tableFunction(table1.col2)
WHERE Table1.Col3 = Table1.Col3
[/code]

It’s simple to use.

Apply comes in two types, Cross Apply and Outer Apply. Cross Apply works like an Inner Join – the join key must be present in both datasets. Outer Apply works like a Left Join, It returns all results from the outer dataset even if there is nothing matching on the inner dataset.

Apply works by taking the outer dataset – Table 1 in the example above – and for every record it runs the command for the inner dataset – dbo.tableFunction in the example.

Often when we join tables together it is because there is a natural relationship between the two tables. When we use a Join, SQL tries to understand that relationship using statistics – it tries to answer the question “How many results am I likely to have for a particular query?” Usually it does this really well, but sometimes our data is skewed in a way that makes the statistics invalid.

Imagine the following tables:

[code]
CREATE TABLE table1(ID INT IDENTITY(1, 1) PRIMARY KEY, sometext VARCHAR(50));
CREATE TABLE table2(ID INT IDENTITY(1, 1) PRIMARY KEY, table1ID INT, someOtherText VARCHAR(50));
CREATE NONCLUSTERED INDEX table2_table1ID ON dbo.table2 ( table1ID )
[/code]

Pretty simple structure, you can add a foreign key if you like between Table1.ID = Table2.Table1ID, but it won’t make a difference for this example. Let’s say that you have 10 million rows in table1 and 100 million rows in table2.

Now do the following simple select:

[code]
SELECT table1.id, table1.sometext, table2.someOtherText
FROM dbo.table1
INNER JOIN dbo.table2 ON table2.table1ID = table1.ID
WHERE table1.sometext = ‘Something’ AND table2.SomeOtherText = ‘SomethingElse’
[/code]

 

It’ll work much like you’d expect, doing either a nested loops join or a merge join, and if the rows in table2 are fairly evenly distributed across table1 ID then it’ll run pretty well. But what if they’re not evenly distributed. What if some of the joins will return only 1 row, but others will return 1000’s of rows? SQL Stats aren’t going to work as well as we’d hope they would. And what if we also know that the value of table2.SomeOtherText is going to only contain a specific value either 1 or 0 times for any table1ID. Because we know this extra information, we can pass that on to SQL using CROSS APPLY.

The following query will give the exact same result as the INNER JOIN, but it will give SQL a bit of extra info, allowing it to also potentially use a better index (although not for our trivial example – we don’t have any other indexes):

[code]
SELECT table1.id, table1.sometext, table2.someOtherText
FROM dbo.table1
CROSS APPLY (SELECT TOP 1 table2.someOtherText, table2.table1ID
FROM dbo.table2
WHERE table1.ID = table2.table1ID) table2
[/code]

SQL will execute the select from table1, and then it will apply that dataset and find the results of the inner query. This extra info can allow SQL to query table2 in a more efficient way because it has more information about what it’s looking for.

APPLY does not replace JOIN. Generally JOIN gives the right answer, but when you start working with larger datasets, sometimes you need to give SQL a bit of extra info to work out how to get the data it needs, and APPLY can do that for you.

For more info, check out Alexander Kuznetsov blog post, which I found quite helpful.

Leave a Reply