Plain Subqueries
Subqueris
A
subquery is a SELECT statement that is nested within another T-SQL statement. A
subquery SELECT statement if executed independently of the T-SQL statement, in which
it is nested, will return a result set. Meaning a subquery SELECT statement can
standalone and is not depended on the statement in which it is nested. A subquery
SELECT statement can return any number of values, and can be found in, the column
list of a SELECT statement, a FROM, GROUP BY, HAVING, and/or ORDER BY clauses of a
T-SQL statement. A Subquery can also be used as a parameter to a function call.
Basically a subquery can be used anywhere an expression can be used.
Joining Virtual Tables
Joining virtual Tables is one of the most powerful solution you
can build with subqueries. Virtual means in this context, that the result set you are
joining is build on the fly. The following example shows, how to join a GROUP BY
result set with another, real table (Person).
SELECT P.id_person,
P.first_name,
P.last_name,
CONVERT(varchar(30), P.birth, 104),
A.id_council,
A.id_groupe,
A.numActivities
FROM Person P JOIN (SELECT id_person,
MIN(id_council) id_council,
MIN(id_groupe) id_groupe,
COUNT(*) numActivities
FROM Activity
GROUP BY id_person) A ON (A.id_person = P.id_person)
WHERE P.id_person NOT IN (SELECT id_person
FROM Activity
WHERE id_council != 5)
The virtual Table is referenced in the outer query by the alias
A and is joined with person_id. You can use the virtual table columns in the
outer query using the alias A. for example A.numActivities.
Joining more than one virtual Table (SQL Server)
The next example shows a very complex query using more than one
virtual table.
--
-- Declare Variables
--
DECLARE @LaufID BIGINT
DECLARE @AbrDatum DATETIME
DECLARE @CountLauf INT
--
-- Fill Variables
--
SELECT @LaufID = MAX(LaufID),
@AbrDatum = MAX(AbrDatum),
@CountLauf = COUNT(*)
FROM AbrLauf
WHERE BuchDatum >= CONVERT(datetime, @DatumVon, 104)
AND BuchDatum < DATEADD(day, 1, CONVERT(datetime, @DatumBis,
104))
--
-- Generate Report
--
SELECT P.Nr,
P.Name,
P.Vorname,
CASE R.Rat WHEN 1 THEN 'NR' WHEN 2 THEN 'SR'
ELSE NULL END Rat,
ISNULL(Entschaedigung.Betrag, 0)
EntschaedigungBetrag,
ISNULL(Vorsorge.Betrag, 0) VorsorgeBetrag,
ISNULL(Entschaedigung.Betrag, 0) +
ISNULL(Vorsorge.Betrag, 0) Total,
CONVERT(varchar(30), @DatumVon, 104)
DatumVon,
CONVERT(varchar(30), @DatumBis, 104)
DatumBis,
@LaufID LaufID,
CONVERT(varchar(30), @AbrDatum, 104)
AbrDatum,
@CountLauf CountLauf
FROM Person P
--
-- Now join the real Table P with the virtaul Table R ...
--
LEFT OUTER JOIN (SELECT M.PersonID,
M.Rat
FROM Ratsmitglied M
WHERE M.Eintritt = (SELECT MAX(MI.Eintritt)
FROM Ratsmitglied MI
WHERE MI.PersonID = M.PersonID)) R
ON (P.PersonID = R.PersonID)
--
-- ... then join Table P with the virtaul Table
'Entschaedigung'
--
LEFT OUTER JOIN (SELECT PersonID,
SUM(Betrag) Betrag
FROM ExportKreditor
WHERE ExportKreditorID IN (SELECT EAEK.ExportKreditorID
FROM EntAbrExportKreditor EAEK
JOIN EntAbr EA ON (EA.EntAbrID = EAEK.EntAbrID)
JOIN Abr A ON (A.AbrID = EA.AbrID)
JOIN AbrArt AA ON (AA.AbrArtID = A.AbrArtID)
WHERE AA.Abk = 'A')
AND SollHabenBez = 'H'
AND BuchDatum >= CONVERT(datetime, @DatumVon, 104)
AND BuchDatum < DATEADD(day, 1, CONVERT(datetime, @DatumBis, 104))
GROUP BY PersonID) Entschaedigung
ON (P.PersonID = Entschaedigung.PersonID)
--
-- ... then join Table P with the virtaul Table 'Vorsorge'
--
LEFT OUTER JOIN (SELECT PersonID,
SUM(Betrag) Betrag
FROM ExportKreditor
WHERE ExportKreditorID IN (SELECT EAEK.ExportKreditorID
FROM EntAbrExportKreditor EAEK
JOIN EntAbr EA ON (EA.EntAbrID = EAEK.EntAbrID)
JOIN Abr A ON (A.AbrID = EA.AbrID)
JOIN AbrArt AA ON (AA.AbrArtID = A.AbrArtID)
WHERE AA.Abk = 'V')
AND SollHabenBez = 'H'
AND BuchDatum >= CONVERT(datetime, @DatumVon, 104)
AND BuchDatum < DATEADD(day, 1, CONVERT(datetime, @DatumBis, 104))
GROUP BY PersonID) Vorsorge
ON (P.PersonID = Vorsorge.PersonID)
--
-- ... then the final WHERE Clause, based on the virtual
Tables
--
WHERE ISNULL(Entschaedigung.Betrag,
0) + ISNULL(Vorsorge.Betrag, 0) > 0
ORDER BY P.Name, P.Vorname, R.Rat
Use of a Subquery in the Column List of a SELECT Statement
Suppose you would like to see the last OrderID and the OrderDate
for the last order that was shipped to Paris. Along with that information, say you
would also like to see the OrderDate for the last order shipped regardless of the
ShipCity. In addition to this, you would also like to calculate the difference in
days between the two different OrderDates. Here is my T-SQL SELECT statement to
accomplish this:
SELECT TOP 1 OrderId,
CONVERT(CHAR(10), OrderDate,121)
Last_Paris_Order,
(SELECT
CONVERT(CHAR(10),MAX(OrderDate),121)
FROM
Northwind.dbo.Orders) Last_OrderDate,
DATEDIFF(dd,OrderDate,(SELECT MAX(OrderDate)
FROM Northwind.dbo.Orders)) Day_Diff
FROM Northwind.dbo.Orders
WHERE ShipCity = 'Paris'
ORDER BY OrderDate DESC
The above code contains two subqueries. The first subquery gets
the OrderDate for the last order shipped regardless of ShipCity, and the second
subquery calculates the number of days between the two different OrderDates.
Here we used the first subquery to return a column value in
the final result set. The second subquery was used as a parameter in a function call.
This subquery passed the "max(OrderDate)" date to the DATEDIFF function.
Use of a Subquery in the WHERE clause
A subquery can be used to control the records returned from a
SELECT by controlling which records pass the conditions of a WHERE clause. In this
case the results of the subquery would be used on one side of a WHERE clause
condition. Here is an example:
SELECT DISTINCT country
FROM Northwind.dbo.Customers
WHERE country NOT IN (SELECT DISTINCT country
FROM Northwind.dbo.Suppliers)
Here we have returned a list of
countries where customers live, but there is no supplier located in that
country. We suppose if you where trying to provide better
delivery time to customers, then you might target these countries to look for
additional suppliers.
Suppose a company would like to do some targeted marketing. This
targeted marketing would contact customers in the country with the fewest number of
orders. It is hoped that this targeted marketing will increase the overall sales in
the targeted country. Here is an example that uses a subquery to return the customer
contact information for the country with the fewest number of orders:
SELECT Country,
CompanyName,
ContactName,
ContactTitle,
Phone
FROM Northwind.dbo.Customers
WHERE country = (SELECT TOP 1 country
FROM Northwind.dbo.Customers C
JOIN Northwind.dbo.Orders O
ON C.CustomerId = O.CustomerID
GROUP BY country
ORDER BY count(*))
Here we have written a subquery that
joins the Customer and Orders Tables to determine the total number of orders for each
country. The subquery uses the "TOP 1" clause to return the country with the fewest
number of orders. The country with the fewest number of orders is then used in the
WHERE clause to determine which Customer Information will be displayed.
Use of a Subquery in the FROM clause
The FROM clause normally identifies the tables used in the T-SQL
statement. You can think of each of the tables identified in the FROM clause as a set
of records. Well, a subquery is just a set of records, and therefore can be used in
the FROM clause just like a table. Here is an example where a subquery is used in the
FROM clause of a SELECT statement:
SELECT au_lname,
au_fname,
title FROM (SELECT au_lname, au_fname, au_id
FROM pubs.dbo.authors
WHERE state = 'CA') as A
JOIN pubs.dbo.titleauthor ta ON A.au_id = ta.au_id
JOIN pubs.dbo.titles t ON ta.title_id = t.title_id
Here we have used a subquery to select
only the author record information, if the author's record has a state column equal
to "CA." We have named the set returned from this subquery
with a table alias of "A". WeI
can then use this alias elsewhere in the T-SQL statement to refer to the columns from
the subquery by prefixing them with an "A", as we
did in the "ON" clause of the "JOIN" criteria. Sometimes using a
subquery in the FROM clause reduces the size of the set that needs to be joined.
Reducing the number of records that have to be joined enhances the performance of
joining rows, and therefore speeds up the overall execution of a query.
Subquery in the FROM clause of an UPDATE statement:
SET NOCOUNT ON
CREATE TABLE x(
i INT IDENTITY,
a CHAR(1))
INSERT INTO x VALUES ('A')
INSERT INTO x VALUES ('B')
INSERT INTO x VALUES ('C')
INSERT INTO x VALUES ('D')
SELECT * FROM x
UPDATE x
SET a = b.a
FROM (SELECT MAX(a) AS a FROM x) b
WHERE I > 2
SELECT * FROM x
DROP TABLE x
Here we created a table named "x" that
has four rows. Then we proceeded to update the rows where
"i" was greater than 2 with the max value in column "a". We used a subquery in the FROM clause of the UPDATE statement to identity the
max value of column "a."
Use of a Subquery in the HAVING clause
In the following example, we used a
subquery to find the number of books a publisher has published where the publisher is
not located in the state of California. To accomplish this we used a subquery in a HAVING clause. Here is the code:
SELECT pub_name,
COUNT(*) bookcnt
FROM pubs.dbo.titles t
JOIN pubs.dbo.publishers p on t.pub_id = p.pub_id
GROUP BY pub_name
HAVING p.pub_name IN (SELECT pub_name
FROM pubs.dbo.publishers
WHERE state <> 'CA')
Here the subquery returns the pub_name
values for all publishers that have a state value not equal to "CA." The HAVING
condition then checks to see if the pub_name is in the set returned by my
subquery.
Correlated Subqueries
A correlated subquery is a SELECT statement nested inside another T-SQL statement,
which contains a reference to one or more columns in the outer query. Therefore, the
correlated subquery can be said to be dependent on the outer query. This is the main
difference between a correlated subquery and just a plain subquery. A plain subquery
is not dependent on the outer query, can be run independently of the outer query, and
will return a result set. A correlated subquery, since it is dependent on the outer
query will return a syntax errors if it is run by itself.
A correlated subquery will be executed many times while processing the T-SQL
statement that contains the correlated subquery. The correlated subquery will be run
once for each candidate row selected by the outer query. The outer query columns,
referenced in the correlated subquery, are replaced with values from the candidate
row prior to each execution. Depending on the results of the execution of the
correlated subquery, it will determine if the row of the outer query is returned in
the final result set.
Using a Correlated Subquery in a WHERE Clause
Suppose you want a report of all "OrderID's" where the customer did not purchase more
than 10% of the average quantity sold for a given product. This way you could review
these orders, and possibly contact the customers, to help determine if there was a
reason for the low quantity order. A correlated subquery in a WHERE clause can help
you produce this report. Here is a SELECT statement that produces the desired list of
"OrderID's":
SELECT DISTINCT OrderId
FROM Northwind.dbo.[Order Details] OD
WHERE Quantity > (SELECT AVG(Quantity) *
.1
FROM Northwind.dbo.[Order Details]
WHERE OD.ProductID = ProductID)
The correlated subquery in the above command is contained within the parenthesis
following the greater than sign in the WHERE clause above. Here you can see this
correlated subquery contains a reference to "OD.ProductID". This reference compares
the outer query's "ProductID" with the inner query's "ProductID". When this query is
executed, the SQL engine will execute the inner query, the correlated subquery, for
each "[Order Details]" record. This inner query will calculate the average "Quantity"
for the particular "ProductID" for the candidate row being processed in the outer
query. This correlated subquery determines if the inner query returns a value that
meets the condition of the WHERE clause. If it does, the row identified by the outer
query is placed in the record set that will be returned from the complete T-SQL
SELECT statement.
The code below is another example that uses a correlated subquery in the WHERE clause
to display the top two customers, based on the dollar amount associated with their
orders, per region. You might want to perform a query like this so you can reward
these customers, since they buy the most per region.
SELECT C1.CompanyName,
C1.ContactName,
C1.Address,
C1.City,
C1.Country,
C1.PostalCode
FROM Northwind.dbo.Customers C1
WHERE C1.CustomerID IN (SELECT TOP 2 C2.CustomerId
FROM Northwind.dbo.[Order Details] OD
JOIN Northwind.dbo.Orders O on OD.OrderId = O.OrderID
JOIN Northwind.dbo.Customers C2 on O.CustomerID = C2.CustomerId
WHERE C2.Region = C1.Region
GROUP BY C2.Region, C2.CustomerId
ORDER BY SUM(OD.UnitPrice * OD.Quantity * (1 - OD.Discount)) DESC)
ORDER BY C1.Region
Here you can see the inner query is a correlated subquery because it references
"C1", which is the table alias for the
"Northwind.DBO.Customers" table in the outer query. This inner query uses the
"Region" value to calculate the top two customers for the region associated with the
row being processed from the outer query. If the "CustomerID" of the outer query is
one of the top two customers, then the record is placed in the record set to be
returned.
Correlated Subquery in the HAVING Clause
Say your organizations wants to run a yearlong incentive program to increase revenue.
Therefore, they advertise to your customers that if each order they place, during the
year, is over $750 you will provide them a rebate at the end of the year at the rate
of $75 per order they place. Below is an example of how to calculate the rebate
amount. This example uses a correlated subquery in the HAVING clause to identify the
customers that qualify to receive the rebate.
SELECT C.CustomerID,
COUNT(*) * 75 Rebate
FROM Northwind.DBO.Customers C
JOIN Northwind.DBO.Orders O ON C.CustomerID = O.CustomerID
WHERE DATEPART(yy,OrderDate) = '1998'
GROUP BY C.CustomerId
HAVING 750 < ALL(SELECT SUM(UnitPrice * Quantity * (1 -
Discount))
FROM Northwind.DBO.Orders O
JOIN Northwind.DBO.[Order Details] OD ON O.OrderID = OD.OrderID
WHERE O.CustomerID = C.CustomerId
AND DATEPART(yy,O.OrderDate) = '1998'
GROUP BY O.OrderId)
By reviewing this query, you can see the correlated query in the HAVING clause to
calculate the total order amount for each customer order. We use the "CustomerID"
from the outer query and the year of the order "Datepart(yy,OrderDate)", to help
identify the Order records associated with each customer, that were placed the year
'1998'. For these associated records I am calculating the total order amount, for
each order, by summing up all the "[Order Details]" records, using the following
formula: sum(UnitPrice * Quantity * (1-Discount)). If each and every order for a
customer, for year 1998 has a total dollar amount greater than 750, I then calculate
the Rebate amount in the outer query using this formula "Count(*) * 75 ".
SQL Server's query engine will only execute the inner correlated subquery in the
HAVING clause for those customer records identified in the outer query, or basically
only those customer that placed orders in "1998".
Performing an Update Statement Using a Correlated Subquery
A correlated subquery can even be used in an update statement. Here is an example:
create table A(A int, S int)
create table B(A int, B int)
set nocount on
insert into A(A) values(1)
insert into A(A) values(2)
insert into A(A) values(3)
insert into B values(1,1)
insert into B values(2,1)
insert into B values(2,1)
insert into B values(3,1)
insert into B values(3,1)
insert into B values(3,1)
update A
set S = (select sum(B)
from
B
where
A.A = A group by A)
select * from A
drop table A,B
A
S
----------- -----------
1 1
2 2
3 3
In the query above, I used the correlated subquery to update column A in table A with
the sum of column B in table B for rows that have the same value in column A as the
row being updated.
Conclusion
A subquery and a correlated subquery are SELECT queries coded inside another query,
known as the outer query. The correlated subquery and the subquery help determine the
outcome of the result set returned by the complete query. A subquery, when executed
independent of the outer query, will return a result set, and is therefore not
dependent on the outer query. Where as, a correlated subquery cannot be executed
independently of the outer query because it uses one or more references to columns in
the outer query to determine the result set returned from the correlated subquery. I
hope that you now understand the different of subqueries and correlated subqueries,
and how they can be used in your T-SQL code.
|