How to take make OFFSET & LIMIT with COUNT (OVER?) when having a subquery?
-
05-02-2021 - |
Pregunta
It appears I have a case I can't quite wrap my brains around. So coming here in hopes to find pointers to a query that maybe could be helpful to someone else too.
In the following, I have a query that functions correctly as far returning results goes but requires a second query that is the same as the one presented here but without OFFSET
and the output is just a COUNT(*)
of all of the rows.
I have two objectives:
- Write the query so that
COUNT(*)
is returned in the same query. Indeed I have been looking at help pieces such as the excellent SQL SERVER – How to get total row count from OFFSET / FETCH NEXT (Paging) with different ways of solving the problem, but then there's another piece... - Rewrite the join with a window function (e.g.
OVER(PARTITION BY
) or some more performant way as that query seem to generate anINDEX SCAN
andINDEX SEEK
on the table. The real query is a bit more complicated in theWHERE
part, but it looks to me even one scan could be enough if the query were a bit more straightforward so that theCOUNT
andMAX
could be had simultaneously with the outer query. Even this would be a win, but combined with having the overallCOUNT
would be even bigger.
Maybe I'm trying to chew a teeny bit more than I can chew currently, but on the other hand, maybe there is now a chance to learn something.
Here are the table and data
CREATE TABLE Temp
(
Id INT NOT NULL PRIMARY KEY,
Created INT NOT NULL,
ParentId INT,
SomeInfo INT NOT NULL,
GroupId INT NOT NULL
CONSTRAINT FK_Temp FOREIGN KEY(ParentId) REFERENCES Temp(Id)
);
-- Some root levels nodes.
INSERT INTO Temp VALUES(1, 1, NULL, 1, 1);
INSERT INTO Temp VALUES(2, 2, NULL, 2, 2);
INSERT INTO Temp VALUES(3, 3, NULL, 1, 3);
INSERT INTO Temp VALUES(13, 13, NULL, 1, 1);
-- First order child nodes.
INSERT INTO Temp VALUES(4, 4, 1, 2, 1);
INSERT INTO Temp VALUES(5, 5, 2, 1, 2);
INSERT INTO Temp VALUES(6, 6, 3, 2, 3);
-- Second order child nodes.
INSERT INTO Temp VALUES(7, 7, 4, 1, 1);
INSERT INTO Temp VALUES(8, 8, 5, 2, 2);
INSERT INTO Temp VALUES(9, 9, 6, 1, 3);
SELECT
Id,
newestTable.SomeInfo,
newestTable.Created,
CASE WHEN newestTable.RootCount > 1 THEN 1 ELSE 0 END AS IsMulti
FROM
Temp as originalTable
INNER JOIN
(
SELECT
SomeInfo,
Max(Created) AS Created,
Count(*) AS RootCount
FROM
Temp
WHERE ParentId IS NULL AND GroupId = 1
GROUP BY SomeInfo
) AS newestTable ON originalTable.SomeInfo = newestTable.SomeInfo AND originalTable.Created = newestTable.Created
/*WHERE
(
originalTable.SomeInfo = 1
)*/
ORDER BY newestTable.Created ASC
OFFSET 0 ROWS FETCH NEXT 5 ROWS ONLY;
P.S. Also How to apply outer limit offset and filters in the subquery to avoid grouping over the complete table used in subquery in Postgresql looks interesting.
<edit:
It looks like
SELECT
Id,
SomeInfo,
GroupId,
ParentId,
MAX(Created) OVER(PARTITION BY SomeInfo) AS Created,
COUNT(Id) OVER(PARTITION BY SomeInfo) AS RootCount,
CASE WHEN COUNT(Id) OVER(PARTITION BY SomeInfo) > 1 THEN 1 ELSE 0 END AS IsMulti
FROM
Temp
WHERE
(
GroupId = 1 AND ParentId IS NULL
)
ORDER BY Created ASC
OFFSET 0 ROWS FETCH NEXT 5 ROWS ONLY;
gets close to there. The problem is, though, there are now two result rows and it appears to me this is due to the original INNER JOIN
joining back to Temp
that cull it to one row. I wonder if there is a way to apply the conditions somehow either before or after the windowing to match more closely the original query. (And this isn't the same query, to be clear. There's just so little data, hence the queries look like being close to each other.)
Solución
So it looks like to me what you are missing is the "Return only the top Created
record for each instance". So you are getting all rows, and then watever its top Created
value is for the same SomeInfo
record. Unfortunately you can't just add the MAX(Created) = Created
into the base WHERE
clause.
If you just wrap the whole thing in a CTE
you can then just add a MAX(Created) = Created
into the WHERE
and get what you are looking for (not that i think CTE
's are the anwer for everything).
WITH CTE (ID, SomeInfo, GroupID, ParentID, Created, MaxCreated, RootCount, IsMulti)
AS
(
SELECT
Id,
SomeInfo,
GroupId,
ParentId,
Created,
MAX(Created) OVER(PARTITION BY SomeInfo) AS MaxCreated,
COUNT(Id) OVER(PARTITION BY SomeInfo) AS RootCount,
CASE WHEN COUNT(Id) OVER(PARTITION BY SomeInfo) > 1 THEN 1 ELSE 0 END AS IsMulti
FROM
Temp
)
SELECT ID, SomeInfo, GroupID, ParentID, MaxCreated AS Created, RootCount, IsMulti
FROM CTE
WHERE
(
GroupId = 1
AND ParentId IS NULL
AND Created = MaxCreated
)
ORDER BY MaxCreated ASC
OFFSET 0 ROWS FETCH NEXT 5 ROWS ONLY;
In my quick test it has the same execution plan and does not take any additional execution time (See execution plan added below). (now it is a small result set so it is probably something you will still need to test with.)
Hopefully that is more of what you are looking for.