查找某些年份奖项数量最高的电影-代码重复
-
21-12-2019 - |
题
我正在尝试编写一个查询(PostgreSQL)以获得"2012年度奖项数量最多的电影。"
我有以下表格:
CREATE TABLE Award(
ID_AWARD bigserial CONSTRAINT Award_pk PRIMARY KEY,
award_name VARCHAR(90),
category VARCHAR(90),
award_year integer,
CONSTRAINT award_unique UNIQUE (award_name, category, award_year));
CREATE TABLE AwardWinner(
ID_AWARD integer,
ID_ACTOR integer,
ID_MOVIE integer,
CONSTRAINT AwardWinner_pk PRIMARY KEY (ID_AWARD));
我写了下面的查询,它给出了正确的结果,但是我认为有很多代码重复。
select * from
(select id_movie, count(id_movie) as awards
from Award natural join awardwinner
where award_year = 2012 group by id_movie) as SUB
where awards = (select max(count) from
(select id_movie, count(id_movie)
from Award natural join awardwinner
where award_year = 2012 group by id_movie) as SUB2);
所以 SUB
和 SUB2
是完全相同的子查询。有没有更好的方法来做到这一点?
解决方案
获取所有获奖电影
SELECT id_movie, awards
FROM (
SELECT aw.id_movie, count(*) AS awards
,rank() OVER (ORDER BY count(aw.id_movie) DESC) AS rnk
FROM award a
JOIN awardwinner aw USING (id_award)
WHERE a.award_year = 2012
GROUP BY aw.id_movie
) sub
WHERE rnk = 1;
主要要点
这应该比迄今为止的建议更简单,更快。测试用
EXPLAIN ANALYZE
.在某些情况下,Cte有助于避免代码重复。但不是在这个时候:子查询可以很好地完成工作,并且通常更快。
您可以在同一查询级别上的聚合函数上运行窗口函数。这就是为什么这是有效的:
rank() OVER (ORDER BY count(aw.id_movie) DESC) AS rnk
我建议在连接条件中使用显式列名称,而不是
NATURAL JOIN
, ,如果您以后更改/添加列到基础表,则容易发生破损。
连接条件与USING
几乎一样短,但不那么容易折断。自
id_movie
不能为NULL(被JOIN条件排除,也是pk的一部分)它更短,使用速度稍快count(*)
相反。同样的结果。
就一部电影
更短,更快,但是,如果你只需要 一个 胜利者:
SELECT aw.id_movie, count(*) AS awards
FROM award a
JOIN awardwinner aw USING (id_award)
WHERE a.award_year = 2012
GROUP BY 1
ORDER BY 2 DESC, 1 -- as tie breaker
LIMIT 1
使用位置引用(1
, 2
)这里作为简写。
我补充说 id_movie
到 ORDER BY
作为平局断路器的情况下,多部电影应该有资格获得胜利。
其他提示
你可以用 常用表表达式 为了避免代码重复:
with cte_s as (
select id_movie, count(id_movie) as awards
from Award natural join awardwinner
where award_year = 2012
group by id_movie
)
select
sub.id_movie, sub.awards
from cte_s as sub
where sub.awards = (select max(sub2.awards) from cte_s as sub2)
或者你可以做这样的事情 窗口功能 (未经测试,但我认为PostgreSQL允许这样做):
with cte_s as (
select
id_movie,
count(id_movie) as awards,
max(count(id_movie)) over() as max_awards
from Award natural join awardwinner
where award_year = 2012
group by id_movie
)
select id_movie
from cte_s
where max_awards = awards
另一种方法是使用 职级() 功能(未经测试,可能是你必须使用两个cte而不是一个):
with cte_s as (
select
id_movie,
count(id_movie) as awards,
rank() over(order by count(id_movie) desc) as rnk
from Award natural join awardwinner
where award_year = 2012
group by id_movie
)
select id_movie
from cte_s
where rnk = 1
更新资料 当我创建这个答案时,我的主要目标是展示如何使用cte来避免代码重复。在genearal中,如果可能的话,最好避免在查询中多次使用cte-第一个查询使用2个表扫描(或索引查找),第二个和第三个只使用一个,所以我应该指定使用这些查询更好。无论如何,@Erwin在他的答案中做了这个测试。只是为了补充他的要点:
- 我也建议反对
natural join
由于这种容易出错的性质。实际上,我的主要RDBMS是不支持它的SQL Server,所以我更习惯于显式outer/inner join
. - 在查询中始终使用别名是一个好习惯,因此您可以避免 奇怪的结果.
- 这可能是完全主观的事情,但通常如果我只使用一些表来过滤掉查询主表中的行(就像在这个查询中一样,我们只想获得
awards
对于2012年,只过滤来自awardwinner
),我宁愿不使用join
, ,但使用exists
或in
相反,对我来说似乎更合乎逻辑。
with cte_s as (
select
aw.id_movie,
count(*) as awards,
rank() over(order by count(*) desc) as rnk
from awardwinner as aw
where
exists (
select *
from award as a
where a.id_award = aw.id_award and a.award_year = 2012
)
group by aw.id_movie
)
select id_movie
from cte_s
where rnk = 1
你不需要这样的东西吗?
SELECT ID_MOVIE, COUNT(*)
FROM AwardWinner
JOIN Award ON Award.ID_AWARD = AwardWinner.ID_AWARD
WHERE award_year = 2012
GROUP BY ID_MOVIE
ORDER BY COUNT(*) DESC
.
或可能(取决于您正在寻找的):
SELECT ID_MOVIE, COUNT(DISTINCT AwardWinner.ID_AWARD)
FROM AwardWinner
JOIN Award ON Award.ID_AWARD = AwardWinner.ID_AWARD
WHERE award_year = 2012
GROUP BY ID_MOVIE
ORDER BY COUNT(*) DESC
.