Performance issue during count Postresql 9.4 - Creating index on pivot table key…?

https://dba.stackexchange.com/questions/122362

29-09-2020
|

Domanda

Disclaimer: I am new to Postgres, I need some advice...

I have a huge table, with full of poker hands.(will contain 20M records, right now 2M records in it)

aggregated_hands

I have a pivot table, which connected the given hands with an action. The action and the hands has M:M relationship (will have 30M pivot record, right now 3M is in it):

aggregated_hand_preflop_action

The task is easy. Time to time, I have to count the hands table, based on some search parameters. The issue, I believe is the counting:( It makes everything sloooooow.... Right now, I do caching, and it is working "fine"... but for the future, I would like to know what is the best way to tweak, or architect things like this.

the query:

 select cards cards, count(*) count from "aggregated_hands" 
 inner join "aggregated_hand_preflop_action" on "aggregated_hands"."id"  = "aggregated_hand_preflop_action"."aggregated_hand_id" 
 where "aggregated_hand_preflop_action"."preflop_action_id" = 1 and "tag_name" = 'reg' and "effective" >= 1 and "effective" <= 25 
 group by "cards"

explain:

"HashAggregate  (cost=124924.83..124925.57 rows=74 width=3) (actual time=1219.146..1219.170 rows=169 loops=1)"
"  Group Key: aggregated_hands.cards"
"  Buffers: shared hit=13352 read=23346, temp read=4182 written=4152"
"  ->  Hash Join  (cost=99870.62..124348.71 rows=115225 width=3) (actual time=892.849..1194.205 rows=93671 loops=1)"
"        Hash Cond: (aggregated_hand_preflop_action.aggregated_hand_id = aggregated_hands.id)"
"        Buffers: shared hit=13352 read=23346, temp read=4182 written=4152"
"        ->  Bitmap Heap Scan on aggregated_hand_preflop_action  (cost=5765.09..20921.74 rows=265892 width=4) (actual time=20.774..60.710 rows=270934 loops=1)"
"              Recheck Cond: (preflop_action_id = 1)"
"              Heap Blocks: exact=1199"
"              Buffers: shared hit=1199 read=943"
"              ->  Bitmap Index Scan on aggregated_hand_preflop_action_preflop_action_id_index  (cost=0.00..5698.62 rows=265892 width=0) (actual time=20.628..20.628 rows=270934 loops=1)"
"                    Index Cond: (preflop_action_id = 1)"
"                    Buffers: shared read=943"
"        ->  Hash  (cost=76901.14..76901.14 rows=1048591 width=7) (actual time=871.933..871.933 rows=1059259 loops=1)"
"              Buckets: 16384  Batches: 16  Memory Usage: 2603kB"
"              Buffers: shared hit=12153 read=22403, temp written=3387"
"              ->  Seq Scan on aggregated_hands  (cost=0.00..76901.14 rows=1048591 width=7) (actual time=0.013..652.702 rows=1059259 loops=1)"
"                    Filter: ((effective >= 1::double precision) AND (effective <= 25::double precision) AND ((tag_name)::text = 'reg'::text))"
"                    Rows Removed by Filter: 1360469"
"                    Buffers: shared hit=12153 read=22403"
"Planning time: 0.288 ms"
"Execution time: 1219.413 ms"

The insert to these tables are not important (happens once per month) only the query side is important

CREATE TABLE aggregated_hands
(
id serial NOT NULL,
id_hand integer NOT NULL,
id_site integer NOT NULL,
player_name character varying(255) NOT NULL,
tag_name character varying(255) NOT NULL,
cards character varying(255) NOT NULL,
action character varying(255) NOT NULL DEFAULT ''::character varying,
pos integer NOT NULL,
effective double precision NOT NULL,
nums integer NOT NULL,
bi integer NOT NULL,
date_played timestamp(0) without time zone NOT NULL,
created_at timestamp(0) without time zone NOT NULL,
updated_at timestamp(0) without time zone NOT NULL,
player_search character varying(255) NOT NULL DEFAULT ''::character varying,
CONSTRAINT aggregated_hands_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);

CREATE INDEX cards_index
ON aggregated_hands
USING btree
(cards COLLATE pg_catalog."default");

CREATE INDEX effective_index
ON aggregated_hands
USING btree
(effective);

CREATE INDEX player_search_index
ON aggregated_hands
USING btree
(player_search COLLATE pg_catalog."default");

CREATE INDEX stat_index
ON aggregated_hands
USING btree
(tag_name COLLATE pg_catalog."default", effective, cards COLLATE  pg_catalog."default");

CREATE INDEX stat_index2
ON aggregated_hands
USING btree
(id, tag_name COLLATE pg_catalog."default", effective, cards COLLATE    pg_catalog."default");

CREATE INDEX tag_name_index
ON aggregated_hands
USING btree
(tag_name COLLATE pg_catalog."default");

CREATE TABLE aggregated_hand_preflop_action
(
aggregated_hand_id integer NOT NULL,
preflop_action_id integer NOT NULL,
CONSTRAINT aggregated_hand_preflop_action_pkey PRIMARY KEY     (aggregated_hand_id, preflop_action_id),
CONSTRAINT aggregated_hand_preflop_action_aggregated_hand_id_foreign   FOREIGN KEY (aggregated_hand_id)
REFERENCES aggregated_hands (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT aggregated_hand_preflop_action_preflop_action_id_foreign FOREIGN KEY (preflop_action_id)
REFERENCES preflop_actions (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE
)
WITH (
OIDS=FALSE
);
CREATE INDEX aggregated_hand_preflop_action_aggregated_hand_id_index
ON aggregated_hand_preflop_action
USING btree
(aggregated_hand_id);

CREATE INDEX aggregated_hand_preflop_action_preflop_action_id_index
ON aggregated_hand_preflop_action
USING btree
(preflop_action_id);

Update: Thank for the quick turnaround. More info:

Slow means more than a second. It has to be super-quick
Right now there is 2M records in the database but there will be 20M.
The upload will happens only one per month, so it is a query only tables
I can scale a system, and configure the server anyhow. I am hosting it at digital ocean. Right now I am using 4GB ram and 2 processor.
I am expecting query to this table 2-3 / second on daily usage.
Yes, as I told I will cache and I am cacheing the results based on the query, but in case of update of the table I have to recacheit again so I still need to be quick
I am asking these question in advance, so I still have time to rethink, re optimalize the solution. Right now this is in a PoC phase, so I am not afraid to change anything if it is needed, and it is reasonable.
I can tweak any server config/memory configfiles anything

Update2:

Did the update 2xM-> 20M etc... Also did the update to the pivot table. I guess I can get rid of from the contains, I do not know how much is count during select (guess nothing), so the tweak should be somewhere else:)

Update3:

Is there any way to create index on pivot table and use it? Or the only way to do this is to migrate the pivot table item into rows like: preflop_action_1_flg boolean, preflop_action_2_flg boolean etc... there is not many columns like this (right now it is 13, and it will be max around 30 I guess). If I do this than I can create individual index on (cards, effective, tag) where preflop_action_1_flg is true, (cards, effective, tag) preflop_action_2_flg is true ... etc... But do I really have to do this, or I can make it work with the current design?

Soluzione 2

So. I have been checked a lots of article around it. I have not recieved any good advice, only the way of restructure, and make the index one by one, grop by group. So I flattened out the pivot table, with adding:

-- adding columns to aggragated_hands:
pa_flag_1 boolean
pa_flag_2 boolean
...

-- adding indexes to aggrageted_hands
'pa_flag_1', 'tag_name', 'effective', 'cards' => pa_flag_1_index
'pa_flag_2', 'tag_name', 'effective', 'cards' => pa_flag_2_index

-- doing the migration stuff
update aggregated_hands set pa_flag_1 = true where id in (select aggregated_hand_id from aggregated_hand_preflop_action where preflop_action_id = 1);
update aggregated_hands set pa_flag_2 = true where id in (select aggregated_hand_id from aggregated_hand_preflop_action where preflop_action_id = 2);

And... that's it. Because it is a query table, and it will grow but, it will be still manageable. The query time from 1500ms went down to 100ms. Which is a win for me:

select cards cards, count(*) count 
from aggregated_hands ahs
where 1=1
and tag_name = 'reg'
and effective between 1 and 25
and pa_flag_1 is true
group by cards;

Please if anyone has any additional comment, do not hesitate to shoot it...

Altri suggerimenti

I would begin with a change like this:

select cards cards, count(*) count 
from aggregated_hands ahs
where exists
    (select *
     from aggregated_hand_preflop_action apa
     where ahs.id = apa.aggregated_hand_id 
     and apa.preflop_action_id = 1)
and tag_name = 'reg'
and effective between 1 and 25
group by cards;

But it also can depend on how sparse the 1's are on the aggregated_hand_preflop_action table.

measure, then add this index:

create index on aggregated_hands (cards) where tag_name = 'reg' and effective between 1 and 25;

If that index doesn't show up in your explain or you are not gaining any speed from it, get rid of it.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a dba.stackexchange