25M x 25M inner join (postgresql) performance
-
02-11-2019 - |
Question
I have a one time need to do an inner join of 25M rows on 25M rows. The box is a Alienware area 51, 4 cores 25GB memory and SATA drive (non system disk). So far it has taken 22 hours. I did btree index the ID (bigint) column which the join is being done on for both tables. Any tips? How long do you think I have to wait?
EXPLAIN SELECT
public.products_by_location_mv.id,
public.products_by_location_mv."data_object.unique_id",
public.products_by_location_mv.location AS outline,
public.products_by_location_mv.elevation_ft,
public.products_by_location_mv."geo_product.geo_product_id" AS pid,
public.products_by_location_mv.cntry_name,
public.products_by_location_mv.product_name,
public.products_by_location_mv.product_type,
public.products_by_location_mv.product_producer,
public.products_by_location_mv.product_size,
public.products_by_location_mv.do_location,
public.products_by_location_mv.product_location,
public.obj4.uid AS oid,
public.obj4.size_bytes,
public.obj4.object_date,
public.obj4.description,
public.obj4.location AS path
INTO
public.inventory0
FROM
public.obj4
INNER JOIN
public.products_by_location_mv
ON
(
public.obj4.id = public.products_by_location_mv.id) ;
"Hash Join (cost=3825983.03..12908235.27 rows=24202368 width=1356)"
" Hash Cond: (products_by_location_mv.id = obj4.id)"
" -> Seq Scan on products_by_location_mv (cost=0.00..1457298.68 rows=24202368 width=721)"
" -> Hash (cost=1414691.68..1414691.68 rows=25507868 width=643)"
" -> Seq Scan on obj4 (cost=0.00..1414691.68 rows=25507868 width=643)"
No correct solution
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange