Question

I executed an update statement along the following lines yesterday:

UPDATE MainTable
Set SomeField = SubsetTable.SomeField
where MainTable.MainTableKey = SubsetTable.MainTableKey

where SubsetTable is a subset of MainTable and has the same Primary Key field. MainTable has roughly 200m records, SubsetTable has 5m records. MainTableKey is a GUID.

Both of these table have a clustered index on MainTableKey.

When I executed this query the first time it took a whopping 14 hours.

Then I added a non-clustered index to MainTableKey on both tables. Now it takes 30 minutes.

Does anyone have any ideas on why the performance gain would be so dramatic?

Était-ce utile?

La solution

I bet if you look at the execution plans:

The first query is a merge join which involved reading both tables completely.

200M rows + 5M rows = 205M rows.
205M rows / 14 hours = 4067 rows per second.

The second query is a nested loop join which reads the whole small table and seeks into the large table for each small table row.

5M + 5M rows = 10M rows.
10M rows / 40 minutes = 4166rows per second.

That these rates are approximately equal, supports my theory about which rows are read.

You don't have to guess: Run the queries with SET STATISTICS IO ON, and/or look at the execution plans.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top