Question

One of my jobs it to maintain our database, usually we have troubles with lack of performance while getting reports and working whit that base.
When I start looking at queries which our ERP sending to database I see a lot of totally needlessly subselect queries inside main queries.
As I am not member of developers which is creator of program we using, they do not like much when I criticize they code and job. Let say they do not taking my review as serious statements. So I asking you few questions about subselect in SQL

Does subselect is taking a lot of more time then left outer joins?
Does exists any blog, article or anything where I subselect is recommended not to use ?
How I can prove that if we avoid subselesct in query that query is going to be faster ?

Our database server is MSSQL2005

Was it helpful?

Solution

"Show, Don't Tell" - Examine and compare the query plans of the queries identified using SQL Profiler. Particularly look out for table scans and bookmark lookups (you want to see index seeks as often as possible). The 'goodness of fit' of query plans depends on up-to-date statistics, what indexes are defined, the holistic query workload.

Run the queries in SQL Server Management Studio (SSMS) and turn on Query->Include Actual Execution Plan (CTRL+M)

Think yourself lucky they're only subselects (which in some cases the optimiser will produce equivalent 'join plans') and not correlated sub-queries!

Identify a query that is performing a high number of logical reads, re-write it using your preferred technique and then show how few logicals reads it does by comparison.

Here's a tip. To get the total number of logical reads performed, wrap a query in question with:

SET STATISTICS IO ON
GO

-- Run your query here

SET STATISTICS IO OFF
GO

Run your query, and switch to the messages tab in the results pane.

If you are interested in learning more, there is no better book than SQL Server 2008 Query Performance Tuning Distilled, which covers the essential techniques for monitoring, interpreting and fixing performance issues.

OTHER TIPS

One thing you can do is to load SQL Profiler and show them the cost (in terms of CPU cycles, reads and writes) of the sub-queries. It's tough to argue with cold, hard statistics.

I would also check the query plan for these queries to make sure appropriate indexes are being used, and table/index scans are being held to a minimum.

In general, I wouldn't say sub-queries are bad, if used correctly and the appropriate indexes are in place.

I'm not very familiar with MSSQL, as we are using postrgesql in most of our applications. However there should exist something like "EXPLAIN" which shows you the execution plan for the query. There you should be able to see the various steps that a query will produce in order to retrieve the needed data.

If you see there a lot of table scans or loop join without any index usage it is definitely a hint for a slow query execution. With such a tool you should be able to compare the two queries (one with the join, the other without)

It is difficult to state which is the better ways, because it really highly depends on the indexes the optimizer can take in the various cases and depending on the DBMS the optimizer may be able to implicitly rewrite a subquery-query into a join-query and execute it.

If you really want to show which is better you have to execute both and measure the time, cpu-usage and so on.

UPDATE: Probably it is this one for MSSQL -->QueryPlan

From my own experience both methods can be valid, as for example an EXISTS subselect can avoid a lot of treatment with an early break.

Buts most of the time queries with a lot of subselect are done by devs which do not really understand SQL and use their classic-procedural-programmer way of thinking on queries. Then they don't even think about joins, and makes some awfull queries. So I prefer joins, and I always check subqueries. To be completly honnest I track slow queries, and my first try on slow queries containing subselects is trying to do joins. Works a lot of time.

But there's no rules which can establish that subselect are bad or slower than joins, it's just that bad sql programmer often do subselects :-)

Does subselect is taking a lot of more time then left outer joins?

This depends on the subselect and left outer joins.

Generally, this construct:

SELECT  *
FROM    mytable
WHERE   mycol NOT IN
        (
        SELECT  othercol
        FROM    othertable
        )

is more efficient than this:

SELECT  m.*
FROM    mytable m
LEFT JOIN
        othertable o
ON      o.othercol = m.mycol
WHERE   o.othercol IS NULL

See here:

Does exists any blog, article or anything where subselect is recommended not to use ?

I would steer clear of the blogs which blindly recommend to avoid subselects.

They are implemented for a reason and, believe it or not, the developers have put some effort into optimizing them.

How I can prove that if we avoid subselesct in query that query is going to be faster ?

Write a query without the subselects which runs faster.

If you post your query here we possibly will be able to improve it. However, a version with the subselects may turn out to be faster.

Try rewriting some of the queries to elminate the sub-select and compare runtimes.

Share and enjoy.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top