Identifying rows which don't match a master row
-
22-10-2019 - |
문제
I'm comparing a bunch of tables from different databases on different servers to a Master record. I need to know which servers, identified by locationID
, have the non-matching rows because they might need maintenance.
I've got a simple EXCEPT
query where I compare a table where each row is the configuration from each server; table1
has one row per server with all configuration plus locationID
which is a column that tells me which server it is. I compare these all to a table1_master
table which has the right settings, but I exclude the locationID
since it won't match.
Simple query below:
SELECT everything, but, locationID
FROM table1
EXCEPT
SELECT everything, but, locationID
FROM table1_master
There's only one master row I compare all servers to, and I don't select it's locationID
here.
This is an example of the rows I'm comparing. Each has a primary key, a single column varchar
and a giant list of that's dozens of columns. I want to compare all columns except LocationID, but I need LocationID to identify the rows.
LocationID setting setting setting setting
CS02 C Y Y Y Y
CS03 C Y Y Y Y
CS06 C Y N Y Y
In this example say CS02 is my Master record, so since all settings are the same in CS02 and CS03, those rows don't show up, but CS06's does. But in my EXCEPT
query, I'm not actually catching LocationID so I don't actually know which row was returned.
This returns the rows I need but NOT the locationID
, so I don't know which rows are wrong. Is there any way I can include locationID
in the results set while kicking out the matching rows?
The solution I thought of was to make a row for each server in the table1_master
table, so each locationID
is represented, but they all have the same data other than that. My EXCLUDE
query should then return the locationID
and my info, but is that the best way to do it?
해결책
You can also do this with dynamic SQL without having to manually build out all the column names.
DECLARE @sql NVARCHAR(MAX), @c1 NVARCHAR(MAX), @c2 NVARCHAR(MAX);
SELECT @c1 = N'', @c2 = N'';
SELECT
@c1 = @c1 + ',' + QUOTENAME(name),
@c2 = @c2 + ' AND m.' + QUOTENAME(name) + ' = s.' + QUOTENAME(name)
FROM sys.columns
WHERE name <> 'LocationID'
AND [object_id] = OBJECT_ID('dbo.table1');
SET @sql = ';WITH s AS (
SELECT ' + STUFF(@c1, 1, 1, '') + ' FROM dbo.table1
EXCEPT
SELECT ' + STUFF(@c1, 1, 1, '') + ' FROM dbo.table1_master
)
SELECT m.LocationID
FROM s INNER JOIN dbo.table1 AS m ON 1 = 1
' + @c2;
SELECT @sql;
--EXEC sp_executesql @sql;
You can take the output of this query as is and store the query somewhere, or you can comment out the SELECT
and uncomment the EXEC
and leave it as permanent dynamic SQL - in this case it will automatically adapt to column changes in the two tables.
Another idea (assuming LocationID is unique) - and it occurred to me you may want to include the master row so you can quickly spot the columns that are different:
;WITH c AS
(
SELECT t.LocationID, m.setting1, m.setting2, ...
FROM dbo.table1 AS t CROSS JOIN dbo.table1_master AS m
)
SELECT DISTINCT src = '> master', setting1, setting2, ...
FROM c
UNION ALL
(
SELECT RTRIM(LocationID), setting1, setting2, ...
FROM dbo.table1
EXCEPT
SELECT RTRIM(LocationID), setting1, setting2, ...
FROM c
)
ORDER BY src;
This version is a little cheaper (mostly by avoiding the DISTINCT
against the master table, at the cost of needing to specify all of the columns one more time - which again you can automate as per above):
;WITH m AS
(
SELECT setting1, setting2, ...
FROM dbo.table1_master
),
c AS
(
SELECT src = RTRIM(t.LocationID), m.setting1, m.setting2, ...
FROM dbo.table1 AS t CROSS JOIN m
)
SELECT src = '> master', setting1, setting2, ...
FROM m
UNION ALL
(
SELECT RTRIM(LocationID), setting1, setting2, ...
FROM dbo.table1
EXCEPT
SELECT src, setting1, setting2, ...
FROM c
)
ORDER BY src;
However all of these options are poorer performers with worse plans than Rachel's simple LEFT JOIN
. I tried to stick to the theme of using EXCEPT
even though it is more about syntax than performance.
The key takeaway is that if the column count is too high to deal with manually, you can use the dynamic SQL approach above to construct whatever query you want to use - and you can do that one time and store the result, or have the code generated every time. To generate Rachel's query using dynamic SQL, not much needs to change:
DECLARE @sql NVARCHAR(MAX), @and NVARCHAR(MAX), @anycol NVARCHAR(128);
SELECT @sql = N'', @and = N'';
SELECT @and = @and + ' AND t.' + QUOTENAME(name) + ' = m.' + QUOTENAME(name)
FROM sys.columns
WHERE [object_id] = OBJECT_ID('dbo.table1_master');
SELECT TOP (1) @anycol = QUOTENAME(name)
FROM sys.columns
WHERE [object_id] = OBJECT_ID('dbo.table1_master')
ORDER BY name;
SET @sql = 'SELECT locationID
FROM dbo.table1 AS t
LEFT OUTER JOIN dbo.table1_master AS m ON 1 = 1'
+ @and + ' WHERE m.' + @anycol + ' IS NULL;';
SELECT @sql;
--EXEC sp_executesql @sql;
다른 팁
I would recommend:
- Creating a
Hash
field that is a persisted computed column with a definition along the lines ofHASHBYTES('SHA1', Field1 + Field2 + Field3...)
- Comparing just that
HASH
value from your "master" to your other records - Displaying all the actual values from non-matching rows
Something like
SELECT *
FROM Table1
WHERE HashField <> (SELECT Hashfield FROM Table1_Master)
What's wrong with just joining the two tables on every column (or using a where statement) and selecting items that don't exist in the 2nd table?
SELECT locationID
FROM table1
LEFT OUTER JOIN table1_master
ON table1.a = table1_master.a
AND table1.b = table1_master.b
AND table1.c = table1_master.c
WHERE table1_master.a is null
It might not be pretty, but it should work