Selecting distinct records without using a temporary table
-
28-12-2020 - |
Question
I have a third party table that is being populated with some cluttered data that I'm needing to get the most recent distinct records out of. The table will be fed a new row every year, or every time the "Person" changes. The table works based on that the most recent ActiveDate is the correct person. I've created a mock table and data to show this.
CREATE TABLE `Persons` (
`PersonId` varchar(200) NOT NULL,
`Name` varchar(200) NOT NULL DEFAULT '',
`ActiveDate` varchar(25) NOT NULL,
`ExpireDate` varchar(25) DEFAULT NULL,
`Job` varchar(200) NOT NULL DEFAULT '',
`Position` varchar(200) NOT NULL DEFAULT ''
)
And some mock data:
Id |`Name` |ActiveDate |ExpireDate |Job |`Position`
---------------------------------------------------------------------------------------------------
J1234 |Doe, John |2010-08-15 00:00:00 |2011-08-15 00:00:00 |Worker |Janitor
J1234 |Doe, John |2011-08-15 00:00:00 |0000-00-00 00:00:00 |Worker |Janitor
777 |Doe, Jane |2010-06-04 00:00:00 |0000-00-00 00:00:00 |Boss |Janitor
777 |Doe, Jane |2011-04-30 00:00:00 |0000-00-00 00:00:00 |Boss |Janitor
654G |Smith, Jane |2011-01-20 00:00:00 |0000-00-00 00:00:00 |Worker |Janitor
The table also has and ExpireDate column which is actually set by the end user, and is not always set much to my dismay. Currently I'm using a dummy table to pull the distinct records out into and store for the day. I would use a temporary table but I'm not 100% sure how to in MySQL, plus I dislike them. The way I'm doing it is just temporary in hope for better SQL.
The data then has to be joined with a multitude of other tables to get the finished product. But I'm still needing to deal with the initial set of distinct data. And joining in the other table right from the start just wont work.
So here is how I'm pulling my data, storing it, and then pulling it again later and joing it to other tables:
INSERT INTO tmp_Person (Id, `Name`, Job, `Position`)
SELECT DISTINCT Id, `Name`, Job, `Position`
FROM Person
SELECT tmp_Person.Id,
tmp_Person.`Name`,
tmp_Person.Job,
tmp_Person.`Position`,
Pricing.Cost,
Pricing.Benefit
FROM tmp_Person
LEFT OUTER JOIN Pricing AS CL ON CL.PersonId = tmp_Person.Id
AND CL.PriceScredule = 'Major-Client'
AND CL.ExpireDate = '0000-00-00 00:00:00'
LEFT OUTER JOIN Pricing AS Inter ON Inter.PersonId = tmp_Person.Id
AND Inter.PriceScredule = 'Internal-Client'
AND Inter.ExpireDate = '0000-00-00 00:00:00'
How can I write this to avoid the cost of processing out the duplicate rows using a temporary table (in any form)? HOpefully I've made this clear enough, if not I can gladly add, or clarify.
La solution
Replace tmp_Person
with the code you have for the temp table:
SELECT tmp_Person.Id,
tmp_Person.`Name`,
tmp_Person.Job,
tmp_Person.`Position`,
CL.Cost AS MajorCost,
CL.Benefit AS MajorBenefit,
Inter.Cost AS InternalCost,
Inter.Benefit AS InternalBenefit
FROM
( SELECT DISTINCT Id, `Name`, Job, `Position`
FROM Person
)
AS tmp_Person
LEFT OUTER JOIN Pricing AS CL ON CL.PersonId = tmp_Person.Id
AND CL.PriceScredule = 'Major-Client'
AND CL.ExpireDate = '0000-00-00 00:00:00'
LEFT OUTER JOIN Pricing AS Inter ON Inter.PersonId = tmp_Person.Id
AND Inter.PriceScredule = 'Internal-Client'
AND Inter.ExpireDate = '0000-00-00 00:00:00'
As @Andriy spotted, using Pricing.Cost
or Pricing.Benefit
in the SELECT list would raise error. I guess you forgot to change it when you posted.
Autres conseils
Put this together before I realised the question was for mysql but the principal should be the same, this will get you the record for each PersonID with the most recent ActiveDate from Person table.
select *
from
(
select persons.*, ROW_NUMBER() over(partition by personid order by personid, activedate desc) as rn
from persons
) basedata
where basedata.rn=1