|
|
 |
 |
 |
 |
Removing Duplicates from a table
hi, i have a table that has 10 fields. (A, B, C, D, E, F, G, H, I, J). there is a constraint on the table to enforce unique in the fields (A, B, C). i have a staging table that contains the same fields without the constraint. this table contains duplicate data and I need to query out all the records for the unique A, B, and C combination. Can someone help me here.
Is this what you are after? SELECT MT.*, Dup.Quantity FROM MyTable MT JOIN (SELECT A, B, C, COUNT(*) AS Quantity FROM MyTable GROUP BY A, B, C HAVING COUNT(*) > 1) AS Dup ON MT.A = Dup.A AND MT.B=Dup.B AND MT.C = Dup.C RLF "R C" <R @discussions.microsoft.com> wrote in message news:E88CD0E6-CF11-43E4-A241-824F79B12CDC@microsoft.com...
> hi, > i have a table that has 10 fields. (A, B, C, D, E, F, G, H, I, J). > there is a constraint on the table to enforce unique in the fields (A, B, > C). > i have a staging table that contains the same fields without the > constraint. > this table contains duplicate data and I need to query out all the records > for the unique A, B, and C combination. Can someone help me here.
In SQL Server 2005 you can use something like this: WITH StagingCTE (rn) AS ( SELECT ROW_NUMBER() OVER(PARTITION BY A, B, C ORDER BY A, B, C) FROM Staging ) DELETE FROM StagingCTE WHERE rn > 1; HTH, Plamen Ratchev http://www.SQLStudio.com
-----------------------------------------------Reply-----------------------------------------------
this is half way there. i need to delete 1 of the 2 records so i eliminate the duplicates but i have to keep at most one copy of the record.
"Russell Fields" wrote: > Is this what you are after? > SELECT MT.*, Dup.Quantity > FROM MyTable MT > JOIN (SELECT A, B, C, COUNT(*) AS Quantity > FROM MyTable > GROUP BY A, B, C > HAVING COUNT(*) > 1) AS Dup > ON MT.A = Dup.A AND MT.B=Dup.B AND MT.C = Dup.C > RLF > "R C" <R@discussions.microsoft.com> wrote in message > news:E88CD0E6-CF11-43E4-A241-824F79B12CDC@microsoft.com... > > hi, > > i have a table that has 10 fields. (A, B, C, D, E, F, G, H, I, J). > > there is a constraint on the table to enforce unique in the fields (A, B, > > C). > > i have a staging table that contains the same fields without the > > constraint. > > this table contains duplicate data and I need to query out all the records > > for the unique A, B, and C combination. Can someone help me here.
On 5 Jun, 19:14, R C <R @discussions.microsoft.com> wrote: > this is half way there. i need to delete 1 of the 2 records so i eliminate > the duplicates but i have to keep at most one copy of the record.
Which version do you want to keep? If you just want to pick one row at random then you can use Plamen Ratchev's solution but I would say that's just sweeping the real problem under the carpet. There must surely be some meaning to the extra data, otherwise why does it exist at all? -- David Portas, SQL Server MVP Whenever possible please post enough code to reproduce your problem. Including CREATE TABLE and INSERT statements usually helps. State what version of SQL Server you are using and specify the content of any error messages. SQL Server Books Online: http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx --
-----------------------------------------------Reply-----------------------------------------------
>> I have a table [no name?] that has 10 fields [sic: columns are not fields]. (A, B, C, D, E, F, G, H, I, J). There is a constraint [fields do not have constraints; columns do] on the table to enforce unique in the fields [sic] (A, B, C). <<
Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are. Sample data is also a good idea, along with clear specifications. It is very hard to debug code when you do not let us see it. >> I have a staging table that contains the same fields [sic] without the constraint. This table contains duplicate data [tables do not have duplicate rows; try to always call it a "SQL staging table" instead. Picky, picky.] and I need to query out all the records [sic] for the unique A, B, and C combination[s]. <<
Perhaps "SELECT DISTINCT * FROM you_never_told_us_the_name"? And what does "query out" mean? -----------------------------------------------Reply-----------------------------------------------
i like Plamen Ratchev's but i'm using sql 2000. any ideas?
"David Portas" wrote: > On 5 Jun, 19:14, R C <R @discussions.microsoft.com> wrote: > > this is half way there. i need to delete 1 of the 2 records so i eliminate > > the duplicates but i have to keep at most one copy of the record. > Which version do you want to keep? If you just want to pick one row at > random then you can use Plamen Ratchev's solution but I would say > that's just sweeping the real problem under the carpet. There must > surely be some meaning to the extra data, otherwise why does it exist > at all? > -- > David Portas, SQL Server MVP > Whenever possible please post enough code to reproduce your problem. > Including CREATE TABLE and INSERT statements usually helps. > State what version of SQL Server you are using and specify the content > of any error messages. > SQL Server Books Online: > http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx > --
Since your staging table has not constraints and probably just for temporary usage, then you can use Celko's advise and do something like this: SELECT DISTINCT * INTO StagingTmp FROM Staging DROP TABLE Staging EXEC sp_rename 'StagingTmp', 'Staging' Probably all other solutions will be less efficient. HTH, Plamen Ratchev http://www.SQLStudio.com
-----------------------------------------------Reply-----------------------------------------------
R C, As David suggested, I gave you the select assuming that you would want to investigate what was different between the duplicate entries. If there is nothing different, then the DISTINCT type of answer will work. If there are meaningful differences, then you will have enough data to choose which rows to delete. It will be the rows with the same A,B,C, but not a specific D, E, F, etc. RLF "R C" <R @discussions.microsoft.com> wrote in message news:95241640-B514-4705-BB1D-F21ED3272056@microsoft.com...
>i like Plamen Ratchev's but i'm using sql 2000. any ideas? > "David Portas" wrote: >> On 5 Jun, 19:14, R C <R@discussions.microsoft.com> wrote: >> > this is half way there. i need to delete 1 of the 2 records so i >> > eliminate >> > the duplicates but i have to keep at most one copy of the record. >> Which version do you want to keep? If you just want to pick one row at >> random then you can use Plamen Ratchev's solution but I would say >> that's just sweeping the real problem under the carpet. There must >> surely be some meaning to the extra data, otherwise why does it exist >> at all? >> -- >> David Portas, SQL Server MVP >> Whenever possible please post enough code to reproduce your problem. >> Including CREATE TABLE and INSERT statements usually helps. >> State what version of SQL Server you are using and specify the content >> of any error messages. >> SQL Server Books Online: >> http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx >> --
On 5 Jun, 19:28, R C <R @discussions.microsoft.com> wrote: > i like Plamen Ratchev's but i'm using sql 2000. any ideas?
I can only repeat: Which version of the duplicated rows do you want to keep? Are you saying that you don't care what values get populated for columns D, E, F, G, H, I, J in the target table? If the data matters as little as that then one might wonder why you are storing it at all, or why you don't make a more effective design. -- David Portas, SQL Server MVP Whenever possible please post enough code to reproduce your problem. Including CREATE TABLE and INSERT statements usually helps. State what version of SQL Server you are using and specify the content of any error messages. SQL Server Books Online: http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx --
|
 |
 |
 |
 |
|