Home     |     .Net Programming    |     cSharp Home    |     Sql Server Home    |     Javascript / Client Side Development     |     Ajax Programming

Ruby on Rails Development     |     Perl Programming     |     C Programming Language     |     C++ Programming     |     IT Jobs

Python Programming Language     |     Laptop Suggestions?    |     TCL Scripting     |     Fortran Programming     |     Scheme Programming Language


 
 
Cervo Technologies
The Right Source to Outsource

MS Dynamics CRM 3.0

Sql Server Programming

Removing Duplicates from a table


hi,
i have a table that has 10 fields.  (A, B, C, D, E, F, G, H, I, J).
there is a constraint on the table to enforce unique in the fields (A, B, C).
i have a staging table that contains the same fields without the constraint.
 this table contains duplicate data and I need to query out all the records
for the unique A, B, and C combination.  Can someone help me here.  
Is this what you are after?

SELECT MT.*, Dup.Quantity
FROM MyTable MT
    JOIN  (SELECT A, B, C, COUNT(*) AS Quantity
      FROM MyTable
      GROUP BY A, B, C
      HAVING COUNT(*) > 1) AS Dup
   ON MT.A = Dup.A AND MT.B=Dup.B AND MT.C = Dup.C

RLF

"R C" <R@discussions.microsoft.com> wrote in message

news:E88CD0E6-CF11-43E4-A241-824F79B12CDC@microsoft.com...

In SQL Server 2005 you can use something like this:

WITH StagingCTE (rn)
AS
(
  SELECT
       ROW_NUMBER() OVER(PARTITION BY A, B, C
                                                    ORDER BY A, B, C)
  FROM Staging
)
DELETE FROM StagingCTE
WHERE rn > 1;

HTH,

Plamen Ratchev
http://www.SQLStudio.com

-----------------------------------------------Reply-----------------------------------------------

this is half way there.  i need to delete 1 of the 2 records so i eliminate
the duplicates but i have to keep at most one copy of the record.

On 5 Jun, 19:14, R C <R@discussions.microsoft.com> wrote:

> this is half way there.  i need to delete 1 of the 2 records so i eliminate
> the duplicates but i have to keep at most one copy of the record.

Which version do you want to keep? If you just want to pick one row at
random then you can use Plamen Ratchev's solution but I would say
that's just sweeping the real problem under the carpet. There must
surely be some meaning to the extra data, otherwise why does it exist
at all?

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--

-----------------------------------------------Reply-----------------------------------------------

>> I have a table [no name?] that has 10 fields [sic: columns are not fields].  (A, B, C, D, E, F, G, H, I, J). There is a constraint [fields do not have constraints; columns do] on the table to enforce unique in the fields [sic] (A, B, C). <<

Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications.  It is very hard to debug code when you do not let us
see it.

>> I have a staging table that contains the same fields [sic] without the constraint. This table contains duplicate data [tables do not have duplicate rows; try to always call it a "SQL staging table" instead.  Picky, picky.] and I need to query out all the records [sic] for the unique A, B, and C combination[s].  <<

Perhaps "SELECT DISTINCT * FROM you_never_told_us_the_name"?  And what
does "query out" mean?

-----------------------------------------------Reply-----------------------------------------------
i like Plamen Ratchev's but i'm using sql 2000.  any ideas?

Since your staging table has not constraints and probably just for temporary
usage, then you can use Celko's advise and do something like this:

SELECT DISTINCT * INTO StagingTmp FROM Staging
DROP TABLE Staging
EXEC sp_rename 'StagingTmp', 'Staging'

Probably all other solutions will be less efficient.

HTH,

Plamen Ratchev
http://www.SQLStudio.com

-----------------------------------------------Reply-----------------------------------------------

R C,

As David suggested, I gave you the select assuming that you would want to
investigate what was different between the duplicate entries.  If there is
nothing different, then the DISTINCT type of answer will work.

If there are meaningful differences, then you will have enough data to
choose which rows to delete. It will be the rows with the same A,B,C, but
not a specific D, E, F, etc.

RLF

"R C" <R@discussions.microsoft.com> wrote in message

news:95241640-B514-4705-BB1D-F21ED3272056@microsoft.com...

On 5 Jun, 19:28, R C <R@discussions.microsoft.com> wrote:

> i like Plamen Ratchev's but i'm using sql 2000.  any ideas?

I can only repeat: Which version of the duplicated rows do you want to
keep? Are you saying that you don't care what values get populated for
columns D, E, F, G, H, I, J in the target table? If the data matters
as little as that then one might wonder why you are storing it at all,
or why you don't make a more effective design.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--

Add to del.icio.us | Digg this | Stumble it | Powered by Megasolutions Inc