Re: VERY chalanging question



On 30 May 2006 10:39:23 -0700, groupy wrote:

input: 1.5 million records table consisting users with 4 nvchar
fields:A,B,C,D
the problem: there are many records with dublicates A's or duplicates
B's or duplicates A+B's or duplicates B+C+D's & so on. Mathematicly
there are 16-1 posibilities for each duplication.

Hi groupy,

No. Only four possibilities: duplicate A, duplicate B, duplicate C, and
duplicate D. Combinations are just a special case (you can only have a
duplicate A+B if you have both a duplicate A and a duplicate B - though
you can have duplicate A and duplicate B but no duplicate A+B).

aim: find the duplicates & filter them, leave only the unique users
which don't have ANY duplication.

This specification is incorrect. For instance, with the input like this:

num A B C D
--- --- --- --- ---
1 a1 b1 c1 d1
2 a2 b2 c2 d2
3 a1 b2 c3 d3
4 a2 b1 c4 d4

there are two possible result sets, both containing two rows, that have
no duplicates anymore (1 + 2 or 3 + 4).

If the answer is "I don't care - any resultset without duplicates will
do", then the code below should run pretty fast:

CREATE TABLE #Temp
(A nvarchar(25) NOT NULL,
B nvarchar(25) NOT NULL,
C nvarchar(25) NOT NULL,
D nvarchar(25) NOT NULL)
go
CREATE UNIQUE INDEX x_A ON #Temp(A) WITH (IGNORE_DUP_KEY = ON)
CREATE UNIQUE INDEX x_B ON #Temp(B) WITH (IGNORE_DUP_KEY = ON)
CREATE UNIQUE INDEX x_C ON #Temp(C) WITH (IGNORE_DUP_KEY = ON)
CREATE UNIQUE INDEX x_D ON #Temp(D) WITH (IGNORE_DUP_KEY = ON)
go
INSERT INTO #Temp (A, B, C, D)
SELECT A, B, C, D
FROM YourBigTable
-- Show results
SELECT * FROM #Temp
go
DROP TABLE #Temp
go


--
Hugo Kornelis, SQL Server MVP
.



Relevant Pages

  • Re: DTS Package Duplicate Checking
    ... One way to do it is to set the temp table with a unique index and ignore ... occasionally the text file has duplicate records in it. ... > observing the primary key constraints). ...
    (microsoft.public.sqlserver.dts)
  • Re: Duplicates Error Report
    ... Ahh yes, this is the same routine to delete duplicate records from a table, ... Thanks Michael. ... >> table will not accept duplicates of the key field, ... Copy the destination table structure to a new table (TEMP). ...
    (microsoft.public.access.externaldata)
  • Re: Macro to "TransferSpreadsheet"
    ... Then append the temp. ... "Mark" wrote in message ... Is there a way to prevent duplicate ...
    (microsoft.public.access.queries)
  • Re: Duplicates Error Report
    ... > duplicate part numbers already in the access table. ... > table will not accept duplicates of the key field, ... Copy the destination table structure to a new table (TEMP). ... This should give you a Paste Errors table during the copy. ...
    (microsoft.public.access.externaldata)
  • Violation of primary key constraint
    ... one can you give me a the script so that I can count how many records i have in my temp table that have a count>1 for 'particularField'. ... Secondly, IF i dont have duplicate value for ALL four of my primary keys, which i KNOW i don't but want to double check to be sure....WHY is it saying they are duplicates, and how EXACTLY do i fix it? ...
    (microsoft.public.sqlserver.replication)