Re: VERY chalanging question - Explanation
- From: Erland Sommarskog <esquel@xxxxxxxxxxxxx>
- Date: Wed, 31 May 2006 11:40:44 +0000 (UTC)
groupy (liav.ezer@xxxxxxxxx) writes:
A | B | C | D
--------------------
a1 b1 c1 d1
a1 b2 c2 d2
a1 b1 c3 d3
a4 b4 c4 d3
a5 b5 c5 d5
a6 b6 c6 d3
The duplications are:
rows 1+2+3 on A
row 1+3 on B
rows 3+4+6 on D
the only unique (in all params) row is 5
note: finding first that row 1 similar to 2 on A & deleting it will
loose information because we WON'T know if row 1 similar to row 3 on B.
The same goes for the deletion of row 3 : it will cause lose of data
regarding it's similarity to row 4 on D
The Simple query for retriving all duplicated rows which consumes most
time is:
SELECT COUNT(*),A,B,C,D
FROM tbl
GROUP BY A,B,C,D
HAVING count(*)>1
It takes about 2 weaks on a 1.5 million rows, while all fields are
nvchars & the DB is in SQL-Server
I sincerely doubt that this statement takes two weeks to run for 1.5
million rows. Had you said 1.5 milliard rows, I could maybe have
believed it.
Anyway, first index each column individually. Then try:
DELETE tbl
FROM tbl a
JOIN tbl b ON a.A = b.A
WHERE a.B > b.B OR
a.C > b.C OR
a.D > b.D
DELETE tbl
FROM tbl a
JOIN tbl b ON a.B = b.B
WHERE a.C > b.C OR
a.D > b.D
DELETE tbl
FROM tbl a
JOIN tbl b ON a.C = b.C
WHERE a.D > b.C
After this operation, you still have the rows that have the same values
in four columns. But it is not clear from your description whether you
have such duplicates. If you have this maybe the best:
ATLER TABLE tbl ADD ident int IDENTITY
DELETE tbl
FROM tbl a
JOIN tbl b ON a.A = b.A
WHERE a.ident > b.ident
DELETE tbl
FROM tbl a
JOIN tbl b ON a.B = b.B
WHERE a.ident > b.ident
DELETE tbl
FROM tbl a
JOIN tbl b ON a.C = b.C
WHERE a.ident > b.ident
ALTER TABLE tbl DROP COLUMN ident
Note: all the above is untested. For tested solutions (at least with
regards to correctness), please post:
o CREATE TABLE statement for the table.
o INSERT statements with sample data.
o The desired result given the sample.
--
Erland Sommarskog, SQL Server MVP, esquel@xxxxxxxxxxxxx
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
.
- Follow-Ups:
- Re: VERY chalanging question - Explanation
- From: jsfromynr
- Re: VERY chalanging question - Explanation
- References:
- VERY chalanging question
- From: groupy
- Re: VERY chalanging question
- From: Hugo Kornelis
- VERY chalanging question - Explanation
- From: groupy
- VERY chalanging question
- Prev by Date: Re: VERY chalanging question - Explanation
- Next by Date: Re: Very chalanging question - Explanation
- Previous by thread: Re: VERY chalanging question - Explanation
- Next by thread: Re: VERY chalanging question - Explanation
- Index(es):
Relevant Pages
|