Re: How to get a distinct count of result set of multople table joins?
- From: Charles Hooper <hooperc2000@xxxxxxxxx>
- Date: Thu, 28 Feb 2008 09:36:21 -0800 (PST)
On Feb 28, 11:27 am, Mark D Powell <Mark.Pow...@xxxxxxx> wrote:
On Feb 27, 10:03 pm, Charles Hooper <hooperc2...@xxxxxxxxx> wrote:
I think that I now understand what you originally posted. The two
queries should be returning the same number of rows before the count,
since the UNION syntax removes duplicate rows automatically. Note
the:
"SORT (UNIQUE)" in your execution plan.
The first test is similar to your first query:
SELECT
COUNT(*)
FROM
(SELECT
*
FROM
T1,
T2,
T3
WHERE
T1.C1=T2.C1(+)
AND T1.C1=T3.C1(+)
UNION
SELECT
*
FROM
T1,
T2,
T3
WHERE
T1.C1=T2.C1(+)
AND T1.C1=T3.C1(+));
COUNT(*)
----------
2994198
The DBMS Xplan (watch for wordwrapping):
----------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows
| A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem |
Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |
00:03:16.27 | 5392 | 73199 | 73199 | | |
| |
| 2 | VIEW | | 1 | 2806K| 2994K|
00:03:09.16 | 5392 | 73199 | 73199 | | |
| |
| 3 | SORT UNIQUE | | 1 | 2806K| 2994K|
00:03:03.17 | 5392 | 73199 | 73199 | 269M| 8322K| 37M
(1)| 572K|
| 4 | UNION-ALL | | 1 | | 7089K|
00:01:32.59 | 5342 | 0 | 0 | | |
The above took 3 minutes and 16 seconds on Oracle 11.1.0.6
Sample 2 returns the same result, just significantly faster:
SELECT
COUNT(*)
FROM
(SELECT DISTINCT
*
FROM
T1,
T2,
T3
WHERE
T1.C1=T2.C1(+)
AND T1.C1=T3.C1(+));
COUNT(*)
----------
2994198
---------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows
| A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem |
Used-Tmp|
---------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |
00:01:18.14 | 2671 | 40365 | 40365 | | |
| |
| 2 | VIEW | | 1 | 1403K| 2994K|
00:01:11.67 | 2671 | 40365 | 40365 | | |
| |
| 3 | HASH UNIQUE | | 1 | 1403K| 2994K|
00:01:05.68 | 2671 | 40365 | 40365 | 351M| 15M| 46M
(1)| 337K|
|* 4 | HASH JOIN RIGHT OUTER | | 1 | 1403K| 3544K|
00:00:28.78 | 2671 | 0 | 0 | 1102K| 1102K| 181K
I was under the impression that performance is not an issue but rather
it is the fact that the two queries in the initial post are not
returning the same result set.
The base query is an outer join. Since an outer join says return all
rows from the base table plus the matching data from the joined table
or null values when there is not matching row, duplicate output would
potentially be expected but duplicate output could also indicate that
a required join condition has not been specified.
The join should be exaimined to be sure all join conditions have been
properly specified. The query should be checked to make sure it is
the right query for the desired output.
Then we are back to if
select query
union
select same_query
and
select count(*) from ( select distinct query)
should produce the same result set counts.
Note that getting the same results set does potentially depend on none
of the base tables in the join being updated during the time the two
queries are ran else the view of the data will be different.
-- Mark D Powell --
Mark, very good information.
It is quite possible that I am still misunderstanding what the OP is
trying to accomplish.
The OP first stated "Is there a more efficient solution, as the UNION
takes some time to run? ~3 mins." and in a later post stated, "please
note that performance is of secondary importance to me. Being able to
obtain a correct count of the rows returned by the UNION query is my
main concern."
The first SQL example was supposed to mimic the UNION query that the
OP posted, and the second was supposed to mimic the SELECT DISTINCT
query that the OP posted. I attempted to walk the OP through a
process of examining different SQL solutions that return the same
count of rows, but roughly three times faster than the UNION. (Side
note: I also wanted to verify that the claim I made in a previous post
in this thread about GROUP BY being faster than DISTINCT still held
true in Oracle 11i, and whether or not the number of columns made a
difference for the speed at which distinct rows are obtained.)
The explain plans seem to indicate that the two query methods posted
by the OP should produce the same record counts (or more specifically,
COUNT(*) returned by the one query should be the number of rows
returned by the other). The explain plan for one of my examples
showed SORT UNIQUE, and the other showed HASH UNIQUE, but otherwise
they were very similar.
Charles Hooper
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc.
.
- References:
- How to get a distinct count of result set of multople table joins?
- From: Chris ( Val )
- Re: How to get a distinct count of result set of multople table joins?
- From: Charles Hooper
- Re: How to get a distinct count of result set of multople table joins?
- From: Chris ( Val )
- Re: How to get a distinct count of result set of multople table joins?
- From: Charles Hooper
- Re: How to get a distinct count of result set of multople table joins?
- From: Mark D Powell
- How to get a distinct count of result set of multople table joins?
- Prev by Date: Re: Can't access iSQL*Plus
- Next by Date: Re: How to get a distinct count of result set of multople table joins?
- Previous by thread: Re: How to get a distinct count of result set of multople table joins?
- Next by thread: Re: How to get a distinct count of result set of multople table joins?
- Index(es):
Relevant Pages
|