Re: "'Resultcache" effect in 10.2.0.4 ??



On Jul 29, 8:01 am, jlchiappa <jlchia...@xxxxxxxxx> wrote:
cached in the SGA on the first execution and retrieved from memory on the
second execution?
Matthias
He did mention in the OP that the number of I/O's (PIO's, I assume...)
and the plans are the same, as well as ALTER SESSION FLUSH
BUFFER_CACHE and FLUSH SHARED POOL  don't make it happen.

YES, this is  exactly the current suspect...

Perhaps it is possible that two separate things are happening with the
same apparent effect:  When he shuts the db, it's slow because it has
to load the SGA, when he does it later, a SAN cache or something helps
things along.

This is the only possible explanation until now, so I´m buying it,
too.

I would think that there would be some way to flood the
OS cache he is talking about to show it is responsible.

yeah, in the test machine thiscolud be done, maybe writing a C program
malloc´íng more and more RAM until the OS begins to remove pages from
the OS cache, yes...

I had something simpler in mind, like catting big files around with
another machine on the san.

But again, the trace file should point you in the right direction as
to what to look at.


It's just so
odd that the 9i didn't show the effect, and 10g does (if I'm rereading
it correctly).

Yes you read correctly, this was exactly what happened. Strange,
agreed, but some possible explanations :

 a) as I said, the files are in cooked filesystems, maybe in Solaris
9i was doing direct I/Os (bypassing FS caches) and 10g not, due to
different code in the db or even due to different settings

Yeah, mbrc, async, all sorts of possibilities. How was it upgraded,
in place? exp/imp?


 b) this db attends an Oracle EBS 11i app, and when in 9i it was using
RULE optimization for sure (I know it was true in 9i epoch), so maybe
the RBO was generating always a lot of small I/Os and the OS caching
effect for it was small too, hardly detected, only now with CBO and
big hashes the effect is more present

Have you compared the 9i and 10g plans?


but all of this is pure speculation, only...

That's why I suggested bind peeking or something to do
with histograms.

Nope, as I said even if we test with the exact SAME values 2x the
effect is the same, so no way for bvp, I think...

Hopefully the 10046 will show more exactly where the
slow is, over all the combinations of situations.

In truth, my expectations about the 10046 are : if I could get 2 10046
trace files about 2 executions of the same SQL with the same values
but flushing buffer pool and hard parsing the SQLs each time , imho
the plans and the I/Os will be more or less the same, with this if I
see in the 10046 the same I/Os being faster in the 2nd exec, I prooved
that some "external force" is acting, something outside the db, being
OS cache, hardware cache, something like that...

Sounds like it will be informative.


I've been trying to

figure out if MBRC could somehow have this effect (different between
versions), but I don't think so - but see
http://www.freelists.org/post/oracle-l/db-file-multiblock-read-count-....
(and read the whole thread).  Then of course, it could be some solaris
specific thing, I haven't been on solaris in a long time.

Hmmm, different MBRCs ??? I haven´t thought about this, maybe this
could be a possibility...

Is there any possibility we might find out the storage hardware
architecture?

Sure : the storage is an IBM Shark (unsure about the exact model and
cache sizings), disks in RAID-5 groups, and the space is formatted in
native (cooked) Solaris filesystems, that´s it...

IMHO, you are just guessing until you can actually get a trace.  Also,
I wouldn't change anything.  Run it once with a trace, log out, log
in, and run it again with a trace.

Sure.

What if you do the following:
- restart the instance
- execute and time the query
- reboot the Solaris server and start the instance
- execute and time the query
If the second query is faster than the first one, the data is probably
cached by the SAN. If not, the OS cache is probably playing a role.

very good idea, an interesting test, will try it if possible...

Out of curiosity, it sounds like your colleague isn't very interested
in getting this fixed if he/she can't provide you with a trace?

How true : it´s the same old story, the behavior occurs only in the
big beast SQLs, that ones naturally slow and complex, so the end-users
are "used" to get bad response times, and anyway this kind of monsters
are "batch" things, they are not directly impacting day to day work,
so no much interest here, yes... In truth, my intention, more than
only helping my colleague, is to learn, receive different inputs, and
to be prepared if something like this occurs with me, only... In some
measure I already received this, received some hints and thoughts, and
the important, eliminated the chance for some obscure point (maybe a
'hidden' result cache) inside the db being present without my
knowledge, this was a main objective here, too, and this possibility
was eliminated , I wanted it...

Yeah, this is where I diverge with method-r as I understand it - the
business doesn't always know how to rate performance issues. But I
think method-r would do you well here, aside from that.

jg
--
@home.com is bogus.
The start of the real computer overlords:
http://www3.signonsandiego.com/stories/2009/jul/29/verizon-qualcomm-join-forces-venture/?uniontrib
.



Relevant Pages


Loading