Re: 10046 trace: 'latch free' then 'db file seq. read' waits




Spendius wrote:
sjaffarhussain@xxxxxxxxx a écrit:
cache buffer chain generally comes when a block accessed in the buffer,
i.e. logical read. Since the wait event is 'db file sequential read', I
guess this could be index root block hot block issues.
Can you give us the explain plan and information about the tabels
involved in the query and its statistical information and also index
information.
What is the size of buffer cache in the database?
Jaffar

Thanks for the replies. Here are the infos.

SQL> sta param k_buffers

NAME TYPE VALUE
------------------------------------ ----------- -----------
db_block_buffers integer 400000
SQL> sta param lru

NAME TYPE VALUE
------------------------------------ ----------- -----------
db_block_lru_latches integer 8

SQL> sta param cpu_c

NAME TYPE VALUE
------------------------------------ ----------- -----------
cpu_count integer 8
SQL> show sga

Total System Global Area 3875663548 bytes
Fixed Size 102076 bytes
Variable Size 597696512 bytes
Database Buffers 3276800000 bytes
Redo Buffers 1064960 bytes

Block size is 8192.

I have to add we're not experiencing latch problems at the instance
level, but we're hitting a very puzzling and unusual behaviour in a
query that gets blocked *just a few records before the end of its
result set*: thanks to the FIRST_ROWS hint this stmt immediately
starts to display its result, until it's reached 98 or 99% of what it
must display, then it hangs for about 2 minutes. When I check the
waits once it's started to hang, I see 'latch free' and 'db file seq.
read' waits alternating (with most of the waits on 'db file...'). This
*systematically* happens, every day, at any hour: a few lines
before showing the oldest records (you'll see below
the sort is descending on a date), the display in SQL*Plus hangs,
then it suddenly completes (the whole result is about 900 records,
it stops at the 890th or 892nd row, hangs, and ends up displaying
the 10 or 8 last rows). In fact we used to put this query inside
a sub-query with a ROWNUM < 700, we've removed this "SELECT
* FROM (sub-query shown below) WHERE ROWNUM < 700" stuff
and since then we've noticed it blocks for a few mn before ending.
Very peculiar.

The SELECT is as follows:
select:
SELECT /*+ first_rows */
d.tab_id tab_id,
TO_CHAR (di.time_oper, 'DD/MM/YYYY') time_oper,
...
FROM tab TAB, <= accessed thru TAB_PK
tab_inf di, <= accessed thru TAIN_IDX_NI5
tab_pnrers dpe, <= accessed thru TNPE_PK
tab_pay TAB_PAY <= accessed thru TPMT_TAB_FK_I
WHERE di.operation = 'CREATE'
AND di.distrib_code = 'BS'
AND di.tab_id = TAB.tab_id
AND TAB_PAY.tab_id = TAB.tab_id
AND TAB_PAY.TIME_OF_PAY > SYSDATE-1
AND dpe.tab_id(+) = TAB.tab_id
ORDER BY di.time_oper DESC

on TAB_INF.TIME_OPER we have a non-unique index: TAIN_IDX_NI5
that appears much too many times from the moment I start to
trace the session executing this stmt. These tables are pretty
voluminous (millions of records each except for tab_pnrers).

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=237543 Card=1
Bytes=393)
1 0 VIEW (Cost=237543 Card=1 Bytes=393)
2 1 NESTED LOOPS (OUTER) (Cost=237543 Card=1 Bytes=135)
3 2 NESTED LOOPS (Cost=237540 Card=1 Bytes=122)
4 3 NESTED LOOPS (Cost=237537 Card=1 Bytes=66)
5 4 TABLE ACCESS (BY INDEX ROWID) OF 'TAB_INF' (Cost=237535
Card=1 Bytes=24)
6 5 INDEX (FULL SCAN DESCENDING) OF 'TAIN_IDX_NI5'
(NON-UNIQUE) (Cost=35791 Card=1)
7 4 TABLE ACCESS (BY INDEX ROWID) OF 'TAB' (Cost=2
Card=823017 Bytes=34566714)
8 7 INDEX (UNIQUE SCAN) OF 'TAB_PK' (UNIQUE) (Cost=1
Card=823017)
9 3 TABLE ACCESS (BY INDEX ROWID) OF 'TAB_PAY' (Cost=3
Card=49551 Bytes=2774856)
10 9 INDEX (RANGE SCAN) OF 'TPMT_TAB_FK_I' (NON-UNIQUE)
(Cost=2 Card=49551)
11 2 INDEX (RANGE SCAN) OF 'TNPE_PK' (UNIQUE) (Cost=3
Card=1181995 Bytes=15365935)

I have a script that returns lines looking like a 10046 trace, that I
run
from the moment my displays starts to hang for these +/- 2 minutes
(I loop very quickly on a SELECT FROM V$SESSION_WAIT -forget the
"ela=" value as it's erroneous-) and I get things like the following:
SQL> sta pseudo10046 <sid#>
nam='latch free' ela=2 p1=18420778536 p2=66 p3=0
nam='latch free' ela=2 p1=18420778536 p2=66 p3=0
nam='db file sequential read' ela=-1 p1=12 p2=4342 p3=1
nam='db file sequential read' ela=-1 p1=12 p2=4342 p3=1
nam='db file sequential read' ela=-1 p1=12 p2=4342 p3=1
[... continued over thousands of lines until my first session
finishes to display my result set, I get then:]
nam='SQL*Net message from client' ela=0 p1=675562835 p2=1 p3=0
and do a <CTL-C>.

Then I put all p1=file# p2=block# when nam='db file seq. read'
in a script 'hotblocks.sql' that contains a call to another script
that reads
SQL: select owner||'.'||segment_name segment, segment_type,
tablespace_name
SQL: from sys.dba_extents
SQL: where file_id = &&1
SQL: and &&2 between block_id and (block_id+(blocks-1))
it returns me the segments as follows (I've replaced "INDEX"
with "**index**" so it's more legible here):

SQL> sta hotblocks
TNPE_PK **index** TAB1_DI1
TAB_PAY TABLE TAB1_DD1
<128 times...>
TAB_PAY TABLE TAB1_DD1
TAIN_IDX_NI5 **index** TAB1_DI2
TAIN_IDX_NI5 **index** TAB1_DI2
TAIN_IDX_NI5 **index** TAB1_DI2
TAB_PAY TABLE TAB1_DD1
TAIN_IDX_NI5 **index** TAB1_DI2
TAIN_IDX_NI5 **index** TAB1_DI2
TAB_PAY TABLE TAB1_DD1
[... over thousands of lines]

Most of the lines here show reads on blocks from
o this index quoted above (TAIN_IDX_NI5) or
o from the table TAB_PAY,
it varied during my tests (but I don't know depending on
what !). Just to display the verly last records, Oracle then
carries out thousands of reads either against this index, or
against the TAB_PAY table (the 4rth in the FROM list).

What I CAN'T understand is why Oracle starts to display
very quickly, then stays blocked for about 2 mn only to
finish displaying the 10 last rows... (when i say I get the
same behaviour everyday that means every day -and
even every 5 minutes- the result set differs and gets
bigger every hour as these tables incur many inserts/updates
all the time)

Thanks.

First instead of your simulated trace ( don't quite understand that )
why not do a real 10046 trace at level 12? You can setup a logon
database trigger if needed to get the trace started at the beginning of
the activity ... prior to the SQL statement being executed.

You do understand ( I hope ) that FIRST_ROWS hint tells the optimizer
to create if it can an execution plan that favors getting some of the
information back quickly but may not be a good choice for an execution
plan if you really need all the data.

At some point it sounds like someone had this SQL changed so that it
didn't get all the data ( only first 700 rows right ).

You or someone needs to look at the business case. Do you need all the
data or not?

If you do need all the data, then FIRST_ROWS may not be a good choice
to see the last rows.

I would recommend understanding the application and the SQL first
before making other changes.

.



Relevant Pages

  • DES 2.6 Released
    ... Also, top-n SQL ... TAPI has been extended for some commands. ... A novel proposal for hypothetical SQL queries which allow to ... Display of strata is first ordered by strata and then by predicate ...
    (comp.lang.prolog)
  • Re: put_line: any way to have nonbuffered output?
    ... that controls display from pl/sql code without using ... dbmsoutput_linesarray during the execution. ... After the execution ends, sqlplus ... As Oracle session can only execute a single SQL or PL/ ...
    (comp.databases.oracle.server)
  • Re: put_line: any way to have nonbuffered output?
    ... that controls display from pl/sql code without using ... dbmsoutput_linesarray during the execution. ... After the execution ends, sqlplus ... As Oracle session can only execute a single SQL or PL/ ...
    (comp.databases.oracle.misc)
  • Re: "Resultcache" effect in 10.2.0.4 ??
    ... 'slow', say, 15 minutes or alike, and the second execution is 5 times ... SQL caching does not explain it ... or even if the subquery is altered in a small way (example, ... get the trace 10046 from a small but reproductible example ...
    (comp.databases.oracle.server)
  • Re: performance issue after upgrade to oracle 11.2.0.1 linux 32 bit
    ... I have put both 10053 trace for both execution plan as below url, ... SQL statement is the almost immediate substitution of the original SQL ... SQL plan baselines, I believe, are another possibility (looking back ...
    (comp.databases.oracle.server)