Re: long checkpoints



On Nov 21, 8:46 am, Apostrof <abdullah.ako...@xxxxxxxxx> wrote:
<Previous post SNIPPED>
Hi Art,

Hi,

Your server's healthy overall. Metrics look good:

BR = 3.68
BTR = 1.73/hour
RAU = 99.69% - not perfect but acceptable It probably could no hurt
to reduce the RA threshhold be 50%


our server has 4gb memory and informix is using 500mb of it. there is
another application which is using 1-2gb of memory. the batch utility
is part of this application.
we are using cooked chunks. there is no raid5 but hp mirroring tool is

COOKED chunks without NUMAIOVPS set properly is your BIG problem. You
have 7 chunks and on 7.31 you'll need 1.5 AIO VPs per chunk plus a few
extra for message log and other cooked IO (like SET EXPLAIN output).
That means I'd set:
NUMAIOVPS 16
To begin with and monitor onstat -g iov over a normal workload
period. If you see that there are one or more AIO VPs with io/wup >=
1.0 it means that at least some of the time you have IOs waiting for a
VP to free up so you need to increase the number of AIO VPs for the
next restart. If there are any AIO VPs showing io/wup == 0.0 you can
reduce NUMAIOVPS to eliminate them.

I still think that you could benefit from more buffers and I suspect
that is the reason that onstat -P shows that 73% of your buffer cache
is dedicated to index pages. I'm guessing that index pages and data
pages are thrashing the cache a bit. That's why your Read Cache hit
rate is only 92.9% and Write Cache hit rate is only 82.2%. You want
to see these above 95% and 85% (90 would be better but that's
application dependent) and it is also part of the cause of the LRU
Writes - though the AIO VP configuration is the biggest culprit.

Art S. Kagel

Finally, I notice that there is almost as much write activity in the
ROOTDB dbspace as in the data dbspace. This is almost entirely
because you have the physical log configured there. You should move
the physical log to an separate dbspace, preferably on it's own disk
structure away from data, root, and logical logs as much as possible.
Notice that your data write activity, physical log write activity and
logical log write activity are all about the same volume. So, the
more you can isolate these from each other the better checkpoints and
other physical write operations will perform.

used with volume groups where the chunks resides.
you are right with the checkpoint information. there is only one
checkpoint in my sample output while the batch utility is working.
after batch we get an onunload backup. i think th other checkpoint
takes place while we take this backup. sometimes one sometimes two
checkpoints occurs during the batch utility. second checkpoint may
occur during backup.
these are the outputs you want.

17:32:53 Checkpoint Completed: duration was 160 seconds.
17:32:53 Checkpoint loguniq 1250, logpos 0x1842484
17:33:05 Logical Log 1250 Complete.
17:33:07 Process exited with return code 156: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1250 Complete." "Logical
Log 1250 Complete."
17:33:52 Logical Log 1251 Complete.
17:33:53 Process exited with return code 156: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1251 Complete." "Logical
Log 1251 Complete."
17:35:19 Logical Log 1252 Complete.
17:35:20 Process exited with return code 156: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1252 Complete." "Logical
Log 1252 Complete."
17:38:59 Checkpoint Completed: duration was 155 seconds.
17:38:59 Checkpoint loguniq 1253, logpos 0x8fa098

onstat -D output:
-----------------

IBM Informix Dynamic Server Version 7.31.UD8 -- On-Line -- Up 2
days 07:08:22 -- 501872 Kbytes

Dbspaces
address number flags fchunk nchunks flags owner name
e41dc158 1 1 1 1 N informix rootdbs
e41ddf38 2 1 2 1 N informix
llogspace
e4206b48 3 1 3 4 N informix datadbs
e4206c08 4 2001 7 1 N T informix tempdbs
4 active, 2047 maximum

Chunks
address chk/dbs offset page Rd page Wr pathname
e41dc218 1 1 0 4114 102979 /work_a1/db/rootchunk
e41ddc20 2 2 0 9969 108051 /data1/logs/llogchunk1
e41ddd28 3 3 0 9313329 14220 /data1/db/datachunk1
e41dde30 4 3 0 7721858 116564 /data1/db/datachunk2
e4206830 5 3 0 5 0 /data1/db/datachunk3
e4206938 6 3 0 5 0 /data1/db/datachunk4
e4206a40 7 4 0 92433 92610 /work_a1/db/tempchunk
7 active, 2047 maximum

onstat -p output:
-----------------

IBM Informix Dynamic Server Version 7.31.UD8 -- On-Line -- Up 2
days 07:09:13 -- 501872 Kbytes

Profile
dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached
13163890 17141713 73963987 82.20 141207 434424 2013940 92.99

isamtot open start read write rewrite delete
commit rollbk
15105310 93462 1625540 7330333 521236 2764 382820
4434 0

gp_read gp_write gp_rewrt gp_del gp_alloc gp_free gp_curs
0 0 0 0 0 0 0

ovlock ovuserthread ovbuff usercpu syscpu numckpts flushes
0 0 0 1473.22 635.22 50 1870

bufwaits lokwaits lockreqs deadlks dltouts ckpwaits compress
seqscans
705363 0 102952514 0 0 13 38773
13827

ixda-RA idx-RA da-RA RA-pgsused lchwaits
31895 2292 11835066 11832481 675752

onstat -P | head -20
--------------------

IBM Informix Dynamic Server Version 7.31.UD8 -- On-Line -- Up 2
days 07:10:38 -- 501872 Kbytes
partnum total btree data other resident dirty
0 8529 7575 788 166 0 0
1048578 2 1 1 0 0 0
1048579 10 5 5 0 0 0
1048580 1 1 0 0 0 0
1048582 1 1 0 0 0 0
1048584 3 2 1 0 0 0
1048595 1 1 0 0 0 0
1048606 1 1 0 0 0 0
1048703 1 1 0 0 0 0
3145730 26 11 15 0 0 0
3145731 1 1 0 0 0 0
3145732 46 16 30 0 0 0
3145733 7 2 5 0 0 0
3145734 10 5 5 0 0 0
3145735 2 1 1 0 0 0
3145736 8 4 4 0 0 0
3145737 2 1 1 0 0 0

onstat -P | tail -20
--------------------

3145888 6 0 6 0 0 0
3145891 17 0 17 0 0 0
3145892 154 0 154 0 0 0
3145893 4 0 4 0 0 0
3145894 1 0 1 0 0 0
3145895 7 0 7 0 0 0
3145899 1 1 0 0 0 0
3145900 1 0 1 0 0 0
3145901 1 0 1 0 0 0
3145902 478 1 477 0 0 0
3145903 22 3 19 0 0 0
3145904 60 0 60 0 0 0

Totals: 200000 146840 52876 284 0 0

Percentages:
Data 26.44
Btree 73.42
Other 0.14

.



Relevant Pages

  • Re: long checkpoints
    ... I recall an issue on 7.3 engines where the output from onstat -P ... have 7 chunks and on 7.31 you'll need 1.5 AIO VPs per chunk plus a few ... logical log write activity are all about the same volume. ... checkpoint in my sample output while the batch utility is working. ...
    (comp.databases.informix)
  • Re: long checkpoints on informix
    ... 10:32:30 Maximum server connections 282 ... 10:42:41 Fuzzy Checkpoint Completed: duration was 10 seconds, ... 10:42:41 Checkpoint loguniq 31535, logpos 0x249a6ec, timestamp: ...
    (comp.databases.informix)
  • Re: long checkpoints on informix
    ... plus you've got 300Mb of shared memory. ... You've got two CPU VPs and just one AIO VP. ... Try monitoring using "onstat -F" during the checkpoint by running it ...
    (comp.databases.informix)
  • Re: long checkpoints on informix
    ... Your BUFFERS are just 200000 which is just 400Mb or 800Mb depending on your page size, plus you've got 300Mb of shared memory. ... You could definitely have more AIO VPs and maybe you could try two CPU VPs per processor. ... Try monitoring using "onstat -F" during the checkpoint by running it every second for analysis later. ...
    (comp.databases.informix)
  • Re: HDR and log checkpoints
    ... So your primary server could be delayed waiting for the ... in sync the checkpoint like 20/20/100 secs. ... 12:52:17 Logical Log 60339 - Backup Started ... 12:52:51 Logical Log 60339 - Backup Completed ...
    (comp.databases.informix)