Re: Update Statistics and poor performance
- From: "Art S. Kagel" <kagel@xxxxxxxxxxxxx>
- Date: Tue, 30 May 2006 17:13:56 -0400
fictishas@xxxxxxxxx wrote:
See my responses below:
My questions will make more sense after you read between the lines for
details, but here it goes:
1) Do 3 table joins use tempspace where 2 table joins do not?
2) Does update stats typically use enormous amounts of tempspace to
complete when running on large tables?
3) Is it possible that most sql's run efficiently during update stats
including 2 table joins, but 3 table joins may cause dramatic delays?
4) If the answer to number 3 is yes but specific to the 9.3UC2 engine,
can I hope for a better result in version 10.0UC4?
9.30UC2 is the last version released before the update statistics code was rewritten in 9.30xC3. That optimized code IS included in all of the 9.40 and 10.00 releases as well as in 7.31xD2 and later.
I notice that you had better luck using fewer larger udpate statistics statements. This is done automatically by the latest dostats version (Source Revision: 1.121) for the optimized server versions. For your (unoptimized) server version you can enable this updated algorithm in dostats by adding the -o flag which inverts the server version detection logic. Try that.
However, it looks most like you are running out of resources. Keep in mind that the portion of total CPU and Memory resources allocated to any particular query is dependent on the PDQPRIORITY setting of each session, the number of CPU VPs, the amount of total DS_TOTAL_MEMORY allocated, and the number of DS_MAX_QUERIES set in the ONCONFIG file. Read the Performance Guide for details.
I suspect that unreasonably (or at least optimistically) low settings for the DS_ parameters combined with high PDQ values and complex queries are the main culprits.
Art S. Kagel
Please read below for specifics on what occured and what was observed..
All suggestions are appreciated. The answer to your first question is
yes, I am working on upgrading these servers from 9.3 to 10.0UC4 now.
I'm mostly wondering if I need to be on the lookout for 3 table joins
while running update stats in the future or if this is a 9.3 issue or
some other issue all together.
Thanx,
~Tisha Oden
-------------------------------------------------------------------------------------------------
Earlier this week I attempted to run update statistics on one of our
larger production tables with 80,000,000 rows which exists on one of
our oldest servers running IDS9.3UC2. First I started with dostats
which created 13 sql statements for this table as follows:
UPDATE STATISTICS MEDIUM FOR TABLE pin_mast DISTRIBUTIONS ONLY;
UPDATE STATISTICS HIGH FOR TABLE pin_mast (customer_number)
DISTRIBUTIONS ONLY;
UPDATE STATISTICS LOW FOR TABLE pin_mast (customer_number,
batch_number, brick_number);
UPDATE STATISTICS HIGH FOR TABLE pin_mast ( dnis_set ) DISTRIBUTIONS
ONLY;
UPDATE STATISTICS HIGH FOR TABLE pin_mast ( batch_number) DISTRIBUTIONS
ONLY;
UPDATE STATISTICS LOW FOR TABLE pin_mast (customer_number, dnis_set);
UPDATE STATISTICS HIGH FOR TABLE pin_mast ( retail_plan ) DISTRIBUTIONS
ONLY;
UPDATE STATISTICS LOW FOR TABLE pin_mast (customer_number,
retail_plan);
UPDATE STATISTICS HIGH FOR TABLE pin_mast (pin) DISTRIBUTIONS ONLY;
UPDATE STATISTICS LOW FOR TABLE pin_mast (pin, customer_number,
dnis_set);
UPDATE STATISTICS HIGH FOR TABLE pin_mast (cust_ref1) DISTRIBUTIONS
ONLY;
UPDATE STATISTICS LOW FOR TABLE pin_mast (cust_ref1, cust_ref2,
customer_number);
UPDATE STATISTICS LOW FOR TABLE pin_mast (customer_number,
batch_number, seq_number);
This ran for a while and then it got to the fourth statement and we
started experiencing major delays which in turn caused a complete
failure in all programs attached to this database (major outage). In
an attempt to lessen the impact this server was having on our platform
I chose to stop update statistics and see if we could limp along until
traffic slowed down and then restart the update statistics.
This seemed to work for about 15 minutes then again the load average on
the server shot up and the response times became too slow for our
platform to continue functioning as needed. I decided that the
performance was not going to get any better and we had to restart
update stats. I chose to run the modified version suggested by Andrew
Ford in previous postings, cutting my 13 statements down to 3 in hopes
that it would finish a little quicker or provide us with somewhat
better performance perhaps after the first sql completed. This version
is as follows:
update statistics low
for table pin_mast (customer_number, dnis_set, batch_number,
brick_number,
retail_plan, seq_number, pin);
update statistics medium
for table pin_mast (cust_ref1, cust_ref2, amount_remaining,
amount_used,
renewals, first_use, last_use, last_vert,
last_horiz,
last_lata, pin_status, maint_next,
maint_count,
first_attempt, maint_last, maint_total,
recharge_date,
adjustments) distributions only;
update statistics high
for table pin_mast (customer_number, pin, dnis_set, batch_number,
retail_plan, brick_number, seq_number)
distributions only;
In both cases PDQPRIORITY was set first to 10 and then to 40 to try and
get it to run a little faster.
Running the second version did not of course solve our problems
immediately and we continued to experience slow performance and
continued outages across our platform. What I found in looking at it
was there seemed to be alot of sessions open on the server with a
particularly odd looking query. Typically there are only 1 or 2 of
these running at a time however at this moment there were 56 of them
running. When I ran the query manually through dbaccess it returned ok
taking about 1 minute. I noticed that alot of the queries were
identical and started looking through our software logs to figure out
why it was opening multiple new sessions with the same query. I found
that the code was timing out on database response and attempting to
kill the first session and open a new one. It was not actually closing
the first query and then opening a second and then timing out and so
forth.
Lets say we have 2 sides to our platform, the web side and the account
processing side. Both of these are critical and are required to be up
and working 24/7 which has always made running update stats a bit scary
and impossible and usually we try to only run it when needed to try and
avoid these types of issues. The account processing side is slightly
more important to most of our customers and typically when we have to
make a choice in which one to try and salvage we will sacrifice the web
side for the sake of the account processing side.
Having said that, once the web side was taken down and this query that
was running 56 times was closed, the account processing side began to
work efficiently even with update stats running. This was good news
for most, although considering the fact that update stats could run for
12 hours or so to complete having the web side down for that amount of
time is not really an option. So we began to look at various options
to provide some type tourniquet for the web side and limp along until
update stats completed.
The query in question is as follows:
Current SQL statement :
SELECT batch_mast.batch_number, brick_mast.brick_number,
pin_mast.seq_number, pin_mast.cust_ref1, pin_mast.cust_ref2,
pin_mast.amount_used, pin_mast.adjustments, pin_mast.retail_plan,
pin_mast.first_use, pin_mast.last_use, pin_mast.maint_last,
pin_mast.maint_next, pin_mast.maint_count, pin_mast.maint_total,
brick_mast.brick_status FROM pin_mast, brick_mast, batch_mast WHERE
pin_mast.customer_number="SOMECUST" AND pin_mast.pin="111111111111"
AND
pin_mast.dnis_set="SOMEDNIS" AND
pin_mast.customer_number=brick_mast.customer_number AND
pin_mast.brick_number=brick_mast.brick_number AND
pin_mast.batch_number=brick_mast.batch_number AND
pin_mast.customer_number=batch_mast.customer_number AND
pin_mast.batch_number=batch_mast.batch_number
I ran this query as I said and it returned in 1 minute on other
similarly designed servers this query took less then 1 second to run.
I figured this had to do with update statistics. I ran this with set
explain on both while dostats was running originally and while dostats
was not running at all and both returned the same result:
QUERY:
------
SELECT batch_mast.batch_number, brick_mast.brick_number,
pin_mast.seq_number, pin_mast.cust_ref1, pin_mast.cust_ref2,
pin_mast.amount_used, pin_mast.adjustments, pin_mast.retail_plan,
pin_mast.first_use, pin_mast.last_use, pin_mast.maint_last,
pin_mast.maint_next, pin_mast.maint_count, pin_mast.maint_total,
brick_mast.brick_status FROM pin_mast, brick_mast, batch_mast WHERE
pin_mast.customer_number="SOMECUST" AND pin_mast.pin="111111111111"
AND
pin_mast.dnis_set="SOMEDNIS" AND
pin_mast.customer_number=brick_mast.customer_number AND
pin_mast.brick_number=brick_mast.brick_number AND
pin_mast.batch_number=brick_mast.batch_number AND
pin_mast.customer_number=batch_mast.customer_number AND
pin_mast.batch_number=batch_mast.batch_number
Estimated Cost: 2
Estimated # of Rows Returned: 1
1) informix.batch_mast: INDEX PATH
(1) Index Keys: customer_number batch_number (Key-Only) (Serial,
fragments: ALL)
Lower Index Filter: informix.batch_mast.customer_number =
'SOMECUST'
2) informix.pin_mast: INDEX PATH
Filters: (informix.pin_mast.pin = '111111111111' AND
informix.pin_mast.batch_number = informix.batch_mast.batch_number )
(1) Index Keys: customer_number dnis_set (Serial, fragments: ALL)
Lower Index Filter: (informix.pin_mast.customer_number =
informix.batch_mast.customer_number AND informix.pin_mast.dnis_set =
'SOMEDNIS' )
NESTED LOOP JOIN
3) informix.brick_mast: INDEX PATH
(1) Index Keys: customer_number batch_number brick_number
(Serial, fragments: ALL)
Lower Index Filter: ((informix.pin_mast.batch_number =
informix.brick_mast.batch_number AND informix.pin_mast.customer_number
= informix.brick_mast.customer_number ) AND
informix.pin_mast.brick_number = informix.brick_mast.brick_number )
NESTED LOOP JOIN
We attempted to add a directive but that did not produce any different
result either. After further investigation we found that this query was
not all that efficient anyway and we only needed 2 of the 3 tables to
retrieve the required data set. We reworked the query with only a 2
table join and this returned in 3 seconds with update stats running,
the results of set explain are as follows:
QUERY:
------
SELECT p.batch_number, b.brick_number,
p.seq_number, p.cust_ref1, p.cust_ref2,
p.amount_used, p.adjustments, p.retail_plan,
p.first_use, p.last_use, p.maint_last,
p.maint_next, p.maint_count, p.maint_total,
b.brick_status FROM pin_mast p, brick_mast b
WHERE
p.customer_number="SOMECUST" AND p.pin="111111111111" AND
p.dnis_set="SOMEDNIS" AND
p.customer_number=b.customer_number AND
p.brick_number=b.brick_number AND
p.batch_number=b.batch_number
Estimated Cost: 1
Estimated # of Rows Returned: 1
1) informix.p: INDEX PATH
Filters: informix.p.pin = '111111111111'
(1) Index Keys: customer_number dnis_set (Serial, fragments: ALL)
Lower Index Filter: (informix.p.dnis_set = 'SOMEDNIS' AND
informix.p.customer_number = 'SOMECUST' )
2) informix.b: INDEX PATH
(1) Index Keys: customer_number batch_number brick_number
(Serial, fragments: ALL)
Lower Index Filter: ((informix.p.batch_number =
informix.b.batch_number AND informix.p.customer_number =
informix.b.customer_number ) AND informix.p.brick_number =
informix.b.brick_number )
NESTED LOOP JOIN
we decided to do a code change and replace the 3 table join with the
above 2 table join and restart the web side of our platform. This
worked and both the web side and the account processing side were able
to work without any additional timeouts or issues even while update
stats continued to run for a full 12 hours.
-------------------------------------------------------------------------------------------------
- References:
- Update Statistics and poor performance
- From: fictishas
- Update Statistics and poor performance
- Prev by Date: Re: Looking for a job...
- Next by Date: RE: Looking for a job...
- Previous by thread: Update Statistics and poor performance
- Next by thread: Informix Restore on Tru64
- Index(es):
Relevant Pages
|