Re: update statistics



On Apr 23, 9:08 am, RedGrittyBrick <RedGrittyBr...@xxxxxxxxxxxxx>
wrote:
I want to demonstrate the potential effects of update statistics in a
repeatable way but am having trouble doing so. This is for a customer
using an elderly version of Informix SE.

Here's what I did.

1) create table item
(code integer, name char(30), type char(1), cost integer)

2) create index item_ix on item(code)
create index item_type_ix on item(type)

3) load from "item.csv" insert into item
I assumed this would be slow but avoid having the statistics updated by
creating indexes after loading. I want bad stats first. I'd previously
generated a millon lines of item.csv with sequential code values, fixed
name, type = random A-Z, cost = random 0-100

4) set explain on and time a bunch of queries
select sum(cost) from item where type = 'A'
select sum(cost) from item where type != 'A'
select sum(cost) from item where type between 'B' and 'Z'

5) update statistics for table item

6) Repeat timings. Peruse sqexplain.out.

What I found was that the query optimizer selected sequential scan for
"type != 'A'" (40s) but used the index for "type between 'B' and 'Z'"
(420s). Update statistics had no effect.

I guess I'm misunderstanding and have not devised a suitable scenario to
clearly demonstrate the effect of update statistics. Any clues or
suggestions?


As Art said, the SE optimizer doesn't do a lot with statistics. It is
also very fast in SE, and very simple. You can run UPDATE STATISTICS
(with no qualifiers whatsoever) periodically (say once a week, or once
a month; OTOH, daily doesn't hurt either) and unless you have a very
volatile table or two in the system, you'll be fine.

The SE optimizer is not cost-based - it is heuristic.

To the optimizer, there's a big difference between type != 'A' and
type BETWEEN 'B' AND 'Z'. The first includes [a-z0-9] and the
punctuation and the upper-half of the code set and control characters
and so on (as well as [B-Z]), and the second does not. Because
statistics are only statistics, the optimizer cannot rely on them
being accurate and never assumes that there are no values other than
the ones recorded in the statistics. Also, SE doesn't even use
distributions, so it would not know that the values are limited the
range A-Z; it might have second lowest (B) and second highest (Y) but
that's all.

In many respects, the fact that you don't have to worry about UPDATE
STATISTICS with SE is great -- it means it works well despite
neglect. The target for IDS is to get back to a state where you don't
have to worry about UPDATE STATISTICS either. It will still be a
while before IDS gets there.

-=JL=-

.



Relevant Pages

  • Re: How to capture SQL statement messages in DTS log file
    ... To capture the statistics date, look at the STATS_DATE function in the Books ... Then instead of PRINT in your DTS package, ... The execution of the following DTS Package succeeded: ... Update statistics for DYNAMICS User Tables ...
    (microsoft.public.sqlserver.dts)
  • Re: Auto update statistics
    ... We recently solved a problem that was> sucking up proccessor cycles and limited the number of clients on a> database server. ... It is highly likely that there> are numerous inefficient queries. ... On view was that having this on would cause spikes in>>> utilization whenever the auto update kicks in during peak load time,>>> and so this option should be turned off and update statistics should>>> be run via a job during off times. ... running a bunch of inefficient queries will put more>> stress on the server than allowing SQL Server to update statistics as it ...
    (microsoft.public.sqlserver.server)
  • Re: How to capture SQL statement messages in DTS log file
    ... To capture the statistics date, look at the STATS_DATE function in the Books ... I am rather new to SQL Server ... Then instead of PRINT in your DTS package, ... Update statistics for DYNAMICS User Tables ...
    (microsoft.public.sqlserver.dts)
  • RE: Query Help - Wrong Execution Plans
    ... My name is Michael and I would like to thank you for using Microsoft ... If I learned correctly that your query returned a large number of records ... statistics on indexed columns (as long as automatic query statistic ... we have to use UPDATE STATISTICS statement to update STATISTICS ...
    (microsoft.public.sqlserver.server)
  • Re: WITH (INDEX(ix_auftrag_knr))
    ... Ich verstehe, dass der Index nur dann benutzt wird, wenn der Optimizer das für sinnvoll erachtet. ... UPDATE STATISTICS hat nix gebracht, kann ja auch nicht, denn die Daten in der SQL-Tabelle sind noch auf dem gleichen Stand ...
    (microsoft.public.de.sqlserver)