Re: Optimizer Issues with 10.00.xC8



Thomas J. Girsch wrote:
Art S. Kagel (Oninit) wrote:
Well, unless you need colC to lead the index for sorting, how about an index on colA then colC? That would sidestep the low selectivity of the colC index. If you still need the colC only index for some queries, then keep both.

We have some queries that search based on ColA, some that search based on ColA and ColC combined, and some that search on ColC alone (but only when ColC is nonzero). Presumably we _could_ add an additional index on (ColA, ColC), but we shouldn't have to, especially since ColA by itself gets you plenty close enough -- within 18 rows out of 200 million makes for a pretty quick scan.

Interesting. That used to be a big problem. Since the engine doesn't know the values that you will ultimately supply to the replaceable parameters, the optimizer used to do a dumbed down version of the optimization that exhibited just the problems you are seeing with the queries working fine if it had exact values at prepare time, but not with parameters. I, and several other customers, complained about the problem and one of the big advances in the optimizer in 7.31/9.30 was for the engine to delay the final calculations of cost for these queries until it had the replacement values at cursor open/execute time.
This sounds like they somehow broke that in this release.

That doesn't seem to be it. I'm still getting inconsistent results, but what I'm finding is that which path it takes depends on which values it gets the _first time_ that the cursor is opened. So if it gets a ColC=0 request first, it picks the ColA index (the correct one), and does so for subsequent executions as well. But if it gets a ColC=[nonzero] request first, it picks the index on ColC, and seems to stick with it.

Again, this _only_ appears to be the case if there are distributions on the table. Without distributions, it's reliably picking the index on ColA, the correct one. Which is exactly backward from the behavior I'd expect to see. With the data skew on ColC, I would expect it to _never_ select that index when it has a value for ColA.

Makes sense to me. Without distributions it uses the older dumber optimizer algorithms which are dependend on the relative depth and number of unique values of the indexes (stored in sysindices) and the colmin & colmax values (stored in syscolumns) which are the second smallest and second greatest values for the column. Since zero is likely the smallest value of colC and the number of unique values for the colC only index is relatively low compared to the number of unique keys in the unique index on colA & colB (which is somewhere shy of 200million if I remember correctly) the colA, colB index is clearly the more selective according to the algorithms used in the absence of distributions.

Art S. Kagel
Oninit




===========================================================================================
Please access the attached hyperlink for an important electronic communications disclaimer:

http://www.oninit.com/home/disclaimer.php

===========================================================================================

.



Relevant Pages

  • Re: Optimizer Issues with 10.00.xC8
    ... one is a composite, unique index on colA, colB and the other is a nonunique index on colC. ... If I run full statistics with distributions, the optimizer picks the index on colC every time, which is Very Bad, and takes forever. ...
    (comp.databases.informix)
  • Re: Optimizer Issues with 10.00.xC8
    ...  one is a composite, unique index on colA, colB ... and the other is a nonunique index on colC ...  If I run full statistics with distributions ... You may indeed have found a bug in the 10.00.xC8 optimizer and should ...
    (comp.databases.informix)
  • Re: Optimizer Issues with 10.00.xC8
    ... First, if the selectivity of colA is so good, but with millions of rows the default resolution of the data distributions may be too low. ... Whereas for ColC, nearly 60% of the rows have a value of ColC = 0. ... what I've found is that queries run from dbaccess always choose the best index. ... The best solution for this is to UPDATE STATISTICS LOW...DROP DISTRIBUTIONS immediately before the truncate/load operation, run that initial round of time sensitive queries without data distributions, and reinstate the distributions as soon after the initial round of queries as is practical. ...
    (comp.databases.informix)
  • Re: Optimizer Issues with 10.00.xC8
    ... query runs), but the point is that in IDS 10.00.FC6, these queries ran ...  one is a composite, unique index on colA, colB ... and the other is a nonunique index on colC ...  If I run full statistics with distributions ...
    (comp.databases.informix)
  • Re: Optimizer Issues with 10.00.xC8
    ... That would sidestep the low selectivity of the colC index. ... We have some queries that search based on ColA, some that search based on ColA and ColC combined, and some that search on ColC alone. ... So if it gets a ColC=0 request first, it picks the ColA index, and does so for subsequent executions as well. ...
    (comp.databases.informix)