Re: index bloat?



sql_server_2000_user (sethpurcell@xxxxxxxxxxx) writes:
> I updated statistics on the loan table and it had no effect.

And you used FULLSCAN?

Did you also update statistics (WITH FULLSCAN) on loan_history?

> Was that a typo and you meant to say it sounds like a good plan?

Yes. Don't believe everything I say. :-)

> Because I forced this plan with the following query:
>
> select * from (
> select * from loan
> where loan.deal_no='ML4W1') as x
> inner join loan_history as lh
> on x.exloan_id=lh.exloan_id
> where time_period=196
> option (force order, loop join)
>
> and it is so fast it makes me cry - 1 second if cached, 5 seconds if
> not cached and a large deal. Here's the plan:

And that is indeed the plan we are looking for!

> and it takes a couple minutes if not cached, 5 seconds if cached -
> definitely better than a couple hours, but nowhere near my
> corrected-nested-loop. Here's the plan:

Certainly better, but still scanning the entire time_period 196.

> And now for the obvious/hard question: now that I know what the right
> execution plan is, how can I get the query optimizer to generate it -
> *without the hints*? Because some users of the DB are using report
> designing software that will just generate the join query with no
> hints, and I don't want them sitting there for hours when they don't
> have to. Does the optimizer not choose the merge join because it
> requires a bookmark lookup? Does it not choose my nested loop join
> because the statistics are wrong somewhere?

I have no idea. I have posted a question to our internal MVP forum
to get some suggestions. So, OK, I have a vague guess: time_period
is a bit too unselective to be the first column in the index. The
statistics for the index, has a distribution histogramme for this
column only.

Thus, here is an idea: Make the PK on loan_history unclustered,
and then add a clustered index on (exloan_id, time_period), that is
the reverse key. But this is a very costly operation to do on a
305 million row table, and there are other queries that could take
a tool. (Hm, if you add a non-clustered index on the reverse only?
Could that help?)

There is also the idea that I've mentioned before, make the
index on loan.deal_no the clustered on that table. That would at
least be faster to implement. But that is also likely to have
ramifications on other queries.



--
Erland Sommarskog, SQL Server MVP, esquel@xxxxxxxxxxxxx

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp

.



Relevant Pages

  • Re: Planning multiple queries
    ... > so it seems to me a new plan should be prepared for the second query, ... If you submit them all in one batch, SQL Server generates a query plan ... If no statistics get updated as result of the query, ...
    (comp.databases.ms-sqlserver)
  • Re: group by clause filling up temp space
    ... > Here is the explain plan. ... You should seriously consider tracing the session executing this query, ... mention of how current the statistics are. ... might want to adjust this to a lower value (I set it to 15 in the ...
    (comp.databases.oracle.server)
  • Re: QUERY TUNING
    ... IN THE ABOVE QUERY I HAVE A NORMAL INDEX IN THE BRAN_CODE COLUMN.. ... Cost  | ... statistics are not ran by default. ... Otherwise the explain plan can't be used. ...
    (comp.databases.oracle.server)
  • Re: index bloat?
    ... > after running the query, not from the estimated execution plan you can ... > I guess what I expect the optimizer to do is: ... that sounds like a bad plan. ... run "UPDATE STATISTICS loan WITH FULLSCAN" and try the ...
    (comp.databases.ms-sqlserver)
  • more unintuitive behavior
    ... i just updated statistics (with fullscan) on a set of tables that are ... referenced by a query, and now, the query runs slower than it did ... before i updated the statistics. ...
    (microsoft.public.sqlserver.server)