Re: simple sql using between startnumber and endnumber not performing



On 27 Jun., 23:41, y...@xxxxxxxxxxxxxxxxxxx (Malcolm Dew-Jones) wrote:
=?iso-8859-1?q?Benjamin_S=F8lberg?= (benjamin.soelb...@xxxxxxxxx) wrote:

: On 27 Jun., 05:45, y...@xxxxxxxxxxxxxxxxxxx (Malcolm Dew-Jones) wrote:
: > =?iso-8859-1?q?Benjamin_S=F8lberg?= (benjamin.soelb...@xxxxxxxxx) wrote:
: >
SNIP
: > $0.10

: Hello Malcolm,

: Thank you for your fast reply.
: I was thinking about something like this, not an index based but just
: the first digit as each number are all 8 digits.
: But your idea seems like a better way.

If they are actually strings then using the first digit would be the same
general idea.

: One question:
: If there are more than 9999 between fromnumber and tonumber will this
: still work ?

Are you saying that there is a correspondence between the two numbers in
each row? If so then some function of the two numbers should perhaps be
indexed instead of the two numbers independently.

: Say i select from 20000000 to 30000000.
: I belive not and in this case i should select using only startnumber
: and endnumber right ?

No, the underlying issue is the same.

The point of the index can be determined by trying to sort a list of pairs
of numbers. The following is sorted on the first number.

1 10
2 5
3 8
4 1
5 1
6 3

notice that sorting on the first number helps you find the first number
efficiently, but it will not help in finding the second number. On
average, the first number is in the middle of the list, but then you have
to scan through all the rows _starting_ at that point in the list to find
the second number. (i.e. on average you still half to scan through half
the rows).

What happens if you sort the above on both numbers? - well actually the
list is already sorted on both numbers! It doesn't help because the first
numbers are all unique so each first number corresponds to a single second
number.

The point of the suggested index is to partition the first numbers into
lumps. You can efficiently find which lumps probably have the first
number because that is sorted, but within each lump there is a sorted set
of second numbers to help you efficiently find the second numbers in each
lump. Each level of partitition should, on average, half the rows to be
examined.

The original numbers must be examined at some point, so you put them are
in the index so that only the index needs to be read, not the table.

HOWEVER, the indexing adds over head, so at some point the technique
starts to become less efficient, therefore experimentation is required to
see if this will even help in your situation.

Also, if the numbers already have many repetitions then the above will do
nothing except add over head, so as I said, experimentation and
understanding is required to see if this will even help in your situation.


Yes each startnumber is either less or equal to the endnumber.

Here is an example:

startnumber - endnumber

20000000 - 20000000
20000010 - 20000545
31234567 - 31234567
47462832 - 47469999

and so on up til the value 99999999.
Not all numbers exists and some may cross bounderies of others.

I belive i understand the "lump" part and in my case I should do the
following:

Create an index on substr(startnumber, 1, 4), substr(endnumber, 1, 4),
startnumber, endnumber

Then do a SQL like this :
SELECT *
FROM numbers
WHERE '3131' BETWEEN SUBSTR (startnumber, 1, 4) AND SUBSTR
(endnumber, 1, 4)
AND '31313131' BETWEEN startnumber AND endnumber;

Is this correct ?

Regards
Benjamin

.



Relevant Pages