Re: general design issue



On Apr 23, 12:00 am, skink <psk...@xxxxxxxxx> wrote:
hi,

i don't know much about SQL - just basics: tables, indexes, simple
selects and
i'm designing simple database storing key/value data (in fact its
read
only dictionary)

the schema is very straightforward:

CREATE TABLE words (key TEXT, entry TEXT)
CREATE INDEX wordsIdx ON words (key)

the database engine is quite slow (sqlite on embedded device) so
speed
is by far the **most**
important since i need my data in short time (40-60 ms).

the problem i have is as follows: for any key prefix (even not
existing in the database)
i'd like to know if there are keys starting with given prefix + 'a',
'b', 'c' etc

in other words: for prefix lets say 'inte' i'd like to know if there
are keys starting with
'intea', 'inteb', 'intec', 'inted' etc (just need true/false for each
letter not number of keys matching)

i cannot just run 26 queries for:

SELECT key FROM words WHERE key >= 'inteX' LIMIT 1

where X is letter [a..z] since each query takes ~15 ms

of course i don't need to iterate over all 26 letters: for example
when looking for 'intea'
my query might return 'integral' (means that there is no "intea",
"inteb", "intec", "inted", "intee"
and "intef" prefixes since the first key found has prefix "integ") so
i could start with
'inteh' in next iteratiion. but still in the worst scenario i need 26
queries...

my solution is to create second table storing prefix/32-bit-integer-
map where
each bit indicates one letter (for latin alphabet only 26 bits are
used)

CREATE TABLE prefixes (prefix TEXT, bitmap INTEGER)
CREATE INDEX prefixesIdx ON prefixes (prefix)

and use it as follows:

SELECT bitmap FROM prefixes WHERE prefix = 'inte'

this is very fast since i need only one database access to find out
the answer for my question.

however for longer prefixes it's quite expensive, e.g. for prefix
'internationalizatio' i have to store 19
prefixes just to find out there is one or two words starting with
that
prefix.

so my question is how to make it better keeping in mind that speed is
**key** factor but
database size is also important.

any ideas?

thanks
pskink

Maybe what you need is not a relational database. For these sorts
of problems that involve key-value stores, I would use BerkeleyDB. As
it is embedded (as SQLite3), it has a small overhead, and it has
bindings (interfaces) to a lot of programming languages. (It is open
source and maintained by Oracle.)

The solution there resembles the table you have described. As a
hint I would use (in the case of BerkeleyDB) a BTree data store (if
you read through a tutorial you'll figure aut what I'm talking about),
and then when searching for a word, create a cursor (again see the
documentation), find the root word you are searching, and then just
keep going to the next word. Because of the BTree and the way the
cursor is created it assures that the next key will always be bigger
than the last one. And keep iterating until the root changes. This way
you are sure you have seen all possible words starting with the root.

As a side-note, the same solution could be obtained by using any
relational database. For example just use a query like: <<select *
from words where key > root order by key asc>> and apply the same
thechnique as above. It should work fine.

Ciprian Craciun.

P.S.: I've used this approach to both BerkeleyDB (100 million
records), SQLite3, and Postgres (both 4 million records), and worked
just fine. (It was a benchmark for sensor readings, with a schema
somehow resembling yours, and the best performance was obtained by
using BerkeleyDB, but it needs some fine tuning.)
.



Relevant Pages

  • Re: Why the police now have to ask teenage muggers: Do you eat chips?
    ... can you prefix the subject with 'SHIT' so that sensible people can ... within an Every Child <atters programme. ... life tied to this database. ... If people don't resist information gathering and information ...
    (uk.legal)
  • Re: Internet/intranet approach
    ... Thanks for your replay, but as you can imagine, it isn't the answer I'm ... My employer works with Sharepoint, this is an internet based program. ... this database has to be checked once ...
    (microsoft.public.word.vba.beginners)
  • Re: Internet/intranet approach
    ... My employer works with Sharepoint, this is an internet based program. ... this database has to be checked once ... Are you trying to search for *.htm or *.html files? ...
    (microsoft.public.word.vba.beginners)
  • unable to store data in DBI object
    ... used with the same database but different sets of data. ... idea would be to store the prefix as part of the DBI object (eg: ... my $sql = new sql( ...
    (comp.lang.perl.misc)
  • Re: Multiple applications on 1 database - Namespacing ?
    ... >> and live database, I'm reluctant to get my feet too wet into ... > prefix them with something, you'll probably want to give them a logical ... almost no similarities to the new ones, and I'm explicitly told to use this ... specific tables and procedures nearly finished (on a test sql server), ...
    (microsoft.public.sqlserver.programming)