With just one Disk I/O
- From: "David Cressey" <david.cressey@xxxxxxxxxxxxx>
- Date: Thu, 24 Nov 2005 18:34:45 GMT
Over in another thread, DonR points out, with emphasis, that Pick can
retrieve a record with just ONE disk I/O.
I don't blame Don for drawing attention to that point. It's one thing that
can make Pick applications run blazingly fast under certain circumstances.
Caution: long post follows.
Here, I want to draw attention to how a (purported) relational DBMS can do
the same thing. My reference is to DEC Rdb/VMS, as the product stood in the
summer of 1994. Over 50% of the current code base was written since the
product came under Oracle, so my description may be out of date. But it
should serve for illustrative purposes. I believe that DEC Rdb/VMS IS a
relational DBMS, but some theoreticians may dispute that point. Whatever.
There is a unit of data that Rdb reads in a single disk read. It's called a
"Database Page". The address in filespace of a database page can be
computed by simple arithmetic from two numbers, called the Storage Area
Number and the Database Page Number. Rdb computes the address in filespace,
and the operating system fetches the appropriate disk block(s).
A database page can contain a variety of what I will call, for present
purposes, a "record". A record can contain a table row, a table row
fragment, an index node, a hash bucket, a duplicate node, a system record,
or some other things.
A database has been built and loaded for fast lookup of single orders.
The following query comes in...
SELECT *
FROM ORDERS O, ORDER_ITEMS I
WHERE O.ORDER_NO = I.ORDER_NO
AND O.ORDER_NO = '1234567';
In the real world the literal order number would have been provided by a
program variable, but that need not concern us here.
The optimizer looks at the query, looks at the metadata (which is in memory
due to a prior request). The metadata includes some volume statistics
concerning the size of tables. The optimizer now comes up with a strategy.
We've burned a little CPU time, but no Disk I/Os.
Now, Rdb hashes the order number in order to come up with a target page
number, in a storage area devoted to hash buckets that index ORDERS on
ORDER_NO. It then reads this entire Database page (along with a few
adjacent pages) into a memory buffer.
It looks for the relevant hash bucket, and by golly, there it is! (Well,
not by golly.... by design... I'll just say "by design" from here onward).
The hash bucket relates the key value '1234567' to the DB address of the
record containing the table row.
By design, it's the same page just read! With no more disk I/O's it now
has the contents of ORDERS for the desired row.
It now looks up in another index, the value '1234567' in a hashed index on
ORDER_ITEMS. By design it hashes to the same database page we already read!
And there's the second hash bucket, by design! This time, the index permits
duplicates, (obviously), and the key points to a "duplicate node"
somewhere in database land.
Where? In the same page we already read! (I hope you're starting to get
the point). The duplicate node contains pointers to six records containing
table rows. Why? because this order had six items in it.
Where are these six ORDER_ITEM rows? You guessed it! They are in six more
records of the database page we already read! Well, no, actually one of
the items was in an adjacent database page, but it turns out that the one
disk I/O we did read that page into memory, as well as the target page.
Now Rdb completes the join, and returns the result to the program. Total
Disk I/O's? One!
Compared to Pick, it's probably burned more CPU time, but's that not the
point, here.
Now let's try this query:
SELECT *
FROM ORDERS O, ORDER_ITEMS I
WHERE O.ORDER_NO = I.ORDER_NO
AND O.ORDER_DATE BETWEEN '1-NOV-2005' AND '30-NOV-2005';
A disappointing 200 disk I/Os. Much slower, but still lots better than a
sequential scan of the order table, which would have required about a
million I/Os.
Moving on, we try this:
SELECT *
FROM P.PRODUCTS, ORDER_ITEMS I
WHERE P.PRODUCT_ID = I.PRODUCT_ID
AND P.PRODUCT_ID = '98765';
Again a disappointing 50 disk I/Os. The indexes aren't hashed, and the
tables aren't colocated with the indexes.
Too bad.
But all three queries are logically correct, and give logically correct
results. The designer optimized the physical design for the most urgent
query, at the expense of deoptimizing for the less urgent ones. And it's
all done without stepping outside of the
SQL-Relational framework!
My point is that comparing the relational data model with the Pick data
model can be done without prejudice due to the ability of eith one to get
the necessary data in one disk I/O. It can be done with either Pick or Rdb!
.
- Prev by Date: Database Design problem
- Next by Date: Re: So what's null then if it's not nothing?
- Previous by thread: Database Design problem
- Next by thread: Hierarchy as 'UP' constraint
- Index(es):
Relevant Pages
|