Re: DB with list datatype?




Bill Karwin wrote:
dawn wrote:
You might be interested in reading the new IDC white paper entitled
"Because Not All Data is Flat: IBM's U2 Extended Relational DBMSs".
(I'm not sure why he opted to use the oft inflammatory "flat" word

Oh -- I thought you were going to say that the inflammatory wording is
"data is"! <g>

As the daughter of a linguist and a grammar school teacher, it took a
long time before I gave up and conformed on that one (at least some of
the time). Language does evolve, and it seems that "data" is now used
both as singular for a collection and plural. Because he is referring
to the shape of a collection in the title, he is using the singular.
Of course, I do understand if you wish to hold on to a purer approach.

Ok, I've read the white paper. U2 provides physical storage cohesion
for "multivalued fields" and "nested tables". That is, related data is
stored close together.

This is based on a DBMS architecture that dates back to products such as
Pick, UniData, and VMark. These offered multivalued and
multidimensional data modeling. The paper claims that this achieves
greater performance and scalability than relational systems, and
represents the data in a model that is easier to understand.

However, it is not clear from the white paper that the performance
improvement of the multidimensional data model is relevant today as it
was 10, 15, or 20 years ago.

I agree. Unless you are working with a single processor, this is less
relevant today.

Processing power, quantity of high-speed
RAM, and even speed of disk devices is orders of magnitude higher than
it was back then. For instance, decreasing the need for disk seeks
doesn't provide as much benefit, when a portions of data and indexes are
cached in random-access memory. Today's hardware resources can, to some
extent, change where the "bottleneck" is in data retrieval systems.

I would be surprised if the performance advantage of multidimensional
systems over relational systems is significant, given current hardware
resources. The IBM white paper fails to give any quantitative measure
to show this advantage.

I was surprised they did not give some of the benchmarks too. I can
find this one to quote. From an IBM employee in Nov '05

"This last weekend, we attained a new high number of users on a U2
database
of 15,200 on a single system running an application!

....Principle Consultant of the U2 Lab Services group, performed
this formal benchmark at one of the IBM benchmark centers on an
IBM p590
+ 64 dual core CPUs for a total of 128 processors
+ 124gb memory (no, that's not a typo - that's all the benchmark center
had available - odd number though)
+ 34 disk drives, striped with JFS2 on two FAStT controllers
....

The database had over 1 billion records in it."

I'm assuming that both multidimensional and relational systems require
"tuning" so that they make best use of the resources available.

There are tuning techniques, but many sites I know surprisingly employ
few of them. Some of the biggest performance gains can come from
distributing files wisely on disks (said loosely).

From my perspective, the huge gains are in people performance. I have
worked with teams working on U2 and on relational/SQL-DBMS's. My
anecdotal experience is that the "Pick" developers developed and
maintained software with much higher productivity.

Since I believed the claims of relational theory, I started researching
it myself, getting some blog entries in earlier this year at
www.tincat-group.com/mewsings (starting with the "Is Codd Dead?"
entry).

Given what I have read to date and what I have seen in practice, I
think the industry would be well-served to move more toward the
U2/MV/Pick approach with non-1NF data structures (such as multivalued
attributes), two-valued logic (a point on which many relational
proponents agree), variable length data as the norm, and possibly even
descriptive rather than restrictive/prescriptive schema (with duck
typing rather than strong typing).

Given that I cared about language at the start of this response, I
apologize for that last sentence. Cheers! --dawn

.



Relevant Pages

  • Re: Win2k3 R2 - Storage Reporting on File Svr Cluster (SCSI-Attach
    ... The 7.12 driver set from the link you gave me is actually the ones I used. ... I questioned the IBM hardware support staff I contacted on this ... Cluster Manager, I have an IPSHA Disk D: resource which is attached to the ...
    (microsoft.public.windows.server.clustering)
  • Re: vpath vs hdisk question
    ... vpath vs hdisk question ... Looks like fcs1 only has 2 paths to the disk. ... hdisk2 Available 11-08-01 IBM FC 2105800 ... This e-mail, and any attachments there to, ...
    (AIX-L)
  • Beta beta, a short FAQ
    ... This is version 5.5.01 of the "Frequently Asked Questions" commonly seen about the IBM Personal System/2 computers. ... The PS/2 division was IBM's biggest money maker for three years running in the early 90s, and IBM claimed that PS/2 was the most popular model of computer in the world. ... When you get the right one, go to Copy an Option Diskette in the Reference Disk Main Menu and it should do the work for you. ... formatted 1.44MB floppy in A: then run the self creating disk image file. ...
    (comp.sys.ibm.ps2.hardware)
  • Re: Win2k3 R2 - Storage Reporting on File Svr Cluster (SCSI-Attach
    ... I've tried to contact IBM on this and they gave me the runaround because I ... Cluster Manager, I have an IPSHA Disk D: resource which is attached to the ... fine with the IBM ipsha disk resource dll which shows up as a "IBM ServeRAID ...
    (microsoft.public.windows.server.clustering)
  • Re: Nested structures
    ... "Because Not All Data is Flat: IBM's U2 Extended Relational DBMSs" to ... The information about the embedded market growing faster than the ... Non-relational products are nothing new for IBM. ... Many, but not all, of the databases marketed as embedded are not ...
    (comp.databases.theory)