Re: When to use sub-values



it's been quite a while since i poked my head in (hey, jeff!)

....this has been one of my pets, too. i've been confortable with a
solution and weighting the priorities required here for which i've
normalized through the aid of the following:
1) every problem (since i'm a solutions person, i simply call them
"applications") will eventually grow to becoming a data-base problem
across any discipline and meta-level. therefore, to allow for maximum
growth and without boxing oneself into a corner, one must generalize
from the outset as if each discrete problem is a database problem --
including those broken down to, or built up from, the
micro/macro-levels. the same applies to data problems themselves
(conveniently allowing us common meta-meta application of generalized
solutioning.) whenever this is ignored (ie. that problems are database
problems), one will discover oneself reinventing the database wheel as
part of the application until you end up recreating a mini-database
management system, within a database management system, within a
database management system... within your program logic. yes, you can
do it, and it is always a fun excersize to see how far you can go
without boxing yourself in or perhaps rediscovering a new optimum
efficiency for a given solution point in time.. but why? especially
once you understand the above.
2) "optimizations" are merely optimizations. if you utilize a solid
methodology going into a scenario, your optimizations are not far
behind. further, if kept in check and on the fringe, can be readily
backed out as the system/sub-system/sub-sub-system... evolves. the
main reasons to have embedded data structures are data containment,
transactional containment and possible (but not guaranteed)
optimization in data access/retrieval (depending on circumstances, for
which it can actually become worsened as the amount of embedded data
grows.)
3) in problem solving, it's tempting to think "reductionistically" to
the point of excluding modest foresight and accepting that the first
path understood (while ignoring all others) is efficient in its
approach; but this tends to be lazy more than efficient -- and should
not be reasoned as any K.I.S.S. paradigm.
3) to apply the above to the thread's discussion, imagine yourself the
application, the executive program responsible for the data. your job
may be initially to simply add more data elements. okay, you figure
it's just a handful so what the heck, i'll just iterate through the
data elements and search for the controlling attribute, eg. the
sub-SKU, to preserve uniqueness, and then tack the item at the
beginning/end of the controlling value and associated dependent
values/subvalues/sub-subvalues... it's just a handful of code. Then
you figure, why iterate through to my index-point when I can get there
with a LOCATE? Next, you'll find some need to sort the items, so either
you play around with the old bubble-sort or perhaps a more
sophisticated variation or you construct the data so you can use the
PICKBASIC SORT() statement and preserve indexing. then you figure to
simply maintain the data in a presorted sequence using the sort-feature
within LOCATE (depending on the controlling-nesting levels you want to
drop to.) unfortunately, you later determine that you need to sort by
the other dependent fields, too, so we're back to the ol' adhoc
variety. how sad, that we've butchered the entire RDBMS and
reimplemented our own simulated RDBMS to be able to store/retrieve/sort
unnormalized embedded data structures within data structures...
4) now assume complexities of today's data part requirements: assume
some inventory parts from vendors have unique SKU's for the lowest
details of the cartesian product (eg. for each Style, Material, Finish,
Color, Size, Options, Unit). Others parts may have unique SKUs only
for each Style-Size-Unit grouping; others for each Style-Unit grouping;
others for each SIZE-Unit, etc. Next, the vendor may require a unique
multi-part key for certain pairings with/without unique sub-SKUs (eg.
Major Part#, Rocker, Blue, Matte, Large, $25 -- when paired together
define the discrete order to the vendor.) Similarl issues exist for
efficiently associating the PRICING to intermixed associations of
DIADs, TRIADs, etc. -- easily & potentially requiring a full cartesian
product or 2^^7 permutations in this example of
Style-Material-Finish-Color-Size-Options-Unit.
5) Finally, in determining our priorities, a good variable to help
visualize the problem is shifting our attention to the presentation
side and recognizing the complete underlying data access requirements:
especially when you want to economize on screen real-estate. here's
where the real fun begins... suppose we want to do a web presentation
from a catalog of 100,000's of items with an intuitive UI (and for
which a user is not ordering by SKU). importantly we do not want to
have present the user with 2^^7 primary variations on the screen or
even 10 when we can get away with one main one and have various
uniquely-paired drop-down boxes for each attribute (eg. style,
material, finish, color, size, options, unit, price) under
ultra-sophisticated rules. unfortunately the dependancies between
these attributes will vary by part number. for example. if i buy a
T-shirt and (S,M,L,XL,XXL-sizes) all have the same price, then i don't
need to have unique pricing for them -therefore not requiring a noisy
screen and only requiring a drop-down for size. but maybe a fur coat
has a premium on XXL, so i either create and present separate major
part items (which i don't want to have to do) -or- i explode the
cartesian out fully for all of Size-Price -or- i explode a partial
cartesian out on: S-M-L-XL-PRICE1 and XXL-PRICE2 (in a UI that
maintains the pairing.) This becomes complex when (on a part-item
basis) certain paired attributes align in relationship while others do
not. For example Finish may be PRICE-independent and certain Sizes are
only allowed for certain Finishes, as simply one example.

Understanding the above data manipulation/access requirements otherwise
performed by an RDBMS, it would easy to become innundated with
additional code complexity especially should the code also have to
totally reimplement the logic for an RDBMS sub-system intermixed with
the business logic -- and simply to embed nested data-elements to
possibly save a disk I/O? Further as the number of sub-elements grows,
the disk I/O savings becomes lost and in fact worsened for each
additional data-frame that the item grows in size by. By focusing on
the solutions for number "5" above, you will be pleased with the
resulting data structures.

Regards! -dave

.



Relevant Pages

  • Re: Good Books on MultiValue Databases
    ... new info there, right?), the database structure when converting from ... an MV environment to an RDBMS actually does require re-architecting. ... Codd wrote his rules specifically ... management system, rather asking the management system to manage ...
    (comp.databases.pick)
  • Re: Good Books on MultiValue Databases
    ... new info there, right?), the database structure when converting from ... an MV environment to an RDBMS actually does require re-architecting. ... Codd wrote his rules specifically ... management system, rather asking the management system to manage ...
    (comp.databases.pick)
  • Re: MV Keys (was: Key attributes with list values)
    ... "the programmer knows what the data means". ... one of which was an entirely different database (and RDBMS) ... whose data was transferred at night by batch jobs. ...
    (comp.databases.theory)
  • Re: SQL
    ... >> appropriate data in the RDBMS. ... >Many RDBMS vendors supports distributed transactions. ... >Do you doubt that pre-OO applications make heavy use of embedded SQL? ... has more to do with making the database quickly and easily accessible ...
    (comp.object)
  • Re: The future of Linux
    ... Occasionally, I might use a spreadsheet, wordprocessor, or database. ... depending on the selection of OS your work uses. ... "bash history" that remembers your commands and allows you to search ... talking to remote databases without a GUI overhead across the network. ...
    (comp.os.linux.misc)