Layout for HDF5 files
- From: happysegfault@xxxxxxxxx
- Date: 3 Aug 2005 18:01:22 -0700
Hi group,
I'm looking to build a database of numerical data, and I'm evaluating
HDF5 (via PyTables, most likely) for storing the database. The data
the database will be built from is collected now in separate binary
files with between zero and ~1000 events, and each event is described
by x,y coordinates, a magnitude, and a detector channel ID (possibly
stored as an 8-bit int). Each file will be complete before it is added
to my database. Associated with each file will be a unique
~15-character string that identifies the measurement.
I might end up with about a million measurements, each with around four
detector channels, each channel with zero to ~1000 events.
Can anyone offer some advice on how I should structure this data into a
single HDF5 file? I would like to be able to retrieve efficiently all
events in a certain magnitude range across a large set of measurements.
Here are some alternative ideas on how to lay out my HDF5 file.
1: Put all the events into one group of datasets, with the x, y,
magnitude in one float32 dataset with ~1E6 rows and 3 columns, the
detector channels in a 1D uint8 dataset with ~1E6 rows, and the
measurement IDs in a 1D string dataset. I would want to create an
index for fast lookup of rows matching a measurement ID. The drawback
here is that I would store the same ~15 character string measurement ID
for every event in that measurement. Maybe compression filters would
make that bearable. The datasets would need to be enlargeable to add
measurements as time goes by.
2: Create a group for each measurement, and put the data from each
measurement in datasets under its group. I'm unsure if this will be
efficient when I'm trying to query the database. With this approach,
all the datasets can be fixed-size.
3: Same as 2, but arrange the groups into a binary search tree to speed
up queries.
Does anyone have any wisdom to share?
Thanks,
John
[Please reply to the group, so everyone can learn]
.
- Follow-Ups:
- Re: Layout for HDF5 files
- From: Francesc
- Re: Layout for HDF5 files
- Next by Date: Re: Layout for HDF5 files
- Next by thread: Re: Layout for HDF5 files
- Index(es):
Relevant Pages
|