Re: Time series storage
- From: "Jim Kennedy" <kennedy-downwithspammersfamily@xxxxxxxxx>
- Date: Tue, 18 Oct 2005 07:15:56 -0700
"Juan Carlos" <jgil@xxxxxx> wrote in message
news:1129643558.474622.138670@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> Hi database experts!
>
> I have the need to store a very large (hundreds of millions) series of
> scientific observations. Each sample is made of an identifier (an
> integer), a timestamp, a floating point value, and some ancillary data
> (think a few integer fields); the "record" size is fixed at 22 bytes.
>
> The access pattern is:
>
> * All writes are performed from a single place, correspond to new data
> (the timestamp is the current time) and arrive at a pace of some 100
> records per second.
>
> * A few (up to, say, five) clients need to access historical data at
> the same time. The most common query will look like "give me all
> records between these two dates with identifiers 'foo' and 'bar'"
>
> Drivers are storage size, access performance and open source.
>
> As records are fixed-size, queries are sequential in time, and writes
> are to be performed for new data only, my first idea was to implement
> from scratch a fixed-record flat file where each record is an
> observation. But I know that reinventing the wheel is a bad thing, so I
> thought about giving common databases a try. In addition, I would have
> added goodies such as SQL.
>
> I tried MySQL, and it worked nicely. The size overhead is small even
> with indexes over the identifier and timestamp (that's what gave me the
> best performance). But, unfortunately, I learnt that MySQL can not be
> used for non-GPL commercial applications. Too bad.
>
> Then I moved to Postgres, but size overhead is enormous for such small
> size records as a 24 byte header is added to each one.
>
> What is the opinion of you experts out there? Is from scratch
> development a sensible approach (ugh!)? Is there any open source
> product which could fulfill my needs?
>
> Thank you very much in advance!
>
> Juan Carlos---
>
It isn't unusual that indexes take up more space than tables. Indexes have
to contain the data and "pointer" to the data in the table. I am familiar
with a non-opensource database and it has a feature where you can store the
table and the primary key index in the table (one and the same) So this
saves space. I don't understand your space concern. Lets say you are going
to have 500,000,000 rows. Lets say the table and index take up 60 bytes.
That means we need 280 gigs og storage. (500,000,000 X 60 / (1024 X 1024 X
1024)) You can get 300 gig sata drive for like $450. For safety you mirror
the data and buy two for $900. If you need more storage Apple makes a very
nice storage array for like $13,000 for 7 terabytes. So even if you get
the storage needs down to 30 bytes that only saves 140 gigs.
Jim
.
- Follow-Ups:
- Re: Time series storage
- From: Juan Carlos
- Re: Time series storage
- From: wem
- Re: Time series storage
- References:
- Time series storage
- From: Juan Carlos
- Time series storage
- Prev by Date: Time series storage
- Next by Date: Re: How to pair-up/match rows in table? Pair "up/down" or "on/off"
- Previous by thread: Time series storage
- Next by thread: Re: Time series storage
- Index(es):
Relevant Pages
|
Loading