Re: Time series storage




"Juan Carlos" <jgil@xxxxxx> wrote in message
news:1129643558.474622.138670@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> Hi database experts!
>
> I have the need to store a very large (hundreds of millions) series of
> scientific observations. Each sample is made of an identifier (an
> integer), a timestamp, a floating point value, and some ancillary data
> (think a few integer fields); the "record" size is fixed at 22 bytes.
>
> The access pattern is:
>
> * All writes are performed from a single place, correspond to new data
> (the timestamp is the current time) and arrive at a pace of some 100
> records per second.
>
> * A few (up to, say, five) clients need to access historical data at
> the same time. The most common query will look like "give me all
> records between these two dates with identifiers 'foo' and 'bar'"
>
> Drivers are storage size, access performance and open source.
>
> As records are fixed-size, queries are sequential in time, and writes
> are to be performed for new data only, my first idea was to implement
> from scratch a fixed-record flat file where each record is an
> observation. But I know that reinventing the wheel is a bad thing, so I
> thought about giving common databases a try. In addition, I would have
> added goodies such as SQL.
>
> I tried MySQL, and it worked nicely. The size overhead is small even
> with indexes over the identifier and timestamp (that's what gave me the
> best performance). But, unfortunately, I learnt that MySQL can not be
> used for non-GPL commercial applications. Too bad.
>
> Then I moved to Postgres, but size overhead is enormous for such small
> size records as a 24 byte header is added to each one.
>
> What is the opinion of you experts out there? Is from scratch
> development a sensible approach (ugh!)? Is there any open source
> product which could fulfill my needs?
>
> Thank you very much in advance!
>
> Juan Carlos---
>

It isn't unusual that indexes take up more space than tables. Indexes have
to contain the data and "pointer" to the data in the table. I am familiar
with a non-opensource database and it has a feature where you can store the
table and the primary key index in the table (one and the same) So this
saves space. I don't understand your space concern. Lets say you are going
to have 500,000,000 rows. Lets say the table and index take up 60 bytes.
That means we need 280 gigs og storage. (500,000,000 X 60 / (1024 X 1024 X
1024)) You can get 300 gig sata drive for like $450. For safety you mirror
the data and buy two for $900. If you need more storage Apple makes a very
nice storage array for like $13,000 for 7 terabytes. So even if you get
the storage needs down to 30 bytes that only saves 140 gigs.
Jim


.



Relevant Pages

  • Re: Problems with querying date field
    ... >> If you want to put a database somewhere for safekeeping, ... database - SQL isn't about persistence, ... says nothing about the physical storage media to be used. ... store data on a tape, or a disk, or a CD, or a file, or any physical ...
    (microsoft.public.sqlserver.programming)
  • Re: Time series storage
    ... scientific observations. ... Each sample is made of an identifier (an ... with a non-opensource database and it has a feature where you can store the ... nice storage array for like $13,000 for 7 terabytes. ...
    (comp.databases)
  • Re: Information store size considerations during an Exchange 5.5 to 2003 migration
    ... Target store will be approximately the same in size because single instancing of storage is maintained. ... Full-text index will add 8-10% to the size of database. ... You also will be able to model different splits of mailboxes by SGs and DBs. ...
    (microsoft.public.exchange.admin)
  • Re: Compression Tool
    ... Our experience has been that on balance, storing "things" in a sql server database does turn out OK. ... Obviously you have to temper this with the actual usage but often the primary correspondent in these threads declines to explain this in any detail so off you go on a specific tangent that supports document storage only to find it is an image-only repository. ... NTFS is a horribly inefficient way to store a million jpgs with just the folder/naming nomenclature a significant problem. ... We have worked with SharePoint for several years now and I am constantly impressed with the way it is able to deal with large quantities of files and sites. ...
    (microsoft.public.sqlserver.server)
  • Re: Exchange 2003 Transaction Logs Growing Too Quickly
    ... The version store is used by Extensible Storage Engine to track and to ... changes that are made to the database. ... Therefore, if a transaction is open for a long time, the transaction ...
    (microsoft.public.exchange.admin)

Loading