Re: Large scale processing - language/toolkit choice
- From: Rune Allnor <allnor@xxxxxxxxxxxx>
- Date: Sun, 28 Jun 2009 10:33:48 -0700 (PDT)
On 28 Jun, 17:56, "vice" <team.f...@xxxxxxxxx> wrote:
Hello there
I am currently involved in a project analysing vast amounts of raw
acoustic data (~10TB).
Some background..
Essentially this data is from a hydrophone array, and was used as part of
a particle physics experiment. Out of curiosity, we plan to run analysis
over the files and try isolate bioacoustic sources in the data.
What does 'isolate' mean? If this has anything to do with
Collins & Kuperman's "focalization" 'technique' - walk away
from the project. Now. It's in your best interst.
What I would like to know is recommendations for
software/language+libraries for analysing these files quickly and
efficiently.
Computing power is not really limited, have access to a 50 core linux
cluster on which I can run batch jobs, but having such large amounts of
data, efficiency is a priority.
I have been playing around in matlab, but seeing as I dont have much
experience with it, and the fact its an interpreted, not compiled system, I
wondered if there were good, free libraries for C,C++,python, Java etc and
if they would be better than matlab for this large job.
There is a trade-off, which is based on
1) If this is a once-only project or a repeated excercise
2) How much over-all time you have to get the job done
3) How important real-time capacities are
4) How much time you want to spend programming
5) What previous programming experience you have
Items 1 and 2 first: If this is a once-only project and you
only have a few weeks or months to analyze the data, forget
about programming and use matlab.
Set up a sequence of batch jobs and let them run overnight,
over weekends and so on.
If you want real-time capacity, that is, you see an interesting
segment of data, then matlab is more than fast enough to
produce a result within at most a couple of minutes - provided
you do well-behaved analysis.
If your priority is to analyze data, and not programming,
then use matlab, as you will get working programs quickly.
If you don't have previous programming experience, use
matlab, since the learning curve is flat and you get
working programs quickly.
If you want to go with C++, the two main issue are with
numerical libraries and training. There are no easy-to-use
link-straight-in libraries for linear algebra, that I know
of. You *can* try LAPACK, but it is first of all a couple
of decades old, and was written for FORTRAN. The other issue
with C++ is that it is difficult to use in these applications.
It takes years of training and experience to get C++ to
run fast. C is pretty much the same as with C++; maybe a bit
easier to make run fast, but still demanding to learn.
Java is slower than C and C++, as it runs (used to run) on a
virtual computer. Python - maybe. It is supposed to have
some numerical capacity, but I have no experience with it.
Fortran - forget it. Lots of numerics, but cumbersome to
use for the infrastructure that surrounds the computations
at the core.
Rune
.
- Follow-Ups:
- Re: Large scale processing - language/toolkit choice
- From: Randy Yates
- Re: Large scale processing - language/toolkit choice
- From: vice
- Re: Large scale processing - language/toolkit choice
- References:
- Prev by Date: Re: Silly filtering question
- Next by Date: Re: Aliasing question
- Previous by thread: Large scale processing - language/toolkit choice
- Next by thread: Re: Large scale processing - language/toolkit choice
- Index(es):
Relevant Pages
|