Re: Large scale processing - language/toolkit choice



On 28 Jun, 17:56, "vice" <team.f...@xxxxxxxxx> wrote:
Hello there

I am currently involved in a project analysing vast amounts of raw
acoustic data (~10TB).

Some background..
Essentially this data is from a hydrophone array, and was used as part of
a particle physics experiment. Out of curiosity, we plan to run analysis
over the files and try isolate bioacoustic sources in the data.

What does 'isolate' mean? If this has anything to do with
Collins & Kuperman's "focalization" 'technique' - walk away
from the project. Now. It's in your best interst.

What I would like to know is recommendations for
software/language+libraries for analysing these files quickly and
efficiently.  

Computing power is not really limited, have access to a 50 core linux
cluster on which I can run batch jobs, but having such large amounts of
data, efficiency is a priority.

I have been playing around in matlab, but seeing as I dont have much
experience with it, and the fact its an interpreted, not compiled system, I
wondered if there were good, free libraries for C,C++,python, Java etc and
if they would be better than matlab for this large job.

There is a trade-off, which is based on

1) If this is a once-only project or a repeated excercise
2) How much over-all time you have to get the job done
3) How important real-time capacities are
4) How much time you want to spend programming
5) What previous programming experience you have

Items 1 and 2 first: If this is a once-only project and you
only have a few weeks or months to analyze the data, forget
about programming and use matlab.

Set up a sequence of batch jobs and let them run overnight,
over weekends and so on.

If you want real-time capacity, that is, you see an interesting
segment of data, then matlab is more than fast enough to
produce a result within at most a couple of minutes - provided
you do well-behaved analysis.

If your priority is to analyze data, and not programming,
then use matlab, as you will get working programs quickly.

If you don't have previous programming experience, use
matlab, since the learning curve is flat and you get
working programs quickly.

If you want to go with C++, the two main issue are with
numerical libraries and training. There are no easy-to-use
link-straight-in libraries for linear algebra, that I know
of. You *can* try LAPACK, but it is first of all a couple
of decades old, and was written for FORTRAN. The other issue
with C++ is that it is difficult to use in these applications.
It takes years of training and experience to get C++ to
run fast. C is pretty much the same as with C++; maybe a bit
easier to make run fast, but still demanding to learn.
Java is slower than C and C++, as it runs (used to run) on a
virtual computer. Python - maybe. It is supposed to have
some numerical capacity, but I have no experience with it.
Fortran - forget it. Lots of numerics, but cumbersome to
use for the infrastructure that surrounds the computations
at the core.

Rune
.



Relevant Pages

  • Re: [OT?] Matlab usage, when, where, why?
    ... I would say MATLAB code is ... > The one programming technique I have not seen in matlab, ... But that goes for any programming language, ... >> novice to hack someone's ML script and then not document his hack ...
    (comp.soft-sys.matlab)
  • Re: zero based arrays?
    ... programmed outide your safe little Matlab world. ... programming language) and reality. ... Any indexing is based on convention. ...
    (comp.soft-sys.matlab)
  • Re: About matlab object oriented programming?
    ... > Since matlab is usually used for prototyping and the ... justify the switching from functional programming to OO ... there is no limitation w.r.t. the MATLAB language for deployment. ...
    (comp.soft-sys.matlab)
  • Re: Superiorto(class name)
    ... I think there are advantages to the way MATLAB developed its ... What is P/R programming? ... MATLAB does have an object oriented model whether or not it strictly ... I will accept for now MATLAB doesn't fit into standard OOA/D given my ...
    (comp.object)
  • Re: Seg-fault on Matlab exit after running mex-file
    ... I have a problem with some Fortran mex-files. ... On our Linux x86-64 cluster (which runs Matlab R2006b), ... I don't have any hands-on experience with multi-thread ... with the intent to check out multi-thread programming. ...
    (comp.soft-sys.matlab)