Re: Matlab Vectorisation Speed - How is it done in c++?
- From: sturlamolden <sturlamolden@xxxxxxxx>
- Date: Sun, 16 Dec 2007 17:36:05 -0800 (PST)
On 17 Des, 01:05, Phil Winder <philipwin...@xxxxxxxxxxxxxx> wrote:
Im currently porting some matlab algorithms to c++ code. The test
code I have is testing the vector math capabilities and how fast they
can go. I have found that it can be very, very fast and I am
strugglling to reproduce the speed in c++. How does matlab do it? And
how can it be reproduced in c++?
Beating the performance of vectorized Matlab code is very hard, and
usually not worth the effort.
Matlab makes calls to optimized C and Fortran libraries such as blas/
atlas, lapack and fftw. You cannot duplicate their efficacies in C++
for at least two reasons:
1. There are issues related to the language syntax that makes Fortran
particularly easy to optimize for compilers, such as lack of pointer
aliasing. This is particularly important for optimal allocation of
registers when the CPU goes into a tight loop.
2. A lot of effort have been put into making these libraries as fast
as possible. This includes optimal use of cache and branch prediction.
Duplicating these efforts on your own is going to take the rest of
your life to complete.
My advice would be this:
If you want speed in your C++ app, link and call the same libraries as
Matlab do. Most of them are available for free. Give the C++ compiler
pointer aliasing hints wherever possible.
In addition:
Use optimization level 3 on numerical code and level 2 on non-
numerical code. Process the data in chunks that fit in your L1 cache.
Force the CPU to prefetch if you know it will help. Page-align your
arrays in RAM. Memory access is terribly slow, traverse as few times
as possible. Never use strided memory access. Manually unroll tight
loops. Exploit arithmetic pipelining of four subsequent operations.
Avoid divisions, transform to a multiplcation. Exploit multiple CPUs:
use MPI or OpenMP, forkjoin with labour-sharing threadpools, etc. Use
inline assembly to access SIMD parallel registers.
Also remember Hoare's statement about optimization, quoted by D.
Knuth: "Premature optimization is the root of all evil in computer
programming." Profile your code. Direct your optimizations to the
important bottlenecks. They are likely to be few. Do as much as you
can with the bottlenecks, and never mind the reminding 90% of your
code.
But before you begin: ask yourself if the hard work is goint to be
worth the effort.
.
- Follow-Ups:
- Re: Matlab Vectorisation Speed - How is it done in c++?
- From: Tim Davis
- Re: Matlab Vectorisation Speed - How is it done in c++?
- From: Phil Winder
- Re: Matlab Vectorisation Speed - How is it done in c++?
- References:
- Matlab Vectorisation Speed - How is it done in c++?
- From: Phil Winder
- Matlab Vectorisation Speed - How is it done in c++?
- Prev by Date: Re: Row normalize a matrix
- Next by Date: linprog, frontcon and portopt error
- Previous by thread: Matlab Vectorisation Speed - How is it done in c++?
- Next by thread: Re: Matlab Vectorisation Speed - How is it done in c++?
- Index(es):
Relevant Pages
|