Re: Delete row, ridiculous performance
- From: James Tursa <aclassyguywithaknotac@xxxxxxxxxxx>
- Date: Wed, 29 Oct 2008 08:11:10 GMT
On Fri, 24 Oct 2008 07:46:10 -0700 (PDT), pbocchini@xxxxxxxxx wrote:
Hi, I was trying to speed up a code and I realized that a huge amount-------------------------------------------------------------------------------
of time was spent by trivial operations like removing a row. I made a
very simple test and the following lines come from the profiler:
< 0.01 1 1 nrows=30000;
< 0.01 1 2 ncols=3;
3
< 0.01 1 4 g=rand(nrows,ncols);
< 0.01 1 5 g2=g;
6
< 0.01 1 7 for j=1:nrows-1
12.44 29999 8 g1=g(j+1:end,:);
12.33 29999 9 g2(1,:)=[];
0.22 29999 10 end
As you can see, I tried both redefining the matrix g1 every time or
deleting a row (the effect is the same), and in both cases, it takes
way too much.
I can't believe there isn't a fast way to get this. At ewery iteration
of a loop, I need to delete the first row of a matrix. Shouldn't it be
easy and fast?
Thanks in advance for every suggestion!
Paolo
Here are some suggestions, with varying degrees of success depending
on how desperate you are and how flexible you are in changing your
data structure.
-------------------------------------------------------------------------------
Option 1) Your method:
..
nrows = 30000;
ncols = 3;
g = rand(nrows,ncols);
tic
g2 = g;
for j = 1:nrows-1
g2(1,:) = [];
end
toc
..
Elapsed time is 14.755937 seconds.
-------------------------------------------------------------------------------
Option 2) Do the row deletion inside a c-mex routine:
..
nrows = 30000;
ncols = 3;
g = rand(nrows,ncols);
tic
g2 = g;
for j = 1:nrows-1
g2 = deletefirstrow2(g2);
end
toc
..
Call this file deletefirstrow2.c:
#include <string.h>
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray
*prhs[])
{
mwSize m, n, i, m1, nbytes;
double *source, *target;
m = mxGetM(prhs[0]);
n = mxGetN(prhs[0]);
m1 = m - 1;
plhs[0] = mxCreateDoubleMatrix(m1, n, mxREAL);
target = mxGetPr(plhs[0]);
source = mxGetPr(prhs[0]);
nbytes = m1 * sizeof(double);
for( i=0; i<n; i++ ) {
memcpy(target, source, nbytes);
target += m1;
source += m;
}
}
..
Elapsed time is 12.553881 seconds. (17% faster)
-------------------------------------------------------------------------------
Option 3) Do the row deletion inside a c-mex routine in-place. This
option requires that you make sure that g memory is not shared with
any other variable. i.e., you can't build g with a line like g = x,
and you can't create other variables with lines like x = g. This mex
routine works by moving memory inside the data area of the input
variable, which is a bit risky unless you are very careful not to
share memory. No new memory allocation is done. At the end of your
m-file loop you will only have one row left for g even though the
memory is still allocated for the full amount. This will not be a
problem for the memory manager, since all memory for g will get
properly cleared even though it is larger than the size indicates.
..
nrows = 30000;
ncols = 3;
g = rand(nrows,ncols);
tic
for j = 1:nrows-1
deletefirstrow3(g);
end
toc
..
Call this file deletefirstrow3.c:
#include <string.h>
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray
*prhs[])
{
mwSize m, n, i, m1, nbytes;
double *source, *target;
m = mxGetM(prhs[0]);
n = mxGetN(prhs[0]);
m1 = m - 1;
target = mxGetPr(prhs[0]);
source = target + 1;
nbytes = m1 * sizeof(double);
for( i=0; i<n; i++ ) {
memmove(target, source, nbytes);
target += m1;
source += m;
}
mxSetM(prhs[0],m1);
}
..
Elapsed time is 3.658094 seconds. (303% faster)
-------------------------------------------------------------------------------
Option 4) Completely reformulate your problem. Instead of working with
a matrix of rows and deleting rows, restructure your data so that you
are working with a matrix of columns and deleting columns. And instead
of deleting the first column each time, restructure your data so you
are deleting the last column each time. Since MATLAB matrix memory is
column based, this will allow for a very efficient algorithm.
..
nrows = 3;
ncols = 30000;
g = rand(nrows,ncols);
tic
g2 = g;
for j = 1:ncols-1
deletelastcolumn(g2);
end
toc
..
Call this file deletelastcolumn.c:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray
*prhs[])
{
mwSize n;
if( n = mxGetN(prhs[0]) )
mxSetN(prhs[0], n - 1);
}
..
Elapsed time is 0.050578 seconds. (29054% faster)
..
Yes, you read that time correctly. Only 0.05 seconds! This dramatic
improvement in speed works because there is absolutely no data copying
involved. You simply change the size of the variable to virtually
delete the last column without actually changing or reallocating any
memory. And the beauty of this last solution is that it is completely
safe, even if the input variable is sharing memory with another
variable. So this is an example of how you can get a dramatic
improvement by first changing your data structure to allow for
efficient programming. It would be nice if you could get this effect
at the MATLAB command line like this: g(:,end) = [], but unfortunately
this unnecessarily forces a copy of the data, so you have to resort to
a c-mex routine to get this efficiency.
..
James Tursa
..
Caveat: I was lazy and did not put in all the usual input variable
checks in the c-mex routine (checking for double, checking dimensions,
etc.). It would be good to add these checks.
..
Instructions for generating the c-mex routines:
Save the source code somewhere on the MATLAB path under the
appropriate name and then make that directory your current directory.
Then issue the following commands:
(then select any C compiler, e.g. the built-in lcc compiler)mex -setup
..mex filename.c
.
- References:
- Delete row, ridiculous performance
- From: pbocchini
- Delete row, ridiculous performance
- Prev by Date: Re: two matlab's give different values
- Next by Date: Math Problem
- Previous by thread: Re: Delete row, ridiculous performance
- Next by thread: Multiple inputs with a delta 44 sound card?
- Index(es):
Relevant Pages
|
Loading