Re: Delete row, ridiculous performance



On Fri, 24 Oct 2008 07:46:10 -0700 (PDT), pbocchini@xxxxxxxxx wrote:

Hi, I was trying to speed up a code and I realized that a huge amount
of time was spent by trivial operations like removing a row. I made a
very simple test and the following lines come from the profiler:

< 0.01 1 1 nrows=30000;
< 0.01 1 2 ncols=3;
3
< 0.01 1 4 g=rand(nrows,ncols);
< 0.01 1 5 g2=g;
6
< 0.01 1 7 for j=1:nrows-1
12.44 29999 8 g1=g(j+1:end,:);
12.33 29999 9 g2(1,:)=[];
0.22 29999 10 end

As you can see, I tried both redefining the matrix g1 every time or
deleting a row (the effect is the same), and in both cases, it takes
way too much.

I can't believe there isn't a fast way to get this. At ewery iteration
of a loop, I need to delete the first row of a matrix. Shouldn't it be
easy and fast?

Thanks in advance for every suggestion!
Paolo
-------------------------------------------------------------------------------
Here are some suggestions, with varying degrees of success depending
on how desperate you are and how flexible you are in changing your
data structure.
-------------------------------------------------------------------------------
Option 1) Your method:
..
nrows = 30000;
ncols = 3;
g = rand(nrows,ncols);
tic
g2 = g;
for j = 1:nrows-1
g2(1,:) = [];
end
toc
..
Elapsed time is 14.755937 seconds.
-------------------------------------------------------------------------------
Option 2) Do the row deletion inside a c-mex routine:
..
nrows = 30000;
ncols = 3;
g = rand(nrows,ncols);
tic
g2 = g;
for j = 1:nrows-1
g2 = deletefirstrow2(g2);
end
toc
..
Call this file deletefirstrow2.c:
#include <string.h>
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray
*prhs[])
{
mwSize m, n, i, m1, nbytes;
double *source, *target;

m = mxGetM(prhs[0]);
n = mxGetN(prhs[0]);
m1 = m - 1;
plhs[0] = mxCreateDoubleMatrix(m1, n, mxREAL);
target = mxGetPr(plhs[0]);
source = mxGetPr(prhs[0]);
nbytes = m1 * sizeof(double);
for( i=0; i<n; i++ ) {
memcpy(target, source, nbytes);
target += m1;
source += m;
}
}
..
Elapsed time is 12.553881 seconds. (17% faster)
-------------------------------------------------------------------------------
Option 3) Do the row deletion inside a c-mex routine in-place. This
option requires that you make sure that g memory is not shared with
any other variable. i.e., you can't build g with a line like g = x,
and you can't create other variables with lines like x = g. This mex
routine works by moving memory inside the data area of the input
variable, which is a bit risky unless you are very careful not to
share memory. No new memory allocation is done. At the end of your
m-file loop you will only have one row left for g even though the
memory is still allocated for the full amount. This will not be a
problem for the memory manager, since all memory for g will get
properly cleared even though it is larger than the size indicates.
..
nrows = 30000;
ncols = 3;
g = rand(nrows,ncols);
tic
for j = 1:nrows-1
deletefirstrow3(g);
end
toc
..
Call this file deletefirstrow3.c:
#include <string.h>
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray
*prhs[])
{
mwSize m, n, i, m1, nbytes;
double *source, *target;

m = mxGetM(prhs[0]);
n = mxGetN(prhs[0]);
m1 = m - 1;
target = mxGetPr(prhs[0]);
source = target + 1;
nbytes = m1 * sizeof(double);
for( i=0; i<n; i++ ) {
memmove(target, source, nbytes);
target += m1;
source += m;
}
mxSetM(prhs[0],m1);
}
..
Elapsed time is 3.658094 seconds. (303% faster)
-------------------------------------------------------------------------------
Option 4) Completely reformulate your problem. Instead of working with
a matrix of rows and deleting rows, restructure your data so that you
are working with a matrix of columns and deleting columns. And instead
of deleting the first column each time, restructure your data so you
are deleting the last column each time. Since MATLAB matrix memory is
column based, this will allow for a very efficient algorithm.
..
nrows = 3;
ncols = 30000;
g = rand(nrows,ncols);
tic
g2 = g;
for j = 1:ncols-1
deletelastcolumn(g2);
end
toc
..
Call this file deletelastcolumn.c:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray
*prhs[])
{
mwSize n;
if( n = mxGetN(prhs[0]) )
mxSetN(prhs[0], n - 1);
}
..
Elapsed time is 0.050578 seconds. (29054% faster)
..
Yes, you read that time correctly. Only 0.05 seconds! This dramatic
improvement in speed works because there is absolutely no data copying
involved. You simply change the size of the variable to virtually
delete the last column without actually changing or reallocating any
memory. And the beauty of this last solution is that it is completely
safe, even if the input variable is sharing memory with another
variable. So this is an example of how you can get a dramatic
improvement by first changing your data structure to allow for
efficient programming. It would be nice if you could get this effect
at the MATLAB command line like this: g(:,end) = [], but unfortunately
this unnecessarily forces a copy of the data, so you have to resort to
a c-mex routine to get this efficiency.
..
James Tursa
..
Caveat: I was lazy and did not put in all the usual input variable
checks in the c-mex routine (checking for double, checking dimensions,
etc.). It would be good to add these checks.
..
Instructions for generating the c-mex routines:
Save the source code somewhere on the MATLAB path under the
appropriate name and then make that directory your current directory.
Then issue the following commands:
mex -setup
(then select any C compiler, e.g. the built-in lcc compiler)
mex filename.c
..
.



Relevant Pages

  • Re: Castles plans
    ... memory stick in under a second. ... both RISC OS USB systems are rather slower than they ought to be ... the memory sticks were DOS format, so I suspect that a lot of the speed ... This might also explain the extremely slow speed deleting files. ...
    (comp.sys.acorn.misc)
  • Re: USB memory sticks
    ... All that you delete is the reference in the file allocation ... way of deleting data, especially where you are only deleting it because you ... forensics can pick up residual charge). ... Given the amount of time that it takes to delete a 4Gb memory stick I have ...
    (microsoft.public.windowsxp.basics)
  • Re: dll troubles
    ... you have a potential memory corruption problem, ... I suspect you are deleting memory twice, ... > win32 dll) and I have encountered huge problems with it. ...
    (comp.lang.cpp)
  • Re: deleting this
    ... > I was just toying around idea of deleting this from a member function. ... the memory is not necessarilly released to the OS. ... happen if you try to do something with the deleted object. ...
    (comp.lang.cpp)
  • mex leaks memory?
    ... Im trying to free a 2D array however when I check memory utilization using KDE System Guard on my machine matlab doesn't free up all the memory.Any suggestions? ... Here's the code Im using.I used the print statements to get more time to watch the memory utilization.I used this test program cos Im having problem with a much bigger program and I was trying to isolate the memory leak ... int test() ... int nrhs, const mxArray *prhs) ...
    (comp.soft-sys.matlab)

Loading