Re: Using 'Save' with DC Toolbox



Alex,

Is it possible that your script (below) never gets to the
'save' command? The reason I ask is that there is no
OutputArguments property for a job, and in the line before
you call 'save' you say

output = get(jobz{i}, 'OutputArguments');

This will throw an error and save should never be called.
What I think you intended to call here is

output = getAllOutputArguements(jobz{i});

Also, note that there is no destroy method of a scheduler,
so the call at the bottom to

destroy(abc);

will also error, and isn't needed.

Hope this helps

Jos

"Alex Kloth" <akloth@xxxxxxxxx> wrote in message
<ff080c$1cm$1@xxxxxxxxxxxxxxxxxx>...
I'm currently
running large-scale neural network simulations, and I'm using
distributed computing to make my computation time short.

However, I'd also like to get subsets of my results as they're
finishing. I was originally running a simple distributed
computing
script to generate the results from a variety of datasets. The
variations in the data set consisted primarily of
increasing network
sizes; as the network size increases, the computation time
scales
considerably, as my code includes several matrix
multiplications and
transversions. This made it difficult for me to get out
the results
for the small networks while the larger networks were
still running.

So, I created a program that would monitor several DC jobs
running at
the same time, outputting the data from a single job when
that job's
state was finished; I've amended this data to the end of
this message.
However, when I go to use the 'save' function, my program
terminates,
leaving me unable to deal with the rest of my data. More
importantly,
the save function fails, and I inadvertently destroy the
results I
just got.

I was wondering if there was a better way to do this: To
run all of my
embarrasingly parallel distributed computing jobs at the
same time
while saving the data as the jobs finish without crashing
the program.
------

cd /home/alexk/backprop

load data_Flex_alt.mat
load data_Flex.mat

% C = zeros(1,20);
% C(1) = 1;
m = 2;
nu = 0.01;
theta = 0.005;
% setsize = [1,10];

abc = findResource('scheduler','type','generic');
abc.DataLocation = '/data/alexk/backprop';
abc.ClusterMatlabRoot = matlabroot;
abc.SubmitFcn = @pbsSubmitFunc;
get(abc)

jobz = {};
pd = {'/home/alexk/backprop'};
fd = {'zipser_rtrl.m'};
cellz = 10:5:50;

for i=1:length(cellz)
jobz{i} = abc.createJob;
set(jobz{i},'PathDependencies',pd);
set(jobz{i},'FileDependencies',fd);
for j=0:9
setsize = [j*10 + 1, (j+1) * 10];
C = eye(3,10+(i-1)*5);

jobz{i}.createTask(@zipser_rtrl,1,{data_Flex,setsize,m,C,nu,theta});

jobz{i}.createTask(@zipser_rtrl,1,{data_Flex_alt,setsize,m,C,nu,theta});
end
jobz{i}.submit;
end

flagz = zeros(1,length(cellz));

while sum(flagz) < length(cellz)
for i=1:length(cellz)
output = {};
if flagz(i) == 0
etat=get(jobz{i},'State');
if strcmp(etat,'finished');
output = get(jobz{i}, 'OutputArguments');

save(['Flex_out_101407_',num2str(10+(i-1)*5),'.mat'],'output');
flagz(i) = 1;
end
end
end
end

if sum(flagz) == length(cellz)
for i=1:length(cellz)
destroy(jobz{i});
end
destroy(abc);
end

exit

.



Relevant Pages

  • Re: svchost.exe
    ... The trojan was listening on port ... The Distributed Computing Environment (DCE) is an industry-standard, ...
    (comp.security.firewalls)
  • Re: *Real* Distributed Computing
    ... > of distributed computing. ... I keep reading definitions of DC to the effect of ... > algebraic topology and distributed computing, dihomotopy, ditopological ... > worst-case message complexity of distributed tasks, ...
    (comp.theory)
  • Re: *Real* Distributed Computing
    ... >they would have a lot in common with distributed computing) all intersect ... >fault-tolerance, Byzantine failures, crash failures, omission failures, etc, ... This is a network centric model. ...
    (comp.object)
  • Re: *Real* Distributed Computing
    ... and so it doesn't pay hardware engineers to ... >> I think you may be confusing distributed computing with parallel ... parallel s/w often runs in a distributed form. ...
    (comp.object)
  • Re: Computing exp(z)
    ... > Are there any disadvantages/pitfalls to computing expas ... > computation time in my program, and was able to shave a few percentages ... but it shouldn't be visible in single precision. ...
    (comp.lang.fortran)

Loading