Re: Moving large amount of files, 1.750.000+



On 09.11.2008 18:04, Sebastian Newstream wrote:
Hello fellow Rubyists!

I'm trying to impress my boss and co-workers with Ruby so we
hopefully can start to use it in work more often. I was given
the task with moving a *large* repository of images from one
source to the next. The repository consists of around 1.750.000
images and requires around 350GB of space.

My question is this: How do I speed up my application?
I reused my filehandler and skipped the printing to the console,
but it is still taking time.

Also if any one has any previous experience of handling this many files
any kind of tips are welcome. I'm quite worried that the array
containing
the path to all the files will flood the stack.

Sorry to disappoint you but this amount of copying won't be really fast regardless of programming language. You do not mention what a "source" in your case is, what operating systems are involved and what transport media you are intending to use (local, network). If you need to transport using a network in my experience tar with a pipe works pretty well. But no matter what you do, the slowest link will determine your throughput: you cannot go faster than network speed or the speed that your "sources" can read or write.

Here's the tar variant, since you copy images I assume data is compressed and does not need compression (on your favorite Unix shell prompt):

$> ( cd "$source" && tar cf - . ) | ( ssh user@target "cd '$target' && tar xf - )

If you can physically move the source disk to the target host and then do a local copy with cp -a that's probably the fastest you can go - unless the physical takes ages (e.g. to the moon or other remote locations).

Kind regards

robert
.



Relevant Pages