Re: Write to specified line number of file



1. How big is the file?

2. Do you want sampling without replacement (shuffle the original file
keeping the lines intact) or sampling with replacement (n lines randomly
chosen from the file)?

I'm going to assume that 1 is "too big for a considerate programmer to
read all into memory on a shared machine" and 2 is "without replacement
(shuffling)". I'm also going to assume that you're on some form of UNIX
machine that has the "sort" verb. For Windows, that could mean CygWin.

So what you want to do is make a copy of the file with random numbers
tacked on to the front of each line. Then sort the tagged copy
numerically using the external "sort" verb, and remove the tags from the
sorted copy. You'll be doing everything in Ruby except the sort.

If the answer to 1 is "small enough to fit into memory", just read the
file into memory, tag the lines with random numbers, and use a Ruby
"sort" to do the sorting, then untag the lines and write out the file.
You'll be doing everything in Ruby.

By the way, I do this sort of thing rather often. The files in question
are data files that drive performance test scripts. They're small (under
65536 lines), so I just read them into Excel, tack on a random column,
sort on the random column, delete the random column, and write the file
back out.

If you want sampling *with* replacement, the easiest way to do it is
using R. I don't know how to do it in Ruby or Excel, since I have R. :)

I think the "too big/shuffled" case would make an interesting Ruby quiz,
if you rule out the external sort verb as "cheating".

--

M. Edward (Ed) Borasky

http://linuxcapacityplanning.com


.



Relevant Pages

  • Re: the kitchen stove
    ... and I got wound up with real life, sort of. ... and there is indeed no replacement. ... by Robertshaw, a Darlington amplifier array, and a discrete Darlington ... The outputs are to the broiler gas valve, ...
    (sci.electronics.repair)
  • Re: Euler problem #22
    ... For each repetitive sort of problem, ... Ruby comes with a set of tools like that. ... accept as many buffers as you want but at any given time you have one ... string, and -ALLOT to release your scratch buffers, and you're done. ...
    (comp.lang.forth)
  • Re: replacing 96 cabrio power window cable?
    ... > Well isn't that just a pain in the ass. ... > enough to be able to tell if the cable had some sort of break in it. ... > because the cable's worn. ... who's got the best price/quality on a replacement? ...
    (rec.autos.makers.vw.watercooled)
  • Re: OT - Computer Stuff
    ... or a church ... significantly more than a replacement ... at the local tip there's a sort of recycling point, ... The wardrobe is the worst problem, it would fit nicely in the spare ...
    (alt.support.diabetes)
  • Re: A Python like language
    ... sort of like Python (if you consider the indenting is what ... >makes Python unique) and sort of Ruby in its use of prefixes to define ... >from Ruby). ... As I am generating objects from prototypes, ...
    (comp.lang.python)