[R] Efficiently reading random lines form a large file
Marc Schwartz
marc_schwartz at comcast.net
Wed May 16 01:19:08 CEST 2007
On Tue, 2007-05-15 at 16:02 -0700, Juan Pablo Lewinger wrote:
> I need to read two different random lines at a time from a large
> ASCII file (120 x 296976) containing space delimited 0-1 entries.
>
> The following code does the job and it's reasonable fast for my needs:
>
> lineNumber = sample(120, 2)
> line1 = scan(filename, what = "integer", skip=lineNumber[1]-1, nlines=1)
> line2 = scan(filename, what = "integer", skip=lineNumber[2]-1, nlines=1)
>
> > system.time(for (i in 50){
> + lineNumber = sample(120, 2)
> + line1 = scan(filename, what = "integer", skip=lineNumber[1]-1, nlines=1)
> + line2 = scan(filename, what = "integer", skip=lineNumber[2]-1, nlines=1)
> + })
>
> Read 296976 items
> Read 296976 items
> [1] 14.24 0.12 14.51 NA NA
>
> However, I'm wondering if there's an even faster way to do this. Is there?
You might want to take a look at this post by Jim Holtman from earlier
in the year for some ideas:
http://tolstoy.newcastle.edu.au/R/e2/help/07/02/9709.html
HTH,
Marc Schwartz
More information about the R-help
mailing list