[R] Importing random subsets of a data file

Sarah Goslee sarah.goslee at gmail.com
Wed Jul 23 17:37:17 CEST 2014


Hi,

You can use scan() with the nlines and skip arguments to read in a
single line from anywhere in a file.

Sarah

On Wed, Jul 23, 2014 at 11:33 AM, Khurram Nadeem
<khurram.nadee at gmail.com> wrote:
> Hi R folks,
>
> Here is my problem.
>
> *1.* I have a large data file (say, in .csv or .txt format) containing 1
> million rows and 500 variables (columns).
>
> *2.* My statistical algorithm does not require the entire dataset but just
> a small random sample from the original 1 million rows.
>
> *3. *This algorithm needs to be applied 10000 times, each time generating a
> different random sample from the 'big' file as described in (2) above.
>
> Is there a way to 'import' only a (random) subset of rows from the .csv
> file without importing the entire dataset? A quick search on various R
> forums suggest that read.table() does not have this functionality.
> Obviously, I want to avoid importing the whole file because of memory
> issues. Looking forward to your help.
>
> Thanks,
> Khurram


-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list