[R] reading in a subset of a large data set

Bert Gunter gunter.berton at gene.com
Fri Jul 11 19:51:50 CEST 2008


... for which you need ?connections and the "nrows" argument to read.table
and friends.

(also ?scan and its "nlines" argument)

-- Bert Gunter
Genentech


 

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of jim holtman
Sent: Friday, July 11, 2008 9:58 AM
To: stacey.burrows at yahoo.ca
Cc: r-help at r-project.org
Subject: Re: [R] reading in a subset of a large data set

If the data you want is contiguous, then just 'skip' the number of
records and then read the number you want.

If you want to select a random sample, then checkout
http://article.gmane.org/gmane.comp.lang.r.general/78318/match=random+read

In your case where you want to conditionally read based on values,
then you may have to read in a subset, select the records you want and
then continue reading the file.  At then end, you can reconstruct the
data into a single dataframe.`

On Fri, Jul 11, 2008 at 12:25 PM, Stacey Burrows
<stacey.burrows at yahoo.ca> wrote:
> I have a huge dataset for which I only want to read in a subset of it. Is
it possible to use read.table to read in only a subset of the data? For
example, something like read.table('~/data.txt', subset = chromosome=='1' )
>
> If not, then why not? This seems to be a feature available in all other
statistical software.
>
> Thanks,
> Stacey
>
>
>
>      __________________________________________________________________
> [[elided Yahoo spam]]
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list