[R] reading in a subset of a large data set
Bert Gunter
gunter.berton at gene.com
Fri Jul 11 19:51:50 CEST 2008
... for which you need ?connections and the "nrows" argument to read.table
and friends.
(also ?scan and its "nlines" argument)
-- Bert Gunter
Genentech
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of jim holtman
Sent: Friday, July 11, 2008 9:58 AM
To: stacey.burrows at yahoo.ca
Cc: r-help at r-project.org
Subject: Re: [R] reading in a subset of a large data set
If the data you want is contiguous, then just 'skip' the number of
records and then read the number you want.
If you want to select a random sample, then checkout
http://article.gmane.org/gmane.comp.lang.r.general/78318/match=random+read
In your case where you want to conditionally read based on values,
then you may have to read in a subset, select the records you want and
then continue reading the file. At then end, you can reconstruct the
data into a single dataframe.`
On Fri, Jul 11, 2008 at 12:25 PM, Stacey Burrows
<stacey.burrows at yahoo.ca> wrote:
> I have a huge dataset for which I only want to read in a subset of it. Is
it possible to use read.table to read in only a subset of the data? For
example, something like read.table('~/data.txt', subset = chromosome=='1' )
>
> If not, then why not? This seems to be a feature available in all other
statistical software.
>
> Thanks,
> Stacey
>
>
>
> __________________________________________________________________
> [[elided Yahoo spam]]
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list