[R] Filtering a table

Dennis Murphy djmuser at gmail.com
Tue Aug 16 18:44:45 CEST 2011


Hi:

You're in the neighborhood, but not quite there.

>> dbhmean <- mean (exp1 [time==575 & species ==1])

This doesn't work because exp1 is a data frame, which has both rows
and columns. You want to select the *rows* for which time = 575 and
species = 1, so you either need to put a comma after species == 1 and
before the bracket (to designate rows of the data subset to be
selected) or use the subset() function instead. The other problem is
that you never use the dbh variable.

You can get the data subset by any of the following methods:

exp1[with(exp1, time == 575L & species == 1L), ]
exp1[exp1$time == 575L & exp1$species == 1L, ]
exp1[exp1[['time']] == 575L & exp1[['species']] == 1L, ]
subset(exp1, time == 575L & species == 1L)

The mean of dbh can then be computed with a line such as

with(subset(exp1, time == 575L & species == 1L), mean(dbh))

or similarly with any other subset method. In case you were wondering,
the trailing L in the logical tests for equality indicates that the
value is to be treated as an integer rather than a floating point
value; since time and species appear to be integer-valued in your
data, this provides a safe way to test for equality.

HTH,
Dennis

On Tue, Aug 16, 2011 at 1:38 AM, Santini Silvana <nadiasilvana at yahoo.com> wrote:
> Hello, I have a big table with 3 columns and 103918 rows. This is the example,
>
>            time             species                 dbh
> 5 1 4.9377297
> 575 1  11.64127213
> 575 1  109.8182438
> 575 1 8.029809521
> 5 1  24.32501874
> 575 1  4.895992119
> 575 1  11.40567637
> 575 1  2.795090562
> 575 1 21.79281837
> 575 1  52.57476174
> 575 1  27.7290919
> 575 1  3.23262083
> 575 2  19.30612651
> 575 1  2.956672964
> 575 1  111.690689
> 575 1  11.82499086
> 575 1  63.86200585
> 575 1  111.8312759
> 575 1  49.23078501
> 25 1  2.810866156
> 575 1  10.93097209
> 575 1  23.7930745
> 575 1 21.68010008
> 575 1  13.32423271
> 575 1  23.10306499
> 575 1  59.646657
> 1000 2  20.47707761
> 575 1 3.255755538
> 575 1  29.3392412
> 575 1  2.578551542
> 575 1  52.71564453
> 575 1  119.8069955
> 575 1  83.45738555
> 575 1  7.763555744
> 725 1  2.578551542
> I would like to calculate the dbh mean but only for species 1, time 575. I have tried this
>
>> names (exp1)
> [1] "time"    "species" "dbh"
>> time = c(1)
>> species=c(2)
>> dbh=c(3)
>> dbhmean <- mean (exp1 [time==575 & species ==1])
>> dbhmean
> named list()
>
> I am not sure why it appears the message "named list()"...
>
> Can anybody give me some hints on how to do this correctly?
>
> Thanks.
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list