[R] Failure to subset in R v 2.8.0
Rolf Turner
r.turner at auckland.ac.nz
Mon Dec 1 23:53:32 CET 2008
I just tried:
set.seed(42)
c <- data.frame(month=sample(1:12,50,TRUE),blah=sample(letters[1:4],
50,TRUE))
c[c$month==11,]
and got
month blah
1 11 b
21 11 a
28 11 b
30 11 a
39 11 a
47 11 b
All appears to be in harmony. So there would appear to be something
funny about your
data frame ``c'', rather than there being anything wrong with ``[''.
Is c$month a factor,
perhaps? If so, what are its levels? Also have a look at str(c).
BTW ``c'' is a lousy name for an object, since it is the name of the
built-in function
which effects concatenation.
cheers,
Rolf Turner
On 2/12/2008, at 11:32 AM, Alan Cohen wrote:
> Hello,
>
> I've been using a pre-release version of R v 2.8.0 for Windows for
> the last couple months. I think that there have been consistent
> problems with subsetting data sets, but I had usually been able to
> find work-arounds or was unable to confirm this as a bug. I think
> now I have, and would love advice on what to do if I've made some
> error.
>
> The data set in question ("c") has 500,000 observations and 44
> variables. The problematic variable, "month," takes integer values
> 1:12, and all are present in the data set:
>
>> unique(c$month)
> [1] 11 10 9 8 12 1 7 4 6 2 5 3
>
> However, I can't select observations of c for certain values of month:
>
>> c[c$month==11,]
> [1] STATE DISTRICT TALUK VILLAGE
> TYPE SERIALNO INTDATE QH101P
> [9] QH114 QH115A1 QH115B1 QH115C1
> QH115A2 QH115B2 QH115C2 QH115A3
> [17] QH115B3 QH115C3 QH115A4 QH115B4
> QH115C4 QH115A5 QH115B5 QH115C5
> [25] QH116 QH117A1 QH117B1 QH117C1
> QH117A2 QH117B2 QH117C2 QH117A3
> [33] QH117B3 QH117C3 QH117A4 QH117B4
> QH117C4 QH117A5 QH117B5 QH117C5
> [41] phase year month stdistid.rch
> <0 rows> (or 0-length row.names)
>
> I get the same result for c[c[,43]==11,], and
>
>> length(c$month[c$month==11])
> [1] 0
>
> This is true for most values of month (1,2,4,5,7,8,10,11), but the
> multiples of 3 work, apparently correctly.
>
> Other variables do not have this problem (the columns shift in the
> email, but these three observations have month=11):
>
>> c[c$STATE==11,][1:3,]
> STATE DISTRICT TALUK VILLAGE TYPE SERIALNO INTDATE QH101P
> QH114 QH115A1 QH115B1 QH115C1 QH115A2 QH115B2 QH115C2 QH115A3 QH115B3
> 87556 11 2 1 1 1 5 1187 6
> 0 0 0 0 0 0 0 0 0
> 87557 11 2 1 1 1 10 1187 3
> 0 0 0 0 0 0 0 0 0
> 87558 11 2 1 1 1 14 1187 5
> 0 0 0 0 0 0 0 0 0
> QH115C3 QH115A4 QH115B4 QH115C4 QH115A5 QH115B5 QH115C5 QH116
> QH117A1 QH117B1 QH117C1 QH117A2 QH117B2 QH117C2 QH117A3 QH117B3
> QH117C3
> 87556 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0
> 87557 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0
> 87558 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0
> QH117A4 QH117B4 QH117C4 QH117A5 QH117B5 QH117C5 phase year
> month stdistid.rch
> 87556 0 0 0 0 0 0 1 1998
> 11 1102
> 87557 0 0 0 0 0 0 1 1998
> 11 1102
> 87558 0 0 0 0 0 0 1 1998
> 11 1102
>
> The data set is called directly from a csv file, where all
> variables should be stored in the same way, and using as.numeric
> (as.character(c$month)) does not help. Nor does restarting R,
> restarting the computer, or trying the operation on smaller subsets
> of c. I'd appreciate any help you an provide.
######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
More information about the R-help
mailing list