[R] Failure to subset in R v 2.8.0

Rolf Turner r.turner at auckland.ac.nz
Mon Dec 1 23:53:32 CET 2008


I just tried:

set.seed(42)
c <- data.frame(month=sample(1:12,50,TRUE),blah=sample(letters[1:4], 
50,TRUE))
c[c$month==11,]

and got

    month blah
1     11    b
21    11    a
28    11    b
30    11    a
39    11    a
47    11    b

All appears to be in harmony.  So there would appear to be something  
funny about your
data frame ``c'', rather than there being anything wrong with ``[''.   
Is c$month a factor,
perhaps?  If so, what are its levels?  Also have a look at str(c).

BTW ``c'' is a lousy name for an object, since it is the name of the  
built-in function
which effects concatenation.

	cheers,

		Rolf Turner

On 2/12/2008, at 11:32 AM, Alan Cohen wrote:

> Hello,
>
> I've been using a pre-release version of R v 2.8.0 for Windows for  
> the last couple months.  I think that there have been consistent  
> problems with subsetting data sets, but I had usually been able to  
> find work-arounds or was unable to confirm this as a bug.  I think  
> now I have, and would love advice on what to do if I've made some  
> error.
>
> The data set in question ("c") has 500,000 observations and 44  
> variables.  The problematic variable, "month," takes integer values  
> 1:12, and all are present in the data set:
>
>> unique(c$month)
>  [1] 11 10  9  8 12  1  7  4  6  2  5  3
>
> However, I can't select observations of c for certain values of month:
>
>> c[c$month==11,]
>  [1] STATE        DISTRICT     TALUK        VILLAGE       
> TYPE         SERIALNO     INTDATE      QH101P
>  [9] QH114        QH115A1      QH115B1      QH115C1       
> QH115A2      QH115B2      QH115C2      QH115A3
> [17] QH115B3      QH115C3      QH115A4      QH115B4       
> QH115C4      QH115A5      QH115B5      QH115C5
> [25] QH116        QH117A1      QH117B1      QH117C1       
> QH117A2      QH117B2      QH117C2      QH117A3
> [33] QH117B3      QH117C3      QH117A4      QH117B4       
> QH117C4      QH117A5      QH117B5      QH117C5
> [41] phase        year         month        stdistid.rch
> <0 rows> (or 0-length row.names)
>
> I get the same result for c[c[,43]==11,], and
>
>> length(c$month[c$month==11])
> [1] 0
>
> This is true for most values of month (1,2,4,5,7,8,10,11), but the  
> multiples of 3 work, apparently correctly.
>
> Other variables do not have this problem (the columns shift in the  
> email, but these three observations have month=11):
>
>> c[c$STATE==11,][1:3,]
>       STATE DISTRICT TALUK VILLAGE TYPE SERIALNO INTDATE QH101P  
> QH114 QH115A1 QH115B1 QH115C1 QH115A2 QH115B2 QH115C2 QH115A3 QH115B3
> 87556    11        2     1       1    1        5    1187      6      
> 0       0       0       0       0       0       0       0       0
> 87557    11        2     1       1    1       10    1187      3      
> 0       0       0       0       0       0       0       0       0
> 87558    11        2     1       1    1       14    1187      5      
> 0       0       0       0       0       0       0       0       0
>       QH115C3 QH115A4 QH115B4 QH115C4 QH115A5 QH115B5 QH115C5 QH116  
> QH117A1 QH117B1 QH117C1 QH117A2 QH117B2 QH117C2 QH117A3 QH117B3  
> QH117C3
> 87556       0       0       0       0       0       0       0      
> 0       0       0       0       0       0       0       0        
> 0       0
> 87557       0       0       0       0       0       0       0      
> 0       0       0       0       0       0       0       0        
> 0       0
> 87558       0       0       0       0       0       0       0      
> 0       0       0       0       0       0       0       0        
> 0       0
>       QH117A4 QH117B4 QH117C4 QH117A5 QH117B5 QH117C5 phase year  
> month stdistid.rch
> 87556       0       0       0       0       0       0     1 1998     
> 11         1102
> 87557       0       0       0       0       0       0     1 1998     
> 11         1102
> 87558       0       0       0       0       0       0     1 1998     
> 11         1102
>
> The data set is called directly from a csv file, where all  
> variables should be stored in the same way, and using as.numeric 
> (as.character(c$month)) does not help.  Nor does restarting R,  
> restarting the computer, or trying the operation on smaller subsets  
> of c.  I'd appreciate any help you an provide.

######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}



More information about the R-help mailing list