[R] Error in .subset(x, j) : only 0's may be mixed with negative subscripts
David Winsemius
dwinsemius at comcast.net
Tue Jun 23 23:10:04 CEST 2009
On Jun 23, 2009, at 1:18 PM, Russell Ivory wrote:
> I have a data set called datastep4 with 211484 rows and 95 columns
>
WHY ALL OF THE UNNEEDED EMPTY LINES???
>
>> dim(datastep4)
>
> [1] 211484 95
>
> The first few column names are given below, note the first one is
> "RESPONDED"
>
>> names(datastep4)[1:5]
>
> [1] "RESPONDED" "VAR_30" "VAR_31" "VAR_32" "VAR_33"
>
> A table of RESPONDED shows mostly zeros
>
>> table(datastep4$RESPONDED)
>
> 0 1
>
> 210582 902
>
> I reduce the data set by pulling out the RESPONDED column, then verify
> all is well
>
>> test <- datastep4[,-datastep4$RESPONDED]
It may have "worked" but perhaps not for the reasons you thought it
should. Take a look carefully at this
> str(data2)
'data.frame': 300 obs. of 5 variables:
$ x1 : num 0.0592 0.3976 0.9512 0.675 0.7129 ...
$ x2 : num 0.625 0.328 0.721 0.779 0.233 ...
$ y : num 0.685 0.694 1.589 1.461 0.921 ...
$ grp: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ...
$ one: num 1 1 1 1 1 1 1 1 1 1 ...
> str(data2[,-data2$one])
'data.frame': 300 obs. of 4 variables:
$ x2 : num 0.625 0.328 0.721 0.779 0.233 ...
$ y : num 0.685 0.694 1.589 1.461 0.921 ...
$ grp: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ...
$ one: num 1 1 1 1 1 1 1 1 1 1 ...
Notice that the "one" column was _not_ removed.
>> dim(test)
>
> [1] 211484 94
>
>> names(test)[1:5]
>
> [1] "VAR_30" "VAR_31" "VAR_32" "VAR_33" "VAR_34"
>
>> class(test)
>
> [1] "data.frame"
>
>> test[1:10,1:10]
>
> VAR_30 VAR_31 VAR_32 VAR_33 VAR_34 VAR_37 VAR_38 VAR_42 VAR_45
> VAR_46
>
> 1 0 0 0 0 15198 0 0 6 NA
>
> 3 0 0 0 0 8491 0 0 4 NA
>
> 4 0 0 0 0 0 0 0 0 NA
>
> 5 0 0 0 0 67671 0 0 7 NA
>
> 7 0 0 0 0 1334 0 0 1 NA
>
> 9 0 0 0 0 0 0 0 2 NA
>
> 10 0 0 0 0 24169 0 0 10 NA
>
> 11 0 0 0 0 438 0 0 3 NA
>
> 12 0 0 0 0 2158 0 0 1 NA
>
> 13 0 0 0 0 18804 0 0 4 NA
>
>>
>
> If I reduce the data frame datastep4 by removing a few records where
> the
> variable G102 is not 1, and removing the column named "G102" (which is
> column 84),
>
> I end up with a smaller set called datastep5 with 192701 rows and 94
> columns
>
>> datastep5 <- datastep4[datastep4$G102 != 1,-84]
>
This code does the _opposite_ of what you stated. It selects only
those records that are not equal to 1. (And if that is not an integer
type column the results could be further seen as undetermined,)
>>
>
>> dim(datastep5)
>
> [1] 192701 94
>
>> names(datastep5)[1:5]
>
> [1] "RESPONDED" "VAR_30" "VAR_31" "VAR_32" "VAR_33"
>
>> table(datastep5$RESPONDED)
>
> 0 1
>
> 141096 584
>
>
> Now, if I want to reduce this data set by removing the RESPONDED
> column
> as was done for datastep4, it blows up
>
>> test <- datastep5[,-datastep5$RESPONDED]
I am guessing that the first element of datastep5$RESPONDED is now a
zero. You are abusing the indexing conventions. Try instead either:
test <- datastep5[,-1]
Or if you want to imagine that you cannot remember the column number
of "RESPONDED" then this will "work":
test <- datastep5[ , -which(names(datastep5)=="RESPONDED")]
>
> Error in .subset(x, j) : only 0's may be mixed with negative
> subscripts
> Merrick Bank confidentiality trailed elided
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list