[R] dataframe subsetting behaviour

Peter Dalgaard BSA p.dalgaard at biostat.ku.dk
Thu Jan 23 00:15:04 CET 2003


Douglas Grove <dgrove at fhcrc.org> writes:

> Hi,
> 
> I'm trying to understand a behaviour that I have encountered
> and can't fathom.
> 
> 
> Here's some code I will use to illustrate the behaviour:
> 
> # start with some data frame "a" having some named columns
> a <- data.frame(a=rep(1,3),c=rep(2,3),d=rep(3,3),e=rep(4,3))
> 
> # create a subset of the original data frame, but include a
> # name "b" that is not present in my original data frame
> b <- a[,c("a","b","c")]
> 
> 
> ## Up until now no errors are issued, but the following commands
> ## will give the error shown:
> 
> b[1,]     ## "Error in x[[j]] : subscript out of bounds"
> b[1,2]    ## "Error in "names<-.default"(*tmp*, value = cols) : 
>           ##  names attribute must be the same length as the vector"
> 
> 
> Can anyone explain to me the meaning of these error messages in terms
> of R is actually doing?  These error messages had me baffled and 
> it took me hours to track down that the source of the error was an 
> incorrect column name in my data frame subsetting.

Looks like a (semi-)bug. Indexing outside of the data frame creates a
"column" which is really the single value NULL, e.g. 

> dput(a[,4:5])
structure(list(e = c(4, 4, 4), "NA" = NULL), .Names = c("e",
NA), row.names = c("1", "2", "3"), class = "data.frame")

This will print because the format.data.frame called inside
print.data.frame will recycle the NULL and give you

> a[,4:5]
  e   NA
1 4 NULL
2 4 NULL
3 4 NULL

However, it confuses the h*ck out of "[.data.frame"

> (a[,4:5])[2]
Error in "[.data.frame"((a[, 4:5]), 2) : undefined columns selected
> (a[,4:5])[,2]
NULL
> (a[,4:5])[,1]
[1] 4 4 4

and also the examples you found. However, the main issue is that you
have managed to construct a corrupt data frame. So indexing outside
the array should probably either give an error or return a column of
NA.

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907




More information about the R-help mailing list