[Rd] A couple of issues with colClasses/setAs

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Sep 8 11:31:53 CEST 2004


>From ?read.table (this is about read.table, despite the subject line, I 
believe?)

colClasses: character.  A vector of classes to be assumed for the columns.

"NULL" is not a class in my book (and certainly not one a column can
have).  So no wonder it does not work, and it is not a bug not to work in
undocumented cases.

We can look into making it work, but once you start skipping columns I 
think you should be using scan().  (I also suspect scan did not accept 
NULL when this was implemented.)

On 8 Sep 2004, Peter Dalgaard wrote:

> Consider this:
> 
> $ cat test.dat
> 1 a
> 2 b
> 
> Now, we want to read the 2nd column as a factor and ignore the first
> (since it's just a sequential ID). 

Well, you have to have row names, so that's not actually an advantage.

> We can't just put "factor" among
> the colClasses (would have been nice), so let's try this instead
> 
> > setAs("character","factor",as.factor)
> Arguments in definition changed from (x) to (from)
> > read.table("test.dat",colClasses=c("numeric","factor"))
> Error in inherits(x, "factor") : Object "x" not found
> 
> which is a bit peculiar: Why does it change the argument when that's
> going to create a function that doesn't work?? You do need to spell it
> out:
> 
> > setAs("character","factor",function(from)as.factor(from))

> And now we get somewhere
> 
> > read.table("test.dat",colClasses=c("numeric","factor"))
>   V1 V2
> 1  1  a
> 2  2  b

Might be a good idea to teach colClasses about "factor".

> 
> but suppose we want to get rid of col.1:
> 
> > read.table("test.dat",colClasses=c("NULL","factor"))
> Error in data[[i]] : subscript out of bounds
> 
> which looks like a pretty clear bug. In contrast, this works fine
> 
> > read.table("test.dat",colClasses=c("NULL","character"))
>   V2
> 1  a
> 2  b
> 
> so the issue only arises when you have nontrivial coercions.
> 
> Presumably, the issue is that the colClasses in those cases
> miscalculate indices by forgetting the columns that were skipped.
> 
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list