[Rd] read.table() and NULL for colClasses

Henrik Bengtsson hb at maths.lth.se
Wed Jul 28 21:11:29 CEST 2004


Hi,

is there are reason for not supporting NULL or "NULL" values for argument
colClasses in read.table(), much like you can use NULL values for argument
'what' in scan()? This would help quite a bit when reading large data files
where only a few columns are of interest. 

I've modfied read.table() to so it calls scan(what=...) also with NULLs for
the fields to be skipped. Here's the diff of readtable.R (from the
R-1.9.1.tgz; 9,591,217 bytes):

diff readtable.new.R readtable.R
117,123d116
<     # Skip NULL columns in scan()
<     void <- sapply(colClasses, FUN=identical, "NULL") |
<             sapply(colClasses, FUN=is.null)
<     # If all (data) columns are NULL, return empty data frame.
<     if (sum(!void) <= 1*rlabp)
<       return(data.frame())
<     what[void] <- list(NULL)
131c124
<     nlines <- length(data[[which(!void)[1]]])
---
>     nlines <- length(data[[1]])
161c154
<     for (i in (1:cols)[!known & !void]) {
---
>     for (i in 1:cols) {
171,178d163
<     # Skipped row names equals row.names=NULL.
<     if (rlabp) {
<       if (void[1]) {
<         row.names <- NULL
<         data <- data[-1]
<       }
<       void <- void[-1]
<     }
201,202d185
<     # Remove NULL columns
<     data[void] <- NULL

and a diff for read.table.Rd:

diff read.table.new.Rd read.table.Rd
102,104c102
<     \code{NA} when \code{\link{type.convert}} is used.  Columns for
<     which the value is \code{"NULL"} (or \code{NULL} in a list) are
<     skipped. NB: \code{as} is
---
>     \code{NA} when \code{\link{type.convert}} is used.  NB: \code{as} is
181,183c179
<   the five atomic vector classes. Skipping columns with \code{"NULL"}
<   (or \code{NULL} will also require less memory.
<
---
>   the five atomic vector classes.

Note that there is already an, what I assume is unintentional, effect of
setting a colClasses to "NULL". The data conversion, which happens *after*
scan() has read the data anyway, "NULL" will NULL a column via as(x,
"NULL"), but unfortunately the wrong column. If not the above modifications,
maybe a warning for the latter?

Best wishes

Henrik Bengtsson

Dept. of Mathematical Statistics @ Centre for Mathematical Sciences 
Lund Institute of Technology/Lund University, Sweden (+2h UTC)
+46 46 2229611 (off), +46 708 909208 (cell), +46 46 2224623 (fax)
h b @ m a t h s . l t h . s e, http://www.maths.lth.se/~hb/



More information about the R-devel mailing list