[Rd] read.table() and NULL for colClasses
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Jul 28 22:12:56 CEST 2004
NULL is not a valid value for colClasses and I don't see why you thought
it was. colClasses has to be character according to the documentation, so
"NULL" is allowed but not NULL.
Your diff appears to be backwards for a patch. A patch against the
current R-devel sources is what is needed, including some regression
tests.
On Wed, 28 Jul 2004, Henrik Bengtsson wrote:
> Hi,
>
> is there are reason for not supporting NULL or "NULL" values for argument
> colClasses in read.table(), much like you can use NULL values for argument
> 'what' in scan()? This would help quite a bit when reading large data files
> where only a few columns are of interest.
Is that a common enough case to make this worth the code complication,
given that scan() (or better, a DBMS) can be used? The usual reason is
that R is maintained by a small and overworked team and adding
complications needs justification, not not adding them.
> I've modfied read.table() to so it calls scan(what=...) also with NULLs for
> the fields to be skipped. Here's the diff of readtable.R (from the
> R-1.9.1.tgz; 9,591,217 bytes):
>
> diff readtable.new.R readtable.R
> 117,123d116
> < # Skip NULL columns in scan()
> < void <- sapply(colClasses, FUN=identical, "NULL") |
> < sapply(colClasses, FUN=is.null)
> < # If all (data) columns are NULL, return empty data frame.
> < if (sum(!void) <= 1*rlabp)
> < return(data.frame())
> < what[void] <- list(NULL)
> 131c124
> < nlines <- length(data[[which(!void)[1]]])
> ---
> > nlines <- length(data[[1]])
> 161c154
> < for (i in (1:cols)[!known & !void]) {
> ---
> > for (i in 1:cols) {
> 171,178d163
> < # Skipped row names equals row.names=NULL.
> < if (rlabp) {
> < if (void[1]) {
> < row.names <- NULL
> < data <- data[-1]
> < }
> < void <- void[-1]
> < }
> 201,202d185
> < # Remove NULL columns
> < data[void] <- NULL
>
> and a diff for read.table.Rd:
>
> diff read.table.new.Rd read.table.Rd
> 102,104c102
> < \code{NA} when \code{\link{type.convert}} is used. Columns for
> < which the value is \code{"NULL"} (or \code{NULL} in a list) are
> < skipped. NB: \code{as} is
> ---
> > \code{NA} when \code{\link{type.convert}} is used. NB: \code{as} is
> 181,183c179
> < the five atomic vector classes. Skipping columns with \code{"NULL"}
> < (or \code{NULL} will also require less memory.
> <
> ---
> > the five atomic vector classes.
>
> Note that there is already an, what I assume is unintentional, effect of
> setting a colClasses to "NULL". The data conversion, which happens *after*
> scan() has read the data anyway, "NULL" will NULL a column via as(x,
> "NULL"), but unfortunately the wrong column. If not the above modifications,
> maybe a warning for the latter?
That's not usage as documented so the effect is definitely unintentional.
We can't catch all misuses!
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list