[R] column-wise deletion in data-frames

Mon Jul 18 16:43:38 CEST 2005

On Mon, 18 Jul 2005, Peter Dalgaard wrote:

> Chuck Cleland <ccleland at optonline.net> writes:
>
>>> data <- as.data.frame(cbind(X1,X2,X3,X4,X5))
>>>
>>> So only X1, X3 and X5 are vars without any NAs and there are some vars (X2 and
>>> X4 stacked in between that have NAs). Now, how can I extract those former vars
>>> in a new dataset or remove all those latter vars in between that have NAs
>>> (without missing a single row)?
>>> ...
>>
>>    Someone else will probably suggest something more elegant, but how
>> about this:
>>
>> newdata <- data[,-which(apply(data, 2, function(x){all(is.na(x))}))]
>
> (I think that's supposed to be any(), not all(), and which() is
> crossing the creek to fetch water.)
>
> This should do it:
>
> data[,apply(!is.na(data),2,all)]

If `data' is a data frame, apply will coerce it to a matrix.  I would do
something like

keep <- sapply(data, function(x) all(!is.na(x)))
data[keep]

to use the list-like structure of a data frame and make the fewest 
possible copies.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595