[R] column-wise deletion in data-frames

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Jul 18 18:05:17 CEST 2005


On Mon, 18 Jul 2005, Peter Dalgaard wrote:

> Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:
>
>> On Mon, 18 Jul 2005, Peter Dalgaard wrote:
>>
>>> Chuck Cleland <ccleland at optonline.net> writes:
>>>
>>>>> data <- as.data.frame(cbind(X1,X2,X3,X4,X5))
>>>>>
>>>>> So only X1, X3 and X5 are vars without any NAs and there are some vars (X2 and
>>>>> X4 stacked in between that have NAs). Now, how can I extract those former vars
>>>>> in a new dataset or remove all those latter vars in between that have NAs
>>>>> (without missing a single row)?
>>>>> ...
>>>>
>>>>    Someone else will probably suggest something more elegant, but how
>>>> about this:
>>>>
>>>> newdata <- data[,-which(apply(data, 2, function(x){all(is.na(x))}))]
>>>
>>> (I think that's supposed to be any(), not all(), and which() is
>>> crossing the creek to fetch water.)
>>>
>>> This should do it:
>>>
>>> data[,apply(!is.na(data),2,all)]
>>
>> If `data' is a data frame, apply will coerce it to a matrix.
>
> So will is.na()...

Not quite.  is.na on a data frame will create a matrix by cbind-ing 
columns.   I was mainly commenting on Chuck Cleland's version, which 
coerces a data frame to a matrix then pulls out each column of the matrix, 
something that is quite wasteful of space.  Forming the logical matrix 
is.na(data) is also I think wasteful.

>> I would do
>> something like
>>
>> keep <- sapply(data, function(x) all(!is.na(x)))
>> data[keep]
>>
>> to use the list-like structure of a data frame and make the fewest
>> possible copies.
>
> I think the amount of copying is the same, but your version doesn't
> need to store the entire is.na(data) at once.
>
> Nitpick: !any(is.na(x)) should be marginally faster than all(!is.na(x)).

I doubt it is measurably so.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list