[R] How do I delete multiple blank variables from a data frame?

Allan Engelhardt allane at cybaea.com
Sat Mar 19 09:36:43 CET 2011



On 19/03/11 01:35, Joshua Wiley wrote:
> Hi Rita,
>
> This is far from the most efficient or elegant way, but:
>
> ## two column data frame, one all NAs
> d<- data.frame(1:10, NA)
> ## use apply to create logical vector and subset d
> d[, apply(d, 2, function(x) !all(is.na(x)))]

This works, but apply converts d to a matrix which is not needed, so try

d[, sapply(d, function(x) !all(is.na(x)))]


if performance is an issue (apply is about 3x slower on your test data 
frame d, more for larger data frames).

For the related problem of removing columns of constant-or-na values, 
the best I could come up with is

zv.1 <- function(x) {
     ## The literal approach
     y <- var(x, na.rm = TRUE)
     return(is.na(y) || y == 0)
}
sapply(train, zv.1)

See 
http://www.cybaea.net/Blogs/Data/R-Eliminating-observed-values-with-zero-variance.html 
for the benchmarks.

Allan


> I am just apply()ing to each column (the 2) of d, the function
> !all(is.na(x)) which will return FALSE if all of x is missing and TRUE
> otherwise.  The result is a logical vector the same length as the
> number of columns in d that is used to subset only the d columns with
> at least some non-missing values.  For documentation see:
>
> ?apply
> ?is.na
> ?all
> ?"["
> ?Logic
>
> HTH,
>
> Josh
>
> On Fri, Mar 18, 2011 at 3:35 PM, Rita Carreira<ritacarreira at hotmail.com>  wrote:
>> Dear List Members,I have 55 data frames, each of which with 272 variables and 267 observations. Some of these variables are blanks but the blanks are not the same for every data frame. I would like to write a procedure in which I import a data frame, see which variables are blank, and delete those variables. My data frames have variables named P1 to P136 and Q1 to Q136.
>> I have a couple of questions regarding this issue:
>> 1) Is a loop an efficient way to address this problem? If not, what are my alternatives and how do I implement them?2) I have been playing with a single data frame to try to figure out a way of having R go through the columns and see which ones it should delete. I have figured out how to delete rows with missing data (newdata<- na.omit(olddata)) but how do I do it for columns???
>> Thank you very much for your help and have a great weekend!
>> Rita ________________________________________ "If you think education is expensive, try ignorance"--Derek Bok
>>
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>



More information about the R-help mailing list