[R] deleting columns from a dataframe where NA is more than 15 percent of the column length
Faz Jones
jonesfaz4 at gmail.com
Mon Aug 6 09:18:38 CEST 2012
Thank you.. It was very informative and helpful. It works
Sent from my iPhone
On Aug 5, 2012, at 10:21 PM, arun <smartpink111 at yahoo.com> wrote:
> HI,
>
> Try this:
> dat1<-data.frame(x=c(NA,NA,rnorm(6,15),NA),y=c(NA,rnorm(8,15)),z=c(rnorm(7,15),NA,NA))
> dat1[which(colMeans(is.na(dat1))<=.15)]
> y
> 1 NA
> 2 13.53085
> 3 12.89453
> 4 15.02625
> 5 14.00387
> 6 15.34618
> 7 15.69293
> 8 15.62377
> 9 14.76479
>
> #You can also use apply, sapply etc.
> dat2<-data.frame(x=c(NA,NA,rnorm(6,15),NA),y=c(NA,rnorm(8,15)),z=c(rnorm(7,15),NA,NA),u=c(rnorm(9,15)))
> dat2[apply(dat2,2,function(x) mean(is.na(x))<=.15)]
>
> #dat2[sapply(dat2,function(x) mean(is.na(x))<=.15)]
> #dat2[which(colMeans(is.na(dat2))<=.15)]
>
> y u
> 1 NA 14.56278
> 2 16.49940 16.25761
> 3 14.11368 14.08768
> 4 14.95139 14.01923
> 5 14.99517 15.91936
> 6 14.46359 14.07573
> 7 15.09702 13.94888
> 8 15.99967 14.97171
> 9 15.51924 15.59981
>
> A.K.
>
>
>
>
>
> ----- Original Message -----
> From: Faz Jones <jonesfaz4 at gmail.com>
> To: r-help at r-project.org
> Cc:
> Sent: Sunday, August 5, 2012 9:04 PM
> Subject: [R] deleting columns from a dataframe where NA is more than 15 percent of the column length
>
> I have a dataframe of 10 different columns (length of each column is
> the same). I want to eliminate any column that has 'NA' greater than
> 15% of the column length. Do i first need to make a function for
> calculating the percentage of NA for each column and then make another
> dataframe where i apply the function? Whats the best way to do this.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list