[R] Identifying and Removing NA Columns and factor Columns with more than x Levels
arun
smartpink111 at yahoo.com
Thu Aug 30 20:25:58 CEST 2012
Hi,
For the first part in the two questions, do this:
dat1<-data.frame(Temp=c(5,10,9,15,NA,14,25,21,24,23,21,24,35,35,36,34,32,33),Temp2=c(5,10,9,15,15,14,25,21,24,23,21,24,35,35,36,34,32,33),Month=rep(c("January","February","March","April","May","June"),each=3),Roof=as.factor(rep(1:6,times=3)))
dat1[,colMeans(is.na(dat1))!=0]
dat1[,colMeans(is.na(dat1))==0]
#or
dat1[,complete.cases(t(dat1))]
#Second part of two questions: In your case, it is 32.
dat1[unlist(lapply(dat1,function(x) length(levels(x))>=4))]
or,
dat1[sapply(dat1,function(x) length(levels(x))>=4)]
#and
dat1[sapply(dat1,function(x) length(levels(x))<4)]
I guess you wanted this as separate solutions.
A.K.
----- Original Message -----
From: "Lopez, Dan" <lopez235 at llnl.gov>
To: "R help (r-help at r-project.org)" <r-help at r-project.org>
Cc:
Sent: Thursday, August 30, 2012 11:38 AM
Subject: [R] Identifying and Removing NA Columns and factor Columns with more than x Levels
Hi,
How do you subset a dataframe so that you only have columns:
1. that contain one or more NAs?
2. that contain factors with greater than or equal to 32 levels?
How do you remove from a dataframe columns**
3. with one or more NA's?
4. that contain factors with greater than or equal to 32 levels?
** I know how to remove columns at a basic level but I am trying to figure out a more efficient way of performing these particular tasks (my data set has 60 columns).
For NA's I essentially used summary(mtcars) and manually made a note of where NA's appeared than used:
mtcars1<-mtcars1[,!(names(mtcars1)%in% c("hp","wt","vs"))]
I did something similar for factors with greater than x levels only I used str(mtcars) to help me identify them.
BTW I know mtcars doesn't have any of these issues. I just used it as a quick reference.
Dan
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list