[R] Subsetting dataframes based on column names
David Winsemius
dwinsemius at comcast.net
Wed Sep 23 00:35:43 CEST 2009
On Sep 22, 2009, at 5:58 PM, Corey Sparks wrote:
> Dear R users,
> I am interested in taking the columns from multiple dataframes, the
> problem is that the different dataframes have different combinations
> of the same variable names, here's a simple example:
> a<-rep(1:10)
> b<-rep(1:10)
> c<-rep(21:30)
> d<-rep(31:40)
>
> dat.a<-data.frame(a,b,c,d)
> names(dat.a)<-c("a", "b", "c", "d")
>
> dat.b<-data.frame(a,c,d)
> names(dat.b)<-c("a", "c", "d")
>
> I would like to first see if the names in the larger dataframe match
> those of the smaller (they have the same variables)
>
> names(dat.a)%in%names(dat.b)
>
>
> Could anyone help with this problem, I would basically like to form
> a subset of the dat.a that matches the variable names in dat.b. If
> there were only a few variables, this would be easier, but I have
> between 4 and 5 thousand variables in each dataset
I have never tried this on the scale you propose, but on your toy
example, here's what works;
> names(dat.a)%in%names(dat.b) # your code which returns a logical
vector
[1] TRUE FALSE TRUE TRUE
> subset(dat.a, select= names(dat.a)%in%names(dat.b) )
a c d
1 1 21 31
2 2 22 32
3 3 23 33
4 4 24 34
5 5 25 35
6 6 26 36
7 7 27 37
8 8 28 38
9 9 29 39
10 10 30 40
>
--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list