[R] Filtering a dataset's columns by another dataset's column names
marc_schwartz at comcast.net
Fri Feb 27 18:36:55 CET 2009
on 02/27/2009 11:27 AM Josh B wrote:
> Hello all,
> I hope some of you can come to my rescue, yet again.
> I have two genetic datasets, and I want one of the datasets to have only the columns that are in common with the other dataset.
> Here is a toy example (my real datasets have hundreds of columns):
> Dataset 1:
> Individual SNP1 SNP2 SNP3 SNP4 SNP5
> 1 A G T C A
> 2 T C A G T
> 3 A C T C A
> Dataset 2:
> Individual SNP1 SNP3 SNP5 SNP6 SNP7
> 4 A T T G C
> 5 T A A G G
> 6 A A T C G
> I want Dataset1 to have only columns that are also represented in Dataset 2, i.e., I want to generate a new Dataset 3 that looks like this:
> Individual SNP1 SNP3 SNP5
> 1 A T A
> 2 T A T
> 3 A T A
> Does anyone know how I could do this? Keep in mind that this is not a simple merge, as in the "merge" function.
> Thanks very much for your help everyone.
> Josh B.
Same.Cols <- intersect(names(DF1), names(DF2))
 "Individual" "SNP1" "SNP3" "SNP5"
> rbind(DF1[, Same.Cols], DF2[, Same.Cols])
Individual SNP1 SNP3 SNP5
1 1 A T A
2 2 T A T
3 3 A T A
4 4 A T T
5 5 T A A
6 6 A A T
See ?intersect, which gives you the common column names, which you can
then use in rbind().
More information about the R-help