[R] Compare two data sets

jim holtman jholtman at gmail.com
Wed Mar 26 03:37:06 CET 2008


Here is one way to find the common rows.  You can then use the 'keys'
gotten back to reconstruct a new data frame:

> f1 <- read.table(textConnection("V1      V2
+ YBL064C YBR067C
+ YBL064C YBR204C
+ YBL064C YDR368W
+ YBL064C YJL067W
+ YBL064C YPR160W
+ YBR053C YGL089C
+ YBR053C YHR113W
+ YBR053C YNL328C"), header=TRUE)
>
> f2 <- read.table(textConnection("V1      V2
+ YBL064C YBR067C
+ YBL064C YBR204C
+ YBL064C YDR368W"), header=TRUE)
>
> f1$key <- paste(f1$V1, f1$V2)
> f2$key <- paste(f2$V1, f2$V2)
>
> # now find the ones in common
> intersect(f1$key, f2$key)
[1] "YBL064C YBR067C" "YBL064C YBR204C" "YBL064C YDR368W"
>


On Tue, Mar 25, 2008 at 9:18 PM, Suhaila Zainudin
<suhaila.zainudin at gmail.com> wrote:
> Hi,
>
> I have a similar query (how to compare 2 datasets), but my dataset is a bit
> different.
> I want to compare each data in dataset 1 to data in dataset 2 and get the
> data which is common to both datasets.
>
> For example;
>
> I have a a file (named mysample).
>
> V1      V2
> YBL064C YBR067C
> YBL064C YBR204C
> YBL064C YDR368W
> YBL064C YJL067W
> YBL064C YPR160W
> YBR053C YGL089C
> YBR053C YHR113W
> YBR053C YNL328C
>
> And I have another file (myref) as follows
>
> V1      V2
> YBL064C YBR067C
> YBL064C YBR204C
> YBL064C YDR368W
>
>
> When I try to intersect the two files, I received NULL data frames.
>
> > intersect(myref,mysample)
> NULL data frame with 0 rows
>
> What I am hoping to get out of intersect for the above files are
>
> YBL064C YBR067C
> YBL064C YBR204C
> YBL064C YDR368W
>
> Are there any R functions that can achieve what I want to do?
> Or should I merge the data which is currently in 2 columns into single
> column and use intersect again?
>
> Thanks for any feedbacks!
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list