[R] Intersect 2 lists+bring extra columns

arun smartpink111 at yahoo.com
Sun Sep 1 23:45:20 CEST 2013


HI,

If I understand it correctly:

fruit<- read.csv("example.csv",header=TRUE,stringsAsFactors=FALSE,sep="\t")
 res<-merge(fruit["reference"],fruit[,-1],by.x="reference",by.y="list")
 res
#   reference information
#1 grapefruit        pink
#2      lemon      yellow
#3       pear       green

If the dataset have duplicate entries in the second and third columns.  For example:
 fruit1<- fruit
 fruit1[4,2]<- "lemon"
fruit1[4,3]<- "yellow"
 res2<- merge(fruit1["reference"],fruit1[,-1],by.x="reference",by.y="list")
res2
#   reference information
#1 grapefruit        pink
#2      lemon      yellow
#3      lemon      yellow
#4       pear       green

unique(res2)
#   reference information
#1 grapefruit        pink
#2      lemon      yellow
#4       pear       green
A.K.



Hi everyone, 

I am pretty new to R, so be patient. 

I am trying to intersect 2 columns and in the rows that 
intersect, I want information from the 3rd column to be brought with it.
 I think it will be easier to explain with an example example.csv. 

In my example, I have a reference list of fruit (first column), 
and my fruit of interest (second column), and then in the third column, I
 have color information about the fruit of interest in the second 
column. 

Currently to find the intersection between column 1 and 2, I use 
>fruit<-read.csv("//Users//J//Desktop//example.csv", header=TRUE) 
>output<-intersect(fruit[,1],fruit[,2]) 
>write.table(data.frame(fruit),"output.xls", col.names=TRUE, row.names=FALSE) 

However, it would save me a lot of time if I could have the 
color information from column 3 be saved with the overlap. I normally 
have reference list of several hundred and lists of interest in the 
several thousand, and bringing over the information column would be hard
 manually. 

Is there some sort of If function I could be using? I would 
really like something like If row x, column 2 intersects with row x, 
column 1, then row x column 3 is stored with it. I can think through the
 logic, but not sure how to do it in R. 

Any help would be much appreciated!   




More information about the R-help mailing list