Hi Arun,

Many thanks for following this through. Based on your earlier  suggestion
with CEBPA, I also edited the code and now it seems work (I removed the non
exiting names).  I looked into the current code that you have sent below
and I  learnt a few more things. So thank you again.
Your skills in R is very good. Could you suggest some resources for advance
R programming. I am fairly comfortable with R, but need to keep abreast
with the latest R libraries that can make life a lot simpler.

Thanks again,
-M

On Fri, Jul 26, 2013 at 2:20 PM, arun kirshna [via R] <
ml-node+s789695n4672460h72@n4.nabble.com> wrote:

> Hi,
> Using the example code without removing CEBPA:
> gset<- read.table("Names.txt",header=TRUE,stringsAsFactors=FALSE)
>  temp1<- read.table("Data.txt",header=TRUE,stringsAsFactors=FALSE)
> lst1<-split(temp1,temp1$Names)
> mat1<-combn(gset[,1],2)
> library(plyr)
> lst2<-lapply(split(mat1,col(mat1)),function(x){lst1[x][all(lapply(lst1[x],length)==2)]})
>
> lst3<-lapply(lst2[lapply(lst2,length)==2],function(x) {x1<-
> join_all(x,by="patient_id",type="inner");x2<-x1["patient_id"];row.names(x2)<-if(nrow(x1)!=0)
> paste(x1[,1],x1[,3],1:nrow(x1),sep="_") else NULL;x2 })
>  Reduce(rbind,lst3)
> #                   patient_id
> #DNMT3A_FLT3_1 LAML-AB-2811-TB
> #DNMT3A_FLT3_2 LAML-AB-2816-TB
> #DNMT3A_FLT3_3 LAML-AB-2818-TB
> #DNMT3A_IDH1_1 LAML-AB-2802-TB
> #DNMT3A_IDH1_2 LAML-AB-2822-TB
> #DNMT3A_NPM1_1 LAML-AB-2802-TB
> #DNMT3A_NPM1_2 LAML-AB-2809-TB
> #DNMT3A_NPM1_3 LAML-AB-2811-TB
> #DNMT3A_NPM1_4 LAML-AB-2816-TB
> #DNMT3A_NRAS_1 LAML-AB-2816-TB
> #FLT3_NPM1_1   LAML-AB-2811-TB
> #FLT3_NPM1_2   LAML-AB-2812-TB
> #FLT3_NPM1_3   LAML-AB-2816-TB
> #FLT3_NRAS_1   LAML-AB-2816-TB
> #IDH1_NPM1_1   LAML-AB-2802-TB
> #NPM1_NRAS_1   LAML-AB-2816-TB
>
>
>
>
> ########From your original dataset:
> gset<- read.table("SampleGenes.txt",header=TRUE,stringsAsFactors=FALSE)
> temp0<-
> read.table("LAML-TB.final_analysis_set.txt",header=TRUE,stringsAsFactors=FALSE,sep="\t")
>
>  temp1<- temp0[,c("Hugo_Symbol","firehose_patient_id")]
>  str(temp1)
> #'data.frame':    2221 obs. of  2 variables:
> # $ Hugo_Symbol        : chr  "TBX15" "TCHHL1" "DNMT3A" "IDH1" ...
> # $ firehose_patient_id: chr  "LAML-AB-2802-TB" "LAML-AB-2802-TB"
> "LAML-AB-2802-TB" "LAML-AB-2802-TB" ...
> lst1<-split(temp1,temp1$Hugo_Symbol)
>  length(lst1)
> #[1] 1607
> mat1<-combn(gset[,1],2) # Generate all
> lst2<-lapply(split(mat1,col(mat1)),function(x){lst1[x][all(lapply(lst1[x],length)==2)]})
>
>  length(lst2)
> #[1] 105
>
>
>  lst3<-lapply(lst2[lapply(lst2,length)==2],function(x) {x1<-
> join_all(x,by="firehose_patient_id",type="inner");x2<-x1["firehose_patient_id"];row.names(x2)<-if(nrow(x1)!=0)
> paste(x1[,1],x1[,3],1:nrow(x1),sep="_") else NULL;x2 })
> res<-Reduce(rbind,lst3)
>  nrow(res)
> #[1] 234
> head(res)
> #            firehose_patient_id
> #NPM1_FLT3_1     LAML-AB-2811-TB
> #NPM1_FLT3_2     LAML-AB-2812-TB
> #NPM1_FLT3_3     LAML-AB-2816-TB
> #NPM1_FLT3_4     LAML-AB-2818-TB
> #NPM1_FLT3_5     LAML-AB-2825-TB
> #NPM1_FLT3_6     LAML-AB-2836-TB
>
>
>
>
> Regarding your second question:
> setdiff(gset[,1],unique(temp1[,1])) # CEBPA was not found in the temp1[,1]
> #[1] "CEBPA"
> mat2<- combn(gset[-5,1],2)
> vec1<- apply(mat2,2,paste,collapse="_")
> vec2<-unique(gsub("(.*\\_.*)\\_.*","\\1",row.names(res)))
> setdiff(vec1,vec2)
>  #[1] "NPM1_TP53"   "NPM1_EZH2"   "NPM1_RUNX1"  "NPM1_ASXL1"  "NPM1_KDM6A"
>  #[6] "FLT3_TP53"   "FLT3_EZH2"   "FLT3_KRAS"   "FLT3_ASXL1"  "FLT3_KDM6A"
> #[11] "IDH1_TP53"   "IDH1_KRAS"   "NRAS_IDH2"   "NRAS_KRAS"   "NRAS_ASXL1"
> #[16] "NRAS_KDM6A"  "TP53_EZH2"   "TP53_IDH2"   "TP53_RUNX1"  "TP53_KRAS"
> #[21] "TP53_WT1"    "TP53_ASXL1"  "TP53_KDM6A"  "EZH2_IDH2"   "EZH2_WT1"
> #[26] "EZH2_ASXL1"  "EZH2_KDM6A"  "IDH2_TET2"   "IDH2_KDM6A"
> "RUNX1_KDM6A"
> #[31] "KRAS_WT1"    "KRAS_KDM6A"  "WT1_ASXL1"   "WT1_TET2"    "WT1_KDM6A"
> #[36] "ASXL1_TET2"  "ASXL1_KDM6A" "TET2_KDM6A"
> A.K.
>
>
> ----- Original Message -----
> From: arun <[hidden email]<http://user/SendEmail.jtp?type=node&node=4672460&i=0>>
>
> To: Manisha <[hidden email]<http://user/SendEmail.jtp?type=node&node=4672460&i=1>>
>
> Cc: R help <[hidden email]<http://user/SendEmail.jtp?type=node&node=4672460&i=2>>
>
> Sent: Friday, July 26, 2013 11:18 AM
> Subject: Re: Pairwise comparison between columns, logic
>
> Hi Manisha,
> I didn't run your dataset as I am on the way to college.  But, from the
> error reported, I think it will be due to some missing combinations in one
> of the dataset.  For ex. if you run my previous code without removing
> CEBPA:
> ie.
> mat1<- combn(gset[,1],2)
>
>
> lst2<-lapply(split(mat1,col(mat1)),function(x)
> {x1<-join_all(lst1[x],by="patient_id",type="inner");x1["patient_id"] } )
> #Error: All inputs to rbind.fill must be data.frames
>
>
> So, check whether all the combinations are available in the `lst1`.
>
> 2. I will get back to you once I run it.
> A.K.
>
>
>
>
>
> ________________________________
> From: Manisha <[hidden email]<http://user/SendEmail.jtp?type=node&node=4672460&i=3>>
>
> To: arun <[hidden email]<http://user/SendEmail.jtp?type=node&node=4672460&i=4>>
>
> Sent: Friday, July 26, 2013 11:09 AM
> Subject: Re: Pairwise comparison between columns, logic
>
>
>
> Hi Arun,
> I ran the script on a larger dataset and I seem to be running into this
> following error:
> Error: All inputs to rbind.fill must be data.frames
> after the step;
> lst2<-lapply(split(mat1,col(mat1)),function(x)
> {x1<-join_all(lst1[x],by="firehose_patient_id",type="inner");x1["firehose_patient_id"]})
>
> I tried a few things to solve the issue but I am not able to. The format
> of input files and data are same as in the code you posted.
> Could you suggest me something?
> I have attached my input files on which I am trying to run the code. See
> attached code as well. Minor changes have been made by me.
>
> 2. I have another question. From your code how do also capture those pairs
> of names that donot have any common patient id?
>
> Thanks again,
> -M
>
>
> On Fri, Jul 26, 2013 at 9:29 AM, arun <[hidden email]<http://user/SendEmail.jtp?type=node&node=4672460&i=5>>
> wrote:
>
> Hi M,
>
> >No problem.
> >Regards,
> >Arun
> >
> >
> >
> >
> >----- Original Message -----
> >From: "[hidden email]<http://user/SendEmail.jtp?type=node&node=4672460&i=6>"
> <[hidden email] <http://user/SendEmail.jtp?type=node&node=4672460&i=7>>
> >To: [hidden email] <http://user/SendEmail.jtp?type=node&node=4672460&i=8>
> >Cc:
> >Sent: Friday, July 26, 2013 9:27 AM
> >Subject: Re: Pairwise comparison between columns, logic
> >
> >Thanks for the code. It is elegant and does what I need. Learnt some new
> things.
> >-M
> >
> >
> >_____________________________________
> >Sent from http://r.789695.n4.nabble.com
> >
>
> ______________________________________________
> [hidden email] <http://user/SendEmail.jtp?type=node&node=4672460&i=9>mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://r.789695.n4.nabble.com/Pairwise-comparison-between-columns-logic-tp4672356p4672460.html
>  To unsubscribe from Pairwise comparison between columns, logic, click
> here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4672356&code=bWFuaXNoYWJoNzdAZ21haWwuY29tfDQ2NzIzNTZ8LTExNzkwNDA0MDg=>
> .
> NAML<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>


--
View this message in context: http://r.789695.n4.nabble.com/Pairwise-comparison-between-columns-logic-tp4672356p4672461.html
Sent from the R help mailing list archive at Nabble.com.
	[[alternative HTML version deleted]]