[R] Pairwise comparison between columns, logic

arun smartpink111 at yahoo.com
Fri Jul 26 20:19:16 CEST 2013


Hi,
Using the example code without removing CEBPA:
gset<- read.table("Names.txt",header=TRUE,stringsAsFactors=FALSE)
 temp1<- read.table("Data.txt",header=TRUE,stringsAsFactors=FALSE)
lst1<-split(temp1,temp1$Names)
mat1<-combn(gset[,1],2) 
library(plyr)
lst2<-lapply(split(mat1,col(mat1)),function(x){lst1[x][all(lapply(lst1[x],length)==2)]})
lst3<-lapply(lst2[lapply(lst2,length)==2],function(x) {x1<- join_all(x,by="patient_id",type="inner");x2<-x1["patient_id"];row.names(x2)<-if(nrow(x1)!=0) paste(x1[,1],x1[,3],1:nrow(x1),sep="_") else NULL;x2 })
 Reduce(rbind,lst3)
#                   patient_id
#DNMT3A_FLT3_1 LAML-AB-2811-TB
#DNMT3A_FLT3_2 LAML-AB-2816-TB
#DNMT3A_FLT3_3 LAML-AB-2818-TB
#DNMT3A_IDH1_1 LAML-AB-2802-TB
#DNMT3A_IDH1_2 LAML-AB-2822-TB
#DNMT3A_NPM1_1 LAML-AB-2802-TB
#DNMT3A_NPM1_2 LAML-AB-2809-TB
#DNMT3A_NPM1_3 LAML-AB-2811-TB
#DNMT3A_NPM1_4 LAML-AB-2816-TB
#DNMT3A_NRAS_1 LAML-AB-2816-TB
#FLT3_NPM1_1   LAML-AB-2811-TB
#FLT3_NPM1_2   LAML-AB-2812-TB
#FLT3_NPM1_3   LAML-AB-2816-TB
#FLT3_NRAS_1   LAML-AB-2816-TB
#IDH1_NPM1_1   LAML-AB-2802-TB
#NPM1_NRAS_1   LAML-AB-2816-TB




########From your original dataset:
gset<- read.table("SampleGenes.txt",header=TRUE,stringsAsFactors=FALSE) 
temp0<- read.table("LAML-TB.final_analysis_set.txt",header=TRUE,stringsAsFactors=FALSE,sep="\t") 
 temp1<- temp0[,c("Hugo_Symbol","firehose_patient_id")]
 str(temp1)
#'data.frame':    2221 obs. of  2 variables:
# $ Hugo_Symbol        : chr  "TBX15" "TCHHL1" "DNMT3A" "IDH1" ...
# $ firehose_patient_id: chr  "LAML-AB-2802-TB" "LAML-AB-2802-TB" "LAML-AB-2802-TB" "LAML-AB-2802-TB" ...
lst1<-split(temp1,temp1$Hugo_Symbol) 
 length(lst1)
#[1] 1607
mat1<-combn(gset[,1],2) # Generate all
lst2<-lapply(split(mat1,col(mat1)),function(x){lst1[x][all(lapply(lst1[x],length)==2)]}) 
 length(lst2)
#[1] 105


 lst3<-lapply(lst2[lapply(lst2,length)==2],function(x) {x1<- join_all(x,by="firehose_patient_id",type="inner");x2<-x1["firehose_patient_id"];row.names(x2)<-if(nrow(x1)!=0) paste(x1[,1],x1[,3],1:nrow(x1),sep="_") else NULL;x2 })
res<-Reduce(rbind,lst3)
 nrow(res)
#[1] 234
head(res)
#            firehose_patient_id
#NPM1_FLT3_1     LAML-AB-2811-TB
#NPM1_FLT3_2     LAML-AB-2812-TB
#NPM1_FLT3_3     LAML-AB-2816-TB
#NPM1_FLT3_4     LAML-AB-2818-TB
#NPM1_FLT3_5     LAML-AB-2825-TB
#NPM1_FLT3_6     LAML-AB-2836-TB




Regarding your second question:
setdiff(gset[,1],unique(temp1[,1])) # CEBPA was not found in the temp1[,1]
#[1] "CEBPA"
mat2<- combn(gset[-5,1],2)
vec1<- apply(mat2,2,paste,collapse="_")
vec2<-unique(gsub("(.*\\_.*)\\_.*","\\1",row.names(res)))
setdiff(vec1,vec2)
 #[1] "NPM1_TP53"   "NPM1_EZH2"   "NPM1_RUNX1"  "NPM1_ASXL1"  "NPM1_KDM6A" 
 #[6] "FLT3_TP53"   "FLT3_EZH2"   "FLT3_KRAS"   "FLT3_ASXL1"  "FLT3_KDM6A" 
#[11] "IDH1_TP53"   "IDH1_KRAS"   "NRAS_IDH2"   "NRAS_KRAS"   "NRAS_ASXL1" 
#[16] "NRAS_KDM6A"  "TP53_EZH2"   "TP53_IDH2"   "TP53_RUNX1"  "TP53_KRAS"  
#[21] "TP53_WT1"    "TP53_ASXL1"  "TP53_KDM6A"  "EZH2_IDH2"   "EZH2_WT1"   
#[26] "EZH2_ASXL1"  "EZH2_KDM6A"  "IDH2_TET2"   "IDH2_KDM6A"  "RUNX1_KDM6A"
#[31] "KRAS_WT1"    "KRAS_KDM6A"  "WT1_ASXL1"   "WT1_TET2"    "WT1_KDM6A"  
#[36] "ASXL1_TET2"  "ASXL1_KDM6A" "TET2_KDM6A" 
A.K.


----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: Manisha <manishabh77 at gmail.com>
Cc: R help <r-help at r-project.org>
Sent: Friday, July 26, 2013 11:18 AM
Subject: Re: Pairwise comparison between columns, logic

Hi Manisha,
I didn't run your dataset as I am on the way to college.  But, from the error reported, I think it will be due to some missing combinations in one of the dataset.  For ex. if you run my previous code without removing CEBPA:
ie.
mat1<- combn(gset[,1],2)


lst2<-lapply(split(mat1,col(mat1)),function(x) {x1<-join_all(lst1[x],by="patient_id",type="inner");x1["patient_id"] } )
#Error: All inputs to rbind.fill must be data.frames


So, check whether all the combinations are available in the `lst1`.

2. I will get back to you once I run it.
A.K.





________________________________
From: Manisha <manishabh77 at gmail.com>
To: arun <smartpink111 at yahoo.com> 
Sent: Friday, July 26, 2013 11:09 AM
Subject: Re: Pairwise comparison between columns, logic



Hi Arun,
I ran the script on a larger dataset and I seem to be running into this following error:
Error: All inputs to rbind.fill must be data.frames
after the step;
lst2<-lapply(split(mat1,col(mat1)),function(x) {x1<-join_all(lst1[x],by="firehose_patient_id",type="inner");x1["firehose_patient_id"]}) 
I tried a few things to solve the issue but I am not able to. The format of input files and data are same as in the code you posted.
Could you suggest me something?
I have attached my input files on which I am trying to run the code. See attached code as well. Minor changes have been made by me.

2. I have another question. From your code how do also capture those pairs of names that donot have any common patient id?

Thanks again,
-M


On Fri, Jul 26, 2013 at 9:29 AM, arun <smartpink111 at yahoo.com> wrote:

Hi M,
>No problem.
>Regards,
>Arun
>
>
>
>
>----- Original Message -----
>From: "manishabh77 at gmail.com" <manishabh77 at gmail.com>
>To: smartpink111 at yahoo.com
>Cc:
>Sent: Friday, July 26, 2013 9:27 AM
>Subject: Re: Pairwise comparison between columns, logic
>
>Thanks for the code. It is elegant and does what I need. Learnt some new things.
>-M
>
>
>_____________________________________
>Sent from http://r.789695.n4.nabble.com
>  



More information about the R-help mailing list