[R] Pairwise comparison between columns, logic
arun
smartpink111 at yahoo.com
Fri Jul 26 20:19:16 CEST 2013
Hi,
Using the example code without removing CEBPA:
gset<- read.table("Names.txt",header=TRUE,stringsAsFactors=FALSE)
temp1<- read.table("Data.txt",header=TRUE,stringsAsFactors=FALSE)
lst1<-split(temp1,temp1$Names)
mat1<-combn(gset[,1],2)
library(plyr)
lst2<-lapply(split(mat1,col(mat1)),function(x){lst1[x][all(lapply(lst1[x],length)==2)]})
lst3<-lapply(lst2[lapply(lst2,length)==2],function(x) {x1<- join_all(x,by="patient_id",type="inner");x2<-x1["patient_id"];row.names(x2)<-if(nrow(x1)!=0) paste(x1[,1],x1[,3],1:nrow(x1),sep="_") else NULL;x2 })
Reduce(rbind,lst3)
# patient_id
#DNMT3A_FLT3_1 LAML-AB-2811-TB
#DNMT3A_FLT3_2 LAML-AB-2816-TB
#DNMT3A_FLT3_3 LAML-AB-2818-TB
#DNMT3A_IDH1_1 LAML-AB-2802-TB
#DNMT3A_IDH1_2 LAML-AB-2822-TB
#DNMT3A_NPM1_1 LAML-AB-2802-TB
#DNMT3A_NPM1_2 LAML-AB-2809-TB
#DNMT3A_NPM1_3 LAML-AB-2811-TB
#DNMT3A_NPM1_4 LAML-AB-2816-TB
#DNMT3A_NRAS_1 LAML-AB-2816-TB
#FLT3_NPM1_1 LAML-AB-2811-TB
#FLT3_NPM1_2 LAML-AB-2812-TB
#FLT3_NPM1_3 LAML-AB-2816-TB
#FLT3_NRAS_1 LAML-AB-2816-TB
#IDH1_NPM1_1 LAML-AB-2802-TB
#NPM1_NRAS_1 LAML-AB-2816-TB
########From your original dataset:
gset<- read.table("SampleGenes.txt",header=TRUE,stringsAsFactors=FALSE)
temp0<- read.table("LAML-TB.final_analysis_set.txt",header=TRUE,stringsAsFactors=FALSE,sep="\t")
temp1<- temp0[,c("Hugo_Symbol","firehose_patient_id")]
str(temp1)
#'data.frame': 2221 obs. of 2 variables:
# $ Hugo_Symbol : chr "TBX15" "TCHHL1" "DNMT3A" "IDH1" ...
# $ firehose_patient_id: chr "LAML-AB-2802-TB" "LAML-AB-2802-TB" "LAML-AB-2802-TB" "LAML-AB-2802-TB" ...
lst1<-split(temp1,temp1$Hugo_Symbol)
length(lst1)
#[1] 1607
mat1<-combn(gset[,1],2) # Generate all
lst2<-lapply(split(mat1,col(mat1)),function(x){lst1[x][all(lapply(lst1[x],length)==2)]})
length(lst2)
#[1] 105
lst3<-lapply(lst2[lapply(lst2,length)==2],function(x) {x1<- join_all(x,by="firehose_patient_id",type="inner");x2<-x1["firehose_patient_id"];row.names(x2)<-if(nrow(x1)!=0) paste(x1[,1],x1[,3],1:nrow(x1),sep="_") else NULL;x2 })
res<-Reduce(rbind,lst3)
nrow(res)
#[1] 234
head(res)
# firehose_patient_id
#NPM1_FLT3_1 LAML-AB-2811-TB
#NPM1_FLT3_2 LAML-AB-2812-TB
#NPM1_FLT3_3 LAML-AB-2816-TB
#NPM1_FLT3_4 LAML-AB-2818-TB
#NPM1_FLT3_5 LAML-AB-2825-TB
#NPM1_FLT3_6 LAML-AB-2836-TB
Regarding your second question:
setdiff(gset[,1],unique(temp1[,1])) # CEBPA was not found in the temp1[,1]
#[1] "CEBPA"
mat2<- combn(gset[-5,1],2)
vec1<- apply(mat2,2,paste,collapse="_")
vec2<-unique(gsub("(.*\\_.*)\\_.*","\\1",row.names(res)))
setdiff(vec1,vec2)
#[1] "NPM1_TP53" "NPM1_EZH2" "NPM1_RUNX1" "NPM1_ASXL1" "NPM1_KDM6A"
#[6] "FLT3_TP53" "FLT3_EZH2" "FLT3_KRAS" "FLT3_ASXL1" "FLT3_KDM6A"
#[11] "IDH1_TP53" "IDH1_KRAS" "NRAS_IDH2" "NRAS_KRAS" "NRAS_ASXL1"
#[16] "NRAS_KDM6A" "TP53_EZH2" "TP53_IDH2" "TP53_RUNX1" "TP53_KRAS"
#[21] "TP53_WT1" "TP53_ASXL1" "TP53_KDM6A" "EZH2_IDH2" "EZH2_WT1"
#[26] "EZH2_ASXL1" "EZH2_KDM6A" "IDH2_TET2" "IDH2_KDM6A" "RUNX1_KDM6A"
#[31] "KRAS_WT1" "KRAS_KDM6A" "WT1_ASXL1" "WT1_TET2" "WT1_KDM6A"
#[36] "ASXL1_TET2" "ASXL1_KDM6A" "TET2_KDM6A"
A.K.
----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: Manisha <manishabh77 at gmail.com>
Cc: R help <r-help at r-project.org>
Sent: Friday, July 26, 2013 11:18 AM
Subject: Re: Pairwise comparison between columns, logic
Hi Manisha,
I didn't run your dataset as I am on the way to college. But, from the error reported, I think it will be due to some missing combinations in one of the dataset. For ex. if you run my previous code without removing CEBPA:
ie.
mat1<- combn(gset[,1],2)
lst2<-lapply(split(mat1,col(mat1)),function(x) {x1<-join_all(lst1[x],by="patient_id",type="inner");x1["patient_id"] } )
#Error: All inputs to rbind.fill must be data.frames
So, check whether all the combinations are available in the `lst1`.
2. I will get back to you once I run it.
A.K.
________________________________
From: Manisha <manishabh77 at gmail.com>
To: arun <smartpink111 at yahoo.com>
Sent: Friday, July 26, 2013 11:09 AM
Subject: Re: Pairwise comparison between columns, logic
Hi Arun,
I ran the script on a larger dataset and I seem to be running into this following error:
Error: All inputs to rbind.fill must be data.frames
after the step;
lst2<-lapply(split(mat1,col(mat1)),function(x) {x1<-join_all(lst1[x],by="firehose_patient_id",type="inner");x1["firehose_patient_id"]})
I tried a few things to solve the issue but I am not able to. The format of input files and data are same as in the code you posted.
Could you suggest me something?
I have attached my input files on which I am trying to run the code. See attached code as well. Minor changes have been made by me.
2. I have another question. From your code how do also capture those pairs of names that donot have any common patient id?
Thanks again,
-M
On Fri, Jul 26, 2013 at 9:29 AM, arun <smartpink111 at yahoo.com> wrote:
Hi M,
>No problem.
>Regards,
>Arun
>
>
>
>
>----- Original Message -----
>From: "manishabh77 at gmail.com" <manishabh77 at gmail.com>
>To: smartpink111 at yahoo.com
>Cc:
>Sent: Friday, July 26, 2013 9:27 AM
>Subject: Re: Pairwise comparison between columns, logic
>
>Thanks for the code. It is elegant and does what I need. Learnt some new things.
>-M
>
>
>_____________________________________
>Sent from http://r.789695.n4.nabble.com
>
More information about the R-help
mailing list