[R] new question

Sat Mar 16 04:09:14 CET 2013

Hi,
Try this:

directory<- "/home/arunksa111/dados" 
#modified the function
GetFileList <- function(directory,number){
 setwd(directory)
 filelist1<-dir()
    lista<-dir(directory,pattern = paste("MSMS_",number,"PepInfo.txt",sep=""), full.names = TRUE, recursive = TRUE)
     output<- list(filelist1,lista)
 return(output)
    }

 file.list.names<-GetFileList(directory,23)[[1]]
 lista<-GetFileList(directory,23)[[2]]
FacGroup<-c(0,1,0,2,2,0,3)

ReadDir<-function(FacGroup){
 list.new<-lista[FacGroup!=0]
 read.list<-lapply(list.new, function(x) read.table(x,header=TRUE, sep = "\t"))
 names(read.list)<-file.list.names[FacGroup!=0]
 return (read.list)
} 
ListFacGroup<-ReadDir(FacGroup)

z.boxplot<- function(lst){
new.list<-  lapply(lst,function(x) x[x$FDR<0.01,])
pdf("VeraBP.pdf")
lapply(names(new.list),function(x) lapply(new.list[x],function(y) boxplot(FDR~z,data=y,xlab="Charge",ylab="FDR",main=x)))
dev.off()
}
z.boxplot(ListFacGroup)

A.K.

________________________________
From: Vera Costa <veracosta.rt at gmail.com>
To: arun <smartpink111 at yahoo.com> 
Sent: Friday, March 15, 2013 2:08 PM
Subject: Re: new question

Sorry, you could give me a small new help?

Using the same data, I need a boxplot by groups.

I write he the functions I'm using. The last (z.boxplot is what I need, the other is ok). Thank you one more time.

GetFileList <- function(directory,number){
 setwd(directory)
 filelist1<-dir()[file.info(dir())$isdir]
    direct<-dir(directory,pattern = paste("MSMS_",number,"PepInfo.txt",sep=""), full.names = FALSE, recursive = TRUE)
 direct<-lapply(direct,function(x) paste(directory,"/",x,sep=""))
    lista<-unlist(direct)
 output<- list(filelist1,lista)
 return(output)
    }

ReadDir<-function(FacGroup){
 list.new<-lista[FacGroup!=0]
 read.list<-lapply(list.new, function(x) read.table(x,header=TRUE, sep = "\t"))
 names(read.list)<-file.list.names[FacGroup!=0]
 return (read.list)
} 

directory<-"C:/Users/Vera Costa/Desktop/dados.lixo"
 file.list.names<-GetFileList(directory,23) [[1]]
 lista<-GetFileList(directory,23) [[2]]
FacGroup<-c(0,1,0,2,2,0,3)
ListFacGroup<-ReadDir(FacGroup)
#zPValues(ListFacGroup,FacGroup) 

z.boxplot <- function(lista) {
#I need eliminate all data with FDR<0.01
new.list<-lista[FDR<0.01]
#boxplots split by groups 
boxplot(FDR ~ z, data = dct1,  xlab = "Charge", ylab = "FDR",main=(paste("t",i)))
 }
z.boxplot(ListFacGroup)

2013/3/13 Vera Costa <veracosta.rt at gmail.com>

No problem!
>Sorry my questions.
>
>
>
>2013/3/13 arun <smartpink111 at yahoo.com>
>
>As I mentioned earlier, I don't find it useful to do anova on that kind of data.  Previously, I tried with chisq.test also.  It gave warnings() and then you responded that it is not correct.  I would suggest you to dput an example dataset of the specific columns  that you want to compare (possibly by row) and post in the R-help list.  If you get any reply, then you can implement it on your whole list of files.  Sorry, today, I am busy.    
>>
>>
>>
>>
>>
>>
>>
>>________________________________
>>From: Vera Costa <veracosta.rt at gmail.com>
>>To: arun <smartpink111 at yahoo.com>
>>Sent: Wednesday, March 13, 2013 9:43 AM
>>
>>Subject: Re: new question
>>
>>
>>Ok. Thank you.
>>Could you help me to apply this?
>>
>>
>>
>>2013/3/13 arun <smartpink111 at yahoo.com>
>>
>>you are comparing one datapoint to another.  It doesn't make sense.  For anova, you need replications to calculate df.  may be you could try chisq.test.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>________________________________
>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>To: arun <smartpink111 at yahoo.com>
>>>Sent: Wednesday, March 13, 2013 8:56 AM
>>>
>>>Subject: Re: new question
>>>
>>>
>>>I agree with you.
>>>
>>>I write this tests because I need to compare with some test. I agree is not very correct, but what is bioconductor?I need to eliminate some data (rows) not very significant based in some statistics. What about your idea? How can I do this?
>>>
>>>
>>>
>>>2013/3/13 arun <smartpink111 at yahoo.com>
>>>
>>>Ok.
>>>>
>>>>"
>>>>
>>>>I need a t test (it's in this function). But I need a chisq.test corrected and a Anova with data in attach.
>>>>"
>>>>What do you mean by this?
>>>>
>>>>Though, I calculated the t test based on comparing a single value against another for each row, I don't think it makes sense statistically.  Here, you are estimating the mean by just one value, which then is the mean value and comparing it with another value.  It doesn't make much sense.  I think in bioconductor there are some packages which do this kind of comparison (I don't remember the names).  Also, I am not sure what kind of inference you want from chisquare test.  Also, from anova test (?using just 2 datapoints) (if the comparison is rowwise).
>>>>
>>>>
>>>>
>>>>
>>>>________________________________
>>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>>To: arun <smartpink111 at yahoo.com>
>>>>Sent: Tuesday, March 12, 2013 6:04 PM
>>>>
>>>>Subject: Re: new question
>>>>
>>>>
>>>>Ok. It isn't the last code...
>>>>You sent me this code
>>>>
>>>>directory<- "/home/arunksa111/data.new"
>>>>#first function
>>>>filelist<-function(directory,number,list1){
>>>>setwd(directory)
>>>>filelist1<-dir(directory)
>>>>
>>>>direct<-dir(directory,pattern = paste("MSMS_",number,"PepInfo.txt",sep=""), full.names = FALSE, recursive = TRUE)
>>>>
>>>>list1<-lapply(direct, function(x) read.table(x,header=TRUE, sep = "\t",stringsAsFactors=FALSE))
>>>>names(list1)<-filelist1
>>>>list2<- list(filelist1,list1)
>>>>return(list2)
>>>>}
>>>>foldernames1<-filelist(directory,23,list1)[[1]]
>>>>foldernames1
>>>>#[1] "a1" "c1" "c2" "c3" "t1" "t2"
>>>>lista<-filelist(directory,23,list1)[[2]] #lista output
>>>>
>>>>FacGroup<- c("c1","c3","t2")
>>>>
>>>>#Second function
>>>>f<-function(listRes,Toselect){
>>>>res2<-split(listRes,gsub("[0-9]","",names(listRes)))
>>>>res3<-lapply(seq_along(res2),function(i) lapply(res2[[i]],function(x) x[x[["FDR"]]<0.01,c("Seq","Mod","z","spec")]))
>>>>res4<-lapply(res3,function(x) x[names(x)[names(x)%in%Toselect]])
>>>>res4New<- lapply(res4,function(x) lapply(names(x), function(i) do.call(rbind,lapply(x[i],function(x) cbind(folder_name=i,x))) ))
>>>>library(plyr)
>>>>library(data.table)
>>>>res5<-lapply(res4New,function(x) lapply(x,function(x1){ x1<- data.table(x1);x1[,spec:=paste(spec,collapse=","),by=c("Seq","Mod","z")]}))
>>>>res6<- lapply(res5,function(x) lapply(x,function(x1) {x1$counts<-sapply(x1$spec, function(x2) length(gsub("\\s", "", unlist(strsplit(x2, ",")))));x3<-as.data.frame(x1);names(x3)[6]<- as.character(unique(x3$folder_name));x3[,-c(1,5)]}))
>>>>
>>>>res7<-lapply(res6,function(x) Reduce(function(...) merge(...,by=c("Seq","Mod","z"),all=TRUE),x))
>>>> res8<-res7[lapply(res7,length)!=0]
>>>> res9<- Reduce(function(...) merge(...,by=c("Seq","Mod","z"),all=TRUE),res8)
>>>>res9[is.na(res9)] <- 0
>>>>return(res9)
>>>>}
>>>>
>>>>f(lista,FacGroup)
>>>> head(f(lista,FacGroup))
>>>> #                    Seq        Mod z c1 c3 t2
>>>>#1 aAAAAAAAAAAAAAATATAGPR 1-n_acPro/ 2  0  0  1
>>>>#2  aAAAAAAAAAAASSPVGVGQR 1-n_acPro/ 2  0  0  1
>>>>#3       aAAAAAAAAAGAAGGR 1-n_acPro/ 2  0  0  1
>>>>#4  aAAAAAAAGAAGGRGSGPGRR 1-n_acPro/ 2  1  0  0
>>>>#5            AAAAAAALQAK            2  0  1  1
>>>>#6         aAAAAAGAGPEMVR 1-n_acPro/ 2  0  0  2
>>>>
>>>>resCounts<- f(lista,FacGroup)
>>>>t.test.p.value <- function(...) {
>>>>    obj<-try(t.test(...), silent=TRUE)
>>>>    if (is(obj, "try-error")) return(NA) else return(obj$p.value)
>>>> }
>>>>
>>>>#3rd function for p-value
>>>>fpv<- function(Countdata){
>>>>resNew<-do.call(cbind,lapply(split(names(Countdata)[4:ncol(Countdata)],gsub("[0-9]","",names(Countdata)[4:ncol(Countdata)])), function(i) {x<-if(ncol(Countdata[i])>1) rowSums(Countdata[i]) else Countdata[i]; colnames(x)<-NULL;x}))
>>>>indx<-combn(names(resNew),2)
>>>>resPval<-do.call(cbind,lapply(seq_len(ncol(indx)),function(i) {x<-as.data.frame(apply(resNew[,indx[,i]],1,t.test.p.value)); colnames(x)<-paste("Pvalue",paste(indx[,i],collapse=""),sep="_");x}))
>>>>resF<-cbind(resCounts,resPval)
>>>>resF
>>>>}
>>>>
>>>>fpv(resCounts)
>>>>
>>>>
>>>>I need a t test (it's in this function). But I need a chisq.test corrected and a Anova with data in attach.
>>>>Sorry!
>>>>
>>>>
>>>>No dia 12 de Mar de 2013 20:08, "arun" <smartpink111 at yahoo.com> escreveu:
>>>>
>>>>where is the reference "t-test above"?
>>>>>
>>>>>Which dataset you want to do this?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>________________________________
>>>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>>>To: arun <smartpink111 at yahoo.com>
>>>>>Sent: Tuesday, March 12, 2013 1:50 PM
>>>>>Subject: Re: new question
>>>>>
>>>>>
>>>>>Hi.
>>>>>
>>>>>Could I ask a little help?
>>>>>
>>>>>Could you hel me to do a chisq.test (corrected), and a Anova, like a t-test above? After that I need to remove all data with a p values<0.05.
>>>>>
>>>>>Sorry and thank you again
>>>>>
>>>>>
>>>>>
>>>>>2013/3/7 arun <smartpink111 at yahoo.com>
>>>>>
>>>>>Hi,
>>>>>>
>>>>>>
>>>>>>directory<- "/home/arunksa111/dados" #renamed directory to dados
>>>>>>
>>>>>>filelist<-function(directory,number,list1){
>>>>>>setwd(directory)
>>>>>>filelist1<-dir(directory)
>>>>>>direct<-dir(directory,pattern = paste("MSMS_",number,"PepInfo.txt",sep=""), full.names = FALSE, recursive = TRUE)
>>>>>>list1<-lapply(direct, function(x) read.table(x,header=TRUE, sep = "\t",stringsAsFactors=FALSE))
>>>>>>names(list1)<-filelist1
>>>>>>list2<- list(filelist1,list1)
>>>>>>return(list2)
>>>>>>}
>>>>>>foldernames1<-filelist(directory,23,list1)[[1]]
>>>>>>foldernames1
>>>>>>#[1] "a1" "a2" "c1" "c2" "c3" "t1" "t2"
>>>>>>
>>>>>>lista<-filelist(directory,23,list1)[[2]] #lista output 
>>>>>>
>>>>>>#If you look at the
>>>>>> lapply(lista,function(x) sapply(x,class)) #some spec were integer, and some were character
>>>>>>#do this
>>>>>> listaNew<-lapply(lista,function(x) within(x,{spec<- as.character(spec)}))
>>>>>>
>>>>>>
>>>>>>FacGroup<- c("c1","c3","t2")
>>>>>>#Second function
>>>>>>#f<- function(....)
>>>>>>
>>>>>>head(f(listaNew,FacGroup))
>>>>>>
>>>>>>#                     Seq        Mod z c1 c3 t2
>>>>>>#1 aAAAAAAAAAAAAAATATAGPR 1-n_acPro/ 2  0  0  1
>>>>>>#2  aAAAAAAAAAAASSPVGVGQR 1-n_acPro/ 2  0  0  1
>>>>>>#3       aAAAAAAAAAGAAGGR 1-n_acPro/ 2  0  0  1
>>>>>>#4  aAAAAAAAGAAGGRGSGPGRR 1-n_acPro/ 2  1  0  0
>>>>>>#5            AAAAAAALQAK            2  0  1  1
>>>>>>#6         aAAAAAGAGPEMVR 1-n_acPro/ 2  0  0  2
>>>>>>
>>>>>>
>>>>>>
>>>>>>A.K.
>>>>>>
>>>>>>
>>>>>>
>>>>>>________________________________
>>>>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>>>>To: arun <smartpink111 at yahoo.com>
>>>>>>
>>>>>>Sent: Thursday, March 7, 2013 7:12 AM
>>>>>>Subject: Re: new question
>>>>>>
>>>>>>
>>>>>>
>>>>>>Hi.
>>>>>>
>>>>>>Sorry again a question about this, but when I run this code I have this error:
>>>>>>
>>>>>>Error in `[.data.table`(x1, , `:=`(spec, paste(spec, collapse = ",")),  :
>>>>>>  Type of RHS ('character') must match LHS ('integer'). To check and coerce would impact performance too much for the fastest cases. Either change the type of the target column, or coerce the RHS of := yourself (e.g. by using 1L instead of 1)
>>>>>>
>>>>>>Could you help me to with this? How can I eliminate this?
>>>>>>
>>>>>>Thank you
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>2013/2/28 arun <smartpink111 at yahoo.com>
>>>>>>
>>>>>>Hi,
>>>>>>>directory<- "/home/arunksa111/data.new"
>>>>>>>#first function
>>>>>>>filelist<-function(directory,number,list1){
>>>>>>>setwd(directory)
>>>>>>>filelist1<-dir(directory)
>>>>>>>
>>>>>>>direct<-dir(directory,pattern = paste("MSMS_",number,"PepInfo.txt",sep=""), full.names = FALSE, recursive = TRUE)
>>>>>>>list1<-lapply(direct, function(x) read.table(x,header=TRUE, sep = "\t",stringsAsFactors=FALSE))
>>>>>>>names(list1)<-filelist1
>>>>>>>list2<- list(filelist1,list1)
>>>>>>>return(list2)
>>>>>>>}
>>>>>>>foldernames1<-filelist(directory,23,list1)[[1]]
>>>>>>>foldernames1
>>>>>>>#[1] "a1" "c1" "c2" "c3" "t1" "t2"
>>>>>>>lista<-filelist(directory,23,list1)[[2]] #lista output
>>>>>>>
>>>>>>>FacGroup<- c("c1","c3","t2")
>>>>>>>
>>>>>>>#Second function
>>>>>>>f<-function(listRes,Toselect){
>>>>>>>res2<-split(listRes,gsub("[0-9]","",names(listRes)))
>>>>>>>res3<-lapply(seq_along(res2),function(i) lapply(res2[[i]],function(x) x[x[["FDR"]]<0.01,c("Seq","Mod","z","spec")]))
>>>>>>>res4<-lapply(res3,function(x) x[names(x)[names(x)%in%Toselect]])
>>>>>>>res4New<- lapply(res4,function(x) lapply(names(x), function(i) do.call(rbind,lapply(x[i],function(x) cbind(folder_name=i,x))) ))
>>>>>>>library(plyr)
>>>>>>>library(data.table)
>>>>>>>res5<-lapply(res4New,function(x) lapply(x,function(x1){ x1<- data.table(x1);x1[,spec:=paste(spec,collapse=","),by=c("Seq","Mod","z")]}))
>>>>>>>res6<- lapply(res5,function(x) lapply(x,function(x1) {x1$counts<-sapply(x1$spec, function(x2) length(gsub("\\s", "", unlist(strsplit(x2, ",")))));x3<-as.data.frame(x1);names(x3)[6]<- as.character(unique(x3$folder_name));x3[,-c(1,5)]}))
>>>>>>> 
>>>>>>>res7<-lapply(res6,function(x) Reduce(function(...) merge(...,by=c("Seq","Mod","z"),all=TRUE),x))
>>>>>>> res8<-res7[lapply(res7,length)!=0]
>>>>>>> res9<- Reduce(function(...) merge(...,by=c("Seq","Mod","z"),all=TRUE),res8)
>>>>>>>res9[is.na(res9)] <- 0
>>>>>>>return(res9)
>>>>>>>}
>>>>>>>
>>>>>>>f(lista,FacGroup)
>>>>>>> head(f(lista,FacGroup))
>>>>>>> #                    Seq        Mod z c1 c3 t2
>>>>>>>#1 aAAAAAAAAAAAAAATATAGPR 1-n_acPro/ 2  0  0  1
>>>>>>>#2  aAAAAAAAAAAASSPVGVGQR 1-n_acPro/ 2  0  0  1
>>>>>>>#3       aAAAAAAAAAGAAGGR 1-n_acPro/ 2  0  0  1
>>>>>>>#4  aAAAAAAAGAAGGRGSGPGRR 1-n_acPro/ 2  1  0  0
>>>>>>>#5            AAAAAAALQAK            2  0  1  1
>>>>>>>#6         aAAAAAGAGPEMVR 1-n_acPro/ 2  0  0  2
>>>>>>>
>>>>>>>resCounts<- f(lista,FacGroup)
>>>>>>>t.test.p.value <- function(...) {
>>>>>>>    obj<-try(t.test(...), silent=TRUE)
>>>>>>>    if (is(obj, "try-error")) return(NA) else return(obj$p.value)
>>>>>>> }
>>>>>>>
>>>>>>>#3rd function for p-value
>>>>>>>fpv<- function(Countdata){
>>>>>>>resNew<-do.call(cbind,lapply(split(names(Countdata)[4:ncol(Countdata)],gsub("[0-9]","",names(Countdata)[4:ncol(Countdata)])), function(i) {x<-if(ncol(Countdata[i])>1) rowSums(Countdata[i]) else Countdata[i]; colnames(x)<-NULL;x}))
>>>>>>>indx<-combn(names(resNew),2)
>>>>>>>resPval<-do.call(cbind,lapply(seq_len(ncol(indx)),function(i) {x<-as.data.frame(apply(resNew[,indx[,i]],1,t.test.p.value)); colnames(x)<-paste("Pvalue",paste(indx[,i],collapse=""),sep="_");x}))
>>>>>>>resF<-cbind(resCounts,resPval)
>>>>>>>resF
>>>>>>>}
>>>>>>>
>>>>>>>fpv(resCounts)
>>>>>>>
>>>>>>>
>>>>>>>A.K.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>________________________________
>>>>>>>From: Vera Costa <veracosta.rt at gmail.com>
>>>>>>>To: arun <smartpink111 at yahoo.com>
>>>>>>>Sent: Thursday, February 28, 2013 11:30 AM
>>>>>>>Subject: new question
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>Sorry about my question, but I need a new small thing...I need to split my function to read data and to do the treatment of the data.
>>>>>>>
>>>>>>>At first I need to know the "names" of the files and read data, and after a new function with my analysis.
>>>>>>>
>>>>>>>So, I did this
>>>>>>>
>>>>>>>directory<-"C:/Users/Vera Costa/Desktop/data.new" 
>>>>>>>filelist<-function(directory,number){
>>>>>>>setwd(directory)
>>>>>>>filelist<-dir(directory)
>>>>>>>return(filelist)
>>>>>>>direct<-dir(directory,pattern = paste("MSMS_",number,"PepInfo.txt",sep=""), full.names = FALSE, recursive = TRUE)
>>>>>>>lista<-lapply(direct, function(x) read.table(x,header=TRUE, sep = "\t"))
>>>>>>>names(lista)<-filelist
>>>>>>>return(lista)
>>>>>>>}
>>>>>>>filelist(directory,23)
>>>>>>>
>>>>>>>
>>>>>>>###"a1" "a2" "c1" "c2" "c3" "t1" "t2"
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>and after
>>>>>>>
>>>>>>>f<-function(filelist,FacGroup){
>>>>>>>
>>>>>>>res2<-split(lista,names(lista))
>>>>>>> res3<- lapply(res2,function(x) {names(x)<-paste(gsub(".*_","",names(x)),1:length(x),sep="");x})
>>>>>>>res3
>>>>>>>#Freq FDR<0.01
>>>>>>> res4<-lapply(seq_along(res3),function(i) lapply(res3[[i]],function(x) x[x[["FDR"]]<0.01,c("Seq","Mod","z","spec")]))
>>>>>>> names(res4)<- names(res2)
>>>>>>> res4
>>>>>>>  res4New<-lapply(res4,function(x) lapply(names(x),function(i) do.call(rbind,lapply(x[i],function(x) cbind(folder_name=i,x))) ))
>>>>>>> res5<- lapply(res4New,function(x) if(length(x)>1) tail(x,-1) else NULL)
>>>>>>> library(plyr)
>>>>>>> library(data.table)
>>>>>>> res6<- lapply(res5,function(x) lapply(x,function(x1) {x1<-data.table(x1); x1[,spec:=past
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>How can I "ask lista in second function? Could you help me?             
>>>>>>>    
>>>>>>   
>>>>>      
>>>> 
>>>
>>
>     
-------------- next part --------------
A non-text attachment was scrubbed...
Name: VeraBP.pdf
Type: application/pdf
Size: 7177 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130315/b6134bc9/attachment.pdf>