[R] speed up process

Ivan Calandra ivan.calandra at uni-hamburg.de
Fri Feb 25 12:57:21 CET 2011


Dear Jim,

I've tried to use Rprof() as you advised me, but I don't understand how 
it works.
I've done this:
Rprof(for (i in seq_along(seq.yvar)){
   all_my_commands
})
summaryRprof()

But I got this error:
Error in summaryRprof() : no lines found in ‘Rprof.out’

I couldn't really understand from the help page what I should do.

In any case, it's sure that the function tstsreg(), is what takes the 
most computing time. But I wanted to optimize the rest of the code to 
gain as much speed as possible.

Ivan

Le 2/25/2011 12:30, Jim Holtman a écrit :
> use Rprof to find where time is being spent.  probably in 'plot' which might imply it is not the 'for' loop and therefore beyond your control.
>
> Sent from my iPad
>
> On Feb 25, 2011, at 6:19, Ivan Calandra<ivan.calandra at uni-hamburg.de>  wrote:
>
>> Thanks Nick for your quick answer.
>> It does work (no missed bracket!) but unfortunately doesn't really speed up anything: with my real data, it takes 82.78 seconds with the double lapply() instead of 83.59s with the double loop (about 0.8 s).
>>
>> It looks like my double loop was not that bad. Does anyone know another faster way to do this?
>>
>> Thanks again in advance,
>> Ivan
>>
>> Le 2/25/2011 11:41, Nick Sabbe a écrit :
>>> Simply avoiding the for loops by using lapply (I may have missed a bracket
>>> here or there cause I did this without opening R)...
>>> Haven't checked the speed up, though.
>>>
>>> lapply(seq.yvar, function(k){
>>>     plot(mydata1[[k]]~mydata1[[ind.xvar]], type="p",
>>> xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k])
>>>     lapply(seq_along(mydata_list), function(j){
>>>       foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j,
>>> pos=mypos[j], name.dat=names(mydata_list)[j])
>>>       return(NULL)
>>>     })
>>>     invisible(NULL)
>>> })
>>>
>>> HTH,
>>>
>>> Nick Sabbe
>>> --
>>> ping: nick.sabbe at ugent.be
>>> link: http://biomath.ugent.be
>>> wink: A1.056, Coupure Links 653, 9000 Gent
>>> ring: 09/264.59.36
>>>
>>> -- Do Not Disapprove
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
>>> Behalf Of Ivan Calandra
>>> Sent: vrijdag 25 februari 2011 11:20
>>> To: r-help
>>> Subject: [R] speed up process
>>>
>>> Dear users,
>>>
>>> I have a double for loop that does exactly what I want, but is quite
>>> slow. It is not so much with this simplified example, but IRL it is slow.
>>> Can anyone help me improve it?
>>>
>>> The data and code for foo_reg() are available at the end of the email; I
>>> preferred going directly into the problematic part.
>>> Here is the code (I tried to simplify it but I cannot do it too much or
>>> else it wouldn't represent my problem). It might also look too complex
>>> for what it is intended to do, but my colleagues who are also supposed
>>> to use it don't know much about R. So I wrote it so that they don't have
>>> to modify the critical parts to run the script for their needs.
>>>
>>> #column indexes for function
>>> ind.xvar<- 2
>>> seq.yvar<- 3:4
>>> #position vector for legend(), stupid positioning but it doesn't matter here
>>> mypos<- c("topleft", "topright","bottomleft")
>>>
>>> #run the function for columns 3&4 as y (seq.yvar) with column 2 as x
>>> (ind.xvar) for all 3 datasets (mydata_list)
>>> par(mfrow=c(2,1))
>>> for (i in seq_along(seq.yvar)){
>>>     k<- seq.yvar[i]
>>>     plot(mydata1[[k]]~mydata1[[ind.xvar]], type="p",
>>> xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k])
>>>     for (j in seq_along(mydata_list)){
>>>       foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j,
>>> pos=mypos[j], name.dat=names(mydata_list)[j])
>>>     }
>>> }
>>>
>>> I tried with lapply() or mapply() but couldn't manage to pass the
>>> arguments for names() and col= correctly, e.g. for the 2nd loop:
>>> lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar,
>>> yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])})
>>> mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])},
>>> mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar=ind.xvar, yvar=k))
>>>
>>> Thanks in advance for any hints.
>>> Ivan
>>>
>>>
>>>
>>>
>>> #create data (it looks horrible with these datasets but it doesn't
>>> matter here)
>>> mydata1<- structure(list(species = structure(1:8, .Label = c("alsen",
>>> "gogor", "loalb", "mafas", "pacyn", "patro", "poabe", "thgel"), class =
>>> "factor"), fruit = c(0.52, 0.45, 0.43, 0.82, 0.35, 0.9, 0.68, 0), Asfc =
>>> c(207.463765, 138.5533755, 70.4391735, 160.9742745, 41.455809,
>>> 119.155109, 26.241441, 148.337377), Tfv = c(47068.1437773483,
>>> 43743.8087431582, 40323.5209129239, 23420.9455581495, 29382.6947428651,
>>> 50460.2202192311, 21810.1456510625, 41747.6053810881)), .Names =
>>> c("species", "fruit", "Asfc", "Tfv"), row.names = c(NA, 8L), class =
>>> "data.frame")
>>>
>>> mydata2<- mydata1[!(mydata1$species %in% c("thgel","alsen")),]
>>> mydata3<- mydata1[!(mydata1$species %in% c("thgel","alsen","poabe")),]
>>> mydata_list<- list(mydata1=mydata1, mydata2=mydata2, mydata3=mydata3)
>>>
>>> #function for regression
>>> library(WRS)
>>> foo_reg<- function(dat, xvar, yvar, mycol, pos, name.dat){
>>>    tsts<- tstsreg(dat[[xvar]], dat[[yvar]])
>>>    tsts_inter<- signif(tsts$coef[1], digits=3)
>>>    tsts_slope<- signif(tsts$coef[2], digits=3)
>>>    abline(tsts$coef, lty=1, col=mycol)
>>>    legend(x=pos, legend=c(paste("TSTS ",name.dat,":
>>> Y=",tsts_inter,"+",tsts_slope,"X",sep="")), lty=1, col=mycol)
>>> }
>>>
>> -- 
>> Ivan CALANDRA
>> PhD Student
>> University of Hamburg
>> Biozentrum Grindel und Zoologisches Museum
>> Abt. Säugetiere
>> Martin-Luther-King-Platz 3
>> D-20146 Hamburg, GERMANY
>> +49(0)40 42838 6231
>> ivan.calandra at uni-hamburg.de
>>
>> **********
>> http://www.for771.uni-bonn.de
>> http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

-- 
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calandra at uni-hamburg.de

**********
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php



More information about the R-help mailing list