[R] How do I make this faster?

Mon Apr 11 11:13:12 CEST 2011

On 04/11/2011 10:28 AM, Andreas Borg wrote:
> Hi Hasan,
>
> I'd be happy to help you, but I am not able to run your code. You use
> commandArgs to retrieve arguments of the R program, but which ones do
> you actually provide?
>
> Best regards,
>
> Andreas
>
> Hasan Diwan schrieb:
>> I was on vacation the last week and wrote some code to run a 500-day
>> correlation between the Nasdaq tracking stock (QQQ) and 191 currency
>> pairs
>> for 500 days. The initial run took 9 hours(!) and I'd like to make it
>> faster. So, I'm including my code below, in hopes that somebody will
>> be able
>> to figure out how to make it faster, either through parallelisation,
>> or by
>> making changes. I've marked the places where Rprof showed me it was
>> slowing
>> down:
>> currencyCorrelation <- function(lagtime = 1) {
>>   require(quantmod)
>>
>>   dataTrack <- getSymbols(commandArgs(trailingOnly=T)[1],
>> from='2009-11-21',
>> to='2011-04-03')
>>   stockData <- get(dataTrack)
>>   currencies <- row.names(oanda.currencies[grep(pattern='oz.',
>> fixed=T, x
>> =as.vector(oanda.currencies$oanda.df.1.length.oanda.df...2....1.)) ==
>> F])
>>   correlations <- vector()
>>   values <- list()
>>   # optimise these loops using the apply family
>>   for (i in currencies) {
>>     for (j in currencies) {
>>       if (i == j) next()
>>       fx <- getFX(paste(i, j, sep='/'), from='2009-11-20',
>> to='2011-04-02')
>>       # Prepare data by getting rates for market days only
>>       fx <- get(fx)
>>       fx <- fx[which(index(fx) %in% index(QQQ$QQQ.Close))]
>>       correlation <- cor(fx, QQQ$QQQ.Close)
>>       correlations <- c(correlations, correlation)
In this piece of code you concatenate correlation and correlations.
Because you dynamically change correllations the operating system is
looking for a spot of memory for the object often. Preallocating the
space you need, or a bit is also fine, will make this much faster. You
can do this by not creating zero-length vectors for 'correlations' and
'vectors' before the start of the loop, but creating them already at the
desired length and assign values in the loop, not concatenate. This
could possibly speed up your codes by several orders of magnitude.

cheers,
Paul
>>       string <- paste(paste(i,j,sep='/'), correlation, sep=',')
>>       values <- c(values,paste(string,'\n', sep=''))
>>     }
>>   }
>>   # TODO eliminate NA's
>>   values <- values[which(correlations[is.na(correlations) == F])]
>>   correlations <- correlations[is.na(correlations) == F]
>>   values <- values[order(correlations, decreasing=T)]
>>   write.table(values, file=commandArgs(trailingOnly=T)[2], sep='',
>> qmethod=NULL, quote = F, row.names=F, col.names=F)
>>   rm('currencies', 'correlations', 'values', 'fx', 'string')
>>   return()
>> }
>> lagtime <- as.integer(commandArgs(trailingOnly=T)[3])
>> if (is.na(lagtime)) lagtime <- 1
>> print(paste(Sys.time(), '<--- starting', lagtime, 'day lag currencies
>> correlation with', commandArgs(trailingOnly=T)[1], 'from 2009-11-20 to
>> 2011-04-03'))
>> currencyCorrelation(lagtime)
>> print(paste(Sys.time(), '<--- ended, results in',
>> commandArgs(trailingOnly=T)[2]))
>>
>>
>>   
>
>


-- 
Paul Hiemstra, MSc
Global Climate Division
Royal Netherlands Meteorological Institute (KNMI)
Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
P.O. Box 201 | 3730 AE | De Bilt
tel: +31 30 2206 494

http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770