[R] Aggragating subsets of data in larger vector with sapply

Jim Holtman jholtman at gmail.com
Mon Jan 10 02:50:23 CET 2011


split the data by truncating the time to a second, then process each group. this will save the subsetting you are doing. also merge the data with direction and size in the same frame.  it looks like you can subset by "buy" to begin with.

Sent from my iPad

On Jan 9, 2011, at 19:10, rivercode <aquanyc at gmail.com> wrote:

> 
> 
> Have 40,000 rows of buy/sell trade data and am trying to add up the buys for
> each second, the code works but it is very slow.  Any suggestions how to
> improve the sapply function ?
> 
> secEP = endpoints(xSym$Direction, "secs")  # vector of last second on an XTS
> timeseries object with multiple entries for each second.
> d = xSym$Direction
> s = xSym$Size
> buySize = sapply(1:(length(secEP)-1), function(y) { 
>    i =  (secEP[y]+ 1):secEP[y+1]; # index of vectors between each secEP
>    return(sum(as.numeric(s[i][d[i] == "buy"])));
> } )    
> 
> Object details:
> 
> secEP = numeric Vector of one second Endpoints in xSym$Direction. 
> 
>> head(xSym$Direction)
>                    Direction
> 2011-01-05 09:30:00 "unkn"   
> 2011-01-05 09:30:02 "sell"   
> 2011-01-05 09:30:02 "buy"    
> 2011-01-05 09:30:04 "buy"    
> 2011-01-05 09:30:04 "buy"    
> 2011-01-05 09:30:04 "buy" 
> 
>> head(xSym$Size)
>                    Size  
> 2011-01-05 09:30:00 " 865"
> 2011-01-05 09:30:02 " 100"
> 2011-01-05 09:30:02 " 100"
> 2011-01-05 09:30:04 " 100"
> 2011-01-05 09:30:04 " 100"
> 2011-01-05 09:30:04 "  41"
> 
> Thanks,
> Chris
> 
> 
> -- 
> View this message in context: http://r.789695.n4.nabble.com/Aggragating-subsets-of-data-in-larger-vector-with-sapply-tp3206445p3206445.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list