[R] apply family functions

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Sat Aug 7 18:42:40 CEST 2010


Sapply is not significantly faster than a for loop. Vectorization 
generally is, but you pay for it in RAM.

 > dat.oc <- oc[dat$Class,]
 > dat$Flag <- ifelse(with(dat.oc,(Open<=dat$Close_date & 
dat$Close_date<=Close) | (Open1<=dat$Close_date & 
dat$Close_date<=Close1)),"Valid","Invalid")

If you are really hurting for RAM, you might take a rather less 
computationally efficient approach:

 > dat$Flag <- sapply(1:length(dat$Class), function(idx,ddat,toc){cl <- 
ddat[idx,"Class"]
cld <- ddat[idx,"Close_date"]
if ( (toc[cl,"Open"]<=cld && cld<=toc[cl,"Close"]) || 
(toc[cl,"Open1"]<=cld && cld<=toc[cl,"Close1"])) {result <- "Valid"} 
else {result <- "Invalid"}
c(Flag=result) }, ddat=dat, toc=oc )


Steven Kang wrote:
> ini <- as.Date("2010/1/1", "%Y/%m/%d")
> # Generate arbitrary data frame consisting of date values
> oc <- data.frame(Open = seq(ini, ini + 6, 1), Close = seq(ini + 365, ini +
> 365 + 6, 1), Open1 = seq(ini + 365*2, ini + 365*2 + 6, 1), Close1 = seq(ini
> + 365*3, ini + 365*3 + 6, 1), Open2 = seq(ini + 365*4, ini + 365*4 + 6, 1),
> Close2 = seq(ini + 365*5, ini + 365*5 + 6, 1))
> rownames(oc) <- c("AAA", "C", "AA", "A", "CC", "BB", "B")
>
>   
>> oc
>>     
>           Open          Close          Open1        Close1
> Open2        Close2
> AAA  2010-01-01  2011-01-01  2012-01-01  2012-12-31  2013-12-31  2014-12-31
> C      2010-01-02  2011-01-02  2012-01-02  2013-01-01  2014-01-01
> 2015-01-01
> AA    2010-01-03  2011-01-03  2012-01-03  2013-01-02  2014-01-02  2015-01-02
> A      2010-01-04  2011-01-04  2012-01-04  2013-01-03  2014-01-03
> 2015-01-03
> CC    2010-01-05  2011-01-05  2012-01-05  2013-01-04  2014-01-04  2015-01-04
> BB    2010-01-06  2011-01-06  2012-01-06  2013-01-05  2014-01-05  2015-01-05
> B     2010-01-07   2011-01-07  2012-01-07  2013-01-06  2014-01-06
> 2015-01-06
>
> dat <- data.frame(Class = c("AAA", "C", "CC", "BB", "B", "A"), Close_date =
> c(ini, ini, ini, ini+109, ini+39, ini+24), stringsAsFactors = FALSE)
> ind <- sapply(dat$Class, function(x) match(x, rownames(oc)))
>
> for (i in length(ind))  {
>     dat[["Flag"]] <- sapply(dat[["Close_date"]], function(x) ifelse((x >=
> oc[ind[[i]], 1] & x < oc[ind[[i]], 2]) | (x >= oc[ind[[i]], 3] & x <
>     oc[ind[[i]], 4]) | (x >= oc[ind[[i]], 5] & x < oc[ind[[i]], 6]),
> "Valid", "Invalid"))
> }
>   
>> dat
>>     
>      Class   Close_date    Flag
> *1   AAA    2010-01-01   Invalid*
> 2     C      2010-01-01   Invalid
> 3    CC    2010-01-01    Invalid
> 4    BB    2010-04-20    Valid
> 5     B     2010-02-09    Valid
> 6     A     2010-01-25    Valid
> The first record (highlighted in yellow) is flagged as "Invalid" where it
> should really be "Valid".
>
> Any suggestions on resolving this would be great.
>
> Many thanks.
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list