[R] apply family functions
Jeff Newmiller
jdnewmil at dcn.davis.ca.us
Sat Aug 7 18:42:40 CEST 2010
Sapply is not significantly faster than a for loop. Vectorization
generally is, but you pay for it in RAM.
> dat.oc <- oc[dat$Class,]
> dat$Flag <- ifelse(with(dat.oc,(Open<=dat$Close_date &
dat$Close_date<=Close) | (Open1<=dat$Close_date &
dat$Close_date<=Close1)),"Valid","Invalid")
If you are really hurting for RAM, you might take a rather less
computationally efficient approach:
> dat$Flag <- sapply(1:length(dat$Class), function(idx,ddat,toc){cl <-
ddat[idx,"Class"]
cld <- ddat[idx,"Close_date"]
if ( (toc[cl,"Open"]<=cld && cld<=toc[cl,"Close"]) ||
(toc[cl,"Open1"]<=cld && cld<=toc[cl,"Close1"])) {result <- "Valid"}
else {result <- "Invalid"}
c(Flag=result) }, ddat=dat, toc=oc )
Steven Kang wrote:
> ini <- as.Date("2010/1/1", "%Y/%m/%d")
> # Generate arbitrary data frame consisting of date values
> oc <- data.frame(Open = seq(ini, ini + 6, 1), Close = seq(ini + 365, ini +
> 365 + 6, 1), Open1 = seq(ini + 365*2, ini + 365*2 + 6, 1), Close1 = seq(ini
> + 365*3, ini + 365*3 + 6, 1), Open2 = seq(ini + 365*4, ini + 365*4 + 6, 1),
> Close2 = seq(ini + 365*5, ini + 365*5 + 6, 1))
> rownames(oc) <- c("AAA", "C", "AA", "A", "CC", "BB", "B")
>
>
>> oc
>>
> Open Close Open1 Close1
> Open2 Close2
> AAA 2010-01-01 2011-01-01 2012-01-01 2012-12-31 2013-12-31 2014-12-31
> C 2010-01-02 2011-01-02 2012-01-02 2013-01-01 2014-01-01
> 2015-01-01
> AA 2010-01-03 2011-01-03 2012-01-03 2013-01-02 2014-01-02 2015-01-02
> A 2010-01-04 2011-01-04 2012-01-04 2013-01-03 2014-01-03
> 2015-01-03
> CC 2010-01-05 2011-01-05 2012-01-05 2013-01-04 2014-01-04 2015-01-04
> BB 2010-01-06 2011-01-06 2012-01-06 2013-01-05 2014-01-05 2015-01-05
> B 2010-01-07 2011-01-07 2012-01-07 2013-01-06 2014-01-06
> 2015-01-06
>
> dat <- data.frame(Class = c("AAA", "C", "CC", "BB", "B", "A"), Close_date =
> c(ini, ini, ini, ini+109, ini+39, ini+24), stringsAsFactors = FALSE)
> ind <- sapply(dat$Class, function(x) match(x, rownames(oc)))
>
> for (i in length(ind)) {
> dat[["Flag"]] <- sapply(dat[["Close_date"]], function(x) ifelse((x >=
> oc[ind[[i]], 1] & x < oc[ind[[i]], 2]) | (x >= oc[ind[[i]], 3] & x <
> oc[ind[[i]], 4]) | (x >= oc[ind[[i]], 5] & x < oc[ind[[i]], 6]),
> "Valid", "Invalid"))
> }
>
>> dat
>>
> Class Close_date Flag
> *1 AAA 2010-01-01 Invalid*
> 2 C 2010-01-01 Invalid
> 3 CC 2010-01-01 Invalid
> 4 BB 2010-04-20 Valid
> 5 B 2010-02-09 Valid
> 6 A 2010-01-25 Valid
> The first record (highlighted in yellow) is flagged as "Invalid" where it
> should really be "Valid".
>
> Any suggestions on resolving this would be great.
>
> Many thanks.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list