[R] if else for cumulative sum error

David L Carlson dcarlson at tamu.edu
Tue Dec 2 23:22:46 CET 2014


Let's try a different approach. You don't need a loop for this. First we need a reproducible example:

> set.seed(42)
> dadosmax <- data.frame(above=runif(150) + .5)

Now compute your sums using cumsum() and diff() and then compute enchday using ifelse(). See the manual pages for each of these functions to understand how they work:

> sums <- diff(c(0, cumsum(dadosmax$above)), 45)
> dadosmax$enchday <- c(ifelse(sums >= 45, 1, 0), rep(NA, 44))

> dadosmax$enchday
  [1]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 [26]  1  1  1  1  1  1  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
 [51]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
 [76]  0  0  0  0  0  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[101]  1  1  1  1  1  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[126] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

See the NA's? Those are what David Winsemius is talking about. For the 106th value, 106+44 is 150, but for the 107th value 107+144 is 151 which does not exist. Fortunately diff() understands that and stops at 106, but we have to add 44 NA's because that is the number of rows in your data frame.

You might find this plot informative as well:

> plot(sums, typ="l")
> abline(h=45)

Another way to get there is to use sapply() which will add the NA's for us:

> sums <- sapply(1:150, function(x) sum(dadosmax$above[x:(x+44)]))
> dadosmax$enchday <- ifelse(sums >= 45, 1, 0)

But it won't be as fast if you have a large data set.

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of David Winsemius
Sent: Tuesday, December 2, 2014 2:50 PM
To: Jefferson Ferreira-Ferreira
Cc: r-help at r-project.org
Subject: Re: [R] if else for cumulative sum error


On Dec 2, 2014, at 12:26 PM, Jefferson Ferreira-Ferreira wrote:

> Thank you for replies.
> 
> David,
> 
> I tried your modified form
> 
> for (i in 1:seq_along(rownames(dadosmax))){


No. it is either 1: .... or seq_along(...). in this case perhaps 1:(nrow(dadosmax)-44 would be safer

You do not seem to have understood that you cannot use an index of i+44 when i is going to be the entire set of rows of the dataframe. There is "no there there" to quote Gertrude Stein's slur against Oakland. In fact there is not there there at i+1 when you get to the end. You either need to only go to row

>  dadosmax$enchday[i] <- if ( (sum(dadosmax$above[i:(i+44)])) >= 45) 1 else
> 0
> }
> 
> However, I'm receiving this warning:
> Warning message:
> In 1:seq_along(rownames(dadosmax)) :
>  numerical expression has 2720 elements: only the first used
> 
> I can't figure out why only the first row was calculated...

You should of course read these, but the error is not from your if-statement but rahter you for-loop-indexing.

?'if'
?ifelse


> Any ideas?
> 
> 
> 
> Em Tue Dec 02 2014 at 15:22:25, John McKown <john.archie.mckown at gmail.com>
> escreveu:
> 
>> On Tue, Dec 2, 2014 at 12:08 PM, Jefferson Ferreira-Ferreira <
>> jecogeo at gmail.com> wrote:
>> 
>>> Hello everybody;
>>> 
>>> I'm writing a code where part of it is as follows:
>>> 
>>> for (i in nrow(dadosmax)){
>>>  dadosmax$enchday[i] <- if (sum(dadosmax$above[i:(i+44)]) >= 45) 1 else 0
>>> }
>>> 
>> 
>> ​Without some test data for any validation, I would try the following
>> formula
>> 
>> dadosmax$enchday[i] <- if
>> (sum(dadosmax$above[i:(min(i+44,nrow(dadosmax)))] >= 45) 1 else 0​
>> 
>> 
>> 
>>> 
>>> That is for each row of my data frame, sum an specific column (0 or 1) of
>>> that row plus 44 rows. If It is >=45 than enchday is 1 else 0.
>>> 
>>> The following error is returned:
>>> 
>>> Error in if (sum(dadosmax$above[i:(i + 44)]) >= 45) 1 else 0 :
>>>  missing value where TRUE/FALSE needed
>>> 
>>> I've tested the ifelse statement assigning different values to i and it
>>> works. So I'm wondering if this error is due the fact that at the final of
>>> my data frame there aren't 45 rows to sum anymore. I tried to use "try"
>>> but
>>> It's simply hide the error.
>>> 
>>> How can I deal with this? Any ideas?
>>> Thank you very much.
>>> 
>>>        [[alternative HTML version deleted]]
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 
>> 
>> 
>> --
>> The temperature of the aqueous content of an unremittingly ogled
>> culinary vessel will not achieve 100 degrees on the Celsius scale.
>> 
>> Maranatha! <><
>> John McKown
>> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


More information about the R-help mailing list