[R] Sum function and missing values --- need to mimic SAS sum function

Henrik Bengtsson hb at biostat.ucsf.edu
Mon Jan 26 22:46:40 CET 2015


In case anyone wonders, this behavior is expected and consistent with
the note "the sum of an empty set is zero, by definition" in
help("sum"), i.e.

> x <- numeric(0)
> str(x)
 num(0)
> sum(x)
[1] 0

Analogously, prod(numeric(0)) gives 1.0.


To OP, if you're in the end of the day is after the sample mean, note
that mean() returns NaN in this case, e.g.

> x <- rep(NA_real_, times=10)
> mean(x, na.rm=TRUE)
[1] NaN

/Henrik

On Mon, Jan 26, 2015 at 1:17 PM, Ista Zahn <istazahn at gmail.com> wrote:
> Try with na.rm=TRUE.
> On Jan 26, 2015 4:04 PM, "MacQueen, Don" <macqueen1 at llnl.gov> wrote:
>
>> I'm a little puzzled by the assertion that the result is 0.0 when all the
>> elements are NA:
>>
>> > sum(NA)
>> [1] NA
>>
>> > sum(c(NA,NA))
>> [1] NA
>>
>> > sum(rep(NA, 10))
>> [1] NA
>>
>> > sum(as.numeric(letters[1:4]))
>> [1] NA
>> Warning message:
>> NAs introduced by coercion
>>
>>
>> Considering that the example snippet of code has several other aspects
>> besides using sum(), among them subsetting rows of a data frame when there
>> are apparently NAs in some its variables ... I wonder if the reason for
>> the failure of that snippet has been misunderstood?
>>
>>
>> --
>> Don MacQueen
>>
>> Lawrence Livermore National Laboratory
>> 7000 East Ave., L-627
>> Livermore, CA 94550
>> 925-423-1062
>>
>>
>>
>>
>>
>> On 1/25/15, 3:21 PM, "Allen Bingham" <aebingham2 at gmail.com> wrote:
>>
>> >I understand that in order to get the sum function to ignore missing
>> >values
>> >I need to supply the argument na.rm=TRUE. However, when summing numeric
>> >values in which ALL components are "NA" ... the result is 0.0 ... instead
>> >of
>> >(what I would get from SAS) of NA (or in the case of SAS ".").
>> >
>> >Accordingly, I've had to go to 'extreme' measures to get the sum function
>> >to
>> >result in NA if all arguments are missing (otherwise give me a sum of all
>> >non-NA elements).
>> >
>> >So for example here's a snippet of code that ALMOST does what I want:
>> >
>> >
>> >SumValue<-apply(subset(InputDataFrame,!is.na(Variable.1)|!is.na
>> (Variable.2
>> >),
>> >select=c(Variable.1,Variable.2)),1,sum,na.rm=TRUE)
>> >
>> >In reality this does NOT give me records with NA for SumValue ... but it
>> >doesn't give me values for any records in which both Variable.1 and
>> >Variable.2 are NA --- which is "good enough" for my purposes.
>> >
>> >I'm guessing with a little more work I could come up with a way to adapt
>> >the
>> >code above so that I could get it to work like SAS's sum function ...
>> >
>> >... but before I go that extra mile I thought I'd ask others if they know
>> >of
>> >functions in either base R ... or in a package that will better mimic the
>> >SAS sum function.
>> >
>> >Any suggestions?
>> >
>> >Thanks.
>> >______________________________________
>> >Allen Bingham
>> >aebingham2 at gmail.com
>> >
>> >______________________________________________
>> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >https://stat.ethz.ch/mailman/listinfo/r-help
>> >PLEASE do read the posting guide
>> >http://www.R-project.org/posting-guide.html
>> >and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list