[R] Sum function and missing values --- need to mimic SAS sum function
Allen Bingham
aebingham2 at gmail.com
Mon Jan 26 22:49:12 CET 2015
Don,
The default for the sum function is to NOT remove NA before summing (i.e.,
option na.rm=FALSE), here's the results with na.rm=TRUE
> sum(NA,na.rm=TRUE)
[1] 0
> sum(c(NA,NA),na.rm=TRUE)
[1] 0
> sum(rep(NA,10),na.rm=TRUE)
[1] 0
> sum(as.numeric(letters[1:4]),na.rm=TRUE)
[1] 0
Warning message:
NAs introduced by coercion
Hope that explains it a bit better.
Others have replied with suggested solutions to my 'problem', and the one by
John Fox is what I need (an actual function that I can use in an apply
statement), although the suggested code by Sven Templer is appealing in its
simplicity.
Allen
-----Original Message-----
From: MacQueen, Don [mailto:macqueen1 at llnl.gov]
Sent: Monday, January 26, 2015 1:03 PM
To: Allen Bingham; r-help at r-project.org
Subject: Re: [R] Sum function and missing values --- need to mimic SAS sum
function
I'm a little puzzled by the assertion that the result is 0.0 when all the
elements are NA:
> sum(NA)
[1] NA
> sum(c(NA,NA))
[1] NA
> sum(rep(NA, 10))
[1] NA
> sum(as.numeric(letters[1:4]))
[1] NA
Warning message:
NAs introduced by coercion
Considering that the example snippet of code has several other aspects
besides using sum(), among them subsetting rows of a data frame when there
are apparently NAs in some its variables ... I wonder if the reason for the
failure of that snippet has been misunderstood?
--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
On 1/25/15, 3:21 PM, "Allen Bingham" <aebingham2 at gmail.com> wrote:
>I understand that in order to get the sum function to ignore missing
>values I need to supply the argument na.rm=TRUE. However, when summing
>numeric values in which ALL components are "NA" ... the result is 0.0
>... instead of (what I would get from SAS) of NA (or in the case of SAS
>".").
>
>Accordingly, I've had to go to 'extreme' measures to get the sum
>function to result in NA if all arguments are missing (otherwise give
>me a sum of all non-NA elements).
>
>So for example here's a snippet of code that ALMOST does what I want:
>
>
>SumValue<-apply(subset(InputDataFrame,!is.na(Variable.1)|!is.na(Variabl
>e.2
>),
>select=c(Variable.1,Variable.2)),1,sum,na.rm=TRUE)
>
>In reality this does NOT give me records with NA for SumValue ... but
>it doesn't give me values for any records in which both Variable.1 and
>Variable.2 are NA --- which is "good enough" for my purposes.
>
>I'm guessing with a little more work I could come up with a way to
>adapt the code above so that I could get it to work like SAS's sum
>function ...
>
>... but before I go that extra mile I thought I'd ask others if they
>know of functions in either base R ... or in a package that will better
>mimic the SAS sum function.
>
>Any suggestions?
>
>Thanks.
>______________________________________
>Allen Bingham
>aebingham2 at gmail.com
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list