[Rd] var/sd and NAs in R2.7.0
Simon Urbanek
simon.urbanek at r-project.org
Fri May 16 17:37:19 CEST 2008
Robert,
this was discussed before:
https://stat.ethz.ch/pipermail/r-devel/2007-December/047594.html
and it *is* mentioned in NEWS:
o co[rv](use = "complete.obs") now always gives an error if there
are no complete cases: they used to give NA if
method = "pearson" but an error for the other two methods.
(Note that this is pretty arbitrary, but zero-length vectors
always give an error so it is at least consistent.)
cor(use="pair") used to give diagonal 1 even if the variable
was completely missing for the rank methods but NA for the
Pearson method: it now gives NA in all cases.
cor(use="pair") for the rank methods gave a matrix result with
dimensions > 0 even if one of the inputs had 0 columns.
[sd(..,na.rm=TRUE) -> cov(..,use="complete.obs")]
Cheers,
Simon
On May 16, 2008, at 11:19 AM, McGehee, Robert wrote:
> I know I can get around this, I just would prefer that if R is
> breaking
> backwards compatibility, then it's intentional (maybe it is, I just
> don't know). That is, I don't want to require my entire company to
> upgrade to 2.7.0 just so I can deploy a fix here, and I'd prefer not
> to
> check the argument list of var every time I use it.
>
> if ("use" %in% names(formals(var)))
> var(x, na.rm=TRUE, use="p")
> else
> var(x, na.rm=TRUE)
>
>
> -----Original Message-----
> From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com]
> Sent: Friday, May 16, 2008 11:03 AM
> To: McGehee, Robert
> Cc: R-devel
> Subject: Re: [Rd] var/sd and NAs in R2.7.0
>
> Try
>
> var(c(NA, NA, NA), use = "pairwise.complete.obs")
>
>
> On Fri, May 16, 2008 at 10:56 AM, McGehee, Robert
> <Robert.McGehee at geodecapital.com> wrote:
>> Hello all,
>> I just upgraded to R 2.7.0 and found that the behavior of 'var' and
> 'sd'
>> have changed in the presence NAs (this wasn't explicit in the NEWS
> file,
>> though I see it probably has to do with the change for cor/cov).
> Anyway,
>> I just want to make sure that it was intentional to produce an error
>> when there was all NAs and na.rm=TRUE, rather than returning an NA
> (like
>> R 2.6.2), or NaN (like the function 'mean' does). That is, isn't the
>> purpose of 'na.rm=TRUE' to, in part, suppress these error messages.
>>
>> Specifically,
>>> var(c(NA, NA, NA), na.rm=TRUE) # R2.6.2
>> [1] NA
>>> var(c(NA, NA, NA), na.rm=TRUE) # R2.7.0
>> Error during wrapup: no complete observations in cov/cor
>>
>> I think I can get the old behavior by setting use='p', but the 'sd'
>> function does not have a 'use' argument and I'd like not to get an
> error
>> here. Anyway, I'm a fan of the old behavior (not producing an error),
>> but if there was a reason to change this when na.rm=TRUE, I would
>> request that the 'sd' function be updated to be able to revert to the
>> old behavior as well.
>>
>> FYI: I 'apply' these functions to large matrices of stock return time
>> series with missing values, and don't want the whole calculation to
> fail
>> just because I'm missing stock returns for one company.
>>
>> Thanks,
>> Robert
>>
>> Robert McGehee, CFA
>> Geode Capital Management, LLC
>> One Post Office Square, 28th Floor | Boston, MA | 02109
>> Tel: 617/392-8396 Fax:617/476-6389
>> mailto:robert.mcgehee at geodecapital.com
>>
>>
>>
>> This e-mail, and any attachments hereto, are intended
> fo...{{dropped:11}}
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
More information about the R-devel
mailing list