[Rd] sd(NA)

Patrick Burns pburns at pburns.seanet.com
Mon Dec 3 11:40:20 CET 2007


I like the 2.6.x behaviour better.  Consider:

x <- array(1:30), c(10,3))
x[,1] <- NA
x[-1,2] <- NA
x[1,3] <- NA

sd(x, na.rm=TRUE)

# 2.7.0
Error in var(x, na.rm = na.rm) : no complete element pairs

# 2.6.x
[1]       NA       NA 2.738613

The reason to put 'na.rm=TRUE' into the call is to avoid
getting an error due to missing values. (And, yes, in finance
it is entirely possible to have a matrix with all NAs in a
column.)

I think the way out is to allow there to be a conceptual
difference between computing a value with no data, and
computing a value on all NAs after removing NAs.  The
first is clearly impossible.  The second has some actual
value, but we don't have enough information to have an
estimate of the value.

Patrick Burns
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")

Prof Brian Ripley wrote:

>On Sun, 2 Dec 2007, Wolfgang Huber wrote:
>
>  
>
>>Dear Prof. Ripley
>>
>>I noted a change in the behaviour of "cov", which is very reasonable:
>>
>>## R version 2.7.0 Under development (unstable) (2007-11-30 r43565)
>>    
>>
>>> cov(as.numeric(NA), as.numeric(NA), use="complete.obs")
>>>      
>>>
>>Error in cov(as.numeric(NA), as.numeric(NA), use = "complete.obs") :
>>  no complete element pairs
>>
>>whereas earlier behavior was, for example:
>>## R version 2.6.0 Patched (2007-10-23 r43258)
>>    
>>
>>>cov(as.numeric(NA), as.numeric(NA), use="complete.obs")
>>>      
>>>
>>[1] NA
>>
>>
>>I wanted to ask whether the effect this has on "sd" is desired:
>>
>>## R version 2.7.0 Under development (unstable) (2007-11-30 r43565)
>>    
>>
>>>sd(NA, na.rm=TRUE)
>>>      
>>>
>>Error in var(x, na.rm = na.rm) : no complete element pairs
>>
>>## R version 2.6.0 Patched (2007-10-23 r43258)
>>    
>>
>>> sd(NA, na.rm=TRUE)
>>>      
>>>
>>[1] NA
>>    
>>
>
>That is a bug fix: see the NEWS entry.  The previous behaviour of
>
>  
>
>>sd(numeric(0))
>>    
>>
>Error in var(x, na.rm = na.rm) : 'x' is empty
>  
>
>>sd(NA_real_, na.rm=TRUE)
>>    
>>
>[1] NA
>
>was not as documented:
>
>      This function computes the standard deviation of the values in
>      'x'. If 'na.rm' is 'TRUE' then missing values are removed before
>      computation proceeds.
>
>so somehow an empty vector had a sd() if computed one way, and not if 
>computed another.
>
>  
>



More information about the R-devel mailing list