[R] Problem with NA data when computing standard error

Tue Apr 8 23:00:57 CEST 2008

On Tue, Apr 8, 2008 at 12:44 PM, LeCzar <sirnixu at gmail.com> wrote:
>
>  Hey,
>
>  I want to compute means and standard errors as two tables like this:
>
>   se<-function(x)sqrt(var(x)/length(x))
>
>

The missings are not your main problem.

The command var computes the variance-covariance matrix.  Some
covariance values can be negative.  Trying to take square roots is a
mistake.

For example, run

> example(var)

to get some matrices to work with.

> C1[3,4] <- NA
> C1[3,5] <- NA

Observe you can calculate

> var(C1, na.rm=T)

but you cannot take sqrt of that because it would try to apply sqrt to
negative values.

To get the standard errors, it is necessary to reconsider the problem,
do something like

> diag(var(C1, na.rm=T))

That will give the diagonals, which are positive, so

> sqrt(diag(var(C1, na.rm=T)))

Works as well.

But you have the separate problem of dividing each one by the square
root of the length, and since there are missings that is not the same
for every column.  Maybe somebody knows a smarter way, but this
appears to give the correct answer:

validX <- colSums( ! is.na(C1))

This gives the roots:

sqrt(validX)

Put that together, it seems to me you could try

se <- function(x) {

    myDiag <- sqrt(diag(var(x, na.rm=T)))

     validX <- colSums(! is.na(x))

     myDiag/sqrt(validX)
}

That works for me:

> se(C1)
       Fertility      Agriculture      Examination        Education
       50.740226       110.808614        39.390611        39.303898
        Catholic Infant.Mortality
      328.272207         4.513863

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas