[R] Various Errors using Survey Package

Thu Feb 13 02:50:03 CET 2003

On Wed, 12 Feb 2003, Thompson, Trevor wrote:

> Hi,
>
> I have been experimenting with the new Survey package.  Specifically, I was
> trying to use some of the functions on the public-use survey data from NHIS
> (2000 Sample Adult file).
>
> Error 1):  The first error I get is when I try to specify the complex survey
> design.
>
> nhis.design<-svydesign(ids=~psu, probs=~probs, strata=~strata, data=nhis.df,
> check.strata=TRUE)
> Error in svydesign(ids = ~psu, probs = ~probs, strata = ~strata, data =
> nhis.df,  :
>         Clusters not nested in strata
>
> My data are sorted by strata, psu.  Can someone tell me what the structure
> has to be for a stratified sample with clustering?  Looking at the code, it
> appears to me that it does not allow more than 1 observation per psu [i.e.
> any(sc > 1)].

  The problem is probably that your id numbers for PSU start up again in
each stratum (eg you have a PSU numbered 1 in each stratum).  If so, you
need the nest=TRUE option to tell svydesign() that all the PSUs numbered 1
in different strata are really different PSUs

> Error 2).  If I go ahead and specify check.strata=FALSE, then svydesign runs
> ok.  I then tried using the svymean function.  In the following example, if
> I specify na.rm=TRUE, I get the error below:

No, it doesn't run ok, it just doesn't report an error.

> > svymean(nhis.df$crc10yr, design=nhis.design, na.rm=TRUE)
> Error in rowsum.default(x, strata) : Incorrect length for 'group'
>
> I traced this to the svyCprod call within svymean.   SvyCprod calls rowsum
> and the group argument ("strata") appears to be the full length of that
> column rather than the subset with non-missing data.

With missing data you do need to use the data stored in the design object,
not a separate data frame, otherwise it will get confused. That is, you
want
  svymean(~crc10yr, design=nhis.design, na.rm=TRUE)

> Error 3).  I then tried svymean on another variable with na.rm=FALSE.  I got
> the following error:
>
> > svymean(nhis.df$age, design=nhis.design)
> Error in drop(rval) : names attribute must be the same length as the vector
>
> I also traced this error to a call to rowsum within the function svyCprod.
> I'm not sure what names attribute this is referring to because the arguments
> to rowsum and the rval object do not appear to have a names attribute.  Does
> anyone know what the problem here might be?

This might be the same problem, in which case
    svymean(~age, design=nhis.design)
should work.  You should also make sure you have version 1.0 of `survey'
rather than any of them 0.9-x versions that went up briefly on CRAN.

If you tell me where to find the NHIS data I will look at them. There
shouldn't be any special requirements on the format (other than using
nest=TRUE if PSUs don't have globally unique ids).  I've looked at data
from some NCHS studies that are used as examples by Stata, and I don't
have any of these problems.

Incidentally, you should try writing to the package maintainer first,
rather than the list. In this case it doesn't matter, since I read the
list frequently, but it might in other cases.

	-thomas