[R] Various Errors using Survey Package

Thu Feb 13 14:13:03 CET 2003

Dr. Lumley,

Thanks for your response.  I want to point out that I did try using the
nest=TRUE option earlier and got the same error with svydesign.  I checked
and I was using version 0.9-1.  I have updated this to version 1.0 and I am
no longer getting an error.  

Your other suggestions work too of course.  Still, if you are interstested
in looking at the NHIS data, it is available at:

http://www.cdc.gov/nchs/nhis.htm 

Thanks again for your help.  I will first e-mail the package maintainer
directly in the future.

-Trevor

-----Original Message-----
From: Thomas Lumley [mailto:tlumley at u.washington.edu]
Sent: Wednesday, February 12, 2003 8:49 PM
To: Thompson, Trevor
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Various Errors using Survey Package

On Wed, 12 Feb 2003, Thompson, Trevor wrote:

> Hi,
>
> I have been experimenting with the new Survey package.  Specifically, I
was
> trying to use some of the functions on the public-use survey data from
NHIS
> (2000 Sample Adult file).
>
> Error 1):  The first error I get is when I try to specify the complex
survey
> design.
>
> nhis.design<-svydesign(ids=~psu, probs=~probs, strata=~strata,
data=nhis.df,
> check.strata=TRUE)
> Error in svydesign(ids = ~psu, probs = ~probs, strata = ~strata, data =
> nhis.df,  :
>         Clusters not nested in strata
>
> My data are sorted by strata, psu.  Can someone tell me what the structure
> has to be for a stratified sample with clustering?  Looking at the code,
it
> appears to me that it does not allow more than 1 observation per psu [i.e.
> any(sc > 1)].

  The problem is probably that your id numbers for PSU start up again in
each stratum (eg you have a PSU numbered 1 in each stratum).  If so, you
need the nest=TRUE option to tell svydesign() that all the PSUs numbered 1
in different strata are really different PSUs

> Error 2).  If I go ahead and specify check.strata=FALSE, then svydesign
runs
> ok.  I then tried using the svymean function.  In the following example,
if
> I specify na.rm=TRUE, I get the error below:

No, it doesn't run ok, it just doesn't report an error.

> > svymean(nhis.df$crc10yr, design=nhis.design, na.rm=TRUE)
> Error in rowsum.default(x, strata) : Incorrect length for 'group'
>
> I traced this to the svyCprod call within svymean.   SvyCprod calls rowsum
> and the group argument ("strata") appears to be the full length of that
> column rather than the subset with non-missing data.

With missing data you do need to use the data stored in the design object,
not a separate data frame, otherwise it will get confused. That is, you
want
  svymean(~crc10yr, design=nhis.design, na.rm=TRUE)

> Error 3).  I then tried svymean on another variable with na.rm=FALSE.  I
got
> the following error:
>
> > svymean(nhis.df$age, design=nhis.design)
> Error in drop(rval) : names attribute must be the same length as the
vector
>
> I also traced this error to a call to rowsum within the function svyCprod.
> I'm not sure what names attribute this is referring to because the
arguments
> to rowsum and the rval object do not appear to have a names attribute.
Does
> anyone know what the problem here might be?

This might be the same problem, in which case
    svymean(~age, design=nhis.design)
should work.  You should also make sure you have version 1.0 of `survey'
rather than any of them 0.9-x versions that went up briefly on CRAN.

If you tell me where to find the NHIS data I will look at them. There
shouldn't be any special requirements on the format (other than using
nest=TRUE if PSUs don't have globally unique ids).  I've looked at data
from some NCHS studies that are used as examples by Stata, and I don't
have any of these problems.

Incidentally, you should try writing to the package maintainer first,
rather than the list. In this case it doesn't matter, since I read the
list frequently, but it might in other cases.

	-thomas