[R] Various Errors using Survey Package
tlumley at u.washington.edu
Thu Feb 13 02:50:03 CET 2003
On Wed, 12 Feb 2003, Thompson, Trevor wrote:
> I have been experimenting with the new Survey package. Specifically, I was
> trying to use some of the functions on the public-use survey data from NHIS
> (2000 Sample Adult file).
> Error 1): The first error I get is when I try to specify the complex survey
> nhis.design<-svydesign(ids=~psu, probs=~probs, strata=~strata, data=nhis.df,
> Error in svydesign(ids = ~psu, probs = ~probs, strata = ~strata, data =
> nhis.df, :
> Clusters not nested in strata
> My data are sorted by strata, psu. Can someone tell me what the structure
> has to be for a stratified sample with clustering? Looking at the code, it
> appears to me that it does not allow more than 1 observation per psu [i.e.
> any(sc > 1)].
The problem is probably that your id numbers for PSU start up again in
each stratum (eg you have a PSU numbered 1 in each stratum). If so, you
need the nest=TRUE option to tell svydesign() that all the PSUs numbered 1
in different strata are really different PSUs
> Error 2). If I go ahead and specify check.strata=FALSE, then svydesign runs
> ok. I then tried using the svymean function. In the following example, if
> I specify na.rm=TRUE, I get the error below:
No, it doesn't run ok, it just doesn't report an error.
> > svymean(nhis.df$crc10yr, design=nhis.design, na.rm=TRUE)
> Error in rowsum.default(x, strata) : Incorrect length for 'group'
> I traced this to the svyCprod call within svymean. SvyCprod calls rowsum
> and the group argument ("strata") appears to be the full length of that
> column rather than the subset with non-missing data.
With missing data you do need to use the data stored in the design object,
not a separate data frame, otherwise it will get confused. That is, you
svymean(~crc10yr, design=nhis.design, na.rm=TRUE)
> Error 3). I then tried svymean on another variable with na.rm=FALSE. I got
> the following error:
> > svymean(nhis.df$age, design=nhis.design)
> Error in drop(rval) : names attribute must be the same length as the vector
> I also traced this error to a call to rowsum within the function svyCprod.
> I'm not sure what names attribute this is referring to because the arguments
> to rowsum and the rval object do not appear to have a names attribute. Does
> anyone know what the problem here might be?
This might be the same problem, in which case
should work. You should also make sure you have version 1.0 of `survey'
rather than any of them 0.9-x versions that went up briefly on CRAN.
If you tell me where to find the NHIS data I will look at them. There
shouldn't be any special requirements on the format (other than using
nest=TRUE if PSUs don't have globally unique ids). I've looked at data
from some NCHS studies that are used as examples by Stata, and I don't
have any of these problems.
Incidentally, you should try writing to the package maintainer first,
rather than the list. In this case it doesn't matter, since I read the
list frequently, but it might in other cases.
More information about the R-help