[R] Various Errors using Survey Package
Thompson, Trevor
tkt2 at cdc.gov
Thu Feb 13 14:13:03 CET 2003
Dr. Lumley,
Thanks for your response. I want to point out that I did try using the
nest=TRUE option earlier and got the same error with svydesign. I checked
and I was using version 0.9-1. I have updated this to version 1.0 and I am
no longer getting an error.
Your other suggestions work too of course. Still, if you are interstested
in looking at the NHIS data, it is available at:
http://www.cdc.gov/nchs/nhis.htm
Thanks again for your help. I will first e-mail the package maintainer
directly in the future.
-Trevor
-----Original Message-----
From: Thomas Lumley [mailto:tlumley at u.washington.edu]
Sent: Wednesday, February 12, 2003 8:49 PM
To: Thompson, Trevor
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Various Errors using Survey Package
On Wed, 12 Feb 2003, Thompson, Trevor wrote:
> Hi,
>
> I have been experimenting with the new Survey package. Specifically, I
was
> trying to use some of the functions on the public-use survey data from
NHIS
> (2000 Sample Adult file).
>
> Error 1): The first error I get is when I try to specify the complex
survey
> design.
>
> nhis.design<-svydesign(ids=~psu, probs=~probs, strata=~strata,
data=nhis.df,
> check.strata=TRUE)
> Error in svydesign(ids = ~psu, probs = ~probs, strata = ~strata, data =
> nhis.df, :
> Clusters not nested in strata
>
> My data are sorted by strata, psu. Can someone tell me what the structure
> has to be for a stratified sample with clustering? Looking at the code,
it
> appears to me that it does not allow more than 1 observation per psu [i.e.
> any(sc > 1)].
The problem is probably that your id numbers for PSU start up again in
each stratum (eg you have a PSU numbered 1 in each stratum). If so, you
need the nest=TRUE option to tell svydesign() that all the PSUs numbered 1
in different strata are really different PSUs
> Error 2). If I go ahead and specify check.strata=FALSE, then svydesign
runs
> ok. I then tried using the svymean function. In the following example,
if
> I specify na.rm=TRUE, I get the error below:
No, it doesn't run ok, it just doesn't report an error.
> > svymean(nhis.df$crc10yr, design=nhis.design, na.rm=TRUE)
> Error in rowsum.default(x, strata) : Incorrect length for 'group'
>
> I traced this to the svyCprod call within svymean. SvyCprod calls rowsum
> and the group argument ("strata") appears to be the full length of that
> column rather than the subset with non-missing data.
With missing data you do need to use the data stored in the design object,
not a separate data frame, otherwise it will get confused. That is, you
want
svymean(~crc10yr, design=nhis.design, na.rm=TRUE)
> Error 3). I then tried svymean on another variable with na.rm=FALSE. I
got
> the following error:
>
> > svymean(nhis.df$age, design=nhis.design)
> Error in drop(rval) : names attribute must be the same length as the
vector
>
> I also traced this error to a call to rowsum within the function svyCprod.
> I'm not sure what names attribute this is referring to because the
arguments
> to rowsum and the rval object do not appear to have a names attribute.
Does
> anyone know what the problem here might be?
This might be the same problem, in which case
svymean(~age, design=nhis.design)
should work. You should also make sure you have version 1.0 of `survey'
rather than any of them 0.9-x versions that went up briefly on CRAN.
If you tell me where to find the NHIS data I will look at them. There
shouldn't be any special requirements on the format (other than using
nest=TRUE if PSUs don't have globally unique ids). I've looked at data
from some NCHS studies that are used as examples by Stata, and I don't
have any of these problems.
Incidentally, you should try writing to the package maintainer first,
rather than the list. In this case it doesn't matter, since I read the
list frequently, but it might in other cases.
-thomas
More information about the R-help
mailing list