[R] calibrate function
Thomas Lumley
tlumley at u.washington.edu
Mon Jan 12 11:50:14 CET 2009
On Sun, 11 Jan 2009, [iso-8859-1] Elsa et Stéphane BOUEE wrote:
> Hi all,
>
> I have a question on the package « survey
Please don't send questions to the list and to me separately. Either one is ok, but not both.
> I have some difficulties to use the function calibrate. Although it works
> well with one single factor variable, I cannot use it for 2 and get the
> message
>
> Erreur dans regcalibrate.survey.design2(design, formula, population,
> aggregate.stage = aggregate.stage, : Population and sample totals are not
> the same length.
>
> Here is the format I use as a data.frame:
<snip>
> My program is:
>
> grap<-svydesign(id=~1, data=ecodiaMG)
>
> regMG <-c(region1MG_NE =852, region1MG_NO=662, region1MG_P=636,
> region1MG_SE=961, region1MG_SO=545)
>
> sexMG <-c(sexe1MG_F =976, sexe1MG_H=2680)
>
> ageMG <-c(age_cl1MG_40 =380, age_cl1MG_4054=2099, age_cl1MG_54=1177)
>
> grap2<- calibrate(grap, formula= ~ age_cl1-1, c(ageMG))
>
> grap3<- calibrate(grap2, formula= ~ sexe1-1, c(sexMG))
>
> grap4<- calibrate(grap3, formula= ~region1-1, c(regMG))
>
> I can calibrate the variables one by one, which is wrong, so I would like to
> do it all in once:
>
> grap2<- calibrate(grap, formula= ~ age_cl1+ sexe1+ regMG -1, c(ageMG, sexMG,
> regMG ))
>
You need to drop the first level of sex1 and region1 (I assume you mean region1, not regMG in the formula argument).
grap2<- calibrate(grap, formula= ~ age_cl1+ sexe1+ region1 -1,
c(ageMG, sexMG[-1], regMG[-1] ))
The population totals for calibrate() are the column totals for the regression model matrix specified by the formula. With the default settings, when you have a single factor
~region1
the model matrix has a intercept column and then columns for each level of the factor except the first. Using the -1 notation
~region1 -1
removes the intercept and so requires a column for each level of the factor.
When you have two or more factors and no intercept the first factor is coded with a column for each level of the factor. The sum of these columns is a constant column, so the model now effectively includes an intercept and all remaining factor variables are coded with a column for all levels except the first.
One way to be sure about what model matrix corresponds to the formula is to use the formula in a regression model, eg with svyglm() and see what coefficients appear.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-help
mailing list