[R] calibrate function

Thomas Lumley tlumley at u.washington.edu
Mon Jan 12 11:50:14 CET 2009


On Sun, 11 Jan 2009, [iso-8859-1] Elsa et Stéphane BOUEE wrote:

> Hi all,
>
> I have a question on the package « survey”

Please don't send questions to the list and to me separately. Either one is ok, but not both.

> I have some difficulties to use the function ‘calibrate’. Although it works
> well with one single factor variable, I cannot use it for 2 and get the
> message
>
> “Erreur dans regcalibrate.survey.design2(design, formula, population,
> aggregate.stage = aggregate.stage,  :   Population and sample totals are not
> the same length.”
>
> Here is the format I use as a data.frame:
<snip>
> My program is:
>
> grap<-svydesign(id=~1, data=ecodiaMG)
>
> regMG <-c(region1MG_NE =852, region1MG_NO=662, region1MG_P=636,
> region1MG_SE=961, region1MG_SO=545)
>
> sexMG <-c(sexe1MG_F =976, sexe1MG_H=2680)
>
> ageMG <-c(age_cl1MG_40 =380, age_cl1MG_4054=2099, age_cl1MG_54=1177)
>
> grap2<- calibrate(grap, formula= ~ age_cl1-1, c(ageMG))
>
> grap3<- calibrate(grap2, formula= ~ sexe1-1, c(sexMG))
>
> grap4<- calibrate(grap3, formula= ~region1-1, c(regMG))
>
> I can calibrate the variables one by one, which is wrong, so I would like to
> do it all in once:
>
> grap2<- calibrate(grap, formula= ~ age_cl1+ sexe1+ regMG -1, c(ageMG, sexMG,
> regMG ))
>

You need to drop the first level of sex1 and region1 (I assume you mean region1, not regMG in the formula argument).

grap2<- calibrate(grap, formula= ~ age_cl1+ sexe1+ region1 -1,
     c(ageMG, sexMG[-1], regMG[-1] ))


The population totals for calibrate() are the column totals for the regression model matrix specified by the formula.   With the default settings, when you have a single factor
   ~region1
the model matrix has a intercept column and then columns for each level of the factor except the first.  Using the -1 notation
   ~region1 -1
removes the intercept and so requires a column for each level of the factor.

When you have two or more factors and no intercept the first factor is coded with a column for each level of the factor. The sum of these columns is a constant column, so the model now effectively includes an intercept and all remaining factor variables are coded with a column for all levels except the first.

One way to be sure about what model matrix corresponds to the formula is to use the formula in a regression model, eg with svyglm() and see what coefficients appear.

          -thomas


Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle




More information about the R-help mailing list