[R] Survey Design / Rake questions

Thomas Lumley tlumley at u.washington.edu
Fri Aug 29 19:23:16 CEST 2008


On Thu, 28 Aug 2008, Farley, Robert wrote:

> I'm feeling like I just don't get it.  My attempt at rake now fails
> with:
> Error in postStratify.survey.design(design, strata[[i]],
> population.margins[[i]],  :
>  Stratifying variables don't match

Ah. Now we have an easy one to fix.  This means that the names of the 
variables don't match, which they don't, because the variable names in the 
formula are lineon and NumStn and the variable names in the population 
tables are StnName and StnTraveld.  You just need to rename the variables 
in the population tables.

 	-thomas


> The factors in the data frame looks fine.  Should I have the same
> structure in the design?
>> str(EBDesign$lineon)
> NULL
>> str(EBSurvey$lineon)
> Factor w/ 13 levels "Warner Center",..: 3 1 1 1 2 13 1 5 1 5 ...
>> str(ByEBOn$StnName)
> Factor w/ 13 levels "Balboa","De Soto",..: 11 2 5 8 6 1 12 7 10 13 ...
>> all(levels(EBSurvey$lineon)==StnName)
> [1] TRUE
>> #
>> str(EBDesign$NumStn)
> NULL
>> str(EBSurvey$NumStn)
> Factor w/ 12 levels "1","2","3","4",..: 10 12 4 12 8 1 8 8 12 4 ...
>> str(ByEBNum$StnTraveld)
> Factor w/ 12 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
>> all(levels(EBSurvey$NumStn)==StnTraveld)
> [1] TRUE
>
> A complete listing is below:
> **************************************************
> **************************************************
> **************************************************
>> sessionInfo()        # List loaded packages
> R version 2.7.2 (2008-08-25)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] graphics  grDevices utils     datasets  stats     methods   base
>
>
> other attached packages:
> [1] survey_3.8-1   fortunes_1.3-5 moonsun_0.1    prettyR_1.3-2
> foreign_0.8-29
>> SurveyData <- read.spss("C:/Data/R/orange_delivery.sav",
> use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)
>>
> #=======================================================================
> ========
>> temp <- sub(' +$', '', SurveyData$direction_)
>> SurveyData$direction_ <- temp
>>
> #=======================================================================
> ========
>> # Calc. # stations traversed from StnOn/StnOff
>>
> SurveyData$NumStn=abs(as.numeric(SurveyData$lineon)-as.numeric(SurveyDat
> a$lineoff))
>> #################################################### Kludge
>> mean(SurveyData$NumStn)
> [1] 6.785276
>> SurveyData$NumStn <- pmax(1,SurveyData$NumStn)
>> mean(SurveyData$NumStn)
> [1] 6.789877
>> ####################################################
>> SurveyData$NumStn <- as.factor(SurveyData$NumStn)
>>
> #=======================================================================
> ========
>> # Adjust one direction at a time.  Start W/ EB {learn subsetting
> later}
>> EBSurvey <- subset(SurveyData, direction_ == "EASTBOUND" )
>> EBDesign <- svydesign(id=~sampn, weights=~expwgt, data=EBSurvey)
>>
> #=======================================================================
> ========
>> # New Marignals {start w/ 2 dimensions: StnOn X Distance}
>> StnName <- as.factor(c( "Warner Center", "De Soto", "Pierce College",
> "Tampa", "Reseda", "Balboa", "Woodley", "Sepulveda", "Van Nuys",
> "Woodman", "Valley College", "Laurel Canyon", "North Hollywood"))
>> EBOnNewTots       <- c(            1000,       600,             1200,
> 500,     1000,      500,       200,         250,       1000,       300,
> 100,          123.65,                0 )
>> ByEBOn  <- data.frame(StnName, Freq=EBOnNewTots)
>> #
>> StnTraveld <- as.factor(1:12)
>> EBNumStn   <- c(673.65,     800, 1000, 1000,  800,  700,  600, 500,
> 400, 200,  50, 50 )
>> ByEBNum    <- data.frame(StnTraveld, Freq=EBNumStn)
>> #
>> RakedEBSurvey <- rake(EBDesign, list(~lineon, ~NumStn), list(ByEBOn,
> ByEBNum) )
> Error in postStratify.survey.design(design, strata[[i]],
> population.margins[[i]],  :
>  Stratifying variables don't match
>> #
>> str(EBDesign$lineon)
> NULL
>> str(EBSurvey$lineon)
> Factor w/ 13 levels "Warner Center",..: 3 1 1 1 2 13 1 5 1 5 ...
>> str(ByEBOn$StnName)
> Factor w/ 13 levels "Balboa","De Soto",..: 11 2 5 8 6 1 12 7 10 13 ...
>> all(levels(EBSurvey$lineon)==StnName)
> [1] TRUE
>> #
>> str(EBDesign$NumStn)
> NULL
>> str(EBSurvey$NumStn)
> Factor w/ 12 levels "1","2","3","4",..: 10 12 4 12 8 1 8 8 12 4 ...
>> str(ByEBNum$StnTraveld)
> Factor w/ 12 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
>> all(levels(EBSurvey$NumStn)==StnTraveld)
> [1] TRUE
>> #
> **************************************************
> **************************************************
> **************************************************
>
> Robert Farley
> Metro
> www.Metro.net
>
>
> -----Original Message-----
> From: Thomas Lumley [mailto:tlumley at u.washington.edu]
> Sent: Thursday, August 28, 2008 11:43
> To: Farley, Robert
> Cc: r-help at r-project.org
> Subject: Re: [R] Survey Design / Rake questions
>
> On Mon, 25 Aug 2008, Farley, Robert wrote:
>
>> I see a number of things that bother me.
>>  1) str(ByEBNum$StnTraveld) says "int [1:12] 1 2 3 4 5 6 7 8 9 10 ..."
>>         Even though "StnTraveld  <- c(as.factor(1:12))"
>
> You don't want the c()
>> a<-as.factor(1:12)
>> str(a)
>  Factor w/ 12 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
>> str(c(a))
>  int [1:12] 1 2 3 4 5 6 7 8 9 10 ...
>
> As the help for c() says  "all attributes except names are removed.",
> which includes the factor levels.
>
>>  2) ByEBOn$StnName[1:5] seems to imply I have extra spaces in the
> data.  Where would they have come from?
>
> No, that's just R printing things in columns
>> a<-factor(1:12, labels=c(1:11,"antidisestablishmentarianism"))
>> a
>  [1] 1                            2
>  [3] 3                            4
>  [5] 5                            6
>  [7] 7                            8
>  [9] 9                            10
> [11] 11                           antidisestablishmentarianism
> Levels: 1 2 3 4 5 6 7 8 9 10 11 antidisestablishmentarianism
>
>
>>  3) I'd like to verify that the order (value) of "EBSurvey$lineon"
>> matches my definition in "StnName"
>
> all(levels(EBSurvey$lineon)==StnName)
>
> 	-thomas
>
>
> Thomas Lumley			Assoc. Professor, Biostatistics
> tlumley at u.washington.edu	University of Washington, Seattle
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle



More information about the R-help mailing list